Why is the pause time goal satisfied first?
GC ergonomics tries to satisfy a pause time goal before considering any throughput goal.
Why not the throughput goal first? I tried both ways with a variety of applications.
As one might expect it was not black and white. In the end we chose to consider the
goals in this order.
The pause time goal definitely
has the potential for being the hardest goal to meet. It's dependence on heap size
is complicated and trying to meet the pause time goal without the encumberances of either
of the other goals was easier to think about. If we could meet the pause time goal, then
increasing the heap to try and meet a throughput goal felt safer (i.e., the relationship
between throughput and heap size is more linear so it was easier to understand
how undoing an increase would get us back to where we started).
In retrospect it also seems more natural
to have the pause time goal (which pushes heap size down) competing with the throughput
goal (which pushes heap size up). And only then to have the throughput goal (which again
pushes the heap size up) competing with the footprint goal (which, of course, pushes the
heap size down).
A pause is a pause ...
We talked quite a bit about whether the pause time goal should apply to both the major
and minor pause times. The issue was whether it would be effective to shrink the
size of the old generation to reduce the major pause times. With a young generation
collection you can shrink the heap more easily because there is always some place to
put any live objects in the young generation (namely into the old generation). It was clear
that reducing the young generation size would reduce the minor collection times
(after you've paid the cost of getting the collection started and shutting it down).
Well, that's true if you can ignore the fact that more frequent collections give objects less
time to die. With the old generation it was much less obvious what would happen. The old generation
can only be shunk down to a size big enough to hold all the live data in the old generation.
Also the amount of free space in the old generation has an effect on the young generation
collection in that young generation collection may need to copy objects into the old
generation. In the end we decided that trying to limit both the major pauses and minor pauses with the
pause time goal, while harder was more meaningful. Would you have accepted the
excuse "Yes, we missed the goal but it was a major collection pause not a minor
collection pause".
System.gc()'s. Just ignore them.
During development I initially tried to include the costs of System.gc()'s in the calculation of the averages used by GC ergonomics. In calculating the cost of
collections the frequency of collections matters. If you are
having collections more often then the cost of GC is higher.
The strategy to reduce that cost is to increase the size of the heap so that collections
are less frequent (i.e., since the heap is larger you can do more allocations
before having to do another collection). The difficulty
with System.gc()'s is that increasing the size of the heap does not in general increase
the time between System.gc()'s. I tried to finesse the cost of a System.gc() by considering
how full the heap was when the System.gc() happened and extrapolating to how long the interval
between collections would have been. After some experimentation I found that picking how
to do the extrapolation was basically picking the answer (i.e., what the GC cost would have
been). I could tailor an extrapolation to fit one application, but invariably it did not fit
some other applications. Basically it was too hard. So GC ergonomics ignores System.gc()'s.