What Were We Thinking?
By jonthecollector on Oct 24, 2005
Why is the pause time goal satisfied first?
GC ergonomics tries to satisfy a pause time goal before considering any throughput goal. Why not the throughput goal first? I tried both ways with a variety of applications. As one might expect it was not black and white. In the end we chose to consider the goals in this order.
The pause time goal definitely has the potential for being the hardest goal to meet. It's dependence on heap size is complicated and trying to meet the pause time goal without the encumberances of either of the other goals was easier to think about. If we could meet the pause time goal, then increasing the heap to try and meet a throughput goal felt safer (i.e., the relationship between throughput and heap size is more linear so it was easier to understand how undoing an increase would get us back to where we started).
In retrospect it also seems more natural to have the pause time goal (which pushes heap size down) competing with the throughput goal (which pushes heap size up). And only then to have the throughput goal (which again pushes the heap size up) competing with the footprint goal (which, of course, pushes the heap size down).
A pause is a pause ...
We talked quite a bit about whether the pause time goal should apply to both the major and minor pause times. The issue was whether it would be effective to shrink the size of the old generation to reduce the major pause times. With a young generation collection you can shrink the heap more easily because there is always some place to put any live objects in the young generation (namely into the old generation). It was clear that reducing the young generation size would reduce the minor collection times (after you've paid the cost of getting the collection started and shutting it down). Well, that's true if you can ignore the fact that more frequent collections give objects less time to die. With the old generation it was much less obvious what would happen. The old generation can only be shunk down to a size big enough to hold all the live data in the old generation. Also the amount of free space in the old generation has an effect on the young generation collection in that young generation collection may need to copy objects into the old generation. In the end we decided that trying to limit both the major pauses and minor pauses with the pause time goal, while harder was more meaningful. Would you have accepted the excuse "Yes, we missed the goal but it was a major collection pause not a minor collection pause".
System.gc()'s. Just ignore them.
During development I initially tried to include the costs of System.gc()'s in the calculation of the averages used by GC ergonomics. In calculating the cost of collections the frequency of collections matters. If you are having collections more often then the cost of GC is higher. The strategy to reduce that cost is to increase the size of the heap so that collections are less frequent (i.e., since the heap is larger you can do more allocations before having to do another collection). The difficulty with System.gc()'s is that increasing the size of the heap does not in general increase the time between System.gc()'s. I tried to finesse the cost of a System.gc() by considering how full the heap was when the System.gc() happened and extrapolating to how long the interval between collections would have been. After some experimentation I found that picking how to do the extrapolation was basically picking the answer (i.e., what the GC cost would have been). I could tailor an extrapolation to fit one application, but invariably it did not fit some other applications. Basically it was too hard. So GC ergonomics ignores System.gc()'s.