What Are We Thinking?
By jonthecollector on Dec 06, 2005
Just a friendly warning. This one verges on GC stream-of-consciousness ramblings.
GC ergonomics has been implemented so far in the throughput collector only. We've been thinking about how to extend it to the low pause collector. The low pause collector currently is implemented as a collector that does some of it's work while the application continues to run. It's described in
Some of the policies we used in the throughput collector will also be useful for the low pause collector, but because the low pause collector can be running at the same time as the application, there are some intriguing differences. By the way the low pause collector does completely stop the application in order to do some parts of the collection so some of our experience with the throughput collector is directly applicable. On the other hand having this mix of behaviors can be interesting in and of itself.
When we were developing the low pause collector we decided that any parts of the collection that we could do while the application continued to run was good. It was free. If there are spare cycles on the machine, that's almost true. If there aren't spare cycles, then it can get fuzzy. If the collection steals cycles that the application could use, then there is a cost. Especially if there is only one processor on the machine. If there are more than one processor on the machine and I'm doing GC, am I stealing cycles from the application? If I steal cycles from another process on the machine, does it become free again? We've been thinking about how to assess the load on a machine and what we should do in different load situations. That type of information may turn out to be input for GC ergonomics.
Another aspect that we have to deal with is the connection between the young generation size and the tenured generation pause times (pauses in collecting the tenured generation, that is). When collecting the tenured generation, we need to be aware of objects in the young generation that can be referencing (and thus keeping alive) objects in the tenured generation. In fact we have to find those objects in the young generation. And the larger the young generation is the longer it takes to find those objects. With the throughput collector the times to collect the tenured generation is only distantly related to the size of the young generation. With the low pause collector the connection is stronger. If we're trying to meet a pause time goal for a pause that is part of the tenured generation collection, then maybe we should reduce the size of the young generation as well as reduce the size of the tenured generation. But maybe not.
With the throughput collector a collection is started when the application attempts to allocate an object and there is no room left in the Java heap. With the low pause collector we want the collection to finish before we run out of room in the Java heap. So when does the low pause collector start a collection of the tenured generation? Just In Time, hopefully. Starting too early means that some of the capacity of the tenured generation is not used. Starting too late makes the low pause collector not a low pause collector. In the 5.0 release we did some good work to measure how quickly the tenured generation was being filled and used that to decide when to start a collection. It's a nice self contained problem as long as we can start a collection early enough. But if we cannot start a collection in time then we probably need a larger tenured generation. So a failure to JIT/GC needs to feed into GC ergonomics decisions. Well, really we don't actually want to fail to JIT/GC before we expand the tenured generation so there's more to think about. But not right now.