Jon Masamitsu's Weblog

  • Java
    January 26, 2006

Why not a Grand Unified Garbage Collector?

Guest Author
At last count we have three garbage collectors.

  • the parallel collector

  • the low pause collector

  • the serial collector

    Why is there more than one? Actually, the usual answer applies. Specialization often
    results in
    better performance. If you're interested in more particulars about our
    garbage collectors, read on.

    All three collectors are generational (young generation and tenured generation).

    Let's do the easy comparison first, why a parallel collector and a serial
    collector. Parallelism has overhead. Nuf said? Yeah, I used
    to read comic books when I was a kid. If you don't understand that reference, ignore
    it. I'm just older.

    As you might infer from the names, the serial collector uses 1 thread to do the
    GC work and the parallel collector uses multiple threads to do the same. As usual
    multiple threads doing the same tasks have to synchronize. That's pretty much it.
    On a single cpu machine the additional cost of the synchronization means that the
    parallel collector is slower than the serial collector. On a two cpu machine and
    a VM that has a small heap the parallel collector is as about fast as the serial
    collector. With two cpu's and large heaps the parallel collector will usually do
    better. We keep asking ourselves if we can get rid of the serial collector and
    use the parallel collector in its place. The answer so far keeps coming back no.

    More interesting is the case of the low pause collector versus the parallel collector.
    Above I made the remark about specialization and better performance. This is actually
    a case of more complexity and lesser performance.
    These two collectors do the collection of the young generation using almost the exact same
    techniques The differences in the collectors have to do with the collections of the
    tenured generation. The low pause collector does parts of that collection
    while the application continues to run. One way to do that
    is to not move the live objects when collecting the dead objects. The application tends
    to get confused if the objects it is using move around while the application is running.
    The other two collectors
    compact the heap during a collection of the tenured generation (i.e., live objects are
    moved so as to occupy one contiguous region of the heap). The low pause collector collects
    the dead objects and coalesces their space into blocks that are kept
    in free lists. Maintaining free lists and doing allocations from them takes effort so
    it's slower than having a heap that is compacted. Having applications run while a
    collection is happening means that new objects can be allocated during a collection.
    That leads to so more complexity. Also the collection of the tenured generation can
    be interrupted for a collection of the young generation. More complexity still. The bottom line
    is that the low pause collector has shorter GC pauses but it costs performance.
    That performance difference is not huge but it's large enough to keep us from ditching
    the parallel collector and always using the low pause collector.

    And last but not least, can we replace the serial collector with the low pause
    collector? Very tempting. The serial collector is used by default on desktop machines.
    We expect those to have 1 or 2 cpus and to be running applications that need 10's
    of megabytes of Java heap as opposed to 100's of megabytes. With small heaps the differences
    in collection times tend to make less difference. Even if the low pause collector was
    10% slower than the serial collector, the difference between, for example, 70 ms
    and 77 ms often isn't large enough to matter. It would probably be a done deal except
    that the low pause collector has a larger memory footprint. It has additional data
    structures that it uses (for example to keep track of what references are being
    changed by the the running application while a collection is on going). It also usually
    needs a larger Java heap to run an application. Recall that the low pause collector
    uses free lists to keep track of the available space in the heap. Fragmentation
    can become a problem. The best bet is that we'll replace the serial collector with
    the low pause collector some day but not just yet.

  • Be the first to comment

    Comments ( 0 )
    Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.