So What's It Worth to You?
By jonthecollector on Feb 15, 2006
Well, you're not the only one. That question came up around here recently.
There are techniques for doing pauseless garbage collection. The problem is that it costs in terms of application performance or additional Java heap size. Threads in an application read and write objects. The garbage collector reads and writes (i.e., frees or moves) objects. All those reads and writes have to have some coordination. "Pauseless" GC very often means that the application as a whole is not paused, but individual threads are paused one at a time. Imagine two application threads that are furiously playing hot potato with an object. As soon as thread 1 passes the object A to thread 2, thread 1 forgets about A and thread 2 realizes it has A. Then thread 2 immediately passes it back to thread 1 with the analogous result. A pauseless collector pauses thread 1 to look at what objects it is holding, then restarts thread 1 (so only one thread is paused at a time) and pauses thread 2 to examine it. Did object A get passed from thread 2 to thread 1 in that window during which both were executing? If it did and that is the only reference to object A, then the collector will not know that object A is live without doing more work.
I'm going to skip the details here. The point of this note is not to delve into the particular difficulties of pauseless GC, but rather to give you an idea of why it can be more costly.
Also I'm going to consider collectors that compact objects (i.e., move them). Mostly because it's an easier example to talk about. Some collectors don't move objects during a collections. Not moving objects simplifies some aspects of a collection but has its own complications (e.g., maintaining free lists of objects, splitting and coalescing blocks of free space, and fragmentation).
If the collector does compact the objects, it has to be sure that all references to live objects point to their new locations. A common way to catch object A when it has slipped through the window is to use what is fondly referred to as a read barrier. Any read of an object by the application goes through code (the read barrier) that looks at the object and decides whether something special has to be done. The "something special" depends on the particulars of the collector. Without getting into those particulars just having to do extra stuff on every read has got to hurt. Yes, I'm really over simplifying this example, but what can you expect from my really simple mind.
So what's pauseless GC worth to you (i.e., in terms of the extra space and performance costs, the extra complexity, maybe special hardware)? It's definitely worth a bunch to some. But for many it isn't really necessary to have truly pauseless GC. Shorter is good enough.
Why can't those guys do a pauseless garbage collector?
Would "pauseless garbage collection" give us the biggest bang for the buck (i.e., most benefits to most user)? When we were deciding what to do for the next release, we asked ourselves that question. And we decided there were things we should do before "pauseless GC". Did I mention that I recently worked on the parallel collection of the tenured generation for the throughput collector? I'll tell you about it some time.