So What's It Worth to You?

Have you ever gone to your favorite restaurant, had a great meal and sat back with your second cup of coffee and asked "Why can't those guys do a pausless garbage collector? How hard can it be?"

Well, you're not the only one. That question came up around here recently.

There are techniques for doing pauseless garbage collection. The problem is that it costs in terms of application performance or additional Java heap size. Threads in an application read and write objects. The garbage collector reads and writes (i.e., frees or moves) objects. All those reads and writes have to have some coordination. "Pauseless" GC very often means that the application as a whole is not paused, but individual threads are paused one at a time. Imagine two application threads that are furiously playing hot potato with an object. As soon as thread 1 passes the object A to thread 2, thread 1 forgets about A and thread 2 realizes it has A. Then thread 2 immediately passes it back to thread 1 with the analogous result. A pauseless collector pauses thread 1 to look at what objects it is holding, then restarts thread 1 (so only one thread is paused at a time) and pauses thread 2 to examine it. Did object A get passed from thread 2 to thread 1 in that window during which both were executing? If it did and that is the only reference to object A, then the collector will not know that object A is live without doing more work.

I'm going to skip the details here. The point of this note is not to delve into the particular difficulties of pauseless GC, but rather to give you an idea of why it can be more costly.

Also I'm going to consider collectors that compact objects (i.e., move them). Mostly because it's an easier example to talk about. Some collectors don't move objects during a collections. Not moving objects simplifies some aspects of a collection but has its own complications (e.g., maintaining free lists of objects, splitting and coalescing blocks of free space, and fragmentation).

If the collector does compact the objects, it has to be sure that all references to live objects point to their new locations. A common way to catch object A when it has slipped through the window is to use what is fondly referred to as a read barrier. Any read of an object by the application goes through code (the read barrier) that looks at the object and decides whether something special has to be done. The "something special" depends on the particulars of the collector. Without getting into those particulars just having to do extra stuff on every read has got to hurt. Yes, I'm really over simplifying this example, but what can you expect from my really simple mind.

So what's pauseless GC worth to you (i.e., in terms of the extra space and performance costs, the extra complexity, maybe special hardware)? It's definitely worth a bunch to some. But for many it isn't really necessary to have truly pauseless GC. Shorter is good enough.

Why can't those guys do a pauseless garbage collector?

Would "pauseless garbage collection" give us the biggest bang for the buck (i.e., most benefits to most user)? When we were deciding what to do for the next release, we asked ourselves that question. And we decided there were things we should do before "pauseless GC". Did I mention that I recently worked on the parallel collection of the tenured generation for the throughput collector? I'll tell you about it some time.


Hi Jon, Do you mind explaining again a previous article ? thanks, brice I'm missing something. In the text below: -a thread gets 50 TLAB's during a epoch: is that new TLAB // or free TLAB // the same TLAB ? - 50 TLAB per thread, half used in an epoch, 1/2 a TLAB not used ? don't get the calculation Each thread starts with a small TLAB. Between the end of the last young generation collection and the start of the next (let me call that period an epoch), we keep track of the number of TLAB's a thread has used. We also know the size of the TLAB's for each thread. Averages for each thread are maintained for these two numbers (number and size of TLAB's). These averages are weighted toward the most recent epoch. Based on these averages the sizes of the TLAB's are adjusted so that a thread gets 50 TLAB's during a epoch. Why 50? All things being equal we figured that a thread would have used half its TLAB by the end of the epoch. Per thread that gave us 1/2 a TLAB not used (out of the magic 50) for a wastage of 1%. That seemed acceptable. Also if the young generation was not large enough to provide the desired TLAB's, the size of the young generation would be increased in order to make it so (within the usual heap size constraints, of course).

Posted by Brice Beard on February 15, 2006 at 05:29 PM PST #

We decided against Java because the business logic inside a JVM is not able to garantee ansers within 250 milli seconds. This is a very long time for moderne computers but Java does not match the requirement only because of the garbage collector. And the worse thing is the more you cache the longer you wait. The pause time depends at least O(n\^2) on the number of objects. This is inacceptable for many server processes. And what really bugs me is that you do not have a choice. There is a trade off between answering time and performance, indeed. Fore some applications I might want to go with lower performance and predictable ansering times, but SUN offers no choice. BEA has announced a real time application server somethimes this year. They may have spend more time making it possible than discussing its drawbacks, let's see.

Posted by Haug Bürger on February 20, 2006 at 12:05 AM PST #


The target is for a thread to get 50 new TLAB's per epoch. A thread gets a TLAB, allocates out of it until it is full, and then gets a new one.

The expectation is to leave 1/2 of 1 TLAB unused per epoch. Sorry that was not clear. It probably should have said "used half its last TLAB".

Posted by Jon Masamitsu on February 21, 2006 at 04:46 AM PST #


If you are looking for a guarantee, take a look at the Sun's real-time system.

My reading of the BEA product is that it offers to satisfy a pause time goal a large fraction of the time for a given heap size and on certain platforms. Yes, it will be interesting to see what comes out. For customers that want lower pause times Sun offers a collector that does parts of the collection of the tenured generation concurrently. The young generation is still collected with a stop-the-world collector, but in my experience it's the tenured generation collection pauses that are the longer pauses. We've worked with several telecommunications applications where this lower pause collector has worked well. Add the command line flag -XX:+UseConcMarkSweepGC to get this lower pause collector. More information for it can be found at

in one of the "Tuning Garbage Collection ..." documents.

Posted by Jon Masamitus on February 21, 2006 at 05:28 AM PST #

Post a Comment:
Comments are closed for this entry.



« July 2016