Jon Masamitsu's Weblog

  • Java
    February 15, 2006

So What's It Worth to You?

Guest Author
Have you ever gone to your favorite restaurant, had a great meal and sat back with your
second cup of coffee and asked "Why can't those guys do a pausless garbage collector?
How hard can it be?"

Well, you're not the only one. That question came up around here recently.

There are techniques for doing pauseless garbage collection. The problem is
that it costs in terms of application performance or additional Java heap size.
Threads in an application read and
write objects. The garbage collector reads and writes (i.e., frees or moves) objects.
All those reads and writes have to have some coordination. "Pauseless" GC very often means
that the application as a whole is not paused, but individual threads are paused one at a time.
Imagine two application threads
that are furiously playing hot potato with an object. As soon as thread 1 passes the object A to thread 2,
thread 1 forgets about A and thread 2 realizes it has A. Then thread 2 immediately passes it back
to thread 1 with the analogous result. A pauseless collector pauses thread 1 to look at what objects
it is holding, then restarts thread 1 (so only one thread is paused at a time) and pauses thread 2
to examine it. Did object A get passed from thread 2 to thread 1 in that window during which both
were executing? If it did and that is the only reference to object A, then the collector will not
know that object A is live without doing more work.

I'm going to skip the details here.
The point of this note is not to delve into the particular
difficulties of pauseless GC, but rather to give you an idea of why it can be more costly.

Also I'm going to consider collectors that compact objects (i.e., move them). Mostly because it's
an easier example to talk about. Some collectors don't move objects
during a collections. Not moving objects simplifies
some aspects of a collection but has its own complications (e.g., maintaining free
lists of objects, splitting and coalescing blocks of free space, and fragmentation).

If the collector does compact the objects, it has to be sure that all references
to live objects point to their new locations.
A common way to catch object A when it has slipped through the window is to use
what is fondly referred to as a read barrier. Any read of an object by the application goes
through code (the read barrier) that looks at the object and decides whether something special
has to be done. The "something special" depends on the particulars of the collector.
Without getting into those particulars just having to do extra stuff on every read has got to hurt.
Yes, I'm really over simplifying this example, but what can you expect from my really simple mind.

So what's pauseless GC worth to you (i.e., in terms of the extra space and performance costs,
the extra complexity, maybe special hardware)? It's definitely worth a bunch to some. But for
many it isn't really necessary to have truly pauseless GC. Shorter is good enough.

Why can't those guys do a pauseless garbage collector?

Would "pauseless garbage collection" give us the biggest bang for the buck (i.e., most
benefits to most user)? When we were deciding what
to do for the next release, we asked ourselves that question. And we decided there were things
we should do before "pauseless GC". Did I mention that I recently worked on the
parallel collection of the tenured generation for the throughput collector? I'll tell you about it some time.

Join the discussion

Comments ( 4 )
  • Brice Beard Thursday, February 16, 2006
    Hi Jon,
    Do you mind explaining again a previous article ?
    I'm missing something. In the text below:
    -a thread gets 50 TLAB's during a epoch: is that new TLAB // or free TLAB // the same TLAB ?
    - 50 TLAB per thread, half used in an epoch, 1/2 a TLAB not used ? don't get the calculation
    Each thread starts with a small TLAB. Between the end of the last young generation collection and the start of the next (let me call that period an epoch), we keep track of the number of TLAB's a thread has used. We also know the size of the TLAB's for each thread. Averages for each thread are maintained for these two numbers (number and size of TLAB's). These averages are weighted toward the most recent epoch. Based on these averages the sizes of the TLAB's are adjusted so that a thread gets 50 TLAB's during a epoch. Why 50? All things being equal we figured that a thread would have used half its TLAB by the end of the epoch. Per thread that gave us 1/2 a TLAB not used (out of the magic 50) for a wastage of 1%. That seemed acceptable. Also if the young generation was not large enough to provide the desired TLAB's, the size of the young generation would be increased in order to make it so (within the usual heap size constraints, of course).
  • Haug Bürger Monday, February 20, 2006
    We decided against Java because the business logic inside a JVM is not able to garantee ansers within 250 milli seconds. This is a very long time for moderne computers but Java does not match the requirement only because of the garbage collector. And the worse thing is the more you cache the longer you wait. The pause time depends at least O(n\^2) on the number of objects. This is inacceptable for many server processes.
    And what really bugs me is that you do not have a choice. There is a trade off between answering time and performance, indeed. Fore some applications I might want to go with lower performance and predictable ansering times, but SUN offers no choice.
    BEA has announced a real time application server somethimes this year. They may have spend more time making it possible than discussing its drawbacks, let's see.
  • Jon Masamitsu Tuesday, February 21, 2006

    The target is for a thread to get 50 new TLAB's per epoch. A thread gets a TLAB, allocates out of it until it is full, and then gets a new one.

    The expectation is to leave 1/2 of 1 TLAB unused per epoch. Sorry that was not clear. It probably should have said "used half its last TLAB".

  • Jon Masamitus Tuesday, February 21, 2006

    If you are looking for a guarantee, take a look at
    the Sun's real-time system.


    My reading of the BEA product is that it offers to satisfy a pause time goal a large fraction of the time for a given heap size and on certain platforms. Yes, it will be interesting to see what comes out. For customers that want lower pause times Sun offers a collector that does parts of the collection of the tenured generation concurrently. The young generation is still collected with a stop-the-world collector, but in my experience it's the tenured generation collection pauses that are the longer pauses. We've worked with several telecommunications applications where this lower pause collector has worked well. Add the command line flag -XX:+UseConcMarkSweepGC to get this lower pause collector. More information for it can be found at


    in one of the "Tuning Garbage Collection ..." documents.

Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.