X

Jon Masamitsu's Weblog

  • Java |
    March 6, 2006

When the Sum of the Parts

doesn't equal a big enough hole.

Did I mention that the low pause collector maintains free lists
for the space available in the tenured generation and
that fragmentation can become a problem? If you're using the low pause collector and things are
going just peachy for days and days and then there is a huge (relatively speaking) pause,
the cause may be fragmentation in the tenured generation.

In 1.4.2 and older releases in order to do a young generation collection
there was a requirement that there be a contiguous chunk of
free space in the tenured generation that was big enough to hold
all the the young generation. In the GC tuning documents at

http://java.sun.com/docs/hotspot/

this is referred to as the young generation guarantee. Basically
during a young generation collection, any data that survives may have to be
promoted into the tenured generation and we just don't know how much is going to
survive. Being our usual conservative selves we assumed all of it would survive and
so there needed to be room in the tenured generation for all of it. How does this
cause a big pause? If the young generation is full and needs to be collected but
there is not enough room in the tenured generation, then a full collection of
both the young generation and the tenured generations are done. And this collection
is a stop-the-world collection not a concurrent collection so you generally see a
pause much longer than you want to. By the way this full
collection is also a compacting collection so there is no fragmentation at the
end of the full collection.

In 5.0 we added the ability in the low pause collector to start a young
generation collection and then to back out of it if there was not enough
space in the tenured generation. Being able to backout of a young generation
collection allowed us to make a couple of changes.
We now keep an average of the amount of space
that is used for promotions and use that (with some appropriate
padding to be on the safe side) as the requirement for the space
needed in the tenured generation. Additionally we no longer need
a single contiguous chunk of space for the promotions so we look at the total
amount of free space in the tenured generation in deciding if we can
do a young generation collection. Not having to have a single contiguous chunk of space to support
promotions is where fragmentation comes in (or rather where it doesn't come in as often).
Yes, sometimes using the averages for the
amount promoted and the total amount of free in the tenured generation tells us to
go ahead and do a young generation collection and we get surprised (there really isn't enough
space in tenured generation). In that situation we have to back out of the
young generation collection. It's expensive to back out of a collection, but it's doable.
That's a very long way of saying that fragmentation is less of
a problem in 5.0. It still occurs, but we have better ways of dealling with it.

What should you do if you run into a fragmentation problem?

Try 5.0.

Or you could try a larger total heap and/or smaller young generation.
If your application is on the edge, it might give you just enough
extra space to fit all your live data. But often it just delays the problem.

Or you can try to make you application do a full, compacting collection
at a time which will not disturb your users.
If your application can go for a day without hitting a
fragmentation problem, try a System.gc() in the middle of the
night. That will compact the heap and you can hopefully go
another day without hitting the fragmentation problem. Clearly no help for an
application that does not have a logical "middle of the night".


Or if by chance most
of the data in the tenured generation is read in when your application
first starts up and you can do a System.gc() after you complete
initialization, that might help by compacting all data into
a single chunk leaving the rest of the tenured generation
available for promotions. Depending on the allocation pattern
of the application, that might be adequate.

Or you might want to start the
concurrent collections earlier. The low pause collector tries to start a concurrent
collection just in time (with some safety factor) to collect the
tenured generation before it is full. If you are doing concurrent
collections and freeing enough space, you can try starting a concurrent collection sooner so that
it finishes before the fragmentation becomes a problem.
The concurrent collections don't do a compaction, but they do
coalese adjacent free blocks so larger chunks of free space
can result from a concurrent collection.
One of the triggers for starting a concurrent collection is the amount
of free space in the tenured generation.
You can cause a concurrent collection to
occur early by setting the
option -XX:CMSInitiatingOccupancyFraction= where NNN is the
percentage of the tenured generation that is in use above which a
concurrent collection is started. This will increase the overall time you spend
doing GC but may avoid the fragmentation problem. And this will be more
effective with 5.0 because a single contiguous chunk of space is not required
for promotions.

By the way, I've increased the comment period for my blogs. I hadn't realized it was so short.

Join the discussion

Comments ( 4 )
  • Rob Eden Monday, March 6, 2006
    Do these type of issues also exist with parallel GC or are they unique to concurrent GC?
  • Jon Masamiuts Monday, March 6, 2006
    The issue of fragmentation is limited to the low pause (concurrent) collector. The throughput (parallel) collector does a compacting collection and the type of fragmentation discussed in this entry is not an issue. There are some extreme cases where objects are large enough and the generations are small enough to cause fragmentation in the young generation or the tenured generation (e.g. 1 object fills enough of the generation so another won't fit) but that really extreme.

    My apologies if comments on this entry were closed for a bit. I thought I did it right, but I obviously pushed the wrong button. And apologies in advance if I still didn't get it right. I'm trying.

  • Brice Beard Tuesday, March 7, 2006
    Could you give a perf comparison between the normal promotion from new to old and using the CMS free list in best/worst case scenarios ?
  • Jon Masamuits Wednesday, March 8, 2006
    I assume that by "normal promotion" you mean promotions done by one of the compacting collectors. Also I don't have numbers for the actual difference in the time spent doing just promotions. What I have looked at is the higher cost of a young generation collection when the tenured generation is collected with a compacting collector versus CMS.
    With the caveat that I have not measured this rigorously, in the best case the young generation collection cost for CMS is 10% to 15% higher. On average I expect the cost to be 20%-25% more with CMS. I don't have a number for the worst case. Much of
    this extra cost has to do with keeping track of objects that have been newly (during the current collection) promoted. During a young generation collection, the collector finds live objects and then moves them (either to elsewhere in the young generation or into the tenured generation). Objects that are moved still have to scanned to find references to other objects. An object that is promoted (in the CMS case) can end up anywhere in the tenured generation because allocation is from a free list. There is a mechanism for keeping track of these promoted objects and that takes much of the extra time and some additonal space.
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
Oracle

Integrated Cloud Applications & Platform Services