When the Sum of the Parts

doesn't equal a big enough hole.

Did I mention that the low pause collector maintains free lists for the space available in the tenured generation and that fragmentation can become a problem? If you're using the low pause collector and things are going just peachy for days and days and then there is a huge (relatively speaking) pause, the cause may be fragmentation in the tenured generation.

In 1.4.2 and older releases in order to do a young generation collection there was a requirement that there be a contiguous chunk of free space in the tenured generation that was big enough to hold all the the young generation. In the GC tuning documents at

http://java.sun.com/docs/hotspot/

this is referred to as the young generation guarantee. Basically during a young generation collection, any data that survives may have to be promoted into the tenured generation and we just don't know how much is going to survive. Being our usual conservative selves we assumed all of it would survive and so there needed to be room in the tenured generation for all of it. How does this cause a big pause? If the young generation is full and needs to be collected but there is not enough room in the tenured generation, then a full collection of both the young generation and the tenured generations are done. And this collection is a stop-the-world collection not a concurrent collection so you generally see a pause much longer than you want to. By the way this full collection is also a compacting collection so there is no fragmentation at the end of the full collection.

In 5.0 we added the ability in the low pause collector to start a young generation collection and then to back out of it if there was not enough space in the tenured generation. Being able to backout of a young generation collection allowed us to make a couple of changes. We now keep an average of the amount of space that is used for promotions and use that (with some appropriate padding to be on the safe side) as the requirement for the space needed in the tenured generation. Additionally we no longer need a single contiguous chunk of space for the promotions so we look at the total amount of free space in the tenured generation in deciding if we can do a young generation collection. Not having to have a single contiguous chunk of space to support promotions is where fragmentation comes in (or rather where it doesn't come in as often). Yes, sometimes using the averages for the amount promoted and the total amount of free in the tenured generation tells us to go ahead and do a young generation collection and we get surprised (there really isn't enough space in tenured generation). In that situation we have to back out of the young generation collection. It's expensive to back out of a collection, but it's doable. That's a very long way of saying that fragmentation is less of a problem in 5.0. It still occurs, but we have better ways of dealling with it.

What should you do if you run into a fragmentation problem?

Try 5.0.

Or you could try a larger total heap and/or smaller young generation. If your application is on the edge, it might give you just enough extra space to fit all your live data. But often it just delays the problem.

Or you can try to make you application do a full, compacting collection at a time which will not disturb your users. If your application can go for a day without hitting a fragmentation problem, try a System.gc() in the middle of the night. That will compact the heap and you can hopefully go another day without hitting the fragmentation problem. Clearly no help for an application that does not have a logical "middle of the night".

Or if by chance most of the data in the tenured generation is read in when your application first starts up and you can do a System.gc() after you complete initialization, that might help by compacting all data into a single chunk leaving the rest of the tenured generation available for promotions. Depending on the allocation pattern of the application, that might be adequate.

Or you might want to start the concurrent collections earlier. The low pause collector tries to start a concurrent collection just in time (with some safety factor) to collect the tenured generation before it is full. If you are doing concurrent collections and freeing enough space, you can try starting a concurrent collection sooner so that it finishes before the fragmentation becomes a problem. The concurrent collections don't do a compaction, but they do coalese adjacent free blocks so larger chunks of free space can result from a concurrent collection. One of the triggers for starting a concurrent collection is the amount of free space in the tenured generation. You can cause a concurrent collection to occur early by setting the option -XX:CMSInitiatingOccupancyFraction= where NNN is the percentage of the tenured generation that is in use above which a concurrent collection is started. This will increase the overall time you spend doing GC but may avoid the fragmentation problem. And this will be more effective with 5.0 because a single contiguous chunk of space is not required for promotions.

By the way, I've increased the comment period for my blogs. I hadn't realized it was so short.

Comments:

Do these type of issues also exist with parallel GC or are they unique to concurrent GC?

Posted by Rob Eden on March 06, 2006 at 02:44 AM PST #

The issue of fragmentation is limited to the low pause (concurrent) collector. The throughput (parallel) collector does a compacting collection and the type of fragmentation discussed in this entry is not an issue. There are some extreme cases where objects are large enough and the generations are small enough to cause fragmentation in the young generation or the tenured generation (e.g. 1 object fills enough of the generation so another won't fit) but that really extreme.

My apologies if comments on this entry were closed for a bit. I thought I did it right, but I obviously pushed the wrong button. And apologies in advance if I still didn't get it right. I'm trying.

Posted by Jon Masamiuts on March 06, 2006 at 07:48 AM PST #

Could you give a perf comparison between the normal promotion from new to old and using the CMS free list in best/worst case scenarios ?

Posted by Brice Beard on March 06, 2006 at 05:17 PM PST #

I assume that by "normal promotion" you mean promotions done by one of the compacting collectors. Also I don't have numbers for the actual difference in the time spent doing just promotions. What I have looked at is the higher cost of a young generation collection when the tenured generation is collected with a compacting collector versus CMS. With the caveat that I have not measured this rigorously, in the best case the young generation collection cost for CMS is 10% to 15% higher. On average I expect the cost to be 20%-25% more with CMS. I don't have a number for the worst case. Much of this extra cost has to do with keeping track of objects that have been newly (during the current collection) promoted. During a young generation collection, the collector finds live objects and then moves them (either to elsewhere in the young generation or into the tenured generation). Objects that are moved still have to scanned to find references to other objects. An object that is promoted (in the CMS case) can end up anywhere in the tenured generation because allocation is from a free list. There is a mechanism for keeping track of these promoted objects and that takes much of the extra time and some additonal space.

Posted by Jon Masamuits on March 08, 2006 at 01:33 AM PST #

Post a Comment:
Comments are closed for this entry.
About

jonthecollector

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today