Jon Masamitsu's Weblog

  • Java
    June 22, 2007

Size Matters

Guest Author
I recently had an interesting exchange with a customer about concurrent mode
failures with the low-pause collector. My initial response was with my litany of whys
and wherefores of concurrent mode failures. But it got more interesting, because it
turned out to be an example of where object sizes and the young generation size made
an unexpected difference.

Before I get to that let's talk a little bit about the JVM's behavior with regard to
allocating large objects. Large, of course, is a relative term and what I mean here
is large relative to the size of the young generation. My comments here apply to
JDK 5 update 10 and later and JDK 1.4.2 update 13 and later. Earlier versions of those
JVM's had a bug in them (6369448) that made their behavior different.
All versions of JDK 6 are also free from this bug. By the way the customer was
on JDK 5 update 6 so was subject to 6369448. That made it a little more interesting
for me but may not be of interest to you if you are running on the later releases.
And also the policy I describe below does not apply to the throughput collector
(UseParallelGC) which has its own variation on this policy.

Objects are normally allocated out of the young generation and get moved to the
tenured generation as they age. But what happens when a object is larger than the
young generation (actually larger than the eden space in the young generation)?
If the application tries to allocate object AAA and there is not enough free space in
eden for the allocation, a garbage collection usually occurs. The
exception to this is if AAA is larger than eden. An outline of
the policy for allocation and collections follow.

  1. Allocate out of the young generation

  2. Possibly allocate out of the tenured generation

  3. Collect the young generation

  4. Possibly collect the tenured generation

  5. Heroics

If the allocation in 1. fails, before starting
a GC the JVM considers allocating AAA out of the tenured generation at 2.
The JVM compares the size of AAA to the current capacity of eden. By capacity I
mean the total size available in an empty eden.
If the capacity of eden is too small to hold AAA, then collecting the young generation
will not help with the allocation of AAA. So the JVM tries to allocate AAA directly
out of the tenured generation. If the allocation out of the tenured generation
succeeds, no young generation GC is done and the JVM just continues. If AAA is
smaller than eden, then the JVM proceeds to the
collection of the young generation
at 3. After the the young generation collection, a check is done at 4. to
see if enough space has been freed to allocate AAA in eden. If AAA still
cannot be allocated out of the young generation, then the tenured generation is collected
at 5. and an allocation attempt is made out of the tenured generation. If AAA cannot be
allocated in the tenured generation, some additional heroics are attempted (e.g.,
clearing all SoftReferences). Failing those heroics an out-of-memory is thrown.

A few things to note. The initial allocation at 1. is what we refer
to as fast-path allocation.
Fast-path allocation does not call into the JVM to allocate an object. The JIT
compilers know how to allocate out of the young generation and code for an allocation
is generated in-line for object allocation. The interpreter also knows how to do the
allocation without making a call to the VM. And yes, both know about thread local
allocation buffers (TLAB's) and use them. The allocation out of the tenured generation at
2. could be attempted unconditionally for any
sized object. That could avoid a collection of the young
generation, but would also defeat the purpose of having the young generation. We want
new objects allocated in the young generation because that's where they will probably
die. We only want the longed-lived objects to make it into the tenured generation.
Also an allocation at 2. is not a fast-path allocation so the execution of the application
could be significantly affected by too much allocation at 2. At 4. a check is made to see
if enough space has been freed in the young generation for AAA and if not the tenured
generation is collected. Why not attempt the allocation of AAA out of the tenured
generation before doing the collection? Attempting the allocation first would probably
work lots of the time, but there are pathological cases where allocations would
repeatedly be done at 4. If we called allocation at 2. slow-path allocation, we'd have
to call allocation at 4. slow-slow-path allocation. At 4. we've already stopped the world
and done a young generation collection. The pathological case would execute
very slowly. We know. We've tried variations that did the
slow-slow-path allocation first and the reports that we usually got was the the JVM was hung!
The JVM was not hung. Going slow-slow-path just made it seem so. Doing the collection
of the tenured generation actually also collects the young generation and doing that
avoids the pathological case in practice.

The final thing to note is that you don't want to be stuck using slow-path allocation
(allocation at 2.) It's really slow compared to fast-path allocation. So if you
have large objects (and only you would know), try to make your young generation
large enough to hold plenty of them (10's would be good, 100's would be better).
Or that you have very few of them.

The policy for the throughput collector differs from the above at 2. and 5. At 2.
the throughput collector will allocate an object out of the tenured generation if
it is larger than half of eden. That variation tries to not fill up the
young generation with large objects. Is this better?
I just don't know. The heroics at 5. are also different.

The interesting exchange with the customer that got me thinking about this blog
started with this snippet of a GC log. Again this was with JDK 5 update 6 so the
bug 6369448 was in play.

963236.282: [GC 963236.282: [ParNew: 15343K->0K(21184K), 0.1025429 secs] 8289743K->8274557K(12287936K), 0.1029416 secs]
963236.479: [GC 963236.479: [ParNew: 10666K->0K(21184K), 0.1072986 secs] 963236.587: [CMS (concurrent mode failure): 8285141K->7092744K(12266752K), 91.3603721 secs] 8285224K->7092744K(12287936K), 91.4763098 secs]
963328.194: [GC 963328.194: [ParNew: 21120K->0K(21184K), 0.1455909 secs] 7135030K->7114033K(12287936K), 0.1459930 secs]
963328.434: [GC 963328.435: [ParNew: 7442K->0K(21184K), 0.0745429 secs] 963328.509: [CMS (concurrent mode failure): 7114084K->7073781K(12266752K), 78.1535121 secs] 7121475K->7073781K(12287936K), 78.2286852 secs]
963408.503: [GC 963408.503: [ParNew : 21120K->0K(21184K), 0.0651745 secs] 7116067K->7097487K(12287936K), 0.0656080 secs]

When I first looked at this log I jumped right to the first concurrent mode failure. The first thing to notice was that the available free space in the tenured generation was large (about 4g). The total space in the tenured generation was 12266752K and the occupancy
(amount of space currently being used)
of the tenured generation was only 8285141K (see the part of the log immediately following the first "concurrent mode failure" message). That's about 4g of space. With that much free space the only thing I could think of was a really bad case of fragmentation. Now a concurrent mode failure is bad for the low-pause collector. If it is due to fragmentation at least the resulting collection is a compacting collection and you should not have to worry about fragmentation for a while. So why is there another concurrent mode failure after just one minor collection? And there is even more free space in the tenured generation when this
second one occurred (about 5g). So now I'm sweating. I fortunately started noticing some other peculiarities in the log. For example the first minor collection starts when the young generation had only allocated 15343K out of a total of 21184K. That's not that unusual because the survivor space can be sizable and mostly empty. But the third collection is also a minor collection and the young generation has allocated 21120K by the time the minor collection starts (as does the fifth collection). Hmmm. Looking at the occupancies of the young generation at the point of the two concurrent mode failures shows the pattern: the young generation has a good chunk of free space but is still being collected. So I asked the customer if very large objects are being allocated, objects that are approaching the size of the young generation. To my relief the answer was yes. What follows now is my
interpretation of the logs. I didn't have the application to run and really debug
the behavior, but I think the pieces fit together.

The first collection (a minor collection) starts seemingly early because a large object is being allocated. The object is larger than the avalable space in
eden but is not larger than all of eden so the minor collection
does free up enough space to allocate that large object. The next collection
is started when an even larger object allocation is attempted.
This is where bug 6369448 comes into play.
If not for 6369448 the larger object would have been allocated out of the tenured
generation at 2. (the corrected case that I discussed above). There was an inconsistency
in the test at 2. and the check at 4. before the collection of the tenured generation.
A young generation is divided into an eden and two survivor spaces. The test at 2.
compared the object size to the total size of the young generation. The check at 4.
only checked the size (as is correct) against eden. So the short circuit at 2. was not
taken, the collection at 3. doesn't free up enough space, and a tenured generation
collection (concurrent mode failure) is the result.
The third collection is an uninteresting minor collection. The fourth collection again is a collection prompted by the allocation of an object too big to fit into eden so another concurrent mode failure happens. The fifth collection is also uninteresting as are the many minor collections that follow.

My story for the two concurrent mode failures does depend on the large objects fitting
in that gap in size between eden and the total young generation, but I'm willing to assume
that there was a mix of object sizes and occasionally there was something in the gap. The
two concurrent mode failure pattern above didn't happen often in the entire log and it was
a long log. And the customer increased the size of the young generation and these types
of concurrent mode failures did not reoccur.

I've always expected the occurrence of slow-path allocation to be a rare event. I think
it is rare, but that's not the same as never so I'm thinking about adding a counter to
be printed into the GC log to indicate how many slow-path allocations have occurred since
the last GC.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.