Tuesday Oct 27, 2015

JVM hang with CMS collector

This blog post was long due but better late than never. Sometime back I worked on an interesting GC problem which initially looked like an issue with the JVM runtime system. I am talking about 'JDK-8085965: VM hangs in C2Compiler'. This bug has been fixed in JDK 9 and 8u45+. In this post, I want to share interesting symptoms of this bug, and the troubleshooting steps taken that helped in getting to the root of the problem.

This report came in as a JVM hang when running with the CMS collector. Looking at the core file revealed that the JVM was in the '_synchronizing' state at that point.

(gdb) print SafepointSynchronize::_state
$1 = SafepointSynchronize::_synchronizing
(gdb) print SafepointSynchronize::_waiting_to_block
$2 = 1

It was waiting for one thread to come to safe-point state and block.

And the only thread running at that time was the Compiler thread. Stack trace of the Compiler thread:
(gdb) where
#0  Dependencies::find_finalizable_subclass (k=0x101047428)
#1  0x00007fd6804a38ee in ciInstanceKlass::has_finalizable_subclass (this=0x7fd67913e950)
#2  0x00007fd6809c28fb in Parse::call_register_finalizer (this=0x7fd6395405a0)
#3  0x00007fd6809c2fb9 in Parse::return_current (this=0x101047428, value=0x0)
#4  0x00007fd6809cf526 in Parse::do_one_bytecode (this=0x101047428)    
#5  0x00007fd6809c17d8 in Parse::do_one_block (this=0x7fd6395405a0)
#6  0x00007fd6809c1a57 in Parse::do_all_blocks (this=0x7fd6395405a0)
#7  0x00007fd6809c5fe9 in Parse::Parse (this=0x7fd6395405a0, caller=<optimized out>, parse_method=<optimized out>, expected_uses=5377)
#8  0x00007fd6804784e8 in ParseGenerator::generate (this=0x7fd63004dd60, jvms=0x7fd61c402f40)
#9  0x00007fd6805eea36 in Parse::do_call (this=0x7fd6395409c0)
#10 0x00007fd6809cf5aa in Parse::do_one_bytecode (this=0x101047428)
#11 0x00007fd6809c17d8 in Parse::do_one_block (this=0x7fd6395409c0)
#12 0x00007fd6809c1a57 in Parse::do_all_blocks (this=0x7fd6395409c0)
#13 0x00007fd6809c5fe9 in Parse::Parse (this=0x7fd6395409c0, caller=<optimized out>, parse_method=<optimized out>, expected_uses=5377)

The klass* passed to find_finalizable_subclass() was: 0x101047428

(gdb) info frame
Stack level 0, frame at 0x7fd639540240:
 rip = 0x7fd6805955a0 in Dependencies::find_finalizable_subclass

(gdb) print k
$1 = (Klass *) 0x101047428

(gdb) print *$1
$2 = {<Metadata> = {<MetaspaceObj> = {<No data fields>}, _vptr.Metadata =
0x7fd680fc9670 <vtable for InstanceKlass+16>}, _layout_helper = 16,
  _super_check_offset = 64, _name = 0x7fd618009850, _secondary_super_cache =
 0x100009608, _secondary_supers = 0x7fd668848a88, _primary_supers = {
    0x100000f28, 0x100009418, 0x1000097f0, 0x101047428, 0x0, 0x0, 0x0, 0x0},
 _java_mirror = 0xa77b4b40, _super = 0x1000097f0, _subklass = 0x0,
   _next_sibling = 0x101047428, _next_link = 0x0, _class_loader_data =
 0x7fd618003ab0, _modifier_flags = 1, _access_flags = {_flags = 536870913},
   _last_biased_lock_bulk_revocation_time = 0, _prototype_header = 0x5,
 _biased_lock_revocation_count = 0, _trace_id = 1685520384,
   _modified_oops = 0 '\000', _accumulated_modified_oops = 0 '\000',
 _shared_class_path_index = -1}
(gdb) x/s 0x7fd618009850+8
 0x7fd618009858: "sun/reflect/GeneratedMethodAccessor1389"

 _next_sibling = 0x101047428

The next_sibling in this case was the same as the klass itself. So, the compiler thread was stuck in the following loop:

 1187 Klass* Dependencies::find_finalizable_subclass(Klass* k) {
 1188   if (k->is_interface())  return NULL;
 1189   if (k->has_finalizer()) return k;
 1190   k = k->subklass();
 1191   while (k != NULL) {
 1192     Klass* result = find_finalizable_subclass(k);
 1193     if (result != NULL) return result;
 1194     k = k->next_sibling(); 
 1195   }
 1196   return NULL;
 1197 }

Then we did another run with the failing program using TraceClassLoading and TraceClassUnloading JVM options. From the core file of this run, found that VM was stuck while going over a circular siblings list. Note that the subklasses of a Klass are maintained using the _next_sibling field of its first _subklass.

Here's the klass and subklasses chain built from the core file:

super class sun/reflect/SerializationConstructorAccessorImpl - 0x1001acec0
   |--subklasses chain
0x1010eec30-sun/reflect/GeneratedSerializationConstructorAccessor232->
0x1010ef030-sun/reflect/GeneratedSerializationConstructorAccessor231->
0x1010ef430-sun/reflect/GeneratedSerializationConstructorAccessor230->
0x1010ef830-sun/reflect/GeneratedSerializationConstructorAccessor229->
0x1010efc30-sun/reflect/GeneratedSerializationConstructorAccessor228->
0x1010f0030-sun/reflect/GeneratedSerializationConstructorAccessor227->
0x1010f0430-sun/reflect/GeneratedSerializationConstructorAccessor226->
0x1010f0830-sun/reflect/GeneratedSerializationConstructorAccessor225->
0x1010f0c30- sun/reflect/GeneratedSerializationConstructorAccessor224->
0x1010f1030-sun/reflect/GeneratedSerializationConstructorAccessor223->
0x1010f1430-sun/reflect/GeneratedSerializationConstructorAccessor222->
0x1010f1830-sun/reflect/GeneratedSerializationConstructorAccessor221->
0x1010f1c30-sun/reflect/GeneratedSerializationConstructorAccessor220->
0x1010f2030-sun/reflect/GeneratedSerializationConstructorAccessor219->
0x1010f2430- sun/reflect/GeneratedSerializationConstructorAccessor218->
0x1010f2830-sun/reflect/GeneratedSerializationConstructorAccessor217->
0x10110b830-sun/reflect/GeneratedMethodAccessor94->
0x1010e8430-sun/reflect/GeneratedSerializationConstructorAccessor256->
0x1010e8830-sun/reflect/GeneratedSerializationConstructorAccessor255->
0x1010e8c30-sun/reflect/GeneratedSerializationConstructorAccessor254->
0x1010e9830-sun/reflect/GeneratedSerializationConstructorAccessor253->
0x1010e9c30-sun/reflect/GeneratedSerializationConstructorAccessor252->
0x1010ea030-sun/reflect/GeneratedSerializationConstructorAccessor251->
0x1010ea430-sun/reflect/GeneratedSerializationConstructorAccessor250->
0x1010ea830-sun/reflect/GeneratedSerializationConstructorAccessor249->
0x1010eac30-sun/reflect/GeneratedSerializationConstructorAccessor248->
0x1010eb030-sun/reflect/GeneratedSerializationConstructorAccessor247->
0x1010eb430-sun/reflect/GeneratedSerializationConstructorAccessor246->
0x1010eb830-sun/reflect/GeneratedSerializationConstructorAccessor245->
0x1010ebc30-sun/reflect/GeneratedSerializationConstructorAccessor244->
0x1010ec030-sun/reflect/GeneratedSerializationConstructorAccessor243->
0x1010ec430-sun/reflect/GeneratedSerializationConstructorAccessor242->
0x1010ec830-sun/reflect/GeneratedSerializationConstructorAccessor241->
0x1010ecc30-sun/reflect/GeneratedSerializationConstructorAccessor240->
0x1010ed030-sun/reflect/GeneratedSerializationConstructorAccessor239->
0x1010ed430-sun/reflect/GeneratedSerializationConstructorAccessor238->
0x1010ed830-sun/reflect/GeneratedSerializationConstructorAccessor237->
0x1010edc30-sun/reflect/GeneratedSerializationConstructorAccessor236->
0x1010ee030-sun/reflect/GeneratedSerializationConstructorAccessor235->
0x1010ee430-sun/reflect/GeneratedSerializationConstructorAccessor234->
0x1010ee830-sun/reflect/GeneratedSerializationConstructorAccessor233->
0x1010eec30-sun/reflect/GeneratedSerializationConstructorAccessor232

The GC logs generated with TraceClassLoading and TraceClassUnloading options had:

[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor216 0x000000010110b830]


So, GeneratedSerializationConstructorAccessor216 was unloaded from address 0x10110b830, and sun/reflect/GeneratedMethodAccessor94 got loaded at the same address. But the link to GeneratedSerializationConstructorAccessor216 in the siblings list was not updated after this class was unloaded, and that corrupted the subKlasses chain.

That made me look at the other JVM options used for this failing program. hs_err file showed that -Xnoclassgc was being used to turn off the class unloading but the logs indicated that classes were still getting unloaded. From that point, this mystery was not very hard to solve.

In JDK8, CMSClassUnloadingEnabled option which is used to control the class-unloading in CMS collector, was enabled by default. And this problem was occurring due to the conflicting values getting set for CMSClassUnloadingEnabled and ClassUnloading options. One phase of the class-unloading which is common to the other collectors too that updates subkalsses/siblings links is performed only when ClassUnloading option is enabled. So when -Xnoclassgc or -XX:-ClassUnloading are specified on the command line, due to this bug CMSClassUnloadingEnabled stays enabled and classes get unloaded but that common step gets skipped which corrupts the class hierarchy links leading to hangs or crashes.

The fix was simple - disable CMS class unloading options when -Xnoclassgc or -XX:-ClassUnloading are specified.

There is a similar bug open for G1 collector which is being worked and is targeted for JDK9:
    JDK-8114823: G1 doesn't honor request to disable class unloading


About

I share my work experiences as a JVM Sustaining Engineer through this blog

Search

Categories
Archives
« January 2017
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
    
       
Today