Wednesday Nov 02, 2016

Hung JVM due to the threads stuck in pthread_cond_timedwait()

In this post, I am going to discuss a couple of scenarios where the Java process hangs are actually NOT because of a Java/JVM issue but happen due to the OS problems. Both the hangs being discussed here occur on the Linux operating system.

Lets take a look at the first case where a thread seems to be stuck in the Java Thread.sleep() call. From the process stack trace, the two threads of interest are:

Thread 26755: (state = IN_VM)
 - java.lang.Thread.interrupt0() @bci=0 (Interpreted frame)
 - java.lang.Thread.interrupt() @bci=51, line=852 (Interpreted frame)
 - oracle.core.ojdl.BufferedLogWriter.stopFlusher() @bci=17, line=380
(Interpreted frame)

Thread 21714: (state = BLOCKED)
 - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
 - oracle.core.ojdl.BufferedLogWriter$ @bci=30, line=401
(Interpreted frame)

Thread 26755 is attempting to interrupt thread 21714 which is executing Thread.sleep(). Thread 21714 stays in the BLOCKED state preventing other threads in the process from making any progress, and thus causing the process to hang.

Looking at their native stack traces:

Thread 26755:
#0  0xf77e7430 in __kernel_vsyscall ()
#1  0x4dcc31b9 in __lll_lock_wait () from /lib/
#2  0x4dcbe550 in _L_lock_677 () from /lib/
#3  0x4dcbe421 in pthread_mutex_lock () from /lib/
#4  0x064ff034 in os::Linux::Event::unpark() ()
#5  0x064fc60a in os::interrupt(Thread*) ()

Thread 21714:
#0  0xf77e7430 in __kernel_vsyscall ()
#1  0x4dc0fc43 in __lll_lock_wait_private () from /lib/
#2  0x4db94348 in _L_lock_9835 () from /lib/
#3  0x4db91f27 in calloc () from /lib/
#4  0x4dcc0e1c in pthread_cond_timedwait@GLIBC_2.0 () from /lib/
#5  0x064fefa1 in os::Linux::Event::timedwait(timespec*) ()
#6  0x064fc2bf in os::sleep(Thread*, long long, bool) ()
#7  0x063d5d90 in JVM_Sleep ()

You might be wondering as to what was the value of time duration passed to pthread_cond_timedwait() function. It was 5 secs, and so after 5 seconds pthread_cond_timedwait() should have come out of its wait.

pthread_cond_timedwait() accepts 'pthread_cond_t* cond' and 'pthread_mutex_t* mutex' arguments that it uses to coordinate with the other threads ( Take a look at the cond and the mutex arguments:

(gdb) print *((pthread_cond_t*)0xad533ff0)
$3 = {__c_lock = {__status = 0, __spinlock = 17}, __c_waiting = 0xad534050}

(gdb) print *((pthread_mutex_t*)0xad533fd8)
$4 = {__m_reserved = 2, __m_count = 0, __m_owner = 0x54d2, __m_kind = 0, __m_lock = {__status = 1, __spinlock = 0}}
__m_owner = 0x54d2 = 21714

(gdb) print *$11->_osthread->_interrupt_event
$14 = {<CHeapObj> = {<No data fields>}, _count = 0, _nParked = 1,
cachePad = {-2.3622328335110874e-90, 4.2439915835115547e-313,
    1.4480588116526359e-314, 4.2439915866735748e-313}, _mutex =
{{__m_reserved = 2, __m_count = 0, __m_owner = 0x54d2, __m_kind = 0,
__m_lock = {
        __status = 1, __spinlock = 0}}}, _cond = {{__c_lock = {__status
= 0, __spinlock = 17}, __c_waiting = 0xad534050}}, FreeNext = 0xbad, Immortal = 1}

From the above data, we can see that the sleeping thread owns the _mutex, and the interrupting thread is waiting to acquire that _mutex to signal the sleeping thread. If we look closely, the sleeping thread is actually waiting on a native heap lock to complete some allocation (in calloc) before it could release the _mutex.

So it turned out not to be a JVM issue but some problem with the lock synchronization at the linux kernel level, causing the thread to be stuck in pthread_cond_timedwait() for infinite amount of time.

Let's take look at another case where there were three threads stuck in __pthread_cond_timedwait(), stopping the progress of other threads in the process.

----------------- 11374 -----------------
0x00000032f1c0ba0e      __pthread_cond_timedwait + 0x13e
0x00007fb1453bbf94      _ZN13ObjectMonitor4waitElbP6Thread + 0x224
0x00007fb1454a0d83      _ZN18ObjectSynchronizer4waitE6HandlelP6Thread + 0x53
0x00007fb14520b34b      JVM_MonitorWait + 0x1fb
0x00007fb13d010eee      * java.lang.Object.wait(long) bci:0 (Interpreted frame)
0x00007fb13d005a82      * java.lang.Thread.join(long) bci:70 line:1214 (Interpreted frame)
0x00007fb13d005a82      * iasauto.execution.GetCfgRunDTE.execute() bci:864 line:504 (Interpreted frame)
0x00007fb13d005a82      * iasauto.execution.RunOneJob.execute() bci:1675 line:1521 (Interpreted frame)
0x00007fb13d005a82      * iasauto.execution.RunOneJob.main(java.lang.String[]) bci:18 line:58 (Interpreted frame)
0x00007fb13d000438      <StubRoutines>
0x00007fb14519e8d0      _ZN9JavaCalls11call_helperEP9JavaValueP12methodHandleP17JavaCallArgumentsP6Thread + 0x1e0
0x00007fb1453ca829      _ZN2os20os_exception_wrapperEPFvP9JavaValueP12methodHandleP17JavaCallArgumentsP6ThreadES1_S3_S5_S7_ + 0x19
0x00007fb14519e6e5      _ZN9JavaCalls4callEP9JavaValue12methodHandleP17JavaCallArgumentsP6Thread + 0x25
0x00007fb1451d99f7      _Z17jni_invoke_staticP7JNIEnv_P9JavaValueP8_jobject11JNICallTypeP10_jmethodIDP18JNI_ArgumentPusherP6Thread + 0x147
0x00007fb1451c753f      jni_CallStaticVoidMethod + 0x20f
0x0000000040002284      JavaMain + 0x2a4

----------------- 31490 -----------------
0x00000032f1c0ba0e      __pthread_cond_timedwait + 0x13e
0x00007fb1453bbf94      _ZN13ObjectMonitor4waitElbP6Thread + 0x224
0x00007fb1454a0d83      _ZN18ObjectSynchronizer4waitE6HandlelP6Thread + 0x53
0x00007fb14520b34b      JVM_MonitorWait + 0x1fb
0x00007fb13d010eee      * java.lang.Object.wait(long) bci:0 (Interpreted frame)
0x00007fb13d005a82      * java.lang.Thread.join(long) bci:70 line:1214 (Interpreted frame)
0x00007fb13d005a82      * iasauto.engine.LocalTaskProcess.execute() bci:70 line:206 (Interpreted frame)
0x00007fb13d005a82      * iasauto.engine.TaskProcess.executeLocal(boolean) bci:277 line:751 (Interpreted frame)
0x00007fb13d005a82      * bci:631 line:289 (Interpreted frame)
0x00007fb13d005f5c      * bci:11 line:682 (Interpreted frame)
0x00007fb13d000438      <StubRoutines>
0x00007fb14519e8d0      _ZN9JavaCalls11call_helperEP9JavaValueP12methodHandleP17JavaCallArgumentsP6Thread + 0x1e0
0x00007fb1453ca829      _ZN2os20os_exception_wrapperEPFvP9JavaValueP12methodHandleP17JavaCallArgumentsP6ThreadES1_S3_S5_S7_ + 0x19
0x00007fb14519e246      _ZN9JavaCalls12call_virtualEP9JavaValue11KlassHandle12symbolHandleS3_P17JavaCallArgumentsP6Thread + 0x116
0x00007fb14519e2c7      _ZN9JavaCalls12call_virtualEP9JavaValue6Handle11KlassHandle12symbolHandleS4_P6Thread + 0x47
0x00007fb145230c12      _Z12thread_entryP10JavaThreadP6Thread + 0xa2
0x00007fb1454d3401      _ZN10JavaThread3runEv + 0x121

----------------- 11681 -----------------
0x00000032f1c0ba0e      __pthread_cond_timedwait + 0x13e
0x00007fb1453bbf94      _ZN13ObjectMonitor4waitElbP6Thread + 0x224
0x00007fb1454a0d83      _ZN18ObjectSynchronizer4waitE6HandlelP6Thread + 0x53
0x00007fb14520b34b      JVM_MonitorWait + 0x1fb
0x00007fb13d010eee      * java.lang.Object.wait(long) bci:0 (Interpreted frame)
0x00007fb13d005a82      * java.util.TimerThread.mainLoop() bci:201 line:509 (Interpreted frame)
0x00007fb13d005a82      * bci:1 line:462 (Interpreted frame)
0x00007fb13d000438      <StubRoutines>
0x00007fb14519e8d0      _ZN9JavaCalls11call_helperEP9JavaValueP12methodHandleP17JavaCallArgumentsP6Thread + 0x1e0
0x00007fb1453ca829      _ZN2os20os_exception_wrapperEPFvP9JavaValueP12methodHandleP17JavaCallArgumentsP6ThreadES1_S3_S5_S7_ + 0x19
0x00007fb14519e246      _ZN9JavaCalls12call_virtualEP9JavaValue11KlassHandle12symbolHandleS3_P17JavaCallArgumentsP6Thread + 0x116
0x00007fb14519e2c7      _ZN9JavaCalls12call_virtualEP9JavaValue6Handle11KlassHandle12symbolHandleS4_P6Thread + 0x47
0x00007fb145230c12      _Z12thread_entryP10JavaThreadP6Thread + 0xa2
0x00007fb1454d3401      _ZN10JavaThread3runEv + 0x121
0x00007fb1453cbcdf      _Z10java_startP6Thread + 0x13f

From the investigations so far, we believe it to be a Linux Kernel issue, probably related to the fix done by the following commit.

To summarize, even when on the surface the Java process hangs look to be Java/JVM issues, they may not be. We should be looking closely at the native stack traces of the threads to understand what is really holding them up, and where they are stuck.

Thursday Oct 27, 2016

G1 and Flight Recorder's -XX:FlightRecorderOptions=stackdepth option

Recently there was a report from a customer that they observed performance degradation while using JFR with G1 when they used it along with -XX:FlightRecorderOptions=stackdepth=512, but didn't see any performance impact when using the same setting for the stackdepth with the Parallel Collector. The flight recordings revealed that the poor performance was due to the long pauses introduced by the JFRCheckpoint operations. The long JFRCheckpoints occurred only for the recordings started with -XX:FlightRecorderOptions=stackdepth=512. When tested with -XX:FlightRecorderOptions=stackdepth=3, there were no long pause JFRCheckpoint operations.

Lets look at the possible reasons for that. The time taken by the JFRCheckpoint operations is directly proportional to the amount of data that needs to be written to the disk. In G1, TLAB size is smaller, so in effect it generates more number of 'Allocation in TLAB' and 'Allocation outside TLAB' events. And if we increase the stack depth using 'stackdepth' option, in G1 we'd have much more stack traces data to be written to the disk as compared to the other collectors.

I ran a simple test where I started a GC intensive application first with the Parallel collector, and then with the G1 collector. For both of these tests, I started HotSpot Default recording, and then started a manual time-fixed recording with the profile settings. Observations:

1. The number of GC events created in case of G1 are much more than the parallel collector.
2. The TLABs are smaller in G1 and that leads to the generation of more allocation events in comparison to the parallel collector.
3. If we compare the size of the file written for a 2 minute profile recording with both the collectors, for G1 the size was 3M and for parallel collector it was 600K. This shows that the amount of data that gets written to the disk with G1 collector is more as compared to the ParallelGC, and that contributes towards longer JFRCheckpoint pauses.

To summarize, if you are using JFR with the G1 collector, my suggestion would be to either use the default value for the 'stackdepth' which is 64, or use an even lower value if you observe long JFRCheckpoint operations.

Wednesday Sep 28, 2016

Crashes in ZIP_GetEntry

Recently, received quite a few reports of the application crashes from different customers and product groups with the stack trace looking like this:

Stack: [0xb0c00000,0xb0c80000],  sp=0xb0c7c890,  free space=498k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  []  memcpy+0x2f8
C  []
C  []  ZIP_GetEntry+0xe4
C  []  Java_java_util_zip_ZipFile_getEntry+0xc4
j  java.util.jar.JarFile.getEntry(Ljava/lang/String;)Ljava/util/zip/ZipEntry;+2
j  java.util.jar.JarFile.getJarEntry(Ljava/lang/String;)Ljava/util/jar/JarEntry;+2

Most of the times, the crashes in ZIP_GetEntry occur when the jar file being accessed has been modified/overwritten while the JVM instance was running. For performance reasons, HotSpot JVM memory maps each Jar file's central directory structure using mmap. This is done so as to avoid reading the central directory structure data from the disk every time it needs to read entries from the Jar file. When a Jar file is modified or overwritten on the disk, the JVM's copy of the data read earlier becomes inconsistent with the jar file on the disk, and any attempt to read and load entries from the modified jar can result in an application crash.

Since 1.6.0_23, a property can be used to disable the memory mapping of the central directory structure of the Jar files:

Please note that enabling this property would have some performance impact on the application as the JVM needs to read the central directory structure from the Jar files on the disk again and again whenever it reads a Jar file entry. So, it is best to ensure that the jar files are not modified or overwritten while the JVM has an image of them loaded.

In Java 9, this ZIP crash has been resolved with the following enhancement:
JDK-8142508: To bring j.u.z.ZipFile's native implementation to Java to remove the expensive jni cost and mmap crash risk

The code review thread for this change is here:

This enhancement replaces the ZIP file native implementation that used mmap with the Java implementation, and that removes the risk of application crashes described above.

Tuesday Sep 27, 2016

Eager reclamation of Humongous objects with G1

G1 collector in JDK 8u40 and 8u60 got a very nice enhancement that might come as a surprise to users when they upgrade to either of these releases.

As we all know that the objects larger than half of the size of the regions are called Humongous objects, and are allocated in special regions called Humongous regions. Up until 8u40, the Humongous regions were not collected during Evacuation pauses, and were collected only at the end of the marking cycle or during Full GCs.

In 8u40, the following G1 enhancement was added:
JDK-8027959: Early reclamation of large objects in G1

This enabled the reclamation of Humongous objects that don't have any reference pointing into it. It added two new experimental JVM options:
  • G1ReclaimDeadHumongousObjectsAtYoungGC - Experimental option which is enabled by default. When enabled, G1 reclaims dead objects at every young GC.
  • G1TraceReclaimDeadHumongousObjectsAtYoungGC - Experimental option which is disabled by default. When enabled, G1 prints information about the collection of humongous objects at young GCs.
The above change however introduced a regression which is fixed in 8u60.
JDK-8069367: Eagerly reclaimed humongous objects left on mark stack

In 8u60, this enhancement was improved, and the early reclamation now also looks at the Humongous regions that could have a few incoming references. Upon inspection of those incoming references, if G1 finds those references to be stale/dead then that Humongous region is reclaimed. This work was done under the following enhancement request:
JDK-8048179: Early reclaim of large objects that are referenced by a few objects

This change renamed the JVM option G1ReclaimDeadHumongousObjectsAtYoungGC to G1EagerReclaimHumongousObjects, and G1TraceReclaimDeadHumongousObjectsAtYoungGC to G1TraceEagerReclaimHumongousObjects. It added a new JVM option G1EagerReclaimHumongousObjectsWithStaleRefs.
  • G1EagerReclaimHumongousObjectsWithStaleRefs - This is an Experimental option, and is enabled by default. When this option is enabled, G1 attempts to reclaim Humongous objects that may have a few stale incoming references, at every young GC.

Monday Feb 15, 2016

Using NMT with custom JVM launcher

I recently worked with a customer, facing problem in using Native Memory Tracker with their custom JVM launcher created using the JNI API JNI_CreateJavaVM, and learned how NMT can be enabled when used with the custom JVM launchers (non 'java' launchers). This blog post shares details on how that can be done.

Let's start with the following JNI code example where I am passing the JVM option -XX:+NativeMemoryTracking=detail as an argument while creating the JVM with
JNI_CreateJavaVM call. -XX:+NativeMemoryTracking=detail option should enable the Native Memory Tracker at the detail level.

JavaVM *javaVM;       
JNIEnv *env;       
// VM initialization arguments
JavaVMInitArgs vmArgs;
JavaVMOption* options = new JavaVMOption[1];
options[0].optionString = "-XX:+NativeMemoryTracking=detail";
vmArgs.version = JNI_VERSION_1_8;
vmArgs.nOptions = 1;
vmArgs.options = options;
vmArgs.ignoreUnrecognized = false;
//Create Java VM
JNI_CreateJavaVM(&javaVM, (void**)&env, &vmArgs);
delete options;


JAVA_INCLUDES=-I$(JAVA_HOME)/include/ -I$(JAVA_HOME)/include/$(JAVA_OS)/ -L$(JAVA_HOME)/jre/lib/amd64/server
g++ jnicode.cpp $(JAVA_INCLUDES) -ljvm

When this program is compiled using the above compile command, and executed, it reports the following error:

Java HotSpot(TM) 64-Bit Server VM warning: Native Memory Tracking did not setup properly, using wrong launcher?

The reason for this behavior is that the Native Memory Tracker requires an environment variable to be set for the process before the JVM can be initialized and created. That environment variable is NMT_LEVEL_<pid> where <pid> is the identifier of the process. Also, please note that since this environment variable should already be set before the JVM gets initialized, we need to dynamically load the JVM shared library at runtime after setting the env variable .

Here's the sample code showing how we can do that:

// env variable to be set for NMT
const char*  NMT_Env_Name    = "NMT_LEVEL_";
const int TOTAL_LEN = strlen(NMT_Env_Name)+20;
char *nmtEnv = (char*)calloc(TOTAL_LEN,1);
snprintf(nmtEnv,TOTAL_LEN, "%s%d", NMT_Env_Name, getpid());
// Set the env variable
setenv(nmtEnv, "detail", 1);

//Dynamically load
const char* jvmpath = "<path to>/";
void* libjvm = dlopen(jvmpath, RTLD_NOW + RTLD_GLOBAL);
if (libjvm == NULL) {
 printf("Could not open\n");
 return -1;

// VM initialization arguments
JavaVMInitArgs vmArgs;
JavaVMOption* options = new JavaVMOption[1];
options[0].optionString = "-XX:+NativeMemoryTracking=detail";
vmArgs.version = JNI_VERSION_1_8;
vmArgs.nOptions = 1;
vmArgs.options = options;
vmArgs.ignoreUnrecognized = false;

// get a handle to JNI_CreateJavaVM function in the dynamically loaded JVM
CreateJavaVM_t CreateJavaVM;
CreateJavaVM = (CreateJavaVM_t)dlsym(libjvm, "JNI_CreateJavaVM");
// Create the JVM
JavaVM *javaVM;
JNIEnv *jniEnv;
long flag = CreateJavaVM(&javaVM, (void**)&jniEnv, &vmArgs);
delete options;


g++ $(JAVA_INCLUDES) -ldl

In the above example code, I am setting env variable NMT_LEVEL_<pid> and dynamically loading after setting this env variable. Note that I removed -ljvm from the g++ options so as not to load the JVM shared library at the process load time. With these simple changes, NMT can be used with the custom JVM launchers.

Tuesday Dec 08, 2015

VM warning: PICL ( is missing

To increase the compiled code performance, following two fixes were integrated into JDK 8u40 and 7u80. With these changes, HotSpot uses PICL library available on Sparc Platform to get the CPU L2 cache line data size. This information can be used by the Compiler to optimize the generated compiled code and thus improve the runtime performance.

1. JDK-8056124: Hotspot should use PICL interface to get cacheline size on SPARC

2. JDK-8059200:Promoted JDK9 b31 for Solaris-amd64 fails (Error: dl failure on line 744, no picl library) on Solaris 11.1

This enhancement attempts to load the PICL library dynamically, and if the library is not found on the system it issues the following warning and continues with the execution of the process:

Java HotSpot(TM) Server VM warning: PICL ( is missing.
Performance will not be optimal.

Please note that this warning just indicates that the JVM could not load the PICL library so it won't be able to turn on some of the JIT Compiler optimizations that require the cache line data size information. It does not stop the Java process, or impact the process in any other way. Also, note that this is not a Java bug. The warning message is just intended to inform the user that the platform it is running on does not have the PICL library.

Tuesday Oct 27, 2015

JVM hang with CMS collector

This blog post was long due but better late than never. Sometime back I worked on an interesting GC problem which initially looked like an issue with the JVM runtime system. I am talking about 'JDK-8085965: VM hangs in C2Compiler'. This bug has been fixed in JDK 9 and 8u45+. In this post, I want to share interesting symptoms of this bug, and the troubleshooting steps taken that helped in getting to the root of the problem.

This report came in as a JVM hang when running with the CMS collector. Looking at the core file revealed that the JVM was in the '_synchronizing' state at that point.

(gdb) print SafepointSynchronize::_state
$1 = SafepointSynchronize::_synchronizing
(gdb) print SafepointSynchronize::_waiting_to_block
$2 = 1

It was waiting for one thread to come to safe-point state and block.

And the only thread running at that time was the Compiler thread. Stack trace of the Compiler thread:
(gdb) where
#0  Dependencies::find_finalizable_subclass (k=0x101047428)
#1  0x00007fd6804a38ee in ciInstanceKlass::has_finalizable_subclass (this=0x7fd67913e950)
#2  0x00007fd6809c28fb in Parse::call_register_finalizer (this=0x7fd6395405a0)
#3  0x00007fd6809c2fb9 in Parse::return_current (this=0x101047428, value=0x0)
#4  0x00007fd6809cf526 in Parse::do_one_bytecode (this=0x101047428)    
#5  0x00007fd6809c17d8 in Parse::do_one_block (this=0x7fd6395405a0)
#6  0x00007fd6809c1a57 in Parse::do_all_blocks (this=0x7fd6395405a0)
#7  0x00007fd6809c5fe9 in Parse::Parse (this=0x7fd6395405a0, caller=<optimized out>, parse_method=<optimized out>, expected_uses=5377)
#8  0x00007fd6804784e8 in ParseGenerator::generate (this=0x7fd63004dd60, jvms=0x7fd61c402f40)
#9  0x00007fd6805eea36 in Parse::do_call (this=0x7fd6395409c0)
#10 0x00007fd6809cf5aa in Parse::do_one_bytecode (this=0x101047428)
#11 0x00007fd6809c17d8 in Parse::do_one_block (this=0x7fd6395409c0)
#12 0x00007fd6809c1a57 in Parse::do_all_blocks (this=0x7fd6395409c0)
#13 0x00007fd6809c5fe9 in Parse::Parse (this=0x7fd6395409c0, caller=<optimized out>, parse_method=<optimized out>, expected_uses=5377)

The klass* passed to find_finalizable_subclass() was: 0x101047428

(gdb) info frame
Stack level 0, frame at 0x7fd639540240:
 rip = 0x7fd6805955a0 in Dependencies::find_finalizable_subclass

(gdb) print k
$1 = (Klass *) 0x101047428

(gdb) print *$1
$2 = {<Metadata> = {<MetaspaceObj> = {<No data fields>}, _vptr.Metadata =
0x7fd680fc9670 <vtable for InstanceKlass+16>}, _layout_helper = 16,
  _super_check_offset = 64, _name = 0x7fd618009850, _secondary_super_cache =
 0x100009608, _secondary_supers = 0x7fd668848a88, _primary_supers = {
    0x100000f28, 0x100009418, 0x1000097f0, 0x101047428, 0x0, 0x0, 0x0, 0x0},
 _java_mirror = 0xa77b4b40, _super = 0x1000097f0, _subklass = 0x0,
   _next_sibling = 0x101047428, _next_link = 0x0, _class_loader_data =
 0x7fd618003ab0, _modifier_flags = 1, _access_flags = {_flags = 536870913},
   _last_biased_lock_bulk_revocation_time = 0, _prototype_header = 0x5,
 _biased_lock_revocation_count = 0, _trace_id = 1685520384,
   _modified_oops = 0 '\000', _accumulated_modified_oops = 0 '\000',
 _shared_class_path_index = -1}
(gdb) x/s 0x7fd618009850+8
 0x7fd618009858: "sun/reflect/GeneratedMethodAccessor1389"

 _next_sibling = 0x101047428

The next_sibling in this case was the same as the klass itself. So, the compiler thread was stuck in the following loop:

 1187 Klass* Dependencies::find_finalizable_subclass(Klass* k) {
 1188   if (k->is_interface())  return NULL;
 1189   if (k->has_finalizer()) return k;
 1190   k = k->subklass();
 1191   while (k != NULL) {
 1192     Klass* result = find_finalizable_subclass(k);
 1193     if (result != NULL) return result;
 1194     k = k->next_sibling(); 
 1195   }
 1196   return NULL;
 1197 }

Then we did another run with the failing program using TraceClassLoading and TraceClassUnloading JVM options. From the core file of this run, found that VM was stuck while going over a circular siblings list. Note that the subklasses of a Klass are maintained using the _next_sibling field of its first _subklass.

Here's the klass and subklasses chain built from the core file:

super class sun/reflect/SerializationConstructorAccessorImpl - 0x1001acec0
   |--subklasses chain
0x1010f0c30- sun/reflect/GeneratedSerializationConstructorAccessor224->
0x1010f2430- sun/reflect/GeneratedSerializationConstructorAccessor218->

The GC logs generated with TraceClassLoading and TraceClassUnloading options had:

[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor216 0x000000010110b830]

So, GeneratedSerializationConstructorAccessor216 was unloaded from address 0x10110b830, and sun/reflect/GeneratedMethodAccessor94 got loaded at the same address. But the link to GeneratedSerializationConstructorAccessor216 in the siblings list was not updated after this class was unloaded, and that corrupted the subKlasses chain.

That made me look at the other JVM options used for this failing program. hs_err file showed that -Xnoclassgc was being used to turn off the class unloading but the logs indicated that classes were still getting unloaded. From that point, this mystery was not very hard to solve.

In JDK8, CMSClassUnloadingEnabled option which is used to control the class-unloading in CMS collector, was enabled by default. And this problem was occurring due to the conflicting values getting set for CMSClassUnloadingEnabled and ClassUnloading options. One phase of the class-unloading which is common to the other collectors too that updates subkalsses/siblings links is performed only when ClassUnloading option is enabled. So when -Xnoclassgc or -XX:-ClassUnloading are specified on the command line, due to this bug CMSClassUnloadingEnabled stays enabled and classes get unloaded but that common step gets skipped which corrupts the class hierarchy links leading to hangs or crashes.

The fix was simple - disable CMS class unloading options when -Xnoclassgc or -XX:-ClassUnloading are specified.

There is a similar bug open for G1 collector which is being worked and is targeted for JDK9:
    JDK-8114823: G1 doesn't honor request to disable class unloading

Thursday Sep 10, 2015

Why do I get message "CodeCache is full. Compiler has been disabled"?

JVM JIT generates compiled code and stores that in a memory area called CodeCache. The default maximum size of the CodeCache on most of the platforms is 48M. If any application needs to compile large number of methods resulting in huge amount of compiled code then this CodeCache may become full. When it becomes full, the compiler is disabled to stop any further compilations of methods, and a message like the following gets logged:

Java HotSpot(TM) 64-Bit Server VM warning: CodeCache is full. Compiler has been disabled.
Java HotSpot(TM) 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize=
Code Cache  [0xffffffff77400000, 0xffffffff7a390000, 0xffffffff7a400000) total_blobs=11659 nmethods=10690 adapters=882 free_code_cache=909Kb largest_free_block=502656

When this situation occurs, JVM may invoke sweeping and flushing of this space to make some room available in the CodeCache. There is a JVM option UseCodeCacheFlushing that can be used to control the flushing of the Codeache. With this option enabled JVM invokes an emergency flushing that discards older half of the compiled code(nmethods) to make space available in the CodeCache. In addition, it disables the compiler until the available free space becomes more than the configured CodeCacheMinimumFreeSpace. The default value of CodeCacheMinimumFreeSpace option is 500K.

UseCodeCacheFlushing is set to false by default in JDK6, and is enabled by default since JDK7u4. This essentially means that in jdk6 when the CodeCache becomes full, it is not swept and flushed and further compilations are disabled, and in jdk7u+, an emergency flushing is invoked when the CodeCache becomes full. Enabling this option by default made some issues related to the CodeCache flushing visible in jdk7u4+ releases. The following are two known problems in jdk7u4+ with respect to the CodeCache flushing:

1. The compiler may not get restarted even after the CodeCache occupancy drops down to almost half after the emergency flushing.
2. The emergency flushing may cause high CPU usage by the compiler threads leading to overall performance degradation.

This performance issue, and the problem of the compiler not getting re-enabled again has been addressed in JDK8. To workaround these in JDK7u4+, we can increase the code cache size using ReservedCodeCacheSize option by setting it to a value larger than the compiled-code footprint so that the CodeCache never becomes full. Another solution to this is to disable the CodeCache Flushing using -XX:-UseCodeCacheFlushing JVM option.

The above mentioned issues have been fixed in JDK8 and its updates. Here's the list of the bugs that have been fixed in jdk8 and 8u20 that address these problems:
    * JDK-8006952: CodeCacheFlushing degenerates VM with excessive codecache freelist iteration (fixed in 8)
    * JDK-8012547: Code cache flushing can get stuck without completing reclamation of memory (fixed in 8)
    * JDK-8020151: PSR:PERF Large performance regressions when code cache is filled (fixed in 8)
    * JDK-8029091 Bug in calculation of code cache sweeping interval (fixed in 8u20)
    * 8027593: performance drop with constrained codecache starting with hs25 b111 (fixed in 8)

Tuesday Apr 21, 2015

Throughput collector and PrintTenuringDistribution option

This is going to be a very short blog post. The inspiration for this post is a customer report that the PrintTenuringDistribution option does not show age information with the throughput collector (Parallel or Parallel Old GC).

2.798: [GC
Desired survivor size 11534336 bytes, new threshold 7 (max 15)
[PSYoungGen: 67584K->9802K(78848K)] 67584K->9810K(254976K), 0.0467029 secs]
[Times: user=0.07 sys=0.02, real=0.05 secs]
3.604: [GC
Desired survivor size 11534336 bytes, new threshold 7 (max 15)
[PSYoungGen: 77386K->11156K(146432K)] 77394K->11236K(322560K), 0.0436806 secs]

Well, this is the expected behavior and is not a bug. Throughput collector does not use the age table like the other collectors do. And due to that reason, the age histogram information is not printed with the throughput collector. With the throughput collector, we can only see the desired survivor size, threshold age, and the maximum threshold age with PrintTenuringDistribution option.

Friday Apr 17, 2015

Longer Young Collections with JDK7 as Compared to JDK6

If you have moved from JDK 6 to 7 and are observing some increase in the times that young collections take to do their job, then here's a remedy: -XX:+JavaObjectsInPerm.

In JDK7, class statics and interned strings were moved from PermGen to the Java Heap and that may contribute towards a little increase in the young generation collection times. There is an option JavaObjectsInPerm which when enabled reverts that change i.e. class statics and interned strings get allocated back into the PermGen.

So, if tuning PermGen is not an issue and young gen collection times are bothering you then using -XX:+JavaObjectsInPerm might give some relief. Note that this option is available in jdk7 from 7u80 and was made available as a Diagnostic VM option with this CR: JDK-8048271

Since it is a diagnostic option, it needs to be used with -XX:+UnlockDiagnosticVMOptions.

Please note that this option is not available in JDK8 since PermGen has been completely removed in JDK8. Some more details on the PermGen removal can be read here.

Thursday Apr 02, 2015

Updates to the Java Troubleshooting Guide

Mattis Castegren who is my manager at Oracle, and is the guest writer for this blog post would like to share some details on the Java Troubleshooting Guide. Here's what he has to say:--

With the release of JDK 8, the official Java Troubleshooting Guide got a big overhaul. All pages were looked over and updated for JDK 8, and the two previous guides for the JVM and for Desktop Technologies were merged into one.

In the last month, with the release of 8u40, we have launched the next phase in this project. In this phase, we have added a lot of chapters about the new supportability tools that have been introduced in Java over the last few years.

The biggest new additions are a set of pages on how to use the Java Flight Recorder (JFR). If you haven't used JFR before, you should definitely check out the following chapters:

2.3 What are Java Flight Recordings

2.4 How to Produce a Flight Recording

2.5 Inspect a Flight Recording

These chapters contain step by step instructions on how to record a JFR and also goes through what to look for in a recording to find common issues. The chapters contain lots of screen shots and details on how to interpret the data.

Once you have learned the basics, you can also look at the following two chapters on how to use JFR to debug specific issues:

3.1 Debug a Memory Leak Using Java Flight Recorder

4 Troubleshoot Performance Issues Using JFR

When you have read through these chapters, you will be ready to use this great tool to find bottle necks in your own application.

Other new additions to the troubleshooting guide is a new chapter on how to set up Java for better troubleshooting: Prepare Java for Troubleshooting, as well as a lot of minor updates and clarifications.

Tuesday Feb 03, 2015

Analysis of Strange Hotspot Crashes

In this blog post, I am going to talk about the core file analysis of some strange looking hotspot crashes using gdb and Serviceability Agent debugging tools. At the first glance, these crashes appeared to be caused by some Hotspot Server Compiler issue but in the end these actually turned out to be due to a Linux Kernel bug.

This first crash happened in the compiled code of method SafeHashMap.put()

# Problematic frame:
# J 3575 C2
com.tangosol.util.SafeHashMap.put(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; (190 bytes) @ 0x00002aaaabda7ffd [0x00002aaaabda7da0+0x25d]
# Core dump written.

I opened the core with gdb and the stack trace of the crashing thread looked like this:

#15 0x00002afd41aa5e7f in JVM_handle_linux_signal (sig=11, info=0x40fd7c90, ucVoid=0x40fd7b60, abort_if_unrecognized=1)
#16 0x00002afd41a9ce13 in signalHandler (sig=11, info=0x40fd7c90, uc=0x40fd7b60)
#17 <signal handler called>
#18 0x00002aaaabda7ffd in ?? ()
#19 0x00000000c005e3c8 in ?? ()
#20 0x00000000c005e3c8 in ?? ()
#21 0x00000000c005e3c8 in ?? ()
#22 0xc005e3c82000cb42 in ?? ()
#23 0x0000000000000007 in ?? ()
#24 0x0000000000000000 in ?? ()

Opened the core file with Serviceability Agent tool too and inspected the addresses present in the above stack trace:

hsdb> whatis 0x00002aaaabda7ffd
Address 0x00002aaaabda7ffd: In code in NMethod for com/tangosol/util/SafeHashMap.put(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; content: [0x00002aaaabda7da0, 0x00002aaaabda9280),  code: [0x00002aaaabda7da0, 0x00002aaaabda9280),  data: [0x00002aaaabda9280, 0x00002aaaabda9f18),  oops: [0x00002aaaabda9280, 0x00002aaaabda9288),  frame size: 144

hsdb> whatis 0x00000000c005e3c8
Address 0x00000000c005e3c8: In unknown section of Java heap

So, all the above digging confirmed that the crash happened in the compiled code of com.tangosol.util.SafeHashMap.put(). I looked at the assembly of the generated code in gdb:

(gdb) x/610i 0x00002aaaabda7da0
   0x2aaaabda7f83:      mov    $0x2000cbc5,%r9d
   0x2aaaabda7f89:      movabs $0x0,%r11
   0x2aaaabda7f93:      lea    (%r11,%r9,8),%r11
   0x2aaaabda7f97:      mov    0xa8(%r11),%r11
   0x2aaaabda7f9e:      mov    %r11,0x0(%r13)
   0x2aaaabda7fa2:      movl   $0x2000cbc5,0x8(%r13)
   0x2aaaabda7faa:      mov    %r12d,0xc(%r13)
   0x2aaaabda7fae:      mov    %r12,0x10(%r13)
   0x2aaaabda7fb2:      mov    %r12,0x18(%r13)
   0x2aaaabda7fb6:      mov    %r13,%r9
   0x2aaaabda7fb9:      mov    %r14,%r11
   0x2aaaabda7fbc:      mov    %r11d,0x10(%r13)
   0x2aaaabda7fc0:      shr    $0x9,%r9
   0x2aaaabda7fc4:      movabs $0x2aaab9f5b000,%r11
   0x2aaaabda7fce:      mov    %r12b,(%r11,%r9,1)
   0x2aaaabda7fd2:      mov    (%rsp),%r9d
   0x2aaaabda7fd6:      mov    %r9d,0x14(%r13)
   0x2aaaabda7fda:      mov    %r13,%r11
   0x2aaaabda7fdd:      shr    $0x9,%r11
   0x2aaaabda7fe1:      movabs $0x2aaab9f5b000,%r9
   0x2aaaabda7feb:      mov    %r12b,(%r9,%r11,1)
   0x2aaaabda7fef:      lock addl $0x0,(%rsp)
   0x2aaaabda7ff4:      mov    %r10d,0xc(%r13)
   0x2aaaabda7ff8:      movq   %xmm0,%r11
   0x2aaaabda7ffd:      mov    (%r11),%rax  <---crash here, r11 is 0
   0x2aaaabda8000:      mov    %rax,%r11

The thread crashed while accessing the value at an address stored in r11 register. Looking at the value of r11 register:
r11 0x0

r11 contains 0 and accessing the value at null address caused the crash.

Now, if we look at the previous instruction "movq   %xmm0,%r11", value in r11 was copied over from xmm0 register. The value in xmm0 register is 0.

There is a bug in Linux kernel where the xmm registers may get corrupted under certain circumstances. Details on the kernel bug:

The machine where this crash was observed:
RELEASE=2.6.18-308.  VERSION=#1 SMP Tue Apr 17 16:41:30 EDT 2012  MACHINE=x86_64

So, the process was running on a machine which is vulnerable to that Linux kernel bug.

Let's take a look at another such Hotspot crash.

# A fatal error has been detected by the Java Runtime Environment:
#  Internal Error (sharedRuntime.cpp:833), pid=26946, tid=1082554688
#  fatal error: exception happened outside interpreter, nmethods and vtable stubs at pc 0x00002aaaab1f5340

Current thread (0x000000001ee9b000):  JavaThread "main" [_thread_in_Java, id=26949, stack(0x0000000040767000,0x0000000040868000)]

Stack: [0x0000000040767000,0x0000000040868000],  sp=0x00000000408631a0,  free space=1008k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  []  VMError::report_and_die()+0x2ba
V  []  report_fatal(char const*, int, char const*)+0x59
V  []  SharedRuntime::continuation_for_implicit_exception(JavaThread*, unsigned char*, SharedRuntime::ImplicitExceptionKind)+0x33a
V  []  JVM_handle_linux_signal+0x48a
V  []  signalHandler(int, siginfo*, void*)+0x43
C  []
J 528 C2;Lcom/sun/org/apache/xerces/internal/xni/Augmentations;)V (110 bytes) @ 0x00002aaaab607423 [0x00002aaaab606ec0+0x563]

(gdb) where
#0  0x0000003968a30265 in raise ()
#1  0x0000003968a31d10 in abort ()
#2  0x00002b96ba0bbea5 in os::abort (dump_core=true)
#3  0x00002b96ba258af3 in VMError::report_and_die (this=0x40863320)
#4  0x00002b96b9ca0749 in report_vm_error (detail_msg=<optimized out>,
error_msg=<optimized out>, line=<optimized out>, file=<optimized out>)   
#5  report_fatal (file=0x2b96ba2e0330, line=833, message=0x408633b8)   
#6  0x00002b96ba15752a in SharedRuntime::continuation_for_implicit_exception(thread=0x1ee9b000,
    pc=0x2aaaab1f5340, exception_kind=<optimized out>)   
#7  0x00002b96ba0c170a in JVM_handle_linux_signal (sig=11, info=0x408637c0,
ucVoid=0x40863690, abort_if_unrecognized=1)   
#8  0x00002b96ba0b7c83 in signalHandler (sig=11, info=0x408637c0, uc=0x40863690)
#9  <signal handler called>
#10 0x00002aaaab1f5340 in ?? ()
#11 0x00000000fe8bace8 in ?? ()
#12 0x00002aaaab607423 in ?? ()
#13 0x00000000fe94d0a0 in ?? ()
#14 0x00000000fe8bacd0 in ?? ()
#15 0x00000000fe8bacd0 in ?? ()
#16 0x00002aaac0cdd850 in ?? ()
#17 0x0000000000000007 in ?? ()

Inspecting the crashing PC 0x00002aaaab1f5340 with SA:

hsdb> whatis 0x00002aaaab1f5340
Address 0x00002aaaab1f5340: In code in StubRoutines (2) content:
[0x00002aaaab1f4e20, 0x00002aaaab1fa410),  code: [0x00002aaaab1f4e20,
0x00002aaaab1fa410),  data: [0x00002aaaab1fa410, 0x00002aaaab1fa410),  frame
size: 0

Frame #12:

hsdb> whatis 0x00002aaaab607423
Address 0x00002aaaab607423: In code in NMethod for
rnal/xni/Augmentations;)V content: [0x00002aaaab606ec0, 0x00002aaaab6083d8),  code: [0x00002aaaab606ec0, 0x00002aaaab6083d8),  data: [0x00002aaaab6083d8,
0x00002aaaab608e98),  oops: [0x00002aaaab6083d8, 0x00002aaaab6083e8),  frame size: 144

Disassembly around the crashing PC:

0x2aaaab1f5280:      movdqu 0x30(%rdi,%rdx,8),%xmm0
   0x2aaaab1f5286:      movdqu %xmm0,0x30(%rsi,%rdx,8)
   0x2aaaab1f528c:      movdqu 0x20(%rdi,%rdx,8),%xmm1
   0x2aaaab1f5292:      movdqu %xmm1,0x20(%rsi,%rdx,8)
   0x2aaaab1f5298:      movdqu 0x10(%rdi,%rdx,8),%xmm2
   0x2aaaab1f529e:      movdqu %xmm2,0x10(%rsi,%rdx,8)
   0x2aaaab1f52a4:      movdqu (%rdi,%rdx,8),%xmm3
   0x2aaaab1f52a9:      movdqu %xmm3,(%rsi,%rdx,8)
   0x2aaaab1f52ae:      sub    $0x8,%rdx
   0x2aaaab1f52b2:      jge    0x2aaaab1f5280
   0x2aaaab1f52b4:      add    $0x4,%rdx
   0x2aaaab1f52b8:      jl     0x2aaaab1f52d4
   0x2aaaab1f52ba:      movdqu 0x10(%rdi,%rdx,8),%xmm0
   0x2aaaab1f52c0:      movdqu %xmm0,0x10(%rsi,%rdx,8)
   0x2aaaab1f52c6:      movdqu (%rdi,%rdx,8),%xmm1
   0x2aaaab1f52cb:      movdqu %xmm1,(%rsi,%rdx,8)
   0x2aaaab1f52d0:      sub    $0x4,%rdx
   0x2aaaab1f52d4:      add    $0x4,%rdx
   0x2aaaab1f52d8:      jg     0x2aaaab1f5265
   0x2aaaab1f52da:      xor    %rax,%rax
   0x2aaaab1f52dd:      leaveq
   0x2aaaab1f52de:      retq
   0x2aaaab1f52df:      nop
   0x2aaaab1f52e0:      push   %rbp
   0x2aaaab1f52e1:      mov    %rsp,%rbp
   0x2aaaab1f52e4:      mov    %rdx,%rcx
   0x2aaaab1f52e7:      shr    $0x2,%rdx
   0x2aaaab1f52eb:      lea    -0x8(%rdi,%rdx,8),%rdi
   0x2aaaab1f52f0:      lea    -0x8(%rsi,%rdx,8),%rsi
   0x2aaaab1f52f5:      neg    %rdx
   0x2aaaab1f52f8:      jmpq   0x2aaaab1f5370
   0x2aaaab1f52fd:      mov    0x8(%rdi,%rdx,8),%rax
   0x2aaaab1f5302:      mov    %rax,0x8(%rsi,%rdx,8)
   0x2aaaab1f5307:      inc    %rdx
   0x2aaaab1f530a:      jne    0x2aaaab1f52fd
   0x2aaaab1f530c:      test   $0x2,%ecx
   0x2aaaab1f5312:      je     0x2aaaab1f5322
   0x2aaaab1f5314:      mov    0x8(%rdi),%eax
   0x2aaaab1f5317:      mov    %eax,0x8(%rsi)
   0x2aaaab1f531a:      add    $0x4,%rdi
   0x2aaaab1f531e:      add    $0x4,%rsi
   0x2aaaab1f5322:      test   $0x1,%ecx
   0x2aaaab1f5328:      je     0x2aaaab1f5332
   0x2aaaab1f532a:      mov    0x8(%rdi),%ax
   0x2aaaab1f532e:      mov    %ax,0x8(%rsi)
   0x2aaaab1f5332:      xor    %rax,%rax
   0x2aaaab1f5335:      leaveq
   0x2aaaab1f5336:      retq
   0x2aaaab1f5337:      nopw   0x0(%rax,%rax,1)
=> 0x2aaaab1f5340:      movdqu -0x38(%rdi,%rdx,8),%xmm0
   0x2aaaab1f5346:      movdqu %xmm0,-0x38(%rsi,%rdx,8)
   0x2aaaab1f534c:      movdqu -0x28(%rdi,%rdx,8),%xmm1
   0x2aaaab1f5352:      movdqu %xmm1,-0x28(%rsi,%rdx,8)
   0x2aaaab1f5358:      movdqu -0x18(%rdi,%rdx,8),%xmm2
   0x2aaaab1f535e:      movdqu %xmm2,-0x18(%rsi,%rdx,8)
   0x2aaaab1f5364:      movdqu -0x8(%rdi,%rdx,8),%xmm3
   0x2aaaab1f536a:      movdqu %xmm3,-0x8(%rsi,%rdx,8)
   0x2aaaab1f5370:      add    $0x8,%rdx

In the assembly code of the StubRoutine, we can see that the values are being moved to and from xmm registers and the crash happens while accessing value at an address present in rdi register.

rdx 0xffffffffffffffc8
rdi 0x1f8

Machine details where this crash happened:
RELEASE=2.6.18-308.  VERSION=#1 SMP Tue Apr 17 16:41:30 EDT 2012  MACHINE=x86_64

If you encounter such weird looking Hotspot crashes where the cause of the register/memory corruption seems to be unreasonable, please make sure that you are running on a latest patched Linux kernel. Kernels of version older that 2.6.32 may be vulnerable to this kernel bug.

Wednesday Dec 17, 2014

Long Class-Unloading Pauses with JDK8

Recently, I came across a report where a user was facing long GC pauses with JDK8. They didn't see this issue while running with JDK7. The problem with jdk8 was that sometimes the class unloading step of CMS-remark phase was taking a very long time to do its job. They were running with large pages enabled with both jdk7 and jdk8.

2014-11-30T01:51:35.632+0000: [GC (CMS Final Remark) [YG occupancy: 292006 K (1179648 K)][Rescan (parallel) , 0.0397261 secs][weak refs processing, 0.0954205 secs][class unloading, 29.2450253 secs][scrub symbol table, 4.9592268 secs][scrub string table, 0.0034572 secs][1 CMS-remark: 1101173K(1835008K)] 1393180K(3014656K), 34.3796360 secs] [Times: user=0.92 sys=0.38, real=34.37 secs]

2014-12-01T02:01:44.452+0000: [GC (CMS Final Remark) [YG occupancy: 550275 K (1179648 K)][Rescan (parallel) , 0.0785401 secs][weak refs processing, 0.0931797 secs][class unloading, 28.0163134 secs][scrub symbol table, 5.2806525 secs][scrub string table, 0.0034023 secs][1 CMS-remark: 1101982K(1835008K)] 1652257K(3014656K), 33.5087749 secs] [Times: user=1.48 sys=0.36, real=33.51 secs]

Now, here if we look closely, there is a huge difference in the times reported for 'user' 'sys' and 'real'. As we all know, 'user' is the amount of CPU time spent in user-mode within the process and 'sys' is the amount of CPU time spent in the kernel within the process. Other processes and the time the process spends blocked does not count towards these two figures. The 'real' time is the total elapsed time including time slices used by other processes and the time the process spends blocked (for example in waiting for I/O to complete, or system is performing memory activities such as paging/swapping).

In this case too, the problem seemed not be with the JVM or more specifically with the class unloading step but more with the GC threads getting blocked out. It turned out that the class unloading phase was causing lot of paging activity to occur at the system level which took away the CPU time from the GC threads.

What changed in the JVM from JDK7 to JDK8 is that in JDK8 we don't have PermGen. Instead we have MetaSpace which is allocated out of native memory space. As I mentioned earlier, the user was running with Large Pages enabled. With JDK7 by using option -XX:+UseLargePages, we get the large pages enabled for the Java Heap as well as for the PermGen. But with jdk8, with the class meta data stored in MetaSpace(native space), we don't get large pages enabled for the MetaSpace by default when we use +UseLargePages.

In this particular case, the system was configured to have higher number of large pages and a limited memory resource was left for the regular small pages. And a limited number of regular pages available for the MetaSpace caused paging activity during the class unloading phase leading to long GC pauses. The solution for this problem was to use -XX:+UseLargePagesInMetaspace that enables the large pages support for the MetaSpace. This avoided the paging activity and the class-unloading times returned to normal.

Please note that UseLargePagesInMetaspace is not enabled by default when UseLargePages is ON. This is intentional and is done to avoid any memory usage regressions. A classloader (and its classes metadata) is stored in a chunk of memory called Metachunk in the MetaSpace. To be able to hand back pages to the system, we have to clean out an entire page. It is possible that there may be many Metachunks of different classloaders present on a page, and that page can be returned to the system only when all the classloaders are dead and their Metachunks are not needed any more. This leads to delayed deallocation of pages and it is a bit more likely with large pages.

Another reason for not turning UseLargePagesInMetaspace option on by default is that we can not use large pages for the CompressedClass space. The problem there is that we can not dynamically grow the number of large pages; we have to pre-commit all the large pages when memory is reserved. CompressedClass space by default is 1GB of reserved memory and committing that 1GB up-front would cause memory usage regressions.

So, if one has the system configured in such a way that a larger part of memory is set aside for large pages and a relatively smaller part for the regular pages, it is likely to observe the above described long GC pauses during the clean up of MetaSpace. In such situation, using UseLargePagesInMetaspace may help resolve the problem.

Tuesday Nov 25, 2014

Increased heap usage with G1 GC

G1 garbage collector is a predictable low pause collector that works efficiently to achieve the low pause time goals. To meet the set pause time goals, G1 tries to:

  • pick only those many old heap regions for the collection-set that it thinks could be collected in the specified goal time
  • adapt the young generation size to contain the collection times within the set goal.

In that effort, however, G1 may end up causing a side effect of increased usage of the Java Heap.

For example, in an experimental run with CMS using a heap size of 8GB, the maximum observed heap usage was around 4GB. But after switching to G1, with -XX:MaxGCPauseMillis=50, the heap usage went up to 6GB. In both the cases, the application load was same.

Part of the CMS GC logs:
2014-11-21T05:18:18.790-0500: 1918944.850: [GC2014-11-21T05:18:18.790-0500: 1918944.850: [ParNew: 32704K->64K(32704K), 0.0698450 secs] 4274067K->4247066K(8388544K), 0.0700480 secs] [Times: user=0.32 sys=0.01, real=0.06 secs]
2014-11-21T05:18:18.864-0500: 1918944.924: [GC [1 CMS-initial-mark: 4247002K(8355840K)] 4247095K(8388544K), 0.1249300 secs] [Times: user=0.12 sys=0.00, real=0.13 secs]
2014-11-21T05:18:18.989-0500: 1918945.050: [CMS-concurrent-mark-start]
2014-11-21T05:18:31.163-0500: 1918957.223: [GC2014-11-21T05:18:31.163-0500: 1918957.223: [ParNew: 32704K->64K(32704K), 0.0365220 secs] 4279706K->4247775K(8388544K), 0.0367530 secs] [Times: user=0.26 sys=0.00, real=0.04 secs]
2014-11-21T05:18:37.588-0500: 1918963.648: [CMS-concurrent-mark: 18.547/18.598 secs] [Times: user=39.03 sys=0.47, real=18.59 secs]
2014-11-21T05:18:37.588-0500: 1918963.648: [CMS-concurrent-preclean-start]
2014-11-21T05:18:37.708-0500: 1918963.768: [CMS-concurrent-preclean: 0.114/0.120 secs] [Times: user=0.13 sys=0.00, real=0.12 secs]
2014-11-21T05:18:37.708-0500: 1918963.768: [CMS-concurrent-abortable-preclean-start]
CMS: abort preclean due to time 2014-11-21T05:18:42.724-0500: 1918968.785: [CMS-concurrent-abortable-preclean: 3.008/5.016 secs] [Times: user=3.34 sys=0.12, real=5.02 secs]
2014-11-21T05:18:42.728-0500: 1918968.789: [GC[YG occupancy: 25109 K (32704 K)]2014-11-21T05:18:42.728-0500: 1918968.789: [Rescan (parallel) , 0.1196710 secs]2014-11-21T05:18:42.848-0500: 1918968.909: [weak refs processing, 0.0002300 secs]2014-11-21T05:18:42.848-0500: 1918968.909: [class unloading, 0.0219670 secs]2014-11-21T05:18:42.870-0500: 1918968.931: [scrub symbol table, 0.0144900 secs]2014-11-21T05:18:42.885-0500: 1918968.945: [scrub string table, 0.0046370 secs] [1 CMS-remark: 4247711K(8355840K)] 4272821K(8388544K), 0.1740580 secs] [Times: user=0.99 sys=0.00, real=0.17 secs]
2014-11-21T05:18:42.903-0500: 1918968.963: [CMS-concurrent-sweep-start]

Part of the G1 GC logs:
  [Times: user=0.40 sys=0.01, real=0.06 secs]
    [Eden: 368.0M(368.0M)->0.0B(372.0M) Survivors: 40.0M->36.0M Heap: 6622.9M(8192.0M)->6259.7M(8192.0M)]
 2014-10-08T04:38:46.675-0400: 1311963.055: [GC pause (young) (initial-mark), 0.0564110 secs]
    [Parallel Time: 51.4 ms, GC Workers: 8]
       [GC Worker Start (ms): Min: 1311963055.7, Avg: 1311963055.7, Max: 1311963055.8, Diff: 0.1]
       [Ext Root Scanning (ms): Min: 6.3, Avg: 11.8, Max: 16.2, Diff: 9.9, Sum: 94.7]
       [Update RS (ms): Min: 3.0, Avg: 4.7, Max: 7.4, Diff: 4.4, Sum: 37.8]
          [Processed Buffers: Min: 2, Avg: 8.8, Max: 18, Diff: 16, Sum: 70]
       [Scan RS (ms): Min: 1.0, Avg: 1.3, Max: 1.4, Diff: 0.5, Sum: 10.0]
       [Object Copy (ms): Min: 29.0, Avg: 33.0, Max: 39.7, Diff: 10.7, Sum: 264.3]
       [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2]
       [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.9]
       [GC Worker Total (ms): Min: 50.9, Avg: 51.0, Max: 51.1, Diff: 0.2, Sum: 407.8]
       [GC Worker End (ms): Min: 1311963106.7, Avg: 1311963106.7, Max: 1311963106.9, Diff: 0.2]
    [Code Root Fixup: 0.0 ms]
    [Clear CT: 1.5 ms]
    [Other: 3.6 ms]
       [Choose CSet: 0.0 ms]
       [Ref Proc: 1.8 ms]
       [Ref Enq: 0.1 ms]
       [Free CSet: 0.7 ms]
    [Eden: 372.0M(372.0M)->0.0B(368.0M) Survivors: 36.0M->40.0M Heap: 6631.7M(8192.0M)->6262.8M(8192.0M)]
  [Times: user=0.42 sys=0.01, real=0.05 secs]
 2014-10-08T04:34:59.344-0400: 1311735.724: [GC pause (mixed), 0.0666200 secs]
    [Parallel Time: 60.4 ms, GC Workers: 8]
       [GC Worker Start (ms): Min: 1311735724.4, Avg: 1311735724.5, Max: 1311735724.5, Diff: 0.2]
       [Ext Root Scanning (ms): Min: 10.1, Avg: 13.0, Max: 19.3, Diff: 9.2, Sum: 104.3]
       [Update RS (ms): Min: 0.0, Avg: 4.5, Max: 6.6, Diff: 6.6, Sum: 35.6]
          [Processed Buffers: Min: 0, Avg: 9.1, Max: 20, Diff: 20, Sum: 73]
       [Scan RS (ms): Min: 3.5, Avg: 4.8, Max: 5.1, Diff: 1.6, Sum: 38.7]
       [Object Copy (ms): Min: 37.1, Avg: 37.6, Max: 38.4, Diff: 1.3, Sum: 301.1]
       [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2]
       [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 1.1]
       [GC Worker Total (ms): Min: 60.0, Avg: 60.1, Max: 60.2, Diff: 0.2, Sum: 481.1]
       [GC Worker End (ms): Min: 1311735784.5, Avg: 1311735784.6, Max: 1311735784.7, Diff: 0.2]
    [Code Root Fixup: 0.0 ms]
    [Clear CT: 2.2 ms]
    [Other: 4.0 ms]
       [Choose CSet: 0.1 ms]
       [Ref Proc: 1.9 ms]
       [Ref Enq: 0.0 ms]
       [Free CSet: 0.9 ms]
    [Eden: 372.0M(372.0M)->0.0B(368.0M) Survivors: 36.0M->40.0M Heap: 6633.2M(8192.0M)->6254.9M(8192.0M)]

The reason for this unexpected behaviour is that G1 tries to select minimal number of old regions in the collection-set so that it can do the evacuation pauses within the specified pause time goal. Due to this, it fails to clean up the garbage in the old regions, and hence the increased heap usage.

To overcome this side effect with G1, if one does not have very strict pause time goals, the young generation size can be increased so that most of the die-young objects get collected in the young regions itself and don't get promoted to the old generation. One can also try relaxing the pause time goals so that G1 includes more number of old regions in the collection set and good amount of garbage gets collected with the evacuation pauses.

Friday Oct 10, 2014

Running on a 64bit platform and still running out of memory

It sounds strange that one is running with 64bit JVM but still running out of native memory or Java Heap space. On a 64bit machine, with 64bit JVM we get almost unlimited memory then how is it possible to run out of memory? Well, it may happen in certain situations.

HotSpot JVM has a feature called CompressedOops with which it uses 32-bit (compressed) pointers on 64-bit platforms to have smaller footprints and better performance on 64-bit platforms. 64-bit address values are stored into 32-bit pointers using an encoding and are retrieved back using the corresponding decoding.

CompressedOops implementation tries to allocate Java Heap using different strategies based on the heap size and the platform it runs on. If the heap size is less than 4Gb then it tries to allocate it into low virtual address space (below 4Gb) so that the compressed oops can be used without encoding/decoding. If it fails or heap size > 4Gb then it tries to allocate the heap below 32Gb to use zero based compressed oops. If this also fails then it switches to regular compressed oops with narrow oop base.

The heap base where the Java Heap would be based at is computed by the JVM function Universe::preferred_heap_base():

Heap base computation depends on the heap size and the HeapBaseMinAddress value for the platform we are running on. If the heap_size+HeapBaseMinAddress can fit in the first 4G address space, then heap is allocated below 4G. Then it checks if it can fit under 32G boundary otherwise java heap is allocated in the higher addresses.

In doing so, it puts a cap on the size that the native memory can grow up to, and also limits the growth of the Java Heap, and this may lead to out of memory errors for memory intensive applications. The following process map shows that the java heap is allocated just next to the native heap.

0000000100000000          8K r-x--  /sw/.es-base/sparc/pkg/jdk-1.7.0_60/bin/sparcv9/java
0000000100100000          8K rwx--  /sw/.es-base/sparc/pkg/jdk-1.7.0_60/bin/sparcv9/java
0000000100102000         56K rwx--    [ heap ]
0000000100110000       2624K rwx--    [ heap ]   <--- native Heap
00000001FB000000      24576K rw---    [ anon ]   <--- Java Heap starts here
0000000200000000    1396736K rw---    [ anon ]
0000000600000000     700416K rw---    [ anon ]

Workarounds for this situation:
1. Run with -XX:-UseCompressedOops (keeping in mind that you would lose the performance benefit that comes with enabling CompressedOops). This will instruct the JVM to run without the Compressed Oops feature and it would not try to fit the Java Heap in the first 4GB of heap.
2. Keep the CompressedOops and set the base of the Java Heap with the JVM option -XX:HeapBaseMinAddress=n to specify the address where the Java Heap should start from. If you are running out of native memory, set it to a higher address so that there is enough room available for the native mallocs, and if Java Heap is getting exhausted then set it to a lower value. And if you need java heap to be allocated beyond 4GB or 32GB then HeapBaseMinAddress should be set to a value so that HeapBaseMinAddress + heap size > 4GB or 32GB.
The default value of HeapBaseMinAddress on most of the platforms is 2GB.

Let's run a simple test program on solaris-sparc 10 and solaris-x64 11 and see where the Java Heap is placed on these platforms when using CompressedOops.

On Solaris-sparc 10 with JDK 1.7.0_67

% uname -a
SunOS svr 5.10 Generic_148888-03 sun4v sparc SUNW,Sun-Fire-T200

% java -version
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b32)
Java HotSpot(TM) Server VM (build 24.65-b04, mixed mode)

% java -d64 -XX:+PrintCommandLineFlags -XX:+PrintFlagsFinal TestClass &
[3] 22638
% -XX:InitialHeapSize=266338304 -XX:MaxHeapSize=4261412864 -XX:+PrintCommandLineFlags -XX:+PrintFlagsFinal -XX:+UseCompressedOops -XX:+UseParallelGC
[Global flags]
    uintx AdaptivePermSizeWeight                    = 20              {product}
    uintx GCTimeRatio                               = 99              {product}
    uintx HeapBaseMinAddress                        = 4294967296      {pd product}
    bool HeapDumpAfterFullGC                        = false           {manageable}
    Heap Parameters:
ParallelScavengeHeap [ PSYoungGen [ eden =  [0xffffffff1e800000,0xffffffff1ebf5d88,0xffffffff22a00000] , from =  [0xffffffff235000
00,0xffffffff23500000,0xffffffff24000000] , to =  [0xffffffff22a00000,0xffffffff22a00000,0xffffffff23500000]  ] PSOldGen [  [0xfffffffe75400000,0xfffffffe75400000,0xfffffffe7fc00000]  ] PSPermGen [  [0xfffffffe70400000,0xfffffffe706843a0,0xfffffffe71c00000]  ]

The default value of HeapBaseMinAddress is 4GB here and the Java Heap is allocated in the higher virtual address space.

On Solaris-x64 11 with JDK 1.7.0_67

bash-4.1$ uname -a
SunOS hsdev-15 5.11 11.0 i86pc i386 i86pc

bash-4.1$ java -d64 -XX:+PrintCommandLineFlags -XX:+PrintFlagsFinal TestClass &
[1] 6049
bash-4.1$ -XX:InitialHeapSize=805015296 -XX:MaxHeapSize=12880244736 -XX:+PrintCommandLineFlags -XX:+PrintFlagsFinal -XX:+UseCompressedOops -XX:+UseParallelGC
[Global flags]
    uintx GCTimeRatio                               = 99              {product}
    uintx HeapBaseMinAddress                        = 268435456       {pd product}
    bool HeapDumpAfterFullGC                       = false           {manageable}
bash-4.1$ java -d64 -classpath $JAVA_HOME/lib/sa-jdi.jar sun.jvm.hotspot.CLHSDB 6049
Attaching to process 6049, please wait...
hsdb> universe
Heap Parameters:
ParallelScavengeHeap [ PSYoungGen [ eden =  [0x0000000700000000,0x00000007007ae238,0x000000070c000000] , from =  [0x000000070e0000
00,0x000000070e000000,0x0000000710000000] , to =  [0x000000070c000000,0x000000070c000000,0x000000070e000000]  ] PSOldGen [  [0x0000000500400000,0x0000000500400000,0x0000000520200000]  ] PSPermGen [  [0x00000004fb200000,0x00000004fb483380,0x00000004fc800000]  ]

bash-4.1$ pmap 6049
0000000000400000          4K r-x--  /java/bin/amd64/java
0000000000410000          4K rw---  /java/bin/amd64/java
0000000000411000       2288K rw---    [ heap ]
00000004FB200000      22528K rw---    [ anon ]
0000000500400000     522240K rw---    [ anon ]
0000000700000000     262144K rw---    [ anon ]

Here, the default value of HeapBaseMinAddress is 256MB and the Java Heap is allocated in the 32G address space.

On Solaris-x64 11 with JDK 1.8.0_20

bash-4.1$ java -version
java version "1.8.0_20"
Java(TM) SE Runtime Environment (build 1.8.0_20-b26)
Java HotSpot(TM) 64-Bit Server VM (build 25.20-b23, mixed mode)
bash-4.1$ java -d64 -XX:+PrintCommandLineFlags -XX:+PrintFlagsFinal TestClass &
[1] 19069
bash-4.1$ -XX:InitialHeapSize=805015296 -XX:MaxHeapSize=12880244736 -XX:+PrintCommandLineFlags -XX:+PrintFlagsFinal -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC
[Global flags]
    uintx AdaptiveSizeDecrementScaleFactor          = 4                                   {product}
    uintx GCTimeRatio                               = 99                                  {product}
    uintx HeapBaseMinAddress                        = 2147483648                          {pd product}
    bool HeapDumpAfterFullGC                        = false                               {manageable}
bash-4.1$ java -d64 -classpath $JAVA_HOME/lib/sa-jdi.jar sun.jvm.hotspot.CLHSDB 19069
Attaching to process 19069, please wait...
hsdb> universe
Heap Parameters:
ParallelScavengeHeap [ PSYoungGen [ eden =  [0x00000006c0200000,0x00000006c190a4d8,0x00000006cc200000] , from =  [0x00000006ce2000
00,0x00000006ce200000,0x00000006d0200000] , to =  [0x00000006cc200000,0x00000006cc200000,0x00000006ce200000]  ] PSOldGen [  [0x00000004c0400000,0x00000004c0400000,0x00000004e0400000]  ]  ]

bash-4.1$ pmap 19069
19069:  java -d64 -XX:+PrintCommandLineFlags -XX:+PrintFlagsFinal TestClass
0000000000411000       3696K rw---    [ heap ]
00000004C0400000     524288K rw---    [ anon ]
00000006C0200000     262144K rw---    [ anon ]
00000007C0000000        512K rw---    [ anon ]

Here, HeapBaseMinAddress is 2GB by default and heap is allocated in the first 32GB address space.

The default value of HeapBaseMinAddress for Solaris-x64 systems in jdk1.7.0_xx is 256M. It has been changed to 2G in jdk8 with the following bug fix: 


I share my work experiences as a JVM Sustaining Engineer through this blog


« December 2016