Saturday Jan 04, 2014

UseLargePages on Linux

There is a JVM option UseLargePages (introduced in JDK 5.0u5) that can be used to request large memory pages from the system if large pages memory is supported by the system. The goal of the large page support is to optimize processor Translation-Lookaside Buffers and hence increase performance.

Recently we saw few instances of HotSpot crashes with JDK7 on the Linux platform when using the large memory pages.

8013057: assert(_needs_gc || SafepointSynchronize::is_at_safepoint()) failed: only read at safepoint
https://bugs.openjdk.java.net/browse/JDK-8013057

8007074: SIGSEGV at ParMarkBitMap::verify_clear()
https://bugs.openjdk.java.net/browse/JDK-8007074

Cause: The cause of these crashes is the way mmap works on the Linux platform. If the large page support is enabled on the system, commit_memory() implementation of HotSpot on Linux platform tries to commit the earlier reserved memory with 'mmap' call using the large pages. If there are not enough number of large pages available, the mmap call fails releasing the reserved memory, allowing the same memory region to be used for other allocations. This causes the same memory region to be used for different purposes and leads to unexpected behaviors.

Symptoms: With the above mentioned issue, we may see crashes with stack traces something like this:
 V  [libjvm.so+0x759a1a]  ParMarkBitMap::mark_obj(HeapWord*, unsigned long)+0x7a
 V  [libjvm.so+0x7a116e]  PSParallelCompact::MarkAndPushClosure::do_oop(oopDesc**)+0xce
 V  [libjvm.so+0x485197]  frame::oops_interpreted_do(OopClosure*, RegisterMap const*, bool)+0xe7
 V  [libjvm.so+0x863a4a]  JavaThread::oops_do(OopClosure*, CodeBlobClosure*)+0x15a
 V  [libjvm.so+0x77c97e]  ThreadRootsMarkingTask::do_it(GCTaskManager*, unsigned int)+0xae
 V  [libjvm.so+0x4b7ec0]  GCTaskThread::run()+0x130
 V  [libjvm.so+0x748f90]  java_start(Thread*)+0x100

Here the crash happens while writting to an address 0x00007f2cf656eef0 in the mapped region of ParMarkBitMap. And that memory belongs to the rt.jar (from hs_err log file):
7f2cf6419000-7f2cf65d7000 r--s 039dd000 00:31 106601532                  /jdk/jdk1.7.0_21/jre/lib/rt.jar

Due to this bug, the same memory region got mapped for two different allocations and caused this crash.

Fixes:

8013057 strengthened the error handling of mmap failures on Linux platform and also added some diagnostic information for these failures. It is fixed in 7u40.

8007074 fixes the reserved memory mapping loss issue when using the large pages on the Linux platform. Details on this fix: http://mail.openjdk.java.net/pipermail/hotspot-dev/2013-July/010117.html. It is fixed in JDK 8 and will also be included into 7u60, scheduled to be released in May 2014.

Workarounds:

1. Disable the use of large pages with JVM option -XX:-UseLargePages.

2. Increase the number of large pages available on the system. By having the sufficient number of large pages on the system, we can reduce the risk of memory commit failures and thus reduce the chances of hitting the large pages issue. Please see the details on how to configure the number of large pages here:
http://www.oracle.com/technetwork/java/javase/tech/largememory-jsp-137182.html

Other related fixes:

8026887: Make issues due to failed large pages allocations easier to debug
https://bugs.openjdk.java.net/browse/JDK-8026887

With the fix of 8013057, diagnostic information for the memory commit failures was added. It printed the error messages something like this:
os::commit_memory(0x00000006b1600000, 352321536, 2097152, 0) failed;
error='Cannot allocate memory' (errno=12)

With this fix of 8026887, this error message has been modified to suggest that the memory commit failed due to the lack of large pages, and it now looks like the following:
os::commit_memory(0x00000006b1600000, 352321536, 2097152, 0) failed;
error='Cannot allocate large pages, falling back to small pages' (errno=12)

This change has been integrated into 7u51.

The fix for 8007074 will be available in 7u60 and could not be included into 7u51, so this change (JDK-8026887) makes the error messages printed for the large pages related commit memory failures more informative. If we see these messages in the JVM logs that indicates that we have the risk of hitting the unexpected behaviors due to the bug 8007074.

8024838: Significant slowdown due to transparent huge pages
https://bugs.openjdk.java.net/browse/JDK-8024838

With the fix of 8007074, significant performance degradation was detected. This regression has been fixed with JDK-8024838 in JDK 8 and will also be included in JDK 7u60.

About

poonam

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today