Saturday Jan 04, 2014

UseLargePages on Linux

There is a JVM option UseLargePages (introduced in JDK 5.0u5) that can be used to request large memory pages from the system if large pages memory is supported by the system. The goal of the large page support is to optimize processor Translation-Lookaside Buffers and hence increase performance.

Recently we saw few instances of HotSpot crashes with JDK7 on the Linux platform when using the large memory pages.

8013057: assert(_needs_gc || SafepointSynchronize::is_at_safepoint()) failed: only read at safepoint
https://bugs.openjdk.java.net/browse/JDK-8013057

8007074: SIGSEGV at ParMarkBitMap::verify_clear()
https://bugs.openjdk.java.net/browse/JDK-8007074

Cause: The cause of these crashes is the way mmap works on the Linux platform. If the large page support is enabled on the system, commit_memory() implementation of HotSpot on Linux platform tries to commit the earlier reserved memory with 'mmap' call using the large pages. If there are not enough number of large pages available, the mmap call fails releasing the reserved memory, allowing the same memory region to be used for other allocations. This causes the same memory region to be used for different purposes and leads to unexpected behaviors.

Symptoms: With the above mentioned issue, we may see crashes with stack traces something like this:
 V  [libjvm.so+0x759a1a]  ParMarkBitMap::mark_obj(HeapWord*, unsigned long)+0x7a
 V  [libjvm.so+0x7a116e]  PSParallelCompact::MarkAndPushClosure::do_oop(oopDesc**)+0xce
 V  [libjvm.so+0x485197]  frame::oops_interpreted_do(OopClosure*, RegisterMap const*, bool)+0xe7
 V  [libjvm.so+0x863a4a]  JavaThread::oops_do(OopClosure*, CodeBlobClosure*)+0x15a
 V  [libjvm.so+0x77c97e]  ThreadRootsMarkingTask::do_it(GCTaskManager*, unsigned int)+0xae
 V  [libjvm.so+0x4b7ec0]  GCTaskThread::run()+0x130
 V  [libjvm.so+0x748f90]  java_start(Thread*)+0x100

Here the crash happens while writting to an address 0x00007f2cf656eef0 in the mapped region of ParMarkBitMap. And that memory belongs to the rt.jar (from hs_err log file):
7f2cf6419000-7f2cf65d7000 r--s 039dd000 00:31 106601532                  /jdk/jdk1.7.0_21/jre/lib/rt.jar

Due to this bug, the same memory region got mapped for two different allocations and caused this crash.

Fixes:

8013057 strengthened the error handling of mmap failures on Linux platform and also added some diagnostic information for these failures. It is fixed in 7u40.

8007074 fixes the reserved memory mapping loss issue when using the large pages on the Linux platform. Details on this fix: http://mail.openjdk.java.net/pipermail/hotspot-dev/2013-July/010117.html. It is fixed in JDK 8 and will also be included into 7u60, scheduled to be released in May 2014.

Workarounds:

1. Disable the use of large pages with JVM option -XX:-UseLargePages.

2. Increase the number of large pages available on the system. By having the sufficient number of large pages on the system, we can reduce the risk of memory commit failures and thus reduce the chances of hitting the large pages issue. Please see the details on how to configure the number of large pages here:
http://www.oracle.com/technetwork/java/javase/tech/largememory-jsp-137182.html

Other related fixes:

8026887: Make issues due to failed large pages allocations easier to debug
https://bugs.openjdk.java.net/browse/JDK-8026887

With the fix of 8013057, diagnostic information for the memory commit failures was added. It printed the error messages something like this:
os::commit_memory(0x00000006b1600000, 352321536, 2097152, 0) failed;
error='Cannot allocate memory' (errno=12)

With this fix of 8026887, this error message has been modified to suggest that the memory commit failed due to the lack of large pages, and it now looks like the following:
os::commit_memory(0x00000006b1600000, 352321536, 2097152, 0) failed;
error='Cannot allocate large pages, falling back to small pages' (errno=12)

This change has been integrated into 7u51.

The fix for 8007074 will be available in 7u60 and could not be included into 7u51, so this change (JDK-8026887) makes the error messages printed for the large pages related commit memory failures more informative. If we see these messages in the JVM logs that indicates that we have the risk of hitting the unexpected behaviors due to the bug 8007074.

8024838: Significant slowdown due to transparent huge pages
https://bugs.openjdk.java.net/browse/JDK-8024838

With the fix of 8007074, significant performance degradation was detected. This regression has been fixed with JDK-8024838 in JDK 8 and will also be included in JDK 7u60.

Sunday Jul 12, 2009

Important CMS Fixes

In this entry, I would like to talk about some CMS (Concurrent Mark Sweep) issues, their workarounds and the releases these are fixed in.


\* 6558100: CMS crash following parallel work queue overflow.
This crash is seen when -XX:+ParallelRefProcEnabled is set.
Workaround is to use -XX:-ParallelRefProcEnabled.
This is fixed in 1.4.2_17, 5.0u14 and 6u4


\* 6578335: CMS: BigApps failure with -XX:CMSInitiatingOccupancyFraction=1
-XX:+CMSMarkStackOverflowALot.
For clarity sake, this issue was broken into three separate bugs 6722112, 6722113 and 6722116.


\* 6722112: CMS: Incorrect encoding of overflown ObjectArrays during concurrent precleaning.
Workaround is to use -XX:-CMSPrecleaningEnabled and also increasing the size of the marking stack via -XX:CMSMarkStackSize{,Max} would reduce the probability of hitting this bug.
This is fixed in 1.4.2_19-rev-b09, 5.0u18-rev-b03, 6u7-rev-b15 and 6u12.


\* 6722113: CMS: Incorrect overflow handling during Precleaning of Reference lists.
Workaround is to use options -XX:-CMSPrecleanRefLists1 and -XX:-CMSPrecleanRefLists2
This is fixed in 6u14, 5.0u18-rev-b05 and 6u13-rev-b05.


\* 6722116: CMS: Incorrect overflow handling when using parallel concurrent marking.
Workaround is to switch off parallel concurrent marking with -XX:-CMSConcurrentMTEnabled. Also increasing the CMS marking stack size (-XX:CMSMarkStackSize, -XX:CMSMarkStackSizeMax) would reduce the probability of hitting this bug.
This is fixed in 6u7-rev-b15 and 6u12.



So, if you face any of these above crashes, please upgrade to the JDK version in which it is fixed. And if upgrade is not possible, workaround can be used to avoid the issue.

Please note that Java SE for Business support contract is required for using Revision Releases (e.g. 1.4.2_19-rev-b09).

Saturday Dec 27, 2008

More on Windows Crash Dumps for Java Processes...

User Mode Process Dumper

There is another very good tool 'User Mode Process Dumper' that can be used to collect user dumps for crashing Java processes.

You can get it and install it from here:
http://www.microsoft.com/downloads/details.aspx?
FamilyID=E089CA41-6A87-40C8-BF69-28AC08570B7E&displaylang=en

After installation, run it from the Control Panel by clicking 'Process Dumper'. Add the application name to be monitored; java.exe in our case.

Set the process monitoring rules by clicking on 'Rules' button.

Here, specify the folder where dump should get created when the process crashes. Select 'Access Violation' exception to create dump when any access violation exception happens in the process.

Now, let's run the same test program.

D:\\demo>java test

You will see this message box creating the crash dump.

And then a hs_err log file will also be written:

# A fatal error has been detected by the Java Runtime Environment:
#
# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x10001032, pid=8112, tid=6788
#
# Java VM: OpenJDK Client VM (14.0-b05-internal-debug mixed mode windows-x86 )
# Problematic frame:
# C [test.dll+0x1032]
#
# An error report file with more information is saved as:
# D:\\demo\\hs_err_pid8112.log
#
# If you would like to submit a bug report, please visit:
# http://java.sun.com/webapps/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

So using this tool, we get both the crash dump as well as the hs_err log file for the crash. :) And the crash dump thus created shows the crash in test.f() native method.

OnError JVM option

There is a JVM option 'OnError' that can be used to perform any action or invoke any tool at the occurrence of fatal error in VM.

For example:
-XX:OnError="drwtsn32 -p %p"
-XX:OnError=”userdump -p %p”

This will invoke drwtsn32/userdump whenever any fatal error occurs in VM.

UseOSErrorReporting JVM option

There is a new option UseOSErrorReporting in jdk7 that passes the exception to OS after handling and generating hs_err log file which invokes the Default System Debugger. So for example, If Dr. Watson is set as the default debugger, then with this option, after hs_err file is written, Dr. Watson would be invoked to create the crash dump for the exception.

This work is done under CR 6227246: Improve Windows unhandled structured exception reporting

Sunday Oct 19, 2008

Windows crash dumps for Java Processes

Windows Crash Dump is memory dump of a process running on a Windows system. These dumps can be very useful for debugging Java process crashes. In this entry I discuss how to collect sane Crash Dumps for Java process crashes on Windows machines that can later be analyzed using Windbg or other 'Debugging Tools For Windows'.

I have a simple java class test which uses native library test.dll using JNI. test.dll implements a 'native' method where it accesses a null pointer and causes a crash. Complete source of this crashing program is here


Windows NT, 2000 and XP

Let's first run this program on a Windows NT/2000/XP machine:

D:\\demo>java test
#
# An unexpected error has been detected by Java Runtime Environment:
#
# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x10011f68, pid=2020, tid=640
#
# Java VM: Java HotSpot(TM) Client VM (10.0-b19 mixed mode, sharing windows-x86)
# Problematic frame:
# C [test.dll+0x11f68]
#
# An error report file with more information is saved as:
# D:\\demo\\hs_err_pid2020.log
#
# If you would like to submit a bug report, please visit:
# http://java.sun.com/webapps/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

Running this program created an hs_err log file in the same folder. Error log file contains the following stack trace:

C [test.dll+0x1032]
j test.f()V+0
j test.main([Ljava/lang/String;)V+19
v ~StubRoutines::call_stub
V [jvm.dll+0x873ed]
V [jvm.dll+0xdfb96]
V [jvm.dll+0x872be]
V [jvm.dll+0x8e501]
C [java.exe+0x14c5]
C [java.exe+0x69cd]
C [kernel32.dll+0x16fd7]

Now, let's run it with -XX:+ShowMessageBoxOnError JVM option. I get the following message box:

Attaching Visual Studio or Windbg to this process or collecting a crash dump with Dr Watson and opening that crash dump with Windbg shows the following call trace:

ntdll!DbgBreakPoint
jvm!VMError::show_message_box+0x7f
jvm!VMError::report_and_die+0xe7
jvm!report_error+0x2d
jvm!topLevelExceptionFilter+0x54d
jvm!os::os_exception_wrapper+0x7d
msvcr71!except_handler3+0x61
jvm!JavaCalls::call+0x23
jvm!jni_invoke_static+0xb1
jvm!jni_CallStaticVoidMethod+0x86
java+0x209e
java+0x898f
kernel32!GetModuleFileNameA+0x1b4

This shows JVM's error handling frames and does not show that the crash happened in function test.f().

This is because, by default, the First Chance Exceptions are not sent to the System Debugger. And Debuggers receive only the Second Chance Exceptions. The exception in test.f() was a first Chance Exception that was hanlded by JVM. Please see details on First and Second chance exceptions: http://support.microsoft.com/kb/286350

So how can we collect crash dumps that contain the correct stack trace of the crash. Let's try 'adplus' to collect the crash dumps.

Start adplus in crash mode:
D:\\windbg>adplus -crash -pn java.exe

and then start your java process with -XX:+ShowMessageBoxOnError

By default, adplus creates mini dumps on First Chance Exceptions and full memory dumps on Second Chance Exceptions. It can be changed; details here: http://support.microsoft.com/kb/286350

'adplus' would create dump files in a folder like Crash_Mode__Date_08-12-2008__Time_15-12-56PM. Using Windbg, open the dump created at First Chance Exception and it shows the crashing frame as:

test!Java_test_f+0x22

Ah! that's what I was looking for.



Windows 2008 and Windows Vista

Dr. Watson is not available on Windows 2008 and Windows Vista

Starting with Windows Server 2008 and Windows Vista Service Pack1 (SP1), Windows has new error reporting mechanism called 'Windows Error Reporting' or WER. WER can be configured so that full user-mode dumps are collected and stored locally after a user-mode application crashes. This feature is not enabled by default. Enabling this feature requires administrator privileges. To enable and configure the feature, we need to use the following registry values under the
HKEY_LOCAL_MACHINE\\Software\\Microsoft\\Windows\\Windows Error Reporting\\LocalDumps key.

Details here:

http://msdn.microsoft.com/en-us/library/bb787181(VS.85).aspx

And yes, here also we can use 'adplus' to collect Crash Dumps for First Chance and Second Chance Exceptions.



Enjoy Debugging !

About

poonam

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today