Friday Feb 06, 2015

Thread Stuck at readBytesPinned / writeBytesPinned on JRockit

Introduction

Sometimes you will see one or more threads in your application hang while executing a method called either readBytesPinned or writeBytesPinned. This is a common occurrence and does not indicate any JVM-level issue. In this post I will explain what these methods do and why they might block.

Background

Before explaining what these methods do, it is important to introduce the idea of object pinning. Pinning is where we temporarily tag an object on the heap so that the garbage collector will not try to move the object until we remove the tag. Normally, an object might be moved from one address to another if it is being promoted (i.e. being moved from young space to old space) or as part of compaction (defragmentation). But if an object is pinned, the GC will not try to move it until it is unpinned.

So why would we want to pin an object? There are several scenarios where pinning is important, but in the case of readBytesPinned or writeBytesPinned, it is simply a performance optimization. Pinning a buffer (a byte array) during an I/O operation allows us to hand it's address directly to the operating system. Because the buffer is pinned, we do not need to worry that the garbage collector will try to move it to a different address before the I/O operation finishes. If we were not able to pin the buffer, we would need to allocate additional native (off-heap) memory to pass to the OS's native I/O call and also copy data between the on-heap and off-heap buffers. So by pinning the buffer to a constant address on the heap, we avoid both having to do an otherwise redundant native memory allocation and a copy. If this sounds similar to the use case for NIO’s direct buffers, you've got the right idea. Basically, JRockit gives you the best of both worlds, the I/O speed of direct I/O operations, and the safety / convenience of pure-Java memory management (no off-heap allocations).

Let’s Try It!

Now lets try a really contrived example to see what this might look like:

First, we’ll make a named pipe and open it by redirecting some output. (Please don’t forget to kill the cat process when you are done.)

$ mkfifo pipe
$ cat - > pipe &

Now Let’s make a trivial Java program [1] that tries to read from our new pipe.

import java.io.FileInputStream;

public class PipeRead {
    public static void main(String[] args) throws Exception {
        FileInputStream in = new FileInputStream("pipe");
        in.read(new byte[10]);
    }
}

Finally, we compile and run.

$ javac PipeRead.java
$ java PipeRead

Now if we collect a thread dump, we can see that the main thread is blocked waiting for data (that in this case will never come) from our pipe.

"Main Thread" id=1 idx=0x4 tid=2570 prio=5 alive, in native
    at jrockit/io/FileNativeIO.readBytesPinned(Ljava/io/FileDescriptor;[BII)I(Native Method)
    at jrockit/io/FileNativeIO.readBytes(FileNativeIO.java:32)
    at java/io/FileInputStream.readBytes([BII)I(FileInputStream.java)
    at java/io/FileInputStream.read(FileInputStream.java:198)
    at PipeRead.main(PipeRead.java:6)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace

If we were to try HotSpot on the exact same testcase, we would see it doing a blocking read just like JRockit does.

"main" prio=10 tid=0x00007f25e4006800 nid=0xa8a runnable [0x00007f25e9c44000]
   java.lang.Thread.State: RUNNABLE
    at java.io.FileInputStream.readBytes(Native Method)
    at java.io.FileInputStream.read(FileInputStream.java:198)
    at PipeRead.main(PipeRead.java:6)

So even though the top of the stack trace for JRockit has JRockit-specific classes/methods, the JVM itself does not have anything to do with why the thread is stopped. It is simply waiting for input from a data source it is trying to read from.

Troubleshooting

So what should you do when you have a thread that appears stuck in a call to readBytesPinned or writeBytesPinned? That depends entirely on where the application is trying to read data from or write data to.

Lets look at a real-world example of a thread stuck doing a blocking read:

    "ExecuteThread: '2' for queue: 'weblogic.kernel.Default'" id=20 idx=0x2e tid=16946 prio=5 alive, in native, daemon
        at jrockit/net/SocketNativeIO.readBytesPinned(I[BIII)I(Native Method)
        at jrockit/net/SocketNativeIO.socketRead(Ljava/io/FileDescriptor;[BIII)I(Unknown Source)[inlined]
        at java/net/SocketInputStream.socketRead0(Ljava/io/FileDescriptor;[BIII)I(Unknown Source)[inlined]
        at java/net/SocketInputStream.read([BII)I(SocketInputStream.java:113)[optimized]
        at oracle/net/ns/Packet.receive()V(Unknown Source)[inlined]
        at oracle/net/ns/DataPacket.receive()V(Unknown Source)[optimized]
        at oracle/net/ns/NetInputStream.getNextPacket()V(Unknown Source)[optimized]
        at oracle/net/ns/NetInputStream.read([BII)I(Unknown Source)[inlined]
        at oracle/net/ns/NetInputStream.read([B)I(Unknown Source)[inlined]
        at oracle/net/ns/NetInputStream.read()I(Unknown Source)[optimized]
        at oracle/jdbc/driver/T4CMAREngine.unmarshalUB1()S(T4CMAREngine.java:1099)[optimized]
<rest of stack omited>

In the above case, you can tell from the stack trace that the JDBC (database) driver is doing a blocking read from a network socket. So the typical next step would be to see if there is a reason why the expected data may have been delayed (or even never arrive at all). For example, the database server we are talking to could be hung, there could be a network issue that is delaying (or even dropping) the database’s response, or there could be some sort of protocol mismatch where both parties believe it is the other side’s turn to talk. Analyzing log files on both sides may provide  clues as to what happened. If the issue is reproducible, collecting a network trace and analyzing it with a tool like WireShark may also prove useful.

Obviously, this is just one of countless scenarios where a thread may get stuck waiting on some external (to the JVM) resource. But other cases should be similar: you must look further down in the stack, determine what the application is waiting for (where it is expecting to receive data from or trying to send data to) and then troubleshoot the issue from there. Sometimes, tools like lsof, truss, or strace can come in handy here. Very often, this troubleshooting involves investigating other processes or even other machines across your network.

Conclusion

Seeing a thread block temporarily at readBytesPinned or writeBytesPinned is completely normal and does not usually indicate a problem. However, if one or more threads blocks on a call to either of these methods for an unreasonable amount of time, you should examine further down the stack trace and attempt to determine what external resource the Java application is waiting for.

[1] Obviously, this is horrible code. Real production code would include proper exception handling and of course close the FileInputStream when we are finished with it. You have been warned.

Friday Nov 21, 2014

Where did all of these ConstPoolWrapper objects come from?!

Introduction

A very small number of applications using JRockit R28.2.2 and later may suffer performance issues when compared to earlier versions of R28. In this post, I will describe the issue in detail, and explain how to resolve or work around it.

What Changed in R28.2.2

In JRockit, object finalization requires that each instance of a finalizable class be registered with the JVM at object creation time so that the JVM knows to treat the instance differently (add it to the finalizer queue) when the garbage collector determines that the object is no longer live.

While developing R28, we unfortunately introduced a bug where finalizable objects that are instantiated from native code (think JNI's NewObject function) would never get registered. In other words, objects created from JNI, or other native code internal to the JVM, would simply be discarded by the garbage collector as soon as they are determined to be dead. Thankfully, objects created from pure Java code via the new keyword would still be finalized without issue.

While never executing finalizers that are expected to be called is bad enough, this bug indirectly had a much bigger impact. In JRockit, we depend on finalizers to help us prune our shared constant pool (a pool of constants from loaded classes and other long-living strings).

Each time Java code asks for a copy of an individual class's constant pool, we make a copy of the constant pool data for that class and store it on the object heap. This was necessary because JRockit never had a PermGen like HotSpot to keep class meta-data available for direct access from Java. Every copy of the constant pool on the object heap is wrapped in a ConstPoolWrapper object that holds the constant pool data. Each time a new copy is made, we also increment a reference counter for each string in the shared pool so that we can keep track of which strings are still live. When the ConstPoolWrapper is finalized, we decrement the applicable reference counters.

So what happens if the finalize method for each ConstPoolWrapper is never called? All sorts of bad things. The worst case scenario is that we eventually overflow the 32-bit reference counter and when the count loops back to zero, the string is removed from the pool, despite still being live! This can cause all sorts of crashes and other completely unpredictable behavior.

Fortunately, we were able to fix this issue in R28.2.2 , and finalizers for natively instantiated objects now work exactly as expected.

Performance Impact of this Fix

The negative performance impact of finalizable objects has been very well known since almost the very beginning of the Java platform. Finalization adds a significant amount of overhead to garbage collection. At best, every finalizable object we discard needs to be "collected" at least twice: once before finalization and once after. Also, depending on the JVM implementation and the nature of the finalizer's work, often the act of finalization needs to be single-threaded, resulting in scalability issues for multi-core systems.

Now that we have fixed the "forgotten" finalizers bug in R28.2.2, all of a sudden, every copy of a class's constant pool is now a finalizable object again. For the gross majority of applications, use of reflection or other activities that retrieve copies of constant pools are infrequent enough that this has no noticeable performance impact. But there are a few applications out there that can indirectly generate very large numbers of ConstPoolWrapper objects. For such applications, the finalizer thread(s) may have trouble keeping up with the massive number of ConstPoolWrapper objects being added to the finalizer queue. This can result in a significant increase in memory pressure. In a worst case scenario, this additional memory pressure can even lead to an OutOfMemoryError.

Identifying this Issue

If you suspect that you are hitting this issue, the quickest way to confirm is to use the heap_diagnostics command and examine the state of the finalizer queue. If you see tens of thousands of ConstPoolWrapper objects awaiting finalization, you are almost certainly impacted by this issue.

===
Finalizers:
    1271096  37979  77 1233040      0 Total for all Finalizers
    1125384      1  66 1125317      0 => jrockit/reflect/ConstPoolWrapper
===

Another clue would be if you notice that the finalizer thread is busy finalizing ConstPoolWrapper objects. For example, you may see something like the following in a thread dump:

===
"Finalizer" id=7 idx=0x50 tid=17238 prio=8 alive, native_blocked, daemon
    at jrockit/reflect/ConstPoolWrapper.finalize()V(Native Method)
    at jrockit/memory/Finalizer.doFinalize(Finalizer.java:29)
    at jrockit/memory/Finalizer.access$300(Finalizer.java:12)
    at jrockit/memory/Finalizer$4.run(Finalizer.java:186)
    at java/lang/Thread.run(Thread.java:662)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace
===

Basically, seeing "ConstPoolWrapper" just about anywhere should be a considered a red flag that you may be facing this issue.

One final possible hint is if you notice a severe discrepancy between an hprof heap dump and the live-set size. Our hprof code currently may only dump a subset of the entire finalization queue. So if your heap is filled with ConstPoolWrappers awaiting finalization, you may see a big difference between the amount of heap in use and the size of any hprof dump files generated.

Resolving Issue at the Application Level

The majority of applications that we have seen impacted by this issue dynamically create a large number of new classes at runtime. The most common example of this is applications that (mis)use JAXB to instantiate a set of completely new XML parser classes for every single request they process.

The standard troubleshooting steps to follow in this case are to run your application with the -Xverbose:class flag enabled and examine the output to see what kinds of classes are being loaded continuously, even after the application should have reached its steady-state. Once you know the names of the classes that are being generated at runtime, hopefully you can determine why these classes are being created and possibly alter the application to not use classes is such a "disposable" manner.

For example, we have seen several customers who have changed their use of JAXB to create a single XML parser (or one-per-thread to avoid scalability issues) and reuse that as opposed to dynamically recreating the exact same classes again and again.

I should also point out that modifying your application to limit redundant class creation is a good idea on any JVM. Class loading is expensive and your application is very likely to see a performance benefit from the elimination of unneeded class loading regardless of which JVM you use.

JVM-Level Workaround

But of course, it is not always possible to modify your application. So I have made some changes in R28.3.2 to try and help users who run into this issue and can not resolve it at the application level:

1. I have added a new flag, -XX:-UseCPoolGC that will cause the ConstPoolWrappers to no longer be finalizable.

2. I have built overflow protection into the JVM's shared constant pool reference counter so that we do not experience the crashes and other stability issues that R28 versions before R28.2.2 faced.

So by adding this flag to your java command line, you should get the same performance that you did before R28.2.2, but also not face the crashes from overflowing the reference counter.

The one downside is that this could possibly lead to a native memory leak as an application that continuously created new classes with unique constant pool entries may never be able to prune some entries from the shared constant pool. While we have never seen an application where this memory consumption would even be noticeable, it is a risk, at least in theory. As a result, I only recommend using this new flag if you have confirmed that you are hitting the ConstPoolWrapper issue and are unable to resolve the issue by modifying your application.

Obviously a design-level change to JRockit so that it does not depend on finalization like this at all would be the "ideal fix". If we were going to do another major release of JRockit (R29), this would be something worth serious consideration. But given JRockit's legacy status and focus on stability for the remaining planned maintenance releases, I believe the workaround flag is the best choice for helping our users while not risking further unexpected changes in behavior.

Conclusion

If you are using a JRockit release R28.2.2 or later, and you notice unexplained memory pressure, collect heap_diagnostics output and look for a large number of ConstPoolWrappers in the finalizer queue. If you are hitting this issue, consider the application-level changes recommended above, or use the new -XX:-UseCPoolGC flag (added in R28.3.2) to work around the issue at the JVM level.

And finally (pun intended, sadly), you may wish to keep this in mind as another example of why any design that depends on the timely execution of finalizers is flawed. Finalizers are almost always bad news for performance and you simply can't rely on them being called quickly. In short: Java finalizer != c++ destructor.

Tuesday Nov 18, 2014

Where Are My Dump Files?

Intro

In this post I want to write a little bit about what happens when JRockit crashes. Specifically, I want to talk about some of the situations where you may have trouble finding the dump files (text and binary dumps). While this is usually not an issue, it can be very frustrating to experience a JVM crash and not be able to find any of the important data needed to troubleshoot the issue further. Hopefully this post will shed some light on some of the scenarios where these files are not created or not where you expect them to be.

Background

The dump files are described in detail here. For this discussion, you only need to know the difference between the text dump file (jrockit.<pid>.dump) and binary dump file ("core.<pid>" on Linux, "core" on Solaris, and "jrockit.<pid>.mdmp" on Windows).

Default Behavior

The default behavior is for these two files to be written to the current working directory (cwd) of the JVM process. This behavior is consistent with other JVMs (including HotSpot) and most other software. But there are situations where this is not ideal. For example, if the JVM's cwd is on a file system with limited space, you may want to designate another location for crash files to be output. It is often a good idea to isolate crash file output from the rest of the system so that a series of crashes do not exhaust the free space on a filesystem that is needed to keep your application up and running. This reasoning is similar to using a dedicated filesystem for log file output.

Changing the Default Behavior

On Linux and Solaris, one option is to use the tools built into the OS to specify an alternate output location for all binary dump files. On Linux, you can edit the /proc/sys/kernel/core_pattern file to specify a different path and/or naming convention for binary dump files. Likewise, on Solaris you can use coreadm to specify a different path for output. Note that with Linux, your only option for configuration at the OS level is to change the global behavior for all processes, not just JRockit (Solaris thankfully gives you per-process control, if needed). Also note that both of these options only impact where the binary dump file is output; the text dump file will still be output to the cwd of the JVM, regardless of the OS-level settings. This is often not a concern as the size of most text dump files is usually negligible compared to the space requirements for binary dump files.

When JR crashes, another consideration is that the dump file paths printed as part of the error message assume the default OS behavior. If you use either of the above OS-level methods to specify a non-default output location for binary dump files, JRockit has no way of knowing this and will output incorrect paths. It is best to think of the path output for the binary dump as simply an "educated guess".

Another option is to set the JRockit specific environmental variable, JROCKIT_DUMP_PATH. This will work even on Windows. It also differs from the OS-level settings above in that both binary and text dump files will be written to the specified directory. Note that JROCKIT_DUMP_PATH depends on default settings for OS-level configuration. If you use either of the above methods to override the location for binary dump file output, JROCKIT_DUMP_PATH will only impact the location of the text dump file.

When Things Go Wrong...

There are also two well-known exceptions to my description above. In both cases, our signal handler code (or exception handler code on Windows) is not able to run successfully (or at all), therefor causing issues. The result is that JROCKIT_DUMP_PATH may be ignored, or the dump files could be truncated or even completely missing. Depending on the platform, output of both dump files can be dependent on the correct operation of the signal handler.

The first case is when a binary dump is produced via an external signal. The most common scenario is when you use the kill command to send a signal like SIGABRT to intentionally abort the process and generate a binary dump file. On both Linux and Solaris, this will result in the signal being "delivered" to the process's primordial thread. Starting from Java SE 6, the java launcher forks off from the primordial thread as soon as possible and the primordial thread does nothing but wait for the JVM to shutdown. As the signal handlers are never registered for the primordial thread (What would be the point? It doesn't do anything but wait.), a signal delivered to the primordial thread will result in the OS-defined default action, and JRockit will never have a chance to influence the output any binary dump created. This is also why you will not get a text dump in this situation. The recommended way to "artificially" trigger a crash is to use the force_crash diagnostic command and avoid this issue.

The other scenario where you may not find the expected dump files is when the JVM's state is so corrupted that our signal handler can not run without itself crashing. By far, the most common cause of this is a stack overflow. Especially on Windows, it is very common to end up with a 0-byte mdmp (mini-dump) file when you blow the stack. If you ever find missing or truncated dump files, a stack overflow should be the first culprit you suspect.

Conclusion

By default dump files will be output into the JVM's current working directory, but you can override that behavior with OS-level settings (on Linux/Solaris) or the JROCKIT_DUMP_PATH environmental variable. Also remember that JROCKIT_DUMP_PATH may be totally ignored if an external signal is received, and the dump files may never even get created correctly if you suffer a stack overflow.

Tuesday Nov 11, 2014

Never Disable Bytecode Verification in a Production System

I wanted to write a brief post about the security implications of disabling bytecode verification. As the risks are already covered in the CERT secure coding guidelines and many other excellent sources, I have no intention of reproducing those efforts. I just want to highlight one unfortunately common myth that we sometimes still hear regarding verification: That bytecode verification is unnecessary if you only run "trusted" code.

For those of you unfamiliar with bytecode verification, it is simply part of the JVM's classloading process that checks the code for certain dangerous and disallowed behavior. You can (but shouldn't) disable this protection on many JVMs by adding -Xverify:none or -noverify to the Java command line.

Many people still incorrectly believe that the only point of bytecode verification is to protect us from malicious code. If you believe none of the code you are running could possibly be malicious, then bytecode verification is just needless overhead, or so the thinking goes. This is absolutely incorrect. Bytecode verification is indispensable for running a secure application, regardless of the level of trust you have in the sources of each and every class you load and run.

Bytecode verification lies at the foundation of the Java security model. One of the fundamental strengths of Java security is not only its ability to safely sandbox and run untrusted code (i.e. an arbitrary applet from the Internet), but to also make guarantees about the security of trusted code. While programmers in any language must have a solid understanding of security principles and constantly think about the security implications of the code they write, there are many security mistakes that you simply can't make on the Java platform; the bytecode verifier will not let you.

If you disable bytecode verification, you are saying that not only do you trust all of the code you are running to not be malicious, you are also saying you trust every single class you load to be bug-free (at the bytecode level). This is a much higher standard than simply trusting code to not be overtly malicious.

I suspect that one of the reasons for this misconception is that people underestimate just how many different tools and components out there generate and/or modify bytecode. Do you know how your Java EE application server compiles JSP files? Do any of the frameworks in your application dynamically generate new classes during runtime (answer: most likely yes)? How about profiling / instrumentation tools like Introscope or even the method profiling built into JRockit Mission Control?

The reality is, in any sufficiently complex system, Java bytecode is being generated and modified all of the time by a large number of different tools, products, and frameworks. By disabling bytecode verification, you are trusting that all of those components in your entire stack are completely bug-free. I have seen (and sometimes fixed) many bugs in all of these types of software; they are just as likely to contain bugs as any other software.

Does this mean that there is never a reasonable time or place to disable bytecode verification? Not necessarily. One could argue that it may be appropriate in a development environment where restart time is critical and security concerns are low or even non-existent. However, even in most development environments, bytecode verification may help you discover important issues before you move to testing or production. As a general rule I always recommend leaving it enabled.

In conclusion, you should never disable bytecode verification for any system where security is a concern. These options, -Xverify:none and -noverify, were never intended to be used in production environments under any circumstances. If you are disabling bytecode verification (or any products you use have disabled verification), please consider removing these options to eliminate this security risk immediately.

Sunday Apr 27, 2014

JRockit: Artificial GC: Cleaning up Finished Threads

Introduction

Sometimes we see JRockit users who experience frequent full GCs as the JVM tries to clean up after application threads that have finished executing. In this post I'll write about why this happens, how to identify it in your own environment, and what the possible solutions might be.

What exactly is happening?

If there are many threads that have finished execution, but their corresponding java.lang.Thread objects have not yet been garbage collected from the Java heap, the JVM will automatically trigger a full GC event to try and clean up the left over Thread objects.

Why does JRockit do this?

Due to JRockit's internal design, the JVM is unable to deallocate all of the internal resources associated with an execution thread until after the corresponding Thread object has been removed from the Java heap. While this is normally not a problem, if we go too long between garbage collections and the application creates (and then abandons) a lot of threads, the memory and other native resources consumed by these dead threads can grow very large. For example, if I run a test program that constantly "leaks" dead threads, but keeps strong references to each of the corresponding Thread objects, the memory footprint of the JVM can quickly grow to consume many gigabytes of off-heap (native) memory, just to retain data about our (almost) dead threads.

What does this actually look like in practice?

If you suspect that this is happening with your application, the easiest way to confirm is to enable JRockit's memdbg verbose logging module (*1). If a full collection is triggered by threads waiting for cleanup of their Thread objects, you will see something like the following at the start of the collection event (*2):

===
[DEBUG][memory ] [OC#7] GC reason: Artificial, description: Cleaning up Finished Threads.
===

Another verbose logging module that will give you information about these thread-related full GC events would be the thread module:

===
[INFO ][thread ] Trigging GC due to 5772 dead threads and 251 threads since last collect.
===

The above message indicates that there are currently 5772 threads that have finished executing, but the JVM is still waiting for their Thread objects to be collected by the GC subsystem before it can completely deallocate all of the resources associated with each thread. We also see that 251 new threads have been created since the last full GC triggered by the thread management code. In other words, "last collect" really means "last collection triggered by the thread cleanup code"; this number will not be reset back to zero by a "natural" GC event, only a thread cleanup GC will reset this number.

What can I do to avoid these collections?

The best solution is of course to not create so many threads! Threads are expensive. If an application is continuously generating (and then abandoning) so many new threads that a full GC for every 250 threads becomes a performance concern, perhaps the application can be redesigned to not use threads in such a disposable manner. For example, thread pooling may be a reasonable solution. Even if you run your application on HotSpot, creating so many temporary threads is very likely a performance issue. Changing your application to use threads in a more efficient manner is worth considering, regardless of which JVM you use.

But of course sometimes redesigning your application is not a viable option. Perhaps the code that is generating all of these threads is from some third party and you can't make changes. Perhaps you do plan on changing the application, but in the meantime, you need to do something about all of these full GC events. There is a possible JVM-side "solution", but it should be considered a temporary workaround.

JRockit has two diagnostic options that allow you to tune this behavior: MaxDeadThreadsBeforeGC and MinDeadThreadsSinceGC. There is a subtle difference between them, but basically they both allow you to modify the thresholds used to decide when the thread management code will trigger a full GC.

MaxDeadThreadsBeforeGC (default: 250)

This option specifies the number of dead threads (threads that have finished executing, but can not be cleaned up because we are waiting for the corresponding Thread objects to be removed from the Java heap) that will trigger an artificial full GC.

MinDeadThreadsSinceGC (default: 250)

This option specifies a minimum number of new threads that are created between each thread-cleaning GC event. The idea is that even if we trigger a full GC, some subset of the dead threads may still remain because there are still strong references to the Thread objects. If the number dead threads that survive a full GC are higher than MaxDeadThreadsBeforeGC, we could end up triggering a full GC for every single new thread that is created. The MinDeadThreadsSinceGC option guarantees that at least a certain number of new threads have been created since the last thread-cleaning GC, even if the the number of dead threads remains above MaxDeadThreadsBeforeGC.

Note that these two options correspond to the two numbers output from the Xverbose:thread output:

===
[INFO ][thread ] Trigging GC due to 5772 dead threads and 251 threads since last collect.
===

The first number, 5772, is the number of dead threads waiting to be cleaned up and is compared to the value of MaxDeadThreadsBeforeGC. The second number, 251, is the number of new threads that have been spawned since the last thread-cleaning full GC. This second number is compared to MinDeadThreadsSinceGC. Only if both of these numbers are above the set thresholds is a thread-cleaning GC triggered.

If you need to reduce the frequency of thread-cleaning full collections and are not able to modify the application, I would recommend to try setting MinDeadThreadsSinceGC to a larger number. The trade-off here is that you many see an increase in native memory consumption as the JVM is not able to proactively clean up after finished threads as often. Obviously, you should do a full round of performance and load testing before using either of these options in a production environment.

One more important point: these are both undocumented diagnostic options. On R28, in order to use either of these two options, you must add the -XX:+UnlockDiagnosticVMOptions option before any diagnostic options on the command line. For example, to set MinDeadThreadsSinceGC to 1000, you would use a command line like below:

===
$ java -XX:+UnlockDiagnosticVMOptions -XX:MaxDeadThreadsBeforeGC=1000 MyJavaApp
===

If you have read this entire post, understand the risks (increased native memory consumption) and are willing to thoroughly load test your application before rolling out to production, either of these options should be safe to use.

(*1) For details on using the Xverbose option, please have a look at the command line reference:

(*2) Please note that this example output is from R28. The corresponding verbose output from R27 will refer to the GC cause as "Floating Dead Threads".

Tuesday Apr 22, 2014

JRockit's New Shutdown Verbose Logging Option

Have you ever had to troubleshoot an issue where the JVM seems to shutdown (in other words, disappear, but not crash), but you have no idea what triggered the shutdown? Perhaps some other process sent a SIGTERM? Or maybe some 3rd-party library you don't even have the source code for decided to call System.exit()? Another common scenario is that the last of your application's non-daemon threads have exited (normally, or as the result of an uncaught exception).

For R28.3.2, I created a new verbose logging module that instruments the JVM shutdown code and tries to log details regarding each possible event that can trigger a shutdown. In JRockit, a logging module is enabled with the "-Xverbose" command-line option, so the new module "shutdown" would be enabled  with "-Xverbose:shutdown" (*). Lets see some examples of this in action.

Here is what a shutdown initiated by a call to System.exit() would look like. (Please note the truncated module name.):

===
[INFO ][shutdow] JVM_Halt() called in response to:
[INFO ][shutdow] System.exit() or direct call from JNI code
===

Here is an example from using the "kill <pid>" command to shutdown the JVM:

===
[INFO ][shutdow] JVM_Halt() called in response to:
[INFO ][shutdow] Signal 15 received from PID:21152 UID:142
===

Here is one last example from a process where the main method has returned (and there where no other application threads still running).

===
[INFO ][shutdow] No remaining non-daemon Java threads.
===

All you need to do to get log output like the above is to run your application with a JRockit >= R28.3.2 and add "-Xverbose:shutdown" to the java command line. The logging itself is absolutely minimal and is only output during JVM shutdown. In fact, it would not be unreasonable to enable this new logging module as a proactive measure, even if you were not currently facing an unexplained shutdown issue.

While there is already a wide array of tools and techniques available for dealing with this kind of scenario (SystemTap on Linux and DTrace on Solaris being two of my favorites), I thought it would be useful to have something built into the JVM itself. The main advantage is in having a simple, platform-independent way to troubleshoot these kind of issues across all of JRockit's supported platforms.

It is important to understand that this new module can only output the reason for graceful shutdowns. If the JVM crashes, or is forcibly killed (for example via SIGKILL on a Posix system), the JVM will not have an opportunity to output information about the root cause of it's unfortunate demise. Outside of a JVM crash (which will hopefully leave behind other artifacts like a crash log or core file), another common cause of sudden process disappearances is forced termination by the OS. On Linux, in response to a system-wide lack of resources, a component called OOM killer will pick a specific process and forcibly kill it. As another example, on recent versions of Windows, closing the GUI window of the command line shell also seems to suddenly force any child processes to die without warning. In circumstances like these, the new verbose module will not be able to log the cause of the shutdown (although the lack of output itself would indicate either a forced shutdown or a crash, hopefully helping you to troubleshoot the issue further).

Obviously, there are scenarios where you will need to use other tools or techniques to further narrow down your issue even after trying the new shutdown logging module. But hopefully having this cross-platform and easy-to-use tool built right into the JVM will come in handy for resolving simple issues and at least point you in the right direction when starting a more complicated investigation.

(*) For details on using the Xverbose option, please have a look at the command line reference:
http://docs.oracle.com/cd/E15289_01/doc.40/e15062/optionx.htm#i1020876

Tuesday Feb 18, 2014

Finite Number of Fat Locks in JRockit

Introduction

JRockit has a hard limit on the number of fat locks that can be "live" at once. While this limit is very large, the use of ever larger heap sizes makes hitting this limit more likely. In this post, I want to explain what exactly this limit is and how you can work around it if you need to.

Background

Java locks (AKA monitors) in JRockit basically come in one of two varieties, thin and fat. (We'll leave recursive and lazy locking out of the conversation for now.) For a detailed explanation of how we implement locking in JRockit, I highly recommend reading chapter 4 of JR:TDG. But for now, all that you need to understand is the basic difference between thin and fat locks. Thin locks are lightweight locks with very little overhead, but any thread trying to acquire a thin lock must spin until the lock is available. Fat locks are heavyweight and have more overhead, but threads waiting for them can queue up and sleep while waiting, saving CPU cycles. As long as there is only very low contention for a lock, thin locks are preferred. But if there is high contention, then a fat lock is ideal. So normally a lock will begin its life as a thin lock, and only be converted to a fat lock once the JVM decides that there is enough contention to justify using a fat lock. This conversion of locks between thin and fat is known as inflation and deflation.

Limitation

One of the reasons we call fat locks "heavyweight" is that we need to maintain much more data for each individual lock. For example, we need to keep track of any threads that have called wait() on it (the wait queue) and also any threads that are waiting to acquire the lock (the lock queue). For quick access to this lock information, we store this information in an array (giving us a constant lookup time). We'll call this the monitor array. Each object that corresponds to a fat lock holds an index into this array. We store this index value in a part of the object header known as the lock word. The lock word is a 32-bit value that contains several flags related to locking (and the garbage collection system) in addition to the monitor array index value (in the case of a fat lock). After the 10 flag bits, there are 22 bits left for our index value, limiting the maximum size of our monitor array to 2^22, or space to keep track of just over 4 million fat locks.

Now for a fat lock to be considered "live", meaning it requires an entry in the monitor array, it's object must still be on the heap. If the object is garbage collected or the lock is deflated, it's slot in the array will be cleared and made available to hold information about a different lock. Note that because we depend on GC to clean up the monitor array, even if the object itself is no longer part of the live set (meaning it is eligible for collection), the lock information will still be considered "live" and can not be recycled until the object gets collected.

So what happens when we use up all of the available slots in the monitor array? Unfortunately, we abort and the JVM exits with an error message like this:

===
ERROR] JRockit Fatal Error: The number of active Object monitors has overflowed. (87)
[ERROR] The number of used monitors is 4194304, and the maximum possible monitor index 4194303
===

Want to see for yourself? Try the test case below. One way to guarantee that a lock gets inflated by JRockit is to call wait() on it. So we'll just keep calling wait() on new objects until we run out of slots.

=== LockLeak.java
import java.util.Collections;
import java.util.LinkedList;
import java.util.List;

public class LockLeak extends Thread {

      static List<Object> list  = new LinkedList<Object>();

      public static void main(String[] arg) {
            boolean threadStarted = false;
            for (int i = 0; i < 5000000; i++) {
                  Object obj = new Object();
                  synchronized(obj) {
                      list.add(0, obj);
                      if (!threadStarted) {
                          (new LockLeak()).start();
                          threadStarted = true;
                      }
                      try {
                          obj.wait();
                      } catch (InterruptedException ie) {} // eat Exception
                  }
            }
            System.out.println("done!"); // you must not be on JRockit!
            System.exit(0);
      }

      public void run() {
            while (true) {
                  Object obj = list.get(0);
                  synchronized(obj) {
                      obj.notify();
                  }
            }
      }

}
===

(Yes, this code is not even remotely thread safe. Please don't write code like this in real life and blame whatever horrible fate that befalls you on me. Think of this code as for entertainment purposes only. You have been warned.)

Resolution

While this may seem like a very serious limitation, in practice it is very unlikely to see even the most demanding application hit this limit. The good news is, even if you do have a system that runs up against this limit, you should be able to tune around the issue without too much difficulty. The key point is that GC is required to clean up the monitor array. The more frequently you collect your heap, the quicker "stale" monitor information (lock information for an object that is no longer part of the live set) will be removed.

As an example, one of our fellow product teams here at Oracle recently hit this limit while using a 50GB heap with a single space collector. By enabling the nursery (switching to a generational collector), they were able to completely avoid the issue. By proactively collecting short-lived objects, they avoided filling up the monitor array with entries for dead objects (that would otherwise have to wait for a full GC to be removed).

One other possible solution may be to set the -XX:FatLockDeflationThreshold option to a value below the default of 50 to more aggressively deflate fat locks. While this does work well for simple test cases like LockLeak.java above, I believe that more aggressive garbage collection is more likely to resolve any issues without a negative performance impact.

Either way, we have never seen anyone hit this problem that was not able to tune around the limitation very easily. It is hard to imagine that any real system will ever need more than 4 million fat locks all at once. But in all seriousness, given JRockit's current focus on stability and the lack of a use case that requires more, we are almost certainly not going to ever make the significant (read: risky) changes that removing or expanding this limit would require. The good news is that HotSpot does not seem to have a similar limitation.

Conclusion

You are very unlikely to ever see this issue unless you are running an application with a very large heap, a lot of lock contention, and very infrequent collections. By tuning to collect dead objects that correspond to fat locks faster, for example by enabling a young collector, you should be able to avoid this limit easily. In practice, no application today (or for the near future) will really need over 4 million fat locks at once. As long as you help the JVM prune the monitor array frequently enough, you should never even notice this limit.

Sunday Feb 02, 2014

Inflation System Properties

I wanted to write a quick post about the two inflation-related system properties: sun.reflect.inflationThreshold and sun.reflect.noInflation. There seems to be a lot of confusion on Oracle's forums (and the rest of the net) regarding their behavior. Since neither of these properties is officially documented by us, I thought an informal explanation here might help some people.

There are a ton of good resources out on the net that explain inflation in detail and why we do it. I won't try to duplicate the level of detail of those efforts here. But just to recap:

There are two ways for Java reflection to invoke a method (or constructor) of a class: JNI or pure-Java. JNI is slow to execute (mostly because of the transition overhead from Java to JNI and back), but it incurs zero initial cost because we do not need to generate anything; a generic accessor implementation is already built-in. The pure-Java solution runs much faster (no JNI overhead), but has a high initial cost because we need to generate custom bytecode at runtime for each method we need to call. So ideally we want to only generate pure-Java implementations for methods that will be called many times. Inflation is the technique the Java runtime uses to try and achieve this goal. We initially use JNI by default, but later generate pure-Java versions only for accessors that are invoked more times than a certain threshold. If you think this sounds similar to HotSpot method compilation (interpreter before JIT) or tiered compilation (c1 before c2), you've got the right idea.

This brings us to our two system properties that influence inflation behavior:

sun.reflect.inflationThreshold

This integer specifies the number of times a method will be accessed via the JNI implementation before a custom pure-Java accessor is generated. (default: 15)

sun.reflect.noInflation

This boolean will disable inflation (the default use of JNI before the threshold is reached). In other words, if this is set to true, we immediately skip to generating a pure-Java implementation on the first access. (default: false)

There are a few points I would like to make about the behavior of these two properties:

1. noInflation does NOT disable the generation of pure-Java accessors, it disables use of the JNI accessor. This behavior is the exact opposite of what many users assume based on the name. In this case, "inflation" does not refer to the act of generating a pure-Java accessor, it refers to the 2-stage process of using JNI to try and avoid the overhead of generating a pure-Java accessor for a method that may only be called a handful of times. Setting this to true means you don't want to use JNI accessors at all, and always generate pure-Java accessors.

2. Setting inflationThreshold to 0 does NOT disable the generation of pure-Java accessors. In fact, it has almost the exact opposite effect! If you set this property to 0, on the first access, the runtime will determine that the threshold has already been crossed and will generate a pure-Java accessor (which will be used starting from the next invocation). Apparently, IBM's JDK interprets this property differently, but on both of Oracle's JDKs (OracleJDK and JRockit) and OpenJDK, 0 will not disable generation of Java accessors, it will almost guarantee it. (Note that because the first invocation will still use the JNI accessor, any value of 0 or less will behave the same as a setting of 1. If you want to generate and use a pure-Java accessor from the very first invocation, setting noInflation to true is the correct way.)

So there is no way to completely disable the generation of pure-Java accessors using these two system properties. The closest we can get is to set the value of inflationThreshold to some really large value. This property is a Java int, so why not use Integer.MAX_VALUE ((2^31)-1)?

$ java -Dsun.reflect.inflationThreshold=2147483647 MyApp

This should hopefully meet the needs for anyone looking to prevent continuous runtime generation of pure-Java accessors.

For those of you interested in all of the (not really so) gory details, the following source files (from OpenJDK) correspond to most of the behavior I have described above:

jdk/src/share/classes/sun/reflect/ReflectionFactory.java
jdk/src/share/classes/sun/reflect/NativeMethodAccessorImpl.java
jdk/src/share/classes/sun/reflect/DelegatingMethodAccessorImpl.java
jdk/src/share/classes/sun/reflect/NativeConstructorAccessorImpl.java
jdk/src/share/classes/sun/reflect/DelegatingConstructorAccessorImpl.java

As with anything undocumented, you should not rely on the behavior of these options (or even their continued existence). The idea is you should not normally need to set these properties or even understand what inflation is; it should be transparent and "just work" right out of the box. The inflation implementation could change at any point in the future without notice. But for now, hopefully this post will help prevent some of the confusion out there.

Tuesday Jan 21, 2014

JRockit R28 issue with exact profiling and generics

Some users of JRockit R28 may have noticed problems using Mission Control's exact profiling on methods that use Java's generics facility. Invocation counters for these methods would simply not respond; calling each method would fail to increment the corresponding call count.

For exact method profiling in R28, we replaced our homegrown bytecode instrumentation solution with Apache's Byte Code Engineering Library (BCEL). A version of BCEL was already included in the JDK as an internal component of JAXP, and using BCEL helped provide a cleaner code generation pipeline.

The problem was that while the internal version of BCEL contained in the JDK is very stable and works fine for the very narrow use cases JAXP requires, there were problems using it to instrument arbitrary code as needed by Mission Control Console's profiling tool.

One of those issues was that support for Java generics was never fully implemented in BCEL. In particular, instrumenting a method with generic code could produce bytecode with inconsistencies between the LocalVariableTable (LVT) attribute and the LocalVariableTypeTable (LVTT) attribute (Please see the class file format for details). Thankfully, this issue was found and fixed (in development branches) by the BCEL project:

Bug 39695 - java.lang.ClassFormatError: LVTT entry for 'local' in class file org/shiftone/jrat/test/dummy/CrashTestDummy does not match any LVT entry

Unfortunately, the JDK version of BCEL predated this fix. So when JRockit tried to instrument such methods using BCEL, the new method's bytecode would be invalid and fail subsequent bytecode validation, leaving the original, uninstrumented, version of the method in place.

While I briefly considered adding code on the JRockit side to work around the BCEL issue, it seemed that fixing the version of BCEL in the JDK itself was The Right Thing to do here. Unfortunately for me, the fix for bug 39695 was based on of a version of BCEL that was much more recent than the one contained in the JDK, so I needed to port over a lot of other code to get things working.

JDK-8003147: port fix for BCEL bug 39695 to our copy bundled as part of jaxp

My port of the BCEL fix and other needed code went into 5u45, 6u60 and 7u40. Note that for Java 5, our only option was to put the fix into a CPU release, as we no longer provide non-CPU releases for Java 5. This means that the exact version of JRockit this fix made it into depends on the major Java version: For Java 5: R28.2.7. For Java 6: R28.2.9. As the LVT/LVTT would normally only be included with debug builds, recompiling affected classes without the -g flag should also be a viable workaround for users of earlier releases.

Hopefully, not too many users were impacted by this issue. As explained very well in JR:TDG, sampling-based profiling (like that provided by Flight Recorder), is almost always a better choice than exact profiling. But this story is interesting for another reason: it is a great example of how depending on internal (not officially documented) classes within the JDK is almost always a really bad idea (*1). Even we have been known to get bit.

(*1) Upcoming modularization of the Java Class Library is expected to do a better job preventing outside use of internal JDK classes. Not that it would have helped here.

Wednesday Jan 15, 2014

JRockit R27.8.1 and R28.3.1 versioning

As part of today's CPU release, JRockit R27.8.1 and R28.3.1 are now available to our support customers for download from MOS. (If you don't have a support contract, upgrading to Java 7 update 51 is the way to go.)

I just wanted to post a brief comment about our versioning scheme. It seems that many people have noticed that we have increased the "minor" version numbers for both R27 and R28. For example, R28 suddenly went from R28.2.9 to R28.3.1. Please let me assure you: these are just ordinary maintenance releases, not feature releases. There is zero significance to the jump in minor version number.

The reasoning behind the jump is simple: fear of breaking stuff. For as long as we have used the Rxx.y.z versioning scheme for JRockit, y and z have been single digits. For better or worse, version strings are often read and parsed by all sorts of tools, scripts, and sometimes even the Java applications themselves. While R28.2.10 may have been the most intuitive choice for today's release, we didn't want to risk breaking anyone's system that somehow depended on these numbers being single digits. So why R28.3.1 as opposed to R28.3.0? We thought that a dot zero release would sound too much like a feature release, so to help emphasize the fact that this is just another maintenance release, we went to .1 instead of .0. R27 had an even bigger sudden jump, from R27.7.7 to R27.8.1. This was done to synchronize the last version digits between R27 and R28 to make it easier to tell what versions were released at the same time (and hence contain the same security fixes).

We have actually done this once before in the past, when R27 jumped from R27.6.9 to R27.7.1. Because so many JRockit users had already moved on to R28 by then, that bump seems to have gotten a lot less attention than today's release.

So in summary, all recent JRockit releases (R27.6.1 and later for R27. R28.2.1 and later for R28.) are maintenance releases. If you are still using JRockit, please plan to upgrade as soon as possible to get these important fixes. (Or even better, make the move to Java 7!)

Monday Jan 13, 2014

<< "Hello, Blog!" MSGBOX >>

Welcome to my new work blog!

For those of you that don't know me, a quick introduction:

I am a member of the Java SE Sustaining Engineering team which is made up of the former Sun and BEA (JRockit) Java sustaining teams. My work is divided between our two JVMs (HotSpot and JRockit) and the various Java client technologies (Java2D, AWT, Swing, JavaWS, applets, Java Sound, etc.). Currently, most of my time is still spent working on JRockit. I am based out of Oracle's Akasaka, Tokyo office. In my spare time, I enjoy programming.

My plan is to post regularly about anything interesting I come across related to Java or even software development in general. There will probably be a lot of posts about JRockit for the immediate future, but I am also looking forward to talking more about HotSpot as the JRockit user base continues the move to JDK7 and beyond.

One of the most fun and exciting things about working on Java is that our users are programmers, just like us. Interacting with the people at user group meetings, JavaOne, and other conventions is always a complete blast. I started this blog to have another outlet to communicate with my fellow programmers.

Thank you for visiting my new blog! I look forward to your comments and feedback.
About

I am a member of the Java SE Sustaining Engineering team. I work on both JVMs (HotSpot and JRockit) and the various client technologies (AWT, Swing, Java2D, etc.). I will post mostly about the work I do and any interesting programming, troubleshooting, tuning tips or other random stuff I think somebody out there might want to read about.

Search

Categories
Archives
« August 2015
SunMonTueWedThuFriSat
      
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
     
Today