X

Buck's Blog

Recent Posts

Repost: Understanding Compressed References

This is a repost of an article that was published on October 18th, 2010, to Oracle's JRockit Blog at https://blogs.oracle.com/jrockit/entry/understanding_compressed_refer The JRockit Blog will soon be discontinued, at which point the original posts will no longer be accessible to the public. This post is one of several that contained information that I believe is still of interest to users of JRockit. I have decided to repost such articles here on my own blog so that they remain available for reference. Coincidentally, I also happen to be the original author of this particular post. -Buck This post is a guest post from David Buck, our man in Tokyo. David (Well, everyone that knows him calls him Buck, but until that time you will have to call him David) has offered to write a few guest posts on deep JVM issues, and this is the first one. --- Compressed references are an optimization that helps give 64-bit versions of JRockit a considerable speed boost (sometimes as much as 15% for memory intensive workloads). In this post, we will talk about what they are, how they work, and some of the issues you might run into when compressed references are enabled. Background 'Because of Java's Write Once, Run Anywhere design, migrating from a 32-bit to a 64-bit JVM is usually painless as your Java code can be left untouched (as long as it doesn't rely on JNI). By simply swapping JVMs, you gain the advantages of running as a 64-bit process (huge Java heap sizes, more CPU registers, etc) without even having to recompile your Java code. This is exactly what the Java platform has always promised: allowing applications to take advantage of new hardware and advances in virtual machine technology without the need to update the application itself. However, there are several tradeoffs (usually performance related) that must be considered when choosing between a 32-bit and 64-bit JVM. In some circumstances, the increased code size and pointer length of a 64-bit process can result in less cache efficiency and heaver demands on CPU memory bandwidth. While each application is different and needs to be load-tested to find what JVM will give you the best performance, a 32-bit JVM will often run the same Java application faster than its 64-bit counterpart. Compressed references represent a solution that allows your application running on a 64-bit JRockit JVM to enjoy some of the performance benefits of a 32-bit JVM while retaining the advantages of a 64-bit JVM. When you run with a Java heap size that is small enough, compressed references will allow use of 32-bit object references (pointers) on a 64-bit JRockit JVM. Use of 32-bit pointers not only allows for much faster performance, it also reduces the amount of heap memory required to store any Java object that references other objects. The Details Fortunately, you do not need to understand the details of how compressed references work to benefit from the performance advantages they provide. Anyone out there who just wants the need-to-know information, please feel free to jump to the "Usage" section below. For those of you who think compressed references sound too-good-to-be-true and love details, please read on! How do we use 32-bit pointers with a 64-bit address space? Well, some behind- the-scenes trickery is involved to give your applications the best of both worlds. There are 3 types of compressed references and each one corresponds to maximum heap size supported: 4GB, 32GB and 64GB. The easiest scenario to understand is the use of 4GB compressed references where the Java heap is small enough that we can store the entire heap in the first 4GB of address space. With the heap within the 32-bit addressable range, we can simply use a 32-bit pointer as-is to point to Java objects on the heap. Because this is the simplest form of compressed references to implement, it is also the oldest. JRockit has featured 4GB compressed references for many years now. Once you cross the 4GB barrier, things get a little bit more complicated. We can no longer directly use a 32-pointer to reference all of the objects in the heap. We are helped by a fortunate aspect of JRockit's internal design: Every single object on the heap is aligned to an 8 byte boundary. In other words, every single object address will always be a multiple of 8. Those who are used to bit fiddling will quickly realize that the last 3 bits of any such address will always be 0. In other words, we are only "using" the higher order 29 bits of each 32-bit heap address. The key to making use of these "unused" bits is to rotate the each address by 3 bits to the right before storing each heap address, and then rotate them back 3 places left later when we need to de-reference them. This allows us to address a 35-bit address space (35GB) with a 32-bit pointer. This may sound complicated, but a rotation like this is executed in a single machine code instruction and the performance impact of rotating the address for each access is close to negligible (when compared to the simpler under-4GB compressed reference, we've seen at most a 2% performance hit). These types of compressed references are known as 32GB compressed references. Every year, the demands applications place on the JVM for memory and performance increase dramatically. Especially for J2EE server side application, the use of larger and more complicated frameworks combined with ever more ambitious projects require performance and functionality unimaginable only a few years ago. Case in point: we have more customer than ever who are using larger than 32GB Java heaps. For these customers, we have one last type of compressed reference: 64GB. For heap sizes between 32GB and 64GB, a variation on the bit-rotation trick described above is used. With 64GB compressed references, JRockit will limit itself to only storing objects on 16 byte boundaries. For the price we pay in the form of a very small increase in fragmentation (the wasted space between objects that don't each consume an exact multiple of 16 bytes), we now have 4 bits for each 32-bit address that are guaranteed to be 0 and therefore can be shifted to give us a 36-bit (64GB) range. The point is that compressed references use every single bit of a 32-bit value to reference as much address space as possible. By making a few assumptions about where we store the Java heap (within the 64-bit address space) and how we store the addresses that point to it, we can avoid the overhead of using 64-bit pointers even though we are running on a 64-bit JVM. Usage For most users, compressed references work perfectly right out of the box. In fact, you may be using them already and not know about it! Depending on the maximum heap size (-Xmx) specified on the Java command line, JRockit will automatically enable the use of compressed references. With JRockit R28, the following defaults are used. -Xmx: <= 3GB -- 4GB compressed references -Xmx: <= 25GB -- 32GB compressed references -Xmx: <= 57GB -- 64GB compressed references   Earlier versions of JRockit (R27 and earlier) only supported 4GB compressed references. If you are using a larger heap size (still under 64GB), you may want to consider upgrading to JRockit R28 so you can gain the advantage of compressed references. From JRockit R26.4, compressed references are enabled by default for all heap sizes below 4GB. While JRockit's out-of-the-box behavior is ideal for most users, you can disable compressed references or even specify the type of compressed reference used (R28 only) For more information on overriding JRockit's default behavior, see the on-line documentation: [ JRockit R28 Command-Line Reference -XXcompressedRefs ] [ JRockit R26/R27 Command-Line Reference -XXcompressedRefs ]   Complications Occasionally the use of compressed references can cause problems if the JVM runs low on available address space below the 4GB barrier. There are other data structures and code that can only be stored in the first 4GB of address space (usually because they are also referenced by 32-bit pointers at some point). When using 4GB compressed references, the entire Java heap is also stored in this valuable under-4GB address range, possibly leading to native OutOfMemoryErrors (OOME). In practice, 3GB will be close to the largest heap size you'll want to use with 4GB compress references. Because of the practically limitless address space that can be reserved by a 64-bit process, native OOMEs are very rare on 64-bit platforms. If you are seeing a native OOME with a 64-bit JRockit and are using 4GB compressed references, exhaustion of the below-4GB address space is most likely the cause of your unhappiness. Luckily this problem can be fixed very easily. On more recent JRockit releases (R28), manually forcing JRockit to use 32GB compressed references (as opposed to 4GB compressed references) will allow the heap to be stored above the 4GB range leaving ample room for any other data that needs to be stored in this valuable address space.   $java -Xmx3584m -XXcompressedRefs:size=32GB MyWonderfullJavaApp   On older releases (R27 and earlier), only 4GB compressed references were supported, so the only option is to disable compressed references:   $java -Xmx3584m -XXcompressedRefs=0 MyWonderfullJavaApp   Again, R26/R27 users who have heap sizes too big for 4GB compressed references will want to consider upgrading to R28 take advantage of the added compressed reference support for larger heap sizes. Conclusion Compressed references can really improve the performance of applications that run on 64-bit JRockit. For many applications, it can smooth the transition from a 32-bit JVM by fixing what would otherwise be a substantial performance hit. Even better, they are simple to use and work right out of the box. In fact, you may already be benefiting from them already! PS. The HotSpot JVM also now supports a similar feature known as CompressedOops.  

This is a repost of an article that was published on October 18th, 2010, to Oracle's JRockit Blog at https://blogs.oracle.com/jrockit/entry/understanding_compressed_refer The JRockit Blog will soon be...

Repost: Class Loading Deadlocks Part 2

This is a repost of an article that was published on May 10th, 2010, to Oracle's JRockit Blog at https://blogs.oracle.com/jrockit/entry/class_loading_deadlocks_1 The JRockit Blog will soon be discontinued, at which point the original posts will no longer be accessible to the public. This post is one of several that contained information that I believe is still of interest to users of JRockit. I have decided to repost such articles here on my own blog so that they remain available for reference. -Buck Mattis follows up on his previous post with one more expose on Class Loading Deadlocks As I wrote in a previous post, the class loading mechanism in Java is very powerful. There are many advanced techniques you can use, and when used wrongly you can get into all sorts of trouble. But one of the sneakiest deadlocks you can run into when it comes to class loading doesn't require any home made class loaders or anything. All you need is classes depending on each other, and some bad luck. First of all, here are some basic facts about class loading: 1) If a thread needs to use a class that is not yet loaded, it will try to load that class 2) If another thread is already loading the class, the first thread will wait for the other thread to finish the loading 3) During the loading of a class, one thing that happens is that the <clinit> method of a class is being run 4) The <clinit> method initializes all static fields, and runs any static blocks in the class. Take the following class for example: class Foo { static Bar bar = new Bar(); static { System.out.println("Loading Foo"); } } The first time a thread needs to use the Foo class, the class will be initialized. The <clinit> method will run, creating a new Bar object and printing "Loading Foo"   But what happens if the Bar object has never been used before either? Well, then we will need to load that class as well, calling the Bar <clinit> method as we go. Can you start to see the potential problem here? A hint is in fact #2 above. What if another thread is currently loading class Bar? The thread loading class Foo will have to wait for that thread to finish loading. But what happens if the <clinit> method of class Bar tries to initialize a Foo object? That thread will have to wait for the first thread, and there we have the deadlock. Thread one is waiting for thread two to initialize class Bar, thread two is waiting for thread one to initialize class Foo. All that is needed for a class loading deadlock is static cross dependencies between two classes (and a multi threaded environment): class Foo { static Bar b = new Bar(); }   class Bar { static Foo f = new Foo(); } If two threads cause these classes to be loaded at exactly the same time, we will have a deadlock.   So, how do you avoid this? Well, one way is of course to not have these circular (static) dependencies. On the other hand, it can be very hard to detect these, and sometimes your design may depend on it. What you can do in that case is to make sure that the classes are first loaded single threadedly, for example during an initialization phase of your application. The following program shows this kind of deadlock. To help bad luck on the way, I added a one second sleep in the static block of the classes to trigger the unlucky timing. Notice that if you uncomment the "//Foo f = new Foo();" line in the main method, the class will be loaded single threadedly, and the program will terminate as it should. public class ClassLoadingDeadlock { // Start two threads. The first will instansiate a Foo object, // the second one will instansiate a Bar object. public static void main(String[] arg) { // Uncomment next line to stop the deadlock // Foo f = new Foo(); new Thread(new FooUser()).start(); new Thread(new BarUser()).start(); } }   class FooUser implements Runnable { public void run() { System.out.println("FooUser causing class Foo to be loaded"); Foo f = new Foo(); System.out.println("FooUser done"); } } class BarUser implements Runnable { public void run() { System.out.println("BarUser causing class Bar to be loaded"); Bar b = new Bar(); System.out.println("BarUser done"); } } class Foo { static { // We are deadlock prone even without this sleep... // The sleep just makes us more deterministic try { Thread.sleep(1000); } catch(InterruptedException e) {} } static Bar b = new Bar(); } class Bar { static { try { Thread.sleep(1000); } catch(InterruptedException e) {} } static Foo f = new Foo(); }  

This is a repost of an article that was published on May 10th, 2010, to Oracle's JRockit Blog at https://blogs.oracle.com/jrockit/entry/class_loading_deadlocks_1 The JRockit Blog will soon be...

Repost: Class Loading Deadlocks Part 1

This is a repost of an article that was published on March 1st, 2010, to Oracle's JRockit Blog at https://blogs.oracle.com/jrockit/entry/class_loading_deadlocks The JRockit Blog will soon be discontinued, at which point the original posts will no longer be accessible to the public. This post is one of several that contained information that I believe is still of interest to users of JRockit. I have decided to repost such articles here on my own blog so that they remain available for reference. -Buck Mattis keeps going strong, in this installment you get to learn everything you never you knew you may need to know about class loaders. As I wrote in a previous post, the class loading mechanism in Java is very powerful. There are many advanced techniques you can use, and when used wrongly you can get into all sorts of trouble. But one of the sneakiest deadlocks you can run into when it comes to class loading doesn't require any home made class loaders or anything. All you need is classes depending on each other, and some bad luck. First of all, here are some basic facts about class loading: 1) If a thread needs to use a class that is not yet loaded, it will try to load that class 2) If another thread is already loading the class, the first thread will wait for the other thread to finish the loading 3) During the loading of a class, one thing that happens is that the method of a class is being run 4) The method initializes all static fields, and runs any static blocks in the class. Take the following class for example:   class Foo { static Bar bar = new Bar(); static { System.out.println("Loading Foo"); } }   The first time a thread needs to use the Foo class, the class will be initialized. The method will run, creating a new Bar object and printing "Loading Foo" But what happens if the Bar object has never been used before either? Well, then we will need to load that class as well, calling the Bar method as we go. Can you start to see the potential problem here? A hint is in fact #2 above. What if another thread is currently loading class Bar? The thread loading class Foo will have to wait for that thread to finish loading. But what happens if the method of class Bar tries to initialize a Foo object? That thread will have to wait for the first thread, and there we have the deadlock. Thread one is waiting for thread two to initialize class Bar, thread two is waiting for thread one to initialize class Foo. All that is needed for a class loading deadlock is static cross dependencies between two classes (and a multi threaded environment):   class Foo { static Bar b = new Bar(); }   class Bar { static Foo f = new Foo(); }   If two threads cause these classes to be loaded at exactly the same time, we will have a deadlock. So, how do you avoid this? Well, one way is of course to not have these circular (static) dependencies. On the other hand, it can be very hard to detect these, and sometimes your design may depend on it. What you can do in that case is to make sure that the classes are first loaded single threadedly, for example during an initialization phase of your application. The following program shows this kind of deadlock. To help bad luck on the way, I added a one second sleep in the static block of the classes to trigger the unlucky timing. Notice that if you uncomment the "//Foo f = new Foo();" line in the main method, the class will be loaded single threadedly, and the program will terminate as it should.   public class ClassLoadingDeadlock { // Start two threads. The first will instansiate a Foo object, // the second one will instansiate a Bar object. public static void main(String[] arg) { // Uncomment next line to stop the deadlock // Foo f = new Foo(); new Thread(new FooUser()).start(); new Thread(new BarUser()).start(); } }   class FooUser implements Runnable { public void run() { System.out.println("FooUser causing class Foo to be loaded"); Foo f = new Foo(); System.out.println("FooUser done"); } } class BarUser implements Runnable { public void run() { System.out.println("BarUser causing class Bar to be loaded"); Bar b = new Bar(); System.out.println("BarUser done"); } } class Foo { static { // We are deadlock prone even without this sleep... // The sleep just makes us more deterministic try { Thread.sleep(1000); } catch(InterruptedException e) {} } static Bar b = new Bar(); } class Bar { static { try { Thread.sleep(1000); } catch(InterruptedException e) {} } static Foo f = new Foo(); }   /Mattis

This is a repost of an article that was published on March 1st, 2010, to Oracle's JRockit Blog at https://blogs.oracle.com/jrockit/entry/class_loading_deadlocks The JRockit Blog will soon be...

Repost: Why won't JRockit find my classes

This is a repost of an article that was published on January 29nd, 2010, to Oracle's JRockit Blog at https://blogs.oracle.com/jrockit/entry/why_wont_jrockit_find_my_class The JRockit Blog will soon be discontinued, at which point the original posts will no longer be accessible to the public. This post is one of several that contained information that I believe is still of interest to users of JRockit. I have decided to repost such articles here on my own blog so that they remain available for reference. -Buck This is the second post by Mattis, diving deep into JVM specifics. NoClassDefFoundErrors are a drag. The classloader mechanism in the Java specification is very powerful, but it also gives you plenty of ways to mess things up. In which jar did you put that class file, and why isn't your classloader looking in that jar? In rare cases, you might even have an application that works using Sun Java, but throws a NoClassDefFoundError with JRockit. Surely, this must be a JRockit bug? Not necessarily. There is a slight difference in how the two JVMs work that can explain this behaviour, especially if you modify your classloaders during runtime. Let's take an example: In a separate folder "foo", create a file Foo.java: public class Foo { public Foo () { System.out.println("Foo created"); } } Now, in your root folder for this experiment, create the file ClasspathTest.java: import java.io.File; import java.net.URLClassLoader; import java.net.URL; import java.lang.reflect.Method; public class ClasspathTest { private static final Class[] parameters = new Class[]{URL.class}; // Adds a URL to the classpath (by some dubious means) // method.setAccessible(true) is not the trademark of good code public static void addURL(URL u) throws Exception { Method method = URLClassLoader.class.getDeclaredMethod("addURL", parameters); method.setAccessible(true); method.invoke((URLClassLoader) ClassLoader.getSystemClassLoader(), new Object[]{u}); } public static void main(String[] arg) throws Exception{ // Add foo to the classpath, then create a Foo object addURL(new File("foo").toURL()); Foo a = new Foo(); } } This class has a method "addURL" that basically adds a URL to the classpath of the system classloader. The main method uses this method to first add the folder "foo" to the classpath and then creates a Foo object. When you compile this method, add "foo" to the classpath: > javac -classpath .;foo Test.java But when you run the program, don't add foo, simply run > java Test Using Sun Java, this will work fine. In the first line of main, we add the foo-folder to the classpath. When we create our first Foo-object, we find the Foo class in the foo folder. Using JRockit however, you get: Exception in thread "Main Thread" java.lang.NoClassDefFoundError: Foo at ClasspathTest.main(ClasspathTest.java:20) To understand this behaviour, you have to first understand how Sun and JRockit runs code. Sun Java is an interpreting JVM. This means that the first time you run a method, the JVM will interpret every line step by step. Therefore, Sun will first interpret and run the first line of main, adding "foo" to the classpath, and then the second line, creating the Foo object. JRockit however uses another strategy. The first time a method is run, the entire method is compiled into machine code. To do this, all classes used in the method needs to be resolved first. Therefore, JRockit tries to find the Foo class BEFORE the "foo" folder is added to the classpath, resulting in the NoClassDefFoundError (still thrown just before trying to use the class). So, who is right? Actually, according to the Java spec, both are. Resolving the classes can be done either right before the class is used or as early as during method invocation. For most developers, this is just trivia, but from time to time we see problems with this from customers. The solution? Don't modify your classloaders in the same method as you need the change to load a class. In the example, the following change works fine in both Sun and JRockit: public static void main(String[] arg) throws Exception{ // Add foo to the classpath, then create a Foo object in another method addURL(new File("foo").toURL()); useFoo(); } public static void useFoo() { Foo a = new Foo(); } Here, using JRockit, the class is not resolved until the method useFoo is compiled, which will be AFTER "foo" is added to the classpath. /Mattis PS: Adding URLs to the system classloader during runtime might not be a good idea. But when using your own defined classloaders, modifying these during runtime could very well be according to design.

This is a repost of an article that was published on January 29nd, 2010, to Oracle's JRockit Blog at https://blogs.oracle.com/jrockit/entry/why_wont_jrockit_find_my_class The JRockit Blog will soon be...

Repost: Tuning that was great in old JRockit versions might not be so good anymore

This is a repost of an article that was published on January 22nd, 2010, to Oracle's JRockit Blog at https://blogs.oracle.com/jrockit/entry/tuning_that_was_great_in_old_j The JRockit Blog will soon be discontinued, at which point the original posts will no longer be accessible to the public. This post is one of several that contained information that I believe is still of interest to users of JRockit. I have decided to repost such articles here on my own blog so that they remain available for reference. -Buck (This post is written by Mattis Castergren. Mattis is one of our Gurus in our Sustaining Engineering team) Did you fine tune your JRockit application a few years ago? Did you search the web for command line options to shave off seconds in your favorite benchmark? In that case, I have a challenge for you. Tuning a JVM for performance is hard work. Few programs are as complicated as a JVM. There are tons of more or less useful command line options and some of the knobs and dials you can manipulate will change some really arcane aspects of the run-time environment. This is the reason why Out Of The Box performance (running with no tuning parameters) has been an important area for JRockit over the last years. What good is it to have a really fast JVM, if you can't tune it to its full potential? In a perfect world, an expertly tuned system and a system run without any command line options at all would give more or less the same performance. So, how are we doing so far? Pretty good actually. The R27 line of JRockit has showed a big improvement in OOTB performance. Compared to the previous code line, R26, the OOTB performance has increased with over 100% on the specjbb2005 benchmark (see this) One of several reasons was the re-write of how the command line options -XXtlaSize and -XXlargeObjectLimit works, which leads us to the topic of this blog post: "Tuning that was great in old JRockit versions might not be so good anymore" In older versions of JRockit (basically, pre R27), increasing the TLA size was often a great way to increase performance. Wait a minute, you might think, what is the TLA size? That is not really important for this discussion, and many who tuned the TLA size did not really care. If you really want to know, read the first part of this Setting a TLA size of 64k was used in some of our older benchmark submissions, and many happy system admins used the same flag to great success. But with JRockit R27, the way that XXtlaSize works changed. Before, there was only a fixed tla size. But with the new way things work the size is dynamic, and you tune it by setting a minimum and a preferred size. This means that setting it to a fixed 64k size will actually limit the TLA sizing, which will often decrease performance and increase memory fragmentation. The once very useful JRockit tuning flag has become one of the most misused flags there is. So, is there never a need for XXtlaSize? Sure there is, correctly used you might increase the score on your favorite benchmark by a few percent in some cases. But if used badly, you might find half of your Java heap to be lost due to memory fragmentation. My suggestion is this: If you tuned your JRockit application some years ago and have recently upgraded to a newer version of JRockit, remove any X and XX flags used for tuning except Xmx, Xms, XgcPrio and XpauseTarget. Run some tests before and after, and don't be surprised if the OOTB performance is actually better than the old tuning.  

This is a repost of an article that was published on January 22nd, 2010, to Oracle's JRockit Blog at https://blogs.oracle.com/jrockit/entry/tuning_that_was_great_in_old_j The JRockit Blog will soon be...

Repost: Object Allocation - Better the devil you know...

This is a repost of an article that was published on December 15th, 2009, to Oracle's JRockit Blog at https://blogs.oracle.com/jrockit/entry/object_allocation_-_better_the The JRockit Blog will soon be discontinued, at which point the original posts will no longer be accessible to the public. This post is one of several that contained information that I believe is still of interest to users of JRockit. I have decided to repost such articles here on my own blog so that they remain available for reference. -Buck (The post below is written by Josefin Dahlstedt. Mainly due to x-mas holidays, there will be no accompanying webex to this post.) Being a server JVM, JRockit constantly tries to make the application be all that it can be. However, the applications running in server environments are complex and difficult to predict, making a one-sided effort difficult. By knowing something about your allocation needs you will be able to make informed tuning choices to optimize the runtime for your application. Also, by knowing something of the internal workings of JRockit, your favorite server-side JVM, you will be able to write your application code to maximize performance and avoid poor design choices when running on JRockit. Before you read further, know that creating a lot of objects, i.e. doing allocation, is going to cost you; the fewer allocations, the fewer garbage collections and the less fragmentation issues. That said; this post will address some opportunities to maximize the advantage of JRockit, when developing a server application in Java, with respect to object allocation and access patterns. Generally JRockit allocation and garbage collection relies on two old assumptions about objects in Java: 1. Objects allocated together closely in time are usually accessed together closely in time 2. Most objects die young (the weak generational hypothesis) Also, it is assumed that most objects allocated are smaller than a few hundred kB. Using Locality To accommodate these assumedly prevalent smallish objects and to handle (1) well, JRockit implements the concept of thread local areas, TLAs. The areas successfully co-locate objects allocated closely together in time by one thread. Having good locality for objects accessed closely together in time will have a huge impact on cache locality. This quality is increasingly important as computer architectures move towards multi-core and NUMA technologies. The TLA is a small area of consecutive memory that is handed out to a java thread for allocation. In a naïve implementation all allocating threads have to synchronize on the global java heap structure to get new memory for each allocation; this would create massive contention for the heap lock increasing by the number of java threads. The TLAs help avoid global synchronization for most allocations. The size of the TLAs for smallish object allocations may be configured through a command line option -XXtlaSize:[min= | preferred= | wasteLimit=]. By setting min you specify the minimum size for TLAs, preferred specifies the preferred size, and wasteLimit sets the limit of how much of a TLA you are willing to waste when trying to get memory for an object. The value for wasteLimit will affect fragmentation as it decides if you throw away a partly filled TLA to get a new one for allocation or take the additional cost of allocating directly on the heap. The value for min will also affect fragmentation as you specify how small areas of consecutive memory you accept and partition for allocation, areas smaller than min will not be used and may thus be cause for fragmentation. The options are inherently related so that: wasteLimit <= min <= preferred. Improving Fragmentation To address the second assumption (2), a generational garbage collector is usually used to address fragmentation of the java heap. It will attempt to keep newly allocated objects in a special area with the hope that when the area has been used up most of the objects there will have become unreachable. Then the area can be recycled for new allocations, making the cost of promotion negligible. This assumption has been shown to not be true enough, so the current solution in JRockit is to keep the part of the nursery where the most recent allocations were made uncollected, i.e. the keep area, to reduce the cost of promoting still used objects out of the nursery. If you are running a generational garbage collector you will have a young space pool set up for allocation of the application's java objects. This pool will have two parts: a part that will be garbage collected at the next young collection, and a part that will be full of the lastly allocated objects, which will be kept uncollected by the assumption that they are still reachable and would only be promoted to old space. Previously, when running a generational garbage collector, the entire nursery was split up into TLAs for thread local allocation. Each time an object larger than the TLA wasteLimit was allocated, the allocation had to be done in old space. But a new feature in the JRockit memory system for the upcoming release is to enable allocations in the nursery outside TLAs, and thus avoiding medium sized objects cluttering old space unnecessarily. The allocation will still mostly be done in TLAs, but when trying to allocate objects larger than the wasteLimit, instead of trying to allocate them in old space, in the worst case potentially causing garbage collection of the entire heap, JRockit will attempt to allocate the object in the nursery first. This should reduce fragmentation of old space and put off an old space garbage collection and compaction to a later time. What do I do with this information? These two concepts: TLAs, along with the possibility to accommodate larger objects in the nursery, could be useful to understand to help design the Java application better for performance. With this information you should be able to understand why large objects that are needed for the application may be good to pool for reuse over time. It may also help you take advantage of the TLAs when designing the contents of small objects. Knowing how allocations are locality optimized per thread may help you when distributing work to a number of threads. Being mindful of object allocation size and patterns in a runtime environment like Java may seem something the Java developer should be able to mostly ignore. But when writing for performance, trying to balance resource use and maximize predictability, the little details tend to matter very much and make all the difference. -Josefin Dahlstedt

This is a repost of an article that was published on December 15th, 2009, to Oracle's JRockit Blog at https://blogs.oracle.com/jrockit/entry/object_allocation_-_better_the The JRockit Blog will soon...

Repost: Why is my JVM process larger than max heap size?

This is a repost of an article that was published on February 2nd, 2009, to Oracle's JRockit Blog at https://blogs.oracle.com/jrockit/entry/why_is_my_jvm_process_larger_t The JRockit Blog will soon be discontinued, at which point the original posts will no longer be accessible to the public. This post is one of several that contained information that I believe is still of interest to users of JRockit. I have decided to repost such articles here on my own blog so that they remain available for reference. -Buck (by Stefan Särne) A perfectly valid question, and one that is hard to answer without knowing a bit about what goes on under the hood of your JVM (Java Virtual Machine). This blog entry will give a short description of why this can be the case and also describe the diagnostic command in JRockit called print_memusage, which can be used to ask a running JRockit process how much memory it uses, and for what. But why larger than -Xmx? The command line argument -Xmx sets the maximum Java heap size (mx). All java objects are created on the Java heap, but the total memory consumption of the JVM process consist of more things than just the Java heap. A few examples: - Generated (JIT:ed) code - Loaded libraries (including jar and class files) - Control structures for the java heap - Thread Stacks - User native memory (malloc:ed in JNI) It is important to think of this when dimensioning how many processes that should run on a single server and what to set maximum heap size to. Usually the heap is by far the largest part of the process space, but the total size of the process will also depend on your application. Reserved != Committed It is easy to be alarmed by the number for reserved (or mapped) memory, but a process will not use any physical memory resources until it memory is committed. JRockit will for example reserve the heap up to maximum heap size, but not use it until needed. It will also, for example, reserve 1 GB for generated code on 64-bits X86 architectures, but only commit as is needed. Understanding print_memusage JRockit has a Diagnostics Command called print_memusage. This little tool started with developers in the JRockit engineering team that wanted to understand in detail what the JVM used the memory for. After the development of the first version it became evident that the customers also wanted to know what the JVM uses the memory for, and what it actually means. This brief history is included so that you, as a user, will forgive us for not having a more user friendly tool. It will be improved in the next major release of JRockit (normal disclaimers on release predictions apply). Memusage is a command that will ask the sub-systems in JRockit how much memory they are using, which in turn will walk through the memory it holds on to and report it. It will also ask the OS how much the process is indeed using and scan the virtual memory space for what type it is. This approach makes it possible to retrieve the information from any running JVM, without any cost until it is invoked. Due to technical limitations, the tool is “good” but not 100% exact. In this example I use jrcmd to connect to JRockit in my local development environment. As you can see in this screen shot, the tool reports data hierarchically, starting with “Total mapped”, and then showing details broken down in smaller pieces. Some things worth highlighting: The process has reserved/mapped about 1 GB of memory. Out of that, about 708692 KB is in use, i.e. committed. Both Total mapped and Total in-use are gathered from the OS. The process has 28544 KB as executable memory. This includes both native code, like the jvm library itself and code generated (jit:ed) by the jvm. In this example, 21.1% of the memory which is executable, is allocated by the compiler in the JVM, 5652 KB (or 94%) is currently in use. The fact that the total is higher than 100% is not as strange as it may seem.The JVM will pool the code memory and reuse it and only free it when a large enough amount is unused. There are several reason for code not being in use any more, for example if the method is optimized then the first version may no longer be needed, or if the class is unloaded. This is handled by the Code GC in JRockit. The largest chunk is usually the rw-memory, which includes the Java heap. The Stacks are the thread stacks, one for each thread. Under “classes” and under “malloced memory” is meta information about the java code, which is needed both for the JVM to keep track and for upholding the java model, like debug info when throwing exceptions. It also includes all static Strings, and all jar/zip handlers, which is live when reading from jar files. If there is a number inside parenthesis, it indicates the number of items of the indicated type has been allocated. In the example above, the #97701 says that there are 97701 method structs. In unaccounted for is "the rest", in other words the difference between the total in use and what the subsystems are responsible for. It includes some fragmentation in the memory malloced by the JVM, but also memory allocated from JNI code. Summary So this is a neat little tool to get insight into the process memory. Whoever said that memory is cheap these days? I hope this will serve as a good starting point to understand where all the memory goes.

This is a repost of an article that was published on February 2nd, 2009, to Oracle's JRockit Blog at https://blogs.oracle.com/jrockit/entry/why_is_my_jvm_process_larger_t The JRockit Blog will soon be...

ReRepost: Help! JRockit hangs on my Linux!

This is a repost of an article that was published on September 11th, 2008, to Oracle's JRockit Blog at https://blogs.oracle.com/jrockit/entry/repost_help_jrockit_hangs_on_my_linux The JRockit Blog will soon be discontinued, at which point the original posts will no longer be accessible to the public. This post is one of several that contained information that I believe is still of interest to users of JRockit. I have decided to repost such articles here on my own blog so that they remain available for reference. -Buck Originally posted 8/14 2006 on http://dev2dev.bea.com/blogs/hstahl, reposted on request. We sometimes get bug reports from users who are having issues with JRockit on a Linux distribution that we do not support officially, such as Debian, Ubuntu or Fedora Core. Sometimes - in particular when JRockit hangs intermittently - the cause is a broken OS. Here are two things you can do to check your Linux installation. 1. Verify that you are using the NPTL threading library Before the 2.6 kernel, most Linux distributions used the old pthreads library. We have seen a number of issues with this over the years. To check your kernel version: hstahl@sthx6421:~> cat /proc/version Linux version 2.6.5-7.244-smp (geeko@buildhost) (gcc version 3.3.3 (SuSE Linux)) #1 SMP Mon Dec 12 18:32:25 UTC 2005 In this case, the kernel is 2.6.5. Anything based on 2.6 or later uses NPTL (plus the 2.4-based kernel in Red Hat EL 3.0, but that's a special case) Also, make sure that you are not setting the LD_ASSUME_KERNEL environment variable! 2. Verify that your distribution handles signals correctly JRockit uses OS signals to suspend and resume Java threads. Unfortunately, this sometimes exposes bugs in the underlying OS, causing the JRockit process to hang (or crash). To check your Linux installation; download this small C program, compile and run it on your computer following the instructions in the source code. If the program hangs or crashes, then you most likely have a broken kernel and/or glibc. Try updating your OS to a later build/patch level/service pack and rerun. If you still have an issue, please report it to the appropriate vendor and/or community. If the test program fails, then you will see intermittent JRockit hangs and/or crashes on your platform. It may take days or weeks, but it will happen eventually. Note that we have for the past two years or so run this program or similar tests on all OSes we support, and we always make sure such OS bugs are fixed before we add it to our list of supported configurations, so if you are using one of those then you should be safe.

This is a repost of an article that was published on September 11th, 2008, to Oracle's JRockit Blog at https://blogs.oracle.com/jrockit/entry/repost_help_jrockit_hangs_on_my_linux The JRockit Blog...

ReRepost: Tips and tricks for dealing with a fragmented Java heap

This is a repost of an article that was published on September 11th, 2008, to Oracle's JRockit Blog at https://blogs.oracle.com/jrockit/entry/repost_tips_and_tricks_for_dealing_with_a_fragmented_java_heap The JRockit Blog will soon be discontinued, at which point the original posts will no longer be accessible to the public. This post is one of several that contained information that I believe is still of interest to users of JRockit. I have decided to repost such articles here on my own blog so that they remain available for reference. -Buck Originally posted in December 2007, reposted on request. The problem described here is generic and applies to all JVMs using mark-n-sweep or similar GC algorithms. Heap fragmentation occurs when a Java application allocates a mix of small and large objects that have different life times. The main negative effect of fragmentation is that long GC pause times caused by the JVM being forced to compact the Java heap. These long pause times are typically triggered when your Java program attempts to allocate a large object, such as an array. As described in my previous blog entry on this topic, you can use JRockit Mission Control to find out how fragmented the heap is. But a fragmented heap is only a problem if it leads to long pause times (or an OutOfMemoryError). To find out the impact on pause times, you can run with the -Xverbose:gcpause command line flag, which will give you something like: [INFO ][gcpause] old collection phase 1-0 pause time: 73.214054 ms, (start time: 15.807 s) [INFO ][gcpause] (pause includes compaction: 1.029 ms (external), update ref: 1.532 ms) [INFO ][gcpause] Threads waited for memory 66.612 ms starting at 17.568 s [INFO ][gcpause] old collection phase 1-0 pause time: 66.449507 ms, (start time: 17.569 s) [INFO ][gcpause] (pause includes compaction: 1.236 ms (internal), update ref: 1.488 ms) The pauses in this example are clearly not a problem, but it can sometimes be much longer than this. If you don't want to restart your JVM, you can enable this during runtime by running jrmcd <pid> verbosity set=gcpause=info. After you have the data you need, disable informational logs with jrcmd <pid> verbosity set=gcpause=error. Or you can do a JRA recording (see the previous blog entry) and look in the GC details tab, where the time spent in compaction is clearly visible: Before we look into the possible strategies for dealing with fragmentation, it is crucial to understand what causes it. The first key observation is that fragmentation is caused by GC. When the JVM performs GC it will clear out dead objects. It's the act of removing these dead objects that creates the holes in the heap. Memory allocation only has an indirect impact, in that it can create a pattern in the heap that later leads to the GC fragmenting the heap. A second key observation is that fragmentation is only a problem if you can't use the holes in the heap. As long as you only allocate small objects, it doesn't matter how fragmented the heap is. With these two observations in mind, here are some tips: 1. Increase the heap size Increasing the heap size will decrease the frequency of GCs. One benefit of this is that objects are more likely to be dead than if GCs are very frequent, and if more objects are dead then there will be fewer live objects around that can contribute to create holes. In other words, the heap holes will on average be larger, which implies less fragmentation. Also, if GCs are less frequent, you can possible afford longer pause times since the impact of GC on throughput will be lower. Be aware that increasing the heap size can cause a slight increase in pause times. A special case is to run with an infinitely large heap and never GC, which will of course avoid fragmentation completely. 2. Use a generational GC Running JRockit with -Xgc:gencon or -Xgc:genpar will enable the use of a nursery or young space. The nursery will store recently allocated objects and when it is full a nursery GC will be performed, in which objects that are still alive will be moved to the old space. Since all objects that survive are (eventually) moved to the old space, the nursery will never be fragmented. And fragmentation of the old space will happen much more slowly since objects moved there will on average survive for a long time. Also, since the old space will fill up less rapidly, fragmentation-causing old space GCs will be less frequent. A common strategy for avoiding the cost of old space GCs (often called "full GCs") is to configure your nursery size so that almost all objects die before they reach the old space. If you do this carefully, you can postpone old space GCs for a very long time. I've seen some installations where the app has been configured to avoid old space GCs for a full day, after which it is restarted to force creation of a "clean" heap. This may be more frequent among non-JRockit users, since our compaction is fairly efficent and the pause times tend to be acceptable. One word of warning here: This strategy is not guaranteed to avoid full GCs, since that depends on the load on your application, exact heap layout etc, so don't rely on it too much and configure it with a large safety margin. 3. Tune compaction parameters The default behavior of JRockit is to analyze the fragmentation of the heap and do a little bit of compaction every old space GC cycle. The proportion of the heap that it decides to defragment is called the compaction ratio which is typically stated as a percentage of the heap size, where a common figure would be perhaps 5%. If your application causes a lot of fragmentation, you can configure this ratio manually which gives you the ability to create a balance of power between the GC and your memory-hungry Java program. You can try with -XXcompactratio=10 or so to start with. A high number will lead to longer pause times, but also means that the JVM will be able to cope with higher fragmentation. If you want to do advanced tuning, look in the JRockit reference guide for parameters that impact compaction. Two examples are -XXinternalcompactratio and -XXexternalcompaction. 4. Don't allocate memory Ok, so it's time to look at what you can do in your Java code. The most obvious tip is avoid memory allocation. This will have direct impact on both the frequency of GCs and indirectly on the GC pause time, since less objects will be alive at the time of a GC. You can use the Memory Leak Detector to analyze your application's allocation pattern and trace down excessive allocation to where in your source code it occurs. See Marcus Hirt's blog entry on this subject for tips. 5. Avoid allocating large objects Arrays and other large objects are always the biggest culprit when it comes to fragmentation. They cause the heap to fill up quickly, leading to frequent GCs, they create irregular patterns in the heap and they request big contiguous chunks of memory on the heap at allocation time, which can be impossible for the JVM to fulfill without first doing compaction. To avoid excessive allocation of large objects, think twice before copying arrays etc. All code involving string processing (char arrays), XML, I/O (byte arrays) etc is a target for optimization in this area. Again, the Memory Leak Detector is a very powerful tool for analyzing this. 6. Allocate objects with similar lifespan in chunks This is my last tip, and it is a bit esoteric so please bear with me... The idea is that any larger operation can decrease its impact on fragmentation by allocating the memory it needs in a chunk and then keep it alive until the operation is complete. Consider a J2EE transaction such as a servlet request. When you start this request, you can have one metaobject created that allcates most or all of the objects that you need to process that particular request. Since these allocations will all be performed by a single thread and very closely spaced in time, they will typically end up stored as a contiguous block in the Java heap. If you keep all these objects alive until the transaction are done, they will all become subject for GC at the same time, and cleared as dead objects during the same GC cycle. Ergo, less fragmentation. So nulling out objects prematurely to decrease live data may not be the best in the long run. Final words That's it for this time... I hope you found this useful! Don't hestitate to ask if you found something unclear. Keep up the good coding!  

This is a repost of an article that was published on September 11th, 2008, to Oracle's JRockit Blog at https://blogs.oracle.com/jrockit/entry/repost_tips_and_tricks_for_dealing_with_a_fragmented_java_h...

ReRepost: How fragmented is my Java heap?

This is a repost of an article that was published on September 11th, 2008, to Oracle's JRockit Blog at https://blogs.oracle.com/jrockit/entry/repost_how_fragmented_is_my_ja The JRockit Blog will soon be discontinued, at which point the original posts will no longer be accessible to the public. This post is one of several that contained information that I believe is still of interest to users of JRockit. I have decided to repost such articles here on my own blog so that they remain available for reference. Note the example in this post was taken from JRockit R27 and uses the JRockit Runtime Analyzer. Users of JRockit R28, can use JRockit Flight Recorder, the descendant of JRockit Runtime Analyzer, in a similar fashion. Users of Oracle JDK 7 and later can use Java Flight Recorder, the HotSpot-based descendant of JRockit's Flight Recorder technology. -Buck Originally posted in December 2007, reposted on request. The problem described here is generic and applies to all JVMs using mark-n-sweep or similar GC algorithms. JRockit Mission Control is now free for development and evaluation, you can download it here. One major cause for long GC pause times is heap fragmentation. How problematic this for an application depends on its allocation pattern. The worst possible case is an application that allocates a mix of objects with very different sizes and lifetimes. After the application has been running for a while, the Java heap will be fragmented by lots of long lived Java objects spread out across the heap. There may be plenty of free space available, but no large block of contiguous free memory. When the application then attempts to allocate a large object such as an array, it is unable to find room to store it. The result will be a long GC pause while the heap is compacted. If you suspect you have this issue with your application, the first step is to try to find out exactly how fragmented the heap is. One simple way of doing this is by using the JRockit Runtime Analyzer. Here's an example: 1. Start a Java application on your workstation 2. Start up JRockit Mission Control using the Start menu icon in Windows or the $JR_HOME/bin/jrmc executable 3. Locate the process you want to analyze, right-click and select to start a JRA recording. 4. Select recording time (here 2 minutes) and start the recording 5. After the recording has finished, it will be opened in the JRMC GUI. Select the Heap tab. Heap fragmentation is displayed in black in the Heap Contents pie chart. The application in this example is well behaved and shows only 11% fragmentation - well within acceptable limits. I would start getting concerned if it went above 30%, and more if it continued to increase. Another warning sign is if the free memory distribution (pie chart on the right) contains a very large proportion of smaller free blocks.

This is a repost of an article that was published on September 11th, 2008, to Oracle's JRockit Blog at https://blogs.oracle.com/jrockit/entry/repost_how_fragmented_is_my_ja The JRockit Blog will soon...

ReRepost: How to get (almost) 3 GB heap on Windows!

This is a repost of an article that was published on September 11th, 2008, to Oracle's JRockit Blog at https://blogs.oracle.com/jrockit/entry/how_to_get_almost_3_gb_heap_on_windows. The JRockit Blog will soon be discontinued, at which point the original posts will no longer be accessible to the public. This post is one of several that contained information that I believe is still of interest to users of JRockit. I have decided to repost such articles here on my own blog so that they remain available for reference. -Buck Originally posted in December 2005 on http://dev2dev.bea.com/blogs/hstahl, reposted on request with a few minor updates. I find it amazing that the very useful "non-contiguous heap" feature described here is still unique to JRockit after 2.5 years... As you may be aware, the maximum heap size on Window using JRockit - or for that matter any JVM we are aware of - has been limited to slightly below 2 GB. There have been two reasons for this. One is that the maximum process size on Windows has been limited to 2 GB, though this can be worked around by using the /3GB kernel switch. The second is that JVMs have required a contiguous memory space for the Java heap for efficiency reasons, which causes the maximum Java heap size to be limited by DLLs loaded into the process address space. For more information and previous discussion on this topic see: Microsoft PAE & /3GB docs JRockit docs Sun blog With the soon to be released JRockit 5.0 R26 (update: released in 2006) this barrier is broken! By introducing support for split heaps, we have been able to significantly increase the Java heap size on Windows. Windows 2003/XP or later using the /3GB switch (32-bit OS) 1.85 GB - JRockit versions prior to 2006 2.85 GB - JRockit versions post 2006 Windows 2003/XP x64 Edition with a 32-bit JVM (64-bit OS) 2.05 GB - JRockit versions prior to 2006 3.85 GB - JRockit versions post 2006 How do I enable this feature? 32-bit Windows 1. Add the /3GB switch to your boot.ini file as described here 2. Install and run a recent version of JRockit specifying a large -Xmx 64-bit Windows 1. Install and run the 32-bit version of JRockit specifying a large -Xmx How large is the performance overhead? Zero! What if I need an even larger heap? Use one of the many 64-bit version of JRockit. Are there any drawbacks? You still need to make sure there is enough memory for JVM internals, compiled code and any native libraries you will be using. If JRockit exits with an out of memory error in native code, try decreasing the heap size slightly.  

This is a repost of an article that was published on September 11th, 2008, to Oracle's JRockit Blog at https://blogs.oracle.com/jrockit/entry/how_to_get_almost_3_gb_heap_on_windows. The JRockit Blog...

A Great Example of the Power of Flight Recorder

The following is a guest post by Mattis Castegren, our fearless leader and manager of Oracle’s JVM Sustaining Engineering team: I recently got a great example of a bug solved using a JFR recording, so I wanted to share it on this blog. The Problem The problem came from another team, and they were seeing that their application froze completely for 5-6 seconds, roughly once every five minutes GC? I got a bunch of JFR recordings from their runs. So, first thing to look at is the GC, right? They were running with a Parallel GC, and twice in a 30 minute recording they saw 4-second stop-the-world pause. However, their application froze more often than that. I wondered if the problem was other machines in the cluster seeing GC pauses, and if the machines somehow waited for those, but after running all the machines with G1, the GC pauses went down but they still saw the long pauses. WLS Events Right, so what more can we say? This team was running with WLDF, or the WebLogic Diagnostic Framework. This framework generates events for WLS actions like servlet requests, database queries, etc, and is extremely powerful. Looking at the event graph, I only enabled servlet requests, and if you squint a bit, you can kind of see the pauses as bands of longer requests: To make it clearer, I went to the Log tab, sorted the servlet invocation events on duration, selected the ones longer than 4 seconds, right clicked and selected "Operative Set"->"Set Selection" from the drop-down menu. Now, only the longest servlet invocations are part of our operative set. Going back to the graph and selecting "Show Only Operative Set" gives a pretty clear image: So yes, it seems the work is frozen at somewhat regular intervals, and it is not caused by the GC (at least not all the pauses) Is the JVM frozen? Next question was: Is the entire JVM frozen, or only the worker threads. Could it be some locking issue or something? To answer that, I zoomed in on one of the pauses, enabled all events, and unchecked "Show Only Operative Set". If you are not used to the Graph view, it may look a bit cluttered now. (Note the zoom in the time slider at top): Basically, each thread has several lines representing different event producers. We can zoom in on one of the threads: The tan event at the top is the long servlet invocation. The red event under that is a long database request, the long blue event is an I/O event. Holding the mouse over them gives you information about each event. Here, it seems like this thread is waiting on a long I/O caused by a long database request, but most of the other threads did not have this pattern. Was it a problem that long I/O caused one thread to pause, and that thread held some kind of resource? We did not see any synchronization events, but maybe there were something else? If so, we should be able to see some other events during that time, from threads that did not require that resource. Scrolling further down, I found some threads that gave me the answer: Zooming out a little bit, I saw several threads that looked like this: The yellow events are "Wait" events. They all have a timeout of 1s, but suddenly there are two Wait event that are 6 seconds long, even though the timeout is still 1 second. This is a pretty strong indication that the entire JVM is indeed frozen. Why is it frozen? So, why is the JVM frozen? Here, I again zoomed in on one of the pauses and went to the Log tab. I wanted to know the last event that was generated before the pause. So, I sorted all events on start time, and scrolling through the list of all events, I looked at the last event before the gap: So, at 08:46:08, there is a "VM Operation" event that takes 4.8 seconds. After that, no other events gets created until 08:46:13 (roughly 4.8 seconds later). Looking at the event, we see that it is a ThreadDump event, called from the "oracle.webcenter.DefaultTimer" thread. Thread dumps stops the world, so this makes sense! Putting the pieces together Let's grow our operative set. I started by zooming out, and selected all the VM Operation events over 4 seconds that were caused by thread dumps (there were also two events with the ParallelGCFailedAllocation operation caused by the two GCs). I added those events to the operative set. I also went to the events tab, selected the oracle.webcenter.DefaultTimer thread and added all events in that thread to the operative set too Going back to the graph, enabling all event types and selecting "Show Only Operative Set", I now got this nice image: The DefaultTimer thread periodically waits for 4m 45s. When it wakes up, and triggers two VMOperations collecting thread dumps in the VMThread. These operations correlate perfectly with the pauses seen in the application (except two that were caused by the GC). The problem? So, what was the problem? Now, I handed the investigation back to the application team who found the stack trace in the DefaultTimer thread that triggered the thread dumps, and in the source code we found: ThreadInfo[] threadInfos = threadMXBean.getThreadInfo(threadIds,true, true); Looking at the Javadoc, we can read: “This method obtains a snapshot of the thread information for each thread including: the entire stack trace, the object monitors currently locked by the thread if lockedMonitors is true, and the ownable synchronizers currently locked by the thread if lockedSynchronizers is true. … This method is designed for troubleshooting use, but not for synchronisation control. It might be an expensive operation.” And there we have it. By calling this method with "true, true", they triggered thread dumps, including monitors, that froze the system for several seconds (as the javadoc warns about). By not gathering the monitor information, the problem went away (apart from the slightly long parallel GC pauses, but they seemed fine with that) Summary In summary, this debugging was made possible by the JFR events. Without them, it would have taken several iterations back and forth to find the right logging to capture that the long pauses was caused by thread dumps, and still more iterations to find what thread triggered the thread dumps. Now, all we needed was the data from one of the runs, taken the first time it failed. A great example showing the power of JFR!

The following is a guest post by Mattis Castegren, our fearless leader and manager of Oracle’s JVM Sustaining Engineering team: I recently got a great example of a bug solved using a JFR recording, so...

Thread Stuck at readBytesPinned / writeBytesPinned on JRockit

Introduction Sometimes you will see one or more threads in your application hang while executing a method called either readBytesPinned or writeBytesPinned. This is a common occurrence and does not indicate any JVM-level issue. In this post I will explain what these methods do and why they might block. Background Before explaining what these methods do, it is important to introduce the idea of object pinning. Pinning is where we temporarily tag an object on the heap so that the garbage collector will not try to move the object until we remove the tag. Normally, an object might be moved from one address to another if it is being promoted (i.e. being moved from young space to old space) or as part of compaction (defragmentation). But if an object is pinned, the GC will not try to move it until it is unpinned. So why would we want to pin an object? There are several scenarios where pinning is important, but in the case of readBytesPinned or writeBytesPinned, it is simply a performance optimization. Pinning a buffer (a byte array) during an I/O operation allows us to hand it's address directly to the operating system. Because the buffer is pinned, we do not need to worry that the garbage collector will try to move it to a different address before the I/O operation finishes. If we were not able to pin the buffer, we would need to allocate additional native (off-heap) memory to pass to the OS's native I/O call and also copy data between the on-heap and off-heap buffers. So by pinning the buffer to a constant address on the heap, we avoid both having to do an otherwise redundant native memory allocation and a copy. If this sounds similar to the use case for NIO’s direct buffers, you've got the right idea. Basically, JRockit gives you the best of both worlds, the I/O speed of direct I/O operations, and the safety / convenience of pure-Java memory management (no off-heap allocations).Let’s Try It! Now lets try a really contrived example to see what this might look like: First, we’ll make a named pipe and open it by redirecting some output. (Please don’t forget to kill the cat process when you are done.) $ mkfifo pipe $ cat - > pipe &Now Let’s make a trivial Java program [1] that tries to read from our new pipe. import java.io.FileInputStream; public class PipeRead {     public static void main(String[] args) throws Exception {         FileInputStream in = new FileInputStream("pipe");         in.read(new byte[10]);     } }Finally, we compile and run. $ javac PipeRead.java $ java PipeReadNow if we collect a thread dump, we can see that the main thread is blocked waiting for data (that in this case will never come) from our pipe. "Main Thread" id=1 idx=0x4 tid=2570 prio=5 alive, in native     at jrockit/io/FileNativeIO.readBytesPinned(Ljava/io/FileDescriptor;[BII)I(Native Method)     at jrockit/io/FileNativeIO.readBytes(FileNativeIO.java:32)     at java/io/FileInputStream.readBytes([BII)I(FileInputStream.java)     at java/io/FileInputStream.read(FileInputStream.java:198)     at PipeRead.main(PipeRead.java:6)     at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)     -- end of traceIf we were to try HotSpot on the exact same testcase, we would see it doing a blocking read just like JRockit does. "main" prio=10 tid=0x00007f25e4006800 nid=0xa8a runnable [0x00007f25e9c44000]    java.lang.Thread.State: RUNNABLE     at java.io.FileInputStream.readBytes(Native Method)     at java.io.FileInputStream.read(FileInputStream.java:198)     at PipeRead.main(PipeRead.java:6)So even though the top of the stack trace for JRockit has JRockit-specific classes/methods, the JVM itself does not have anything to do with why the thread is stopped. It is simply waiting for input from a data source it is trying to read from.Troubleshooting So what should you do when you have a thread that appears stuck in a call to readBytesPinned or writeBytesPinned? That depends entirely on where the application is trying to read data from or write data to. Lets look at a real-world example of a thread stuck doing a blocking read:     "ExecuteThread: '2' for queue: 'weblogic.kernel.Default'" id=20 idx=0x2e tid=16946 prio=5 alive, in native, daemon         at jrockit/net/SocketNativeIO.readBytesPinned(I[BIII)I(Native Method)         at jrockit/net/SocketNativeIO.socketRead(Ljava/io/FileDescriptor;[BIII)I(Unknown Source)[inlined]         at java/net/SocketInputStream.socketRead0(Ljava/io/FileDescriptor;[BIII)I(Unknown Source)[inlined]         at java/net/SocketInputStream.read([BII)I(SocketInputStream.java:113)[optimized]         at oracle/net/ns/Packet.receive()V(Unknown Source)[inlined]         at oracle/net/ns/DataPacket.receive()V(Unknown Source)[optimized]         at oracle/net/ns/NetInputStream.getNextPacket()V(Unknown Source)[optimized]         at oracle/net/ns/NetInputStream.read([BII)I(Unknown Source)[inlined]         at oracle/net/ns/NetInputStream.read([B)I(Unknown Source)[inlined]         at oracle/net/ns/NetInputStream.read()I(Unknown Source)[optimized]         at oracle/jdbc/driver/T4CMAREngine.unmarshalUB1()S(T4CMAREngine.java:1099)[optimized] <rest of stack omited>In the above case, you can tell from the stack trace that the JDBC (database) driver is doing a blocking read from a network socket. So the typical next step would be to see if there is a reason why the expected data may have been delayed (or even never arrive at all). For example, the database server we are talking to could be hung, there could be a network issue that is delaying (or even dropping) the database’s response, or there could be some sort of protocol mismatch where both parties believe it is the other side’s turn to talk. Analyzing log files on both sides may provide  clues as to what happened. If the issue is reproducible, collecting a network trace and analyzing it with a tool like WireShark may also prove useful. Obviously, this is just one of countless scenarios where a thread may get stuck waiting on some external (to the JVM) resource. But other cases should be similar: you must look further down in the stack, determine what the application is waiting for (where it is expecting to receive data from or trying to send data to) and then troubleshoot the issue from there. Sometimes, tools like lsof, truss, or strace can come in handy here. Very often, this troubleshooting involves investigating other processes or even other machines across your network. Conclusion Seeing a thread block temporarily at readBytesPinned or writeBytesPinned is completely normal and does not usually indicate a problem. However, if one or more threads blocks on a call to either of these methods for an unreasonable amount of time, you should examine further down the stack trace and attempt to determine what external resource the Java application is waiting for. [1] Obviously, this is horrible code. Real production code would include proper exception handling and of course close the FileInputStream when we are finished with it. You have been warned.

Introduction Sometimes you will see one or more threads in your application hang while executing a method called either readBytesPinned orwriteBytesPinned. This is a common occurrence and does not...

Where did all of these ConstPoolWrapper objects come from?!

Introduction A very small number of applications using JRockit R28.2.2 and later may suffer performance issues when compared to earlier versions of R28. In this post, I will describe the issue in detail, and explain how to resolve or work around it. What Changed in R28.2.2 In JRockit, object finalization requires that each instance of a finalizable class be registered with the JVM at object creation time so that the JVM knows to treat the instance differently (add it to the finalizer queue) when the garbage collector determines that the object is no longer live. While developing R28, we unfortunately introduced a bug where finalizable objects that are instantiated from native code (think JNI's NewObject function) would never get registered. In other words, objects created from JNI, or other native code internal to the JVM, would simply be discarded by the garbage collector as soon as they are determined to be dead. Thankfully, objects created from pure Java code via the new keyword would still be finalized without issue. While never executing finalizers that are expected to be called is bad enough, this bug indirectly had a much bigger impact. In JRockit, we depend on finalizers to help us prune our shared constant pool (a pool of constants from loaded classes and other long-living strings). Each time Java code asks for a copy of an individual class's constant pool, we make a copy of the constant pool data for that class and store it on the object heap. This was necessary because JRockit never had a PermGen like HotSpot to keep class meta-data available for direct access from Java. Every copy of the constant pool on the object heap is wrapped in a ConstPoolWrapper object that holds the constant pool data. Each time a new copy is made, we also increment a reference counter for each string in the shared pool so that we can keep track of which strings are still live. When the ConstPoolWrapper is finalized, we decrement the applicable reference counters. So what happens if the finalize method for each ConstPoolWrapper is never called? All sorts of bad things. The worst case scenario is that we eventually overflow the 32-bit reference counter and when the count loops back to zero, the string is removed from the pool, despite still being live! This can cause all sorts of crashes and other completely unpredictable behavior. Fortunately, we were able to fix this issue in R28.2.2 , and finalizers for natively instantiated objects now work exactly as expected. Performance Impact of this Fix The negative performance impact of finalizable objects has been very well known since almost the very beginning of the Java platform. Finalization adds a significant amount of overhead to garbage collection. At best, every finalizable object we discard needs to be "collected" at least twice: once before finalization and once after. Also, depending on the JVM implementation and the nature of the finalizer's work, often the act of finalization needs to be single-threaded, resulting in scalability issues for multi-core systems. Now that we have fixed the "forgotten" finalizers bug in R28.2.2, all of a sudden, every copy of a class's constant pool is now a finalizable object again. For the gross majority of applications, use of reflection or other activities that retrieve copies of constant pools are infrequent enough that this has no noticeable performance impact. But there are a few applications out there that can indirectly generate very large numbers of ConstPoolWrapper objects. For such applications, the finalizer thread(s) may have trouble keeping up with the massive number of ConstPoolWrapper objects being added to the finalizer queue. This can result in a significant increase in memory pressure. In a worst case scenario, this additional memory pressure can even lead to an OutOfMemoryError. Identifying this Issue If you suspect that you are hitting this issue, the quickest way to confirm is to use the heap_diagnostics command and examine the state of the finalizer queue. If you see tens of thousands of ConstPoolWrapper objects awaiting finalization, you are almost certainly impacted by this issue. === Finalizers:     1271096  37979  77 1233040      0 Total for all Finalizers     1125384      1  66 1125317      0 => jrockit/reflect/ConstPoolWrapper === Another clue would be if you notice that the finalizer thread is busy finalizing ConstPoolWrapper objects. For example, you may see something like the following in a thread dump: === "Finalizer" id=7 idx=0x50 tid=17238 prio=8 alive, native_blocked, daemon     at jrockit/reflect/ConstPoolWrapper.finalize()V(Native Method)     at jrockit/memory/Finalizer.doFinalize(Finalizer.java:29)     at jrockit/memory/Finalizer.access$300(Finalizer.java:12)     at jrockit/memory/Finalizer$4.run(Finalizer.java:186)     at java/lang/Thread.run(Thread.java:662)     at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)     -- end of trace === Basically, seeing "ConstPoolWrapper" just about anywhere should be a considered a red flag that you may be facing this issue. One final possible hint is if you notice a severe discrepancy between an hprof heap dump and the live-set size. Our hprof code currently may only dump a subset of the entire finalization queue. So if your heap is filled with ConstPoolWrappers awaiting finalization, you may see a big difference between the amount of heap in use and the size of any hprof dump files generated. Resolving Issue at the Application Level The majority of applications that we have seen impacted by this issue dynamically create a large number of new classes at runtime. The most common example of this is applications that (mis)use JAXB to instantiate a set of completely new XML parser classes for every single request they process. The standard troubleshooting steps to follow in this case are to run your application with the -Xverbose:class flag enabled and examine the output to see what kinds of classes are being loaded continuously, even after the application should have reached its steady-state. Once you know the names of the classes that are being generated at runtime, hopefully you can determine why these classes are being created and possibly alter the application to not use classes is such a "disposable" manner. For example, we have seen several customers who have changed their use of JAXB to create a single XML parser (or one-per-thread to avoid scalability issues) and reuse that as opposed to dynamically recreating the exact same classes again and again. I should also point out that modifying your application to limit redundant class creation is a good idea on any JVM. Class loading is expensive and your application is very likely to see a performance benefit from the elimination of unneeded class loading regardless of which JVM you use. JVM-Level Workaround But of course, it is not always possible to modify your application. So I have made some changes in R28.3.2 to try and help users who run into this issue and can not resolve it at the application level: 1. I have added a new flag, -XX:-UseCPoolGC that will cause the ConstPoolWrappers to no longer be finalizable. 2. I have built overflow protection into the JVM's shared constant pool reference counter so that we do not experience the crashes and other stability issues that R28 versions before R28.2.2 faced. So by adding this flag to your java command line, you should get the same performance that you did before R28.2.2, but also not face the crashes from overflowing the reference counter. The one downside is that this could possibly lead to a native memory leak as an application that continuously created new classes with unique constant pool entries may never be able to prune some entries from the shared constant pool. While we have never seen an application where this memory consumption would even be noticeable, it is a risk, at least in theory. As a result, I only recommend using this new flag if you have confirmed that you are hitting the ConstPoolWrapper issue and are unable to resolve the issue by modifying your application. Obviously a design-level change to JRockit so that it does not depend on finalization like this at all would be the "ideal fix". If we were going to do another major release of JRockit (R29), this would be something worth serious consideration. But given JRockit's legacy status and focus on stability for the remaining planned maintenance releases, I believe the workaround flag is the best choice for helping our users while not risking further unexpected changes in behavior. Conclusion If you are using a JRockit release R28.2.2 or later, and you notice unexplained memory pressure, collect heap_diagnostics output and look for a large number of ConstPoolWrappers in the finalizer queue. If you are hitting this issue, consider the application-level changes recommended above, or use the new -XX:-UseCPoolGC flag (added in R28.3.2) to work around the issue at the JVM level. And finally (pun intended, sadly), you may wish to keep this in mind as another example of why any design that depends on the timely execution of finalizers is flawed. Finalizers are almost always bad news for performance and you simply can't rely on them being called quickly. In short: Java finalizer != c++ destructor.

Introduction A very small number of applications using JRockit R28.2.2 and later may suffer performance issues when compared to earlier versions of R28. In this post, I will describe the issue in...

Where Are My Dump Files?

Intro In this post I want to write a little bit about what happens when JRockit crashes. Specifically, I want to talk about some of the situations where you may have trouble finding the dump files (text and binary dumps). While this is usually not an issue, it can be very frustrating to experience a JVM crash and not be able to find any of the important data needed to troubleshoot the issue further. Hopefully this post will shed some light on some of the scenarios where these files are not created or not where you expect them to be. Background The dump files are described in detail here. For this discussion, you only need to know the difference between the text dump file (jrockit.<pid>.dump) and binary dump file ("core.<pid>" on Linux, "core" on Solaris, and "jrockit.<pid>.mdmp" on Windows). Default Behavior The default behavior is for these two files to be written to the current working directory (cwd) of the JVM process. This behavior is consistent with other JVMs (including HotSpot) and most other software. But there are situations where this is not ideal. For example, if the JVM's cwd is on a file system with limited space, you may want to designate another location for crash files to be output. It is often a good idea to isolate crash file output from the rest of the system so that a series of crashes do not exhaust the free space on a filesystem that is needed to keep your application up and running. This reasoning is similar to using a dedicated filesystem for log file output. Changing the Default Behavior On Linux and Solaris, one option is to use the tools built into the OS to specify an alternate output location for all binary dump files. On Linux, you can edit the /proc/sys/kernel/core_pattern file to specify a different path and/or naming convention for binary dump files. Likewise, on Solaris you can use coreadm to specify a different path for output. Note that with Linux, your only option for configuration at the OS level is to change the global behavior for all processes, not just JRockit (Solaris thankfully gives you per-process control, if needed). Also note that both of these options only impact where the binary dump file is output; the text dump file will still be output to the cwd of the JVM, regardless of the OS-level settings. This is often not a concern as the size of most text dump files is usually negligible compared to the space requirements for binary dump files. When JR crashes, another consideration is that the dump file paths printed as part of the error message assume the default OS behavior. If you use either of the above OS-level methods to specify a non-default output location for binary dump files, JRockit has no way of knowing this and will output incorrect paths. It is best to think of the path output for the binary dump as simply an "educated guess". Another option is to set the JRockit specific environmental variable, JROCKIT_DUMP_PATH. This will work even on Windows. It also differs from the OS-level settings above in that both binary and text dump files will be written to the specified directory. Note that JROCKIT_DUMP_PATH depends on default settings for OS-level configuration. If you use either of the above methods to override the location for binary dump file output, JROCKIT_DUMP_PATH will only impact the location of the text dump file. When Things Go Wrong... There are also two well-known exceptions to my description above. In both cases, our signal handler code (or exception handler code on Windows) is not able to run successfully (or at all), therefor causing issues. The result is that JROCKIT_DUMP_PATH may be ignored, or the dump files could be truncated or even completely missing. Depending on the platform, output of both dump files can be dependent on the correct operation of the signal handler. The first case is when a binary dump is produced via an external signal. The most common scenario is when you use the kill command to send a signal like SIGABRT to intentionally abort the process and generate a binary dump file. On both Linux and Solaris, this will result in the signal being "delivered" to the process's primordial thread. Starting from Java SE 6, the java launcher forks off from the primordial thread as soon as possible and the primordial thread does nothing but wait for the JVM to shutdown. As the signal handlers are never registered for the primordial thread (What would be the point? It doesn't do anything but wait.), a signal delivered to the primordial thread will result in the OS-defined default action, and JRockit will never have a chance to influence the output any binary dump created. This is also why you will not get a text dump in this situation. The recommended way to "artificially" trigger a crash is to use the force_crash diagnostic command and avoid this issue. The other scenario where you may not find the expected dump files is when the JVM's state is so corrupted that our signal handler can not run without itself crashing. By far, the most common cause of this is a stack overflow. Especially on Windows, it is very common to end up with a 0-byte mdmp (mini-dump) file when you blow the stack. If you ever find missing or truncated dump files, a stack overflow should be the first culprit you suspect. Conclusion By default dump files will be output into the JVM's current working directory, but you can override that behavior with OS-level settings (on Linux/Solaris) or the JROCKIT_DUMP_PATH environmental variable. Also remember that JROCKIT_DUMP_PATH may be totally ignored if an external signal is received, and the dump files may never even get created correctly if you suffer a stack overflow.

Intro In this post I want to write a little bit about what happens when JRockit crashes. Specifically, I want to talk about some of the situations where you may have trouble finding the dump...

Never Disable Bytecode Verification in a Production System

I wanted to write a brief post about the security implications of disabling bytecode verification. As the risks are already covered in the CERT secure coding guidelines and many other excellent sources, I have no intention of reproducing those efforts. I just want to highlight one unfortunately common myth that we sometimes still hear regarding verification: That bytecode verification is unnecessary if you only run "trusted" code. For those of you unfamiliar with bytecode verification, it is simply part of the JVM's classloading process that checks the code for certain dangerous and disallowed behavior. You can (but shouldn't) disable this protection on many JVMs by adding -Xverify:none or -noverify to the Java command line. Many people still incorrectly believe that the only point of bytecode verification is to protect us from malicious code. If you believe none of the code you are running could possibly be malicious, then bytecode verification is just needless overhead, or so the thinking goes. This is absolutely incorrect. Bytecode verification is indispensable for running a secure application, regardless of the level of trust you have in the sources of each and every class you load and run. Bytecode verification lies at the foundation of the Java security model. One of the fundamental strengths of Java security is not only its ability to safely sandbox and run untrusted code (i.e. an arbitrary applet from the Internet), but to also make guarantees about the security of trusted code. While programmers in any language must have a solid understanding of security principles and constantly think about the security implications of the code they write, there are many security mistakes that you simply can't make on the Java platform; the bytecode verifier will not let you. If you disable bytecode verification, you are saying that not only do you trust all of the code you are running to not be malicious, you are also saying you trust every single class you load to be bug-free (at the bytecode level). This is a much higher standard than simply trusting code to not be overtly malicious. I suspect that one of the reasons for this misconception is that people underestimate just how many different tools and components out there generate and/or modify bytecode. Do you know how your Java EE application server compiles JSP files? Do any of the frameworks in your application dynamically generate new classes during runtime (answer: most likely yes)? How about profiling / instrumentation tools like Introscope or even the method profiling built into JRockit Mission Control? The reality is, in any sufficiently complex system, Java bytecode is being generated and modified all of the time by a large number of different tools, products, and frameworks. By disabling bytecode verification, you are trusting that all of those components in your entire stack are completely bug-free. I have seen (and sometimes fixed) many bugs in all of these types of software; they are just as likely to contain bugs as any other software. Does this mean that there is never a reasonable time or place to disable bytecode verification? Not necessarily. One could argue that it may be appropriate in a development environment where restart time is critical and security concerns are low or even non-existent. However, even in most development environments, bytecode verification may help you discover important issues before you move to testing or production. As a general rule I always recommend leaving it enabled. In conclusion, you should never disable bytecode verification for any system where security is a concern. These options, -Xverify:none and -noverify, were never intended to be used in production environments under any circumstances. If you are disabling bytecode verification (or any products you use have disabled verification), please consider removing these options to eliminate this security risk immediately.

I wanted to write a brief post about the security implications of disabling bytecode verification. As the risks are already covered in the CERT secure coding guidelines and many other excellent source...

JRockit: Artificial GC: Cleaning up Finished Threads

Introduction Sometimes we see JRockit users who experience frequent full GCs as the JVM tries to clean up after application threads that have finished executing. In this post I'll write about why this happens, how to identify it in your own environment, and what the possible solutions might be. What exactly is happening? If there are many threads that have finished execution, but their corresponding java.lang.Thread objects have not yet been garbage collected from the Java heap, the JVM will automatically trigger a full GC event to try and clean up the left over Thread objects. Why does JRockit do this? Due to JRockit's internal design, the JVM is unable to deallocate all of the internal resources associated with an execution thread until after the corresponding Thread object has been removed from the Java heap. While this is normally not a problem, if we go too long between garbage collections and the application creates (and then abandons) a lot of threads, the memory and other native resources consumed by these dead threads can grow very large. For example, if I run a test program that constantly "leaks" dead threads, but keeps strong references to each of the corresponding Thread objects, the memory footprint of the JVM can quickly grow to consume many gigabytes of off-heap (native) memory, just to retain data about our (almost) dead threads. What does this actually look like in practice? If you suspect that this is happening with your application, the easiest way to confirm is to enable JRockit's memdbg verbose logging module (*1). If a full collection is triggered by threads waiting for cleanup of their Thread objects, you will see something like the following at the start of the collection event (*2): === [DEBUG][memory ] [OC#7] GC reason: Artificial, description: Cleaning up Finished Threads. === Another verbose logging module that will give you information about these thread-related full GC events would be the thread module: === [INFO ][thread ] Trigging GC due to 5772 dead threads and 251 threads since last collect. === The above message indicates that there are currently 5772 threads that have finished executing, but the JVM is still waiting for their Thread objects to be collected by the GC subsystem before it can completely deallocate all of the resources associated with each thread. We also see that 251 new threads have been created since the last full GC triggered by the thread management code. In other words, "last collect" really means "last collection triggered by the thread cleanup code"; this number will not be reset back to zero by a "natural" GC event, only a thread cleanup GC will reset this number. What can I do to avoid these collections? The best solution is of course to not create so many threads! Threads are expensive. If an application is continuously generating (and then abandoning) so many new threads that a full GC for every 250 threads becomes a performance concern, perhaps the application can be redesigned to not use threads in such a disposable manner. For example, thread pooling may be a reasonable solution. Even if you run your application on HotSpot, creating so many temporary threads is very likely a performance issue. Changing your application to use threads in a more efficient manner is worth considering, regardless of which JVM you use. But of course sometimes redesigning your application is not a viable option. Perhaps the code that is generating all of these threads is from some third party and you can't make changes. Perhaps you do plan on changing the application, but in the meantime, you need to do something about all of these full GC events. There is a possible JVM-side "solution", but it should be considered a temporary workaround. JRockit has two diagnostic options that allow you to tune this behavior: MaxDeadThreadsBeforeGC and MinDeadThreadsSinceGC. There is a subtle difference between them, but basically they both allow you to modify the thresholds used to decide when the thread management code will trigger a full GC. MaxDeadThreadsBeforeGC (default: 250) This option specifies the number of dead threads (threads that have finished executing, but can not be cleaned up because we are waiting for the corresponding Thread objects to be removed from the Java heap) that will trigger an artificial full GC. MinDeadThreadsSinceGC (default: 250) This option specifies a minimum number of new threads that are created between each thread-cleaning GC event. The idea is that even if we trigger a full GC, some subset of the dead threads may still remain because there are still strong references to the Thread objects. If the number dead threads that survive a full GC are higher than MaxDeadThreadsBeforeGC, we could end up triggering a full GC for every single new thread that is created. The MinDeadThreadsSinceGC option guarantees that at least a certain number of new threads have been created since the last thread-cleaning GC, even if the the number of dead threads remains above MaxDeadThreadsBeforeGC. Note that these two options correspond to the two numbers output from the Xverbose:thread output: === [INFO ][thread ] Trigging GC due to 5772 dead threads and 251 threads since last collect. === The first number, 5772, is the number of dead threads waiting to be cleaned up and is compared to the value of MaxDeadThreadsBeforeGC. The second number, 251, is the number of new threads that have been spawned since the last thread-cleaning full GC. This second number is compared to MinDeadThreadsSinceGC. Only if both of these numbers are above the set thresholds is a thread-cleaning GC triggered. If you need to reduce the frequency of thread-cleaning full collections and are not able to modify the application, I would recommend to try setting MinDeadThreadsSinceGC to a larger number. The trade-off here is that you many see an increase in native memory consumption as the JVM is not able to proactively clean up after finished threads as often. Obviously, you should do a full round of performance and load testing before using either of these options in a production environment. One more important point: these are both undocumented diagnostic options. On R28, in order to use either of these two options, you must add the -XX:+UnlockDiagnosticVMOptions option before any diagnostic options on the command line. For example, to set MinDeadThreadsSinceGC to 1000, you would use a command line like below: === $ java -XX:+UnlockDiagnosticVMOptions -XX:MaxDeadThreadsBeforeGC=1000 MyJavaApp === If you have read this entire post, understand the risks (increased native memory consumption) and are willing to thoroughly load test your application before rolling out to production, either of these options should be safe to use. (*1) For details on using the Xverbose option, please have a look at the command line reference: http://docs.oracle.com/cd/E15289_01/doc.40/e15062/optionx.htm#i1020876 (*2) Please note that this example output is from R28. The corresponding verbose output from R27 will refer to the GC cause as "Floating Dead Threads".

Introduction Sometimes we see JRockit users who experience frequent full GCs as the JVM tries to clean up after application threads that have finished executing. In this post I'll write about why...

JRockit's New Shutdown Verbose Logging Option

Have you ever had to troubleshoot an issue where the JVM seems to shutdown (in other words, disappear, but not crash), but you have no idea what triggered the shutdown? Perhaps some other process sent a SIGTERM? Or maybe some 3rd-party library you don't even have the source code for decided to call System.exit()? Another common scenario is that the last of your application's non-daemon threads have exited (normally, or as the result of an uncaught exception).For R28.3.2, I created a new verbose logging module that instruments the JVM shutdown code and tries to log details regarding each possible event that can trigger a shutdown. In JRockit, a logging module is enabled with the "-Xverbose" command-line option, so the new module "shutdown" would be enabled  with "-Xverbose:shutdown" (*). Lets see some examples of this in action.Here is what a shutdown initiated by a call to System.exit() would look like. (Please note the truncated module name.):===[INFO ][shutdow] JVM_Halt() called in response to:[INFO ][shutdow] System.exit() or direct call from JNI code===Here is an example from using the "kill <pid>" command to shutdown the JVM:===[INFO ][shutdow] JVM_Halt() called in response to:[INFO ][shutdow] Signal 15 received from PID:21152 UID:142===Here is one last example from a process where the main method has returned (and there where no other application threads still running).===[INFO ][shutdow] No remaining non-daemon Java threads.===All you need to do to get log output like the above is to run your application with a JRockit >= R28.3.2 and add "-Xverbose:shutdown" to the java command line. The logging itself is absolutely minimal and is only output during JVM shutdown. In fact, it would not be unreasonable to enable this new logging module as a proactive measure, even if you were not currently facing an unexplained shutdown issue.While there is already a wide array of tools and techniques available for dealing with this kind of scenario (SystemTap on Linux and DTrace on Solaris being two of my favorites), I thought it would be useful to have something built into the JVM itself. The main advantage is in having a simple, platform-independent way to troubleshoot these kind of issues across all of JRockit's supported platforms.It is important to understand that this new module can only output the reason for graceful shutdowns. If the JVM crashes, or is forcibly killed (for example via SIGKILL on a Posix system), the JVM will not have an opportunity to output information about the root cause of it's unfortunate demise. Outside of a JVM crash (which will hopefully leave behind other artifacts like a crash log or core file), another common cause of sudden process disappearances is forced termination by the OS. On Linux, in response to a system-wide lack of resources, a component called OOM killer will pick a specific process and forcibly kill it. As another example, on recent versions of Windows, closing the GUI window of the command line shell also seems to suddenly force any child processes to die without warning. In circumstances like these, the new verbose module will not be able to log the cause of the shutdown (although the lack of output itself would indicate either a forced shutdown or a crash, hopefully helping you to troubleshoot the issue further).Obviously, there are scenarios where you will need to use other tools or techniques to further narrow down your issue even after trying the new shutdown logging module. But hopefully having this cross-platform and easy-to-use tool built right into the JVM will come in handy for resolving simple issues and at least point you in the right direction when starting a more complicated investigation.(*) For details on using the Xverbose option, please have a look at the command line reference:http://docs.oracle.com/cd/E15289_01/doc.40/e15062/optionx.htm#i1020876

Have you ever had to troubleshoot an issue where the JVM seems to shutdown (in other words, disappear, but not crash), but you have no idea what triggered the shutdown? Perhaps some other process sent...

Finite Number of Fat Locks in JRockit

Introduction JRockit has a hard limit on the number of fat locks that can be "live" at once. While this limit is very large, the use of ever larger heap sizes makes hitting this limit more likely. In this post, I want to explain what exactly this limit is and how you can work around it if you need to. Background Java locks (AKA monitors) in JRockit basically come in one of two varieties, thin and fat. (We'll leave recursive and lazy locking out of the conversation for now.) For a detailed explanation of how we implement locking in JRockit, I highly recommend reading chapter 4 of JR:TDG. But for now, all that you need to understand is the basic difference between thin and fat locks. Thin locks are lightweight locks with very little overhead, but any thread trying to acquire a thin lock must spin until the lock is available. Fat locks are heavyweight and have more overhead, but threads waiting for them can queue up and sleep while waiting, saving CPU cycles. As long as there is only very low contention for a lock, thin locks are preferred. But if there is high contention, then a fat lock is ideal. So normally a lock will begin its life as a thin lock, and only be converted to a fat lock once the JVM decides that there is enough contention to justify using a fat lock. This conversion of locks between thin and fat is known as inflation and deflation. Limitation One of the reasons we call fat locks "heavyweight" is that we need to maintain much more data for each individual lock. For example, we need to keep track of any threads that have called wait() on it (the wait queue) and also any threads that are waiting to acquire the lock (the lock queue). For quick access to this lock information, we store this information in an array (giving us a constant lookup time). We'll call this the monitor array. Each object that corresponds to a fat lock holds an index into this array. We store this index value in a part of the object header known as the lock word. The lock word is a 32-bit value that contains several flags related to locking (and the garbage collection system) in addition to the monitor array index value (in the case of a fat lock). After the 10 flag bits, there are 22 bits left for our index value, limiting the maximum size of our monitor array to 2^22, or space to keep track of just over 4 million fat locks. Now for a fat lock to be considered "live", meaning it requires an entry in the monitor array, it's object must still be on the heap. If the object is garbage collected or the lock is deflated, it's slot in the array will be cleared and made available to hold information about a different lock. Note that because we depend on GC to clean up the monitor array, even if the object itself is no longer part of the live set (meaning it is eligible for collection), the lock information will still be considered "live" and can not be recycled until the object gets collected. So what happens when we use up all of the available slots in the monitor array? Unfortunately, we abort and the JVM exits with an error message like this: ===ERROR] JRockit Fatal Error: The number of active Object monitors has overflowed. (87)[ERROR] The number of used monitors is 4194304, and the maximum possible monitor index 4194303=== Want to see for yourself? Try the test case below. One way to guarantee that a lock gets inflated by JRockit is to call wait() on it. So we'll just keep calling wait() on new objects until we run out of slots. === LockLeak.javaimport java.util.Collections;import java.util.LinkedList;import java.util.List; public class LockLeak extends Thread {       static List<Object> list  = new LinkedList<Object>();       public static void main(String[] arg) {            boolean threadStarted = false;            for (int i = 0; i < 5000000; i++) {                  Object obj = new Object();                  synchronized(obj) {                      list.add(0, obj);                      if (!threadStarted) {                          (new LockLeak()).start();                          threadStarted = true;                      }                      try {                          obj.wait();                      } catch (InterruptedException ie) {} // eat Exception                  }            }            System.out.println("done!"); // you must not be on JRockit!            System.exit(0);      }       public void run() {            while (true) {                  Object obj = list.get(0);                  synchronized(obj) {                      obj.notify();                  }            }      } }=== (Yes, this code is not even remotely thread safe. Please don't write code like this in real life and blame whatever horrible fate that befalls you on me. Think of this code as for entertainment purposes only. You have been warned.) Resolution While this may seem like a very serious limitation, in practice it is very unlikely to see even the most demanding application hit this limit. The good news is, even if you do have a system that runs up against this limit, you should be able to tune around the issue without too much difficulty. The key point is that GC is required to clean up the monitor array. The more frequently you collect your heap, the quicker "stale" monitor information (lock information for an object that is no longer part of the live set) will be removed. As an example, one of our fellow product teams here at Oracle recently hit this limit while using a 50GB heap with a single space collector. By enabling the nursery (switching to a generational collector), they were able to completely avoid the issue. By proactively collecting short-lived objects, they avoided filling up the monitor array with entries for dead objects (that would otherwise have to wait for a full GC to be removed). One other possible solution may be to set the -XX:FatLockDeflationThreshold option to a value below the default of 50 to more aggressively deflate fat locks. While this does work well for simple test cases like LockLeak.java above, I believe that more aggressive garbage collection is more likely to resolve any issues without a negative performance impact. Either way, we have never seen anyone hit this problem that was not able to tune around the limitation very easily. It is hard to imagine that any real system will ever need more than 4 million fat locks all at once. But in all seriousness, given JRockit's current focus on stability and the lack of a use case that requires more, we are almost certainly not going to ever make the significant (read: risky) changes that removing or expanding this limit would require. The good news is that HotSpot does not seem to have a similar limitation. Conclusion You are very unlikely to ever see this issue unless you are running an application with a very large heap, a lot of lock contention, and very infrequent collections. By tuning to collect dead objects that correspond to fat locks faster, for example by enabling a young collector, you should be able to avoid this limit easily. In practice, no application today (or for the near future) will really need over 4 million fat locks at once. As long as you help the JVM prune the monitor array frequently enough, you should never even notice this limit.

Introduction JRockit has a hard limit on the number of fat locks that can be "live" at once. While this limit is very large, the use of ever larger heapsizes makes hitting this limit more likely. In...

Inflation System Properties

I wanted to write a quick post about the two inflation-related system properties: sun.reflect.inflationThreshold and sun.reflect.noInflation. There seems to be a lot of confusion on Oracle's forums (and the rest of the net) regarding their behavior. Since neither of these properties is officially documented by us, I thought an informal explanation here might help some people.There are a ton of good resources out on the net that explain inflation in detail and why we do it. I won't try to duplicate the level of detail of those efforts here. But just to recap:There are two ways for Java reflection to invoke a method (or constructor) of a class: JNI or pure-Java. JNI is slow to execute (mostly because of the transition overhead from Java to JNI and back), but it incurs zero initial cost because we do not need to generate anything; a generic accessor implementation is already built-in. The pure-Java solution runs much faster (no JNI overhead), but has a high initial cost because we need to generate custom bytecode at runtime for each method we need to call. So ideally we want to only generate pure-Java implementations for methods that will be called many times. Inflation is the technique the Java runtime uses to try and achieve this goal. We initially use JNI by default, but later generate pure-Java versions only for accessors that are invoked more times than a certain threshold. If you think this sounds similar to HotSpot method compilation (interpreter before JIT) or tiered compilation (c1 before c2), you've got the right idea.This brings us to our two system properties that influence inflation behavior:sun.reflect.inflationThresholdThis integer specifies the number of times a method will be accessed via the JNI implementation before a custom pure-Java accessor is generated. (default: 15)sun.reflect.noInflationThis boolean will disable inflation (the default use of JNI before the threshold is reached). In other words, if this is set to true, we immediately skip to generating a pure-Java implementation on the first access. (default: false)There are a few points I would like to make about the behavior of these two properties:1. noInflation does NOT disable the generation of pure-Java accessors, it disables use of the JNI accessor. This behavior is the exact opposite of what many users assume based on the name. In this case, "inflation" does not refer to the act of generating a pure-Java accessor, it refers to the 2-stage process of using JNI to try and avoid the overhead of generating a pure-Java accessor for a method that may only be called a handful of times. Setting this to true means you don't want to use JNI accessors at all, and always generate pure-Java accessors.2. Setting inflationThreshold to 0 does NOT disable the generation of pure-Java accessors. In fact, it has almost the exact opposite effect! If you set this property to 0, on the first access, the runtime will determine that the threshold has already been crossed and will generate a pure-Java accessor (which will be used starting from the next invocation). Apparently, IBM's JDK interprets this property differently, but on both of Oracle's JDKs (OracleJDK and JRockit) and OpenJDK, 0 will not disable generation of Java accessors, it will almost guarantee it. (Note that because the first invocation will still use the JNI accessor, any value of 0 or less will behave the same as a setting of 1. If you want to generate and use a pure-Java accessor from the very first invocation, setting noInflation to true is the correct way.)So there is no way to completely disable the generation of pure-Java accessors using these two system properties. The closest we can get is to set the value of inflationThreshold to some really large value. This property is a Java int, so why not use Integer.MAX_VALUE ((2^31)-1)?$ java -Dsun.reflect.inflationThreshold=2147483647 MyAppThis should hopefully meet the needs for anyone looking to prevent continuous runtime generation of pure-Java accessors.For those of you interested in all of the (not really so) gory details, the following source files (from OpenJDK) correspond to most of the behavior I have described above:jdk/src/share/classes/sun/reflect/ReflectionFactory.javajdk/src/share/classes/sun/reflect/NativeMethodAccessorImpl.javajdk/src/share/classes/sun/reflect/DelegatingMethodAccessorImpl.javajdk/src/share/classes/sun/reflect/NativeConstructorAccessorImpl.javajdk/src/share/classes/sun/reflect/DelegatingConstructorAccessorImpl.javaAs with anything undocumented, you should not rely on the behavior of these options (or even their continued existence). The idea is you should not normally need to set these properties or even understand what inflation is; it should be transparent and "just work" right out of the box. The inflation implementation could change at any point in the future without notice. But for now, hopefully this post will help prevent some of the confusion out there.

I wanted to write a quick post about the two inflation-related system properties: sun.reflect.inflationThreshold and sun.reflect.noInflation. There seems to be a lot of confusion on Oracle's forums...

JRockit R28 issue with exact profiling and generics

Some users of JRockit R28 may have noticed problems using Mission Control's exact profiling on methods that use Java's generics facility. Invocation counters for these methods would simply not respond; calling each method would fail to increment the corresponding call count.For exact method profiling in R28, we replaced our homegrown bytecode instrumentation solution with Apache's Byte Code Engineering Library (BCEL). A version of BCEL was already included in the JDK as an internal component of JAXP, and using BCEL helped provide a cleaner code generation pipeline.The problem was that while the internal version of BCEL contained in the JDK is very stable and works fine for the very narrow use cases JAXP requires, there were problems using it to instrument arbitrary code as needed by Mission Control Console's profiling tool.One of those issues was that support for Java generics was never fully implemented in BCEL. In particular, instrumenting a method with generic code could produce bytecode with inconsistencies between the LocalVariableTable (LVT) attribute and the LocalVariableTypeTable (LVTT) attribute (Please see the class file format for details). Thankfully, this issue was found and fixed (in development branches) by the BCEL project:Bug 39695 - java.lang.ClassFormatError: LVTT entry for 'local' in class file org/shiftone/jrat/test/dummy/CrashTestDummy does not match any LVT entryUnfortunately, the JDK version of BCEL predated this fix. So when JRockit tried to instrument such methods using BCEL, the new method's bytecode would be invalid and fail subsequent bytecode validation, leaving the original, uninstrumented, version of the method in place.While I briefly considered adding code on the JRockit side to work around the BCEL issue, it seemed that fixing the version of BCEL in the JDK itself was The Right Thing to do here. Unfortunately for me, the fix for bug 39695 was based on of a version of BCEL that was much more recent than the one contained in the JDK, so I needed to port over a lot of other code to get things working.JDK-8003147: port fix for BCEL bug 39695 to our copy bundled as part of jaxpMy port of the BCEL fix and other needed code went into 5u45, 6u60 and 7u40. Note that for Java 5, our only option was to put the fix into a CPU release, as we no longer provide non-CPU releases for Java 5. This means that the exact version of JRockit this fix made it into depends on the major Java version: For Java 5: R28.2.7. For Java 6: R28.2.9. As the LVT/LVTT would normally only be included with debug builds, recompiling affected classes without the -g flag should also be a viable workaround for users of earlier releases.Hopefully, not too many users were impacted by this issue. As explained very well in JR:TDG, sampling-based profiling (like that provided by Flight Recorder), is almost always a better choice than exact profiling. But this story is interesting for another reason: it is a great example of how depending on internal (not officially documented) classes within the JDK is almost always a really bad idea (*1). Even we have been known to get bit.(*1) Upcoming modularization of the Java Class Library is expected to do a better job preventing outside use of internal JDK classes. Not that it would have helped here.

Some users of JRockit R28 may have noticed problems using Mission Control's exact profiling on methods that use Java's generics facility. Invocation counters for these methods would simply not...

JRockit R27.8.1 and R28.3.1 versioning

As part of today's CPU release, JRockit R27.8.1 and R28.3.1 are now available to our support customers for download from MOS. (If you don't have a support contract, upgrading to Java 7 update 51 is the way to go.) I just wanted to post a brief comment about our versioning scheme. It seems that many people have noticed that we have increased the "minor" version numbers for both R27 and R28. For example, R28 suddenly went from R28.2.9 to R28.3.1. Please let me assure you: these are just ordinary maintenance releases, not feature releases. There is zero significance to the jump in minor version number. The reasoning behind the jump is simple: fear of breaking stuff. For as long as we have used the Rxx.y.z versioning scheme for JRockit, y and z have been single digits. For better or worse, version strings are often read and parsed by all sorts of tools, scripts, and sometimes even the Java applications themselves. While R28.2.10 may have been the most intuitive choice for today's release, we didn't want to risk breaking anyone's system that somehow depended on these numbers being single digits.So why R28.3.1 as opposed to R28.3.0? We thought that a dot zero release would sound too much like a feature release, so to help emphasize the fact that this is just another maintenance release, we went to .1 instead of .0. R27 had an even bigger sudden jump, from R27.7.7 to R27.8.1. This was done to synchronize the last version digits between R27 and R28 to make it easier to tell what versions were released at the same time (and hence contain the same security fixes). We have actually done this once before in the past, when R27 jumped from R27.6.9 to R27.7.1. Because so many JRockit users had already moved on to R28 by then, that bump seems to have gotten a lot less attention than today's release. So in summary, all recent JRockit releases (R27.6.1 and later for R27. R28.2.1 and later for R28.) are maintenance releases. If you are still using JRockit, please plan to upgrade as soon as possible to get these important fixes. (Or even better, make the move to Java 7!)

As part of today's CPU release, JRockit R27.8.1 and R28.3.1 are now available to our support customers for download from MOS. (If you don'thave a support contract, upgrading to Java 7 update 51 is...

Personal

<< "Hello, Blog!" MSGBOX >>

Welcome to my new work blog! For those of you that don't know me, a quick introduction: I am a member of the Java SE Sustaining Engineering team which is made up of the former Sun and BEA (JRockit) Java sustaining teams. My work is divided between our two JVMs (HotSpot and JRockit) and the various Java client technologies (Java2D, AWT, Swing, JavaWS, applets, Java Sound, etc.). Currently, most of my time is still spent working on JRockit. I am based out of Oracle's Akasaka, Tokyo office. In my spare time, I enjoy programming. My plan is to post regularly about anything interesting I come across related to Java or even software development in general. There will probably be a lot of posts about JRockit for the immediate future, but I am also looking forward to talking more about HotSpot as the JRockit user base continues the move to JDK7 and beyond. One of the most fun and exciting things about working on Java is that our users are programmers, just like us. Interacting with the people at user group meetings, JavaOne, and other conventions is always a complete blast. I started this blog to have another outlet to communicate with my fellow programmers. Thank you for visiting my new blog! I look forward to your comments and feedback.

Welcome to my new work blog! For those of you that don't know me, a quick introduction: I am a member of the Java SE Sustaining Engineering team which is madeup of the former Sun and BEA (JRockit)...