X

Buck's Blog

  • February 19, 2014

Finite Number of Fat Locks in JRockit

David Buck
Principal Member of Technical Staff
Introduction

JRockit has a hard limit on the number of fat locks that can be "live"
at once. While this limit is very large, the use of ever larger heap
sizes makes hitting this limit more likely. In this post, I want to
explain what exactly this limit is and how you can work around it if you
need to.


Background

Java locks (AKA monitors) in JRockit basically come in one of two
varieties, thin and fat. (We'll leave recursive and lazy locking out of the conversation for
now.) For a detailed explanation of how we implement locking in JRockit,
I highly recommend reading chapter 4 of JR:TDG. But
for now, all that you need to understand is the basic difference between
thin and fat locks. Thin locks are lightweight locks with very little
overhead, but any thread trying to acquire a thin lock must spin until
the lock is available. Fat locks are heavyweight and have more overhead,
but threads waiting for them can queue up and sleep while waiting,
saving CPU cycles. As long as there is only very low contention for a
lock, thin locks are preferred. But if there is high contention, then a
fat lock is ideal. So normally a lock will begin its life as a thin
lock, and only be converted to a fat lock once the JVM decides that
there is enough contention to justify using a fat lock. This conversion
of locks between thin and fat is known as inflation and deflation.


Limitation

One of the reasons we call fat locks "heavyweight" is that we need to
maintain much more data for each individual lock. For example, we need
to keep track of any threads that have called wait() on it (the wait
queue) and also any threads that are waiting to acquire the lock (the
lock queue). For quick access to this lock information, we store this
information in an array (giving us a constant lookup time). We'll call
this the monitor array. Each object that corresponds to a fat lock holds
an index into this array. We store this index value in a part of the
object header known as the lock word. The lock word is a 32-bit value
that contains several flags related to locking (and the garbage
collection system) in addition to the monitor array index value (in the
case of a fat lock). After the 10 flag bits, there are 22 bits left for
our index value, limiting the maximum size of our monitor array to 2^22,
or space to keep track of just over 4 million fat locks.


Now for a fat lock to be considered "live", meaning it requires an entry
in the monitor array, it's object must still be on the heap. If the
object is garbage collected or the lock is deflated, it's slot in the
array will be cleared and made available to hold information about a
different lock. Note that because we depend on GC to clean up the
monitor array, even if the object itself is no longer part of the live
set (meaning it is eligible for collection), the lock information will
still be considered "live" and can not be recycled until the object gets
collected.


So what happens when we use up all of the available slots in the monitor
array? Unfortunately, we abort and the JVM exits with an error message
like this:


===

ERROR] JRockit Fatal Error: The number of active Object monitors has
overflowed. (87)

[ERROR] The number of used monitors is 4194304, and the maximum possible
monitor index 4194303

===


Want to see for yourself? Try the test case below. One way to guarantee
that a lock gets inflated by JRockit is to call wait() on it. So we'll
just keep calling wait() on new objects until we run out of slots.


=== LockLeak.java

import java.util.Collections;

import java.util.LinkedList;

import java.util.List;


public class LockLeak extends Thread {


      static List<Object> list  = new LinkedList<Object>();


      public static void main(String[] arg) {

            boolean threadStarted = false;

            for (int i = 0; i < 5000000; i++) {

                  Object obj = new Object();

                  synchronized(obj) {

                      list.add(0, obj);

                      if (!threadStarted) {

                          (new LockLeak()).start();

                          threadStarted = true;

                      }

                      try {

                          obj.wait();

                      } catch (InterruptedException ie) {} // eat Exception

                  }

            }

            System.out.println("done!"); // you must not be on JRockit!

            System.exit(0);

      }


      public void run() {

            while (true) {

                  Object obj = list.get(0);

                  synchronized(obj) {

                      obj.notify();

                  }

            }

      }


}

===


(Yes, this code is not even remotely thread safe. Please don't
write code like this in real life and blame whatever horrible fate that
befalls you on me. Think of this code as for entertainment purposes
only. You have been warned.)


Resolution


While this may seem like a very serious limitation, in practice it is
very unlikely to see even the most demanding application hit this limit.
The good news is, even if you do have a system that runs up against this
limit, you should be able to tune around the issue without too much
difficulty. The key point is that GC is required to clean up the monitor
array. The more frequently you collect your heap, the quicker "stale"
monitor information (lock information for an object that is no longer
part of the live set) will be removed.


As an example, one of our fellow product teams here at Oracle recently
hit this limit while using a 50GB heap with a single space collector. By
enabling the nursery (switching to a generational collector), they were
able to completely avoid the issue. By proactively collecting short-lived
objects, they avoided filling up the monitor array with entries for dead
objects (that would otherwise have to wait for a full GC to be removed).


One other possible solution may be to set the
-XX:FatLockDeflationThreshold option to a value below the default of 50
to more aggressively deflate fat locks. While this does work well for
simple test cases like LockLeak.java above, I believe that more
aggressive garbage collection is more likely to resolve any issues
without a negative performance impact.


Either way, we have never seen anyone hit this problem that was not able
to tune around the limitation very easily. It is hard to imagine that any real
system will ever need more than 4 million fat locks all at once. But in all
seriousness, given JRockit's current focus on stability and the lack of
a use case that requires more, we are almost certainly not going to ever
make the significant (read: risky) changes that removing or expanding
this limit would require. The good news is that HotSpot does not seem to
have a similar limitation.


Conclusion

You are very unlikely to ever see this issue unless you are running an
application with a very large heap, a lot of lock contention, and very
infrequent collections. By tuning to collect dead objects that
correspond to fat locks faster, for example by enabling a young
collector, you should be able to avoid this limit easily. In practice,
no application today (or for the near future) will really need over 4
million fat locks at once. As long as you help the JVM prune the monitor
array frequently enough, you should never even notice this limit.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.