java.util.concurrent ReentrantLock vs synchronized() - which should you use?

In J2SE 6 there's little difference - either should suffice in most circumstances. ReentrantLock might be more convenient if you need to implement a hand-over-hand locking protocol. A classic example of hand-over-hand locking would be a thread that traverses a linked list, locking the next node and then unlocking the current node. That's hard to express with Java's lexically balanced synchronized construct. Synchronized, however, is a mature and first-class language feature.

With respect to performance, both mechanisms are up to the task. (It's worth noting that the synchronization primitives in modern JVMs provide latency and throughput performance that is typically better than that of the native pthreads_mutex constructs). The builtin synchronized construct currently has a few advantages, such as lock coarsening, biased locking (see below) and the potential for lock elision via escape analysis. Those optimizations aren't currently implemented for ReentrantLock and friends. There's no fundamental reason, however, why a JVM couldn't eventually apply such optimizations to ReentrantLock.

The Synchronized implementation also provides adaptive spinning, whereas ReentantLock currently does not. Adaptive spinning employs a two-phase spin-then-block strategy. Briefly, on multiprocessor systems a contended synchronized enter attempt will spin briefly before blocking in order to avoid context switching. Context switching is wasted work -- it doesn't contribute toward forward progress of the application. Worse, it causes TLBs and caches to be repopulated when the blocked thread eventually resumes. (This is the so-called "cache reload transient"). The spin duration varies as a function of the success/failure ratio of recent spin attempts on that same monitor, so the mechanism adapts automatically to parallelism, current system load, application modality, critical section length, etc. In addition, we avoid spinning for a lock where the current lock owner is itself blocked and unlikely to release the lock in a timely fashion. On solaris our checks can be more refined, determining if the target thread is ONPROC (running), for instance, via the contract private thr_schedctl interface. And it should go without saying that we spin "politely", using a backoff to avoid generating excessive and wasteful traffic on the coherency bus, as well as using PAUSE on IA32 and AMD64 platforms. We'll likely add spinning support to ReentrantLock in a future release.

If you're curious about the inner workings or ReentrantLock, see Doug Lea's The java.util.concurrent Synchronizer Framework. If you're curious about adaptive spinning see synchronizer.cpp in the J2SE 6 source kit.

Finally, ReentrantLock and synchronized are equivalent with respect to the clarified Java Memory Model (JMM, JSR133).


What about LockSupport.park() and LockSupport.unpark(<thread>) on Linux and Windows? There seem to be periodical latencies associated with unpark calls. Do other, more intelligent locking classes, such as locks, use park/unpark behind the sciences or not? Thank you

Posted by Alex on May 15, 2007 at 09:18 AM EDT #

Hello Alex, All the JSR166 j.u.c operators that might need to block or wake threads are based on LockSupport.park() and unpark(). As for synchronized, it also uses an internal form of park-unpark at the lowest levels. I'm curious about your comment regarding periodical latencies. Could you provide more details? Regards Dave

Posted by Dave on May 30, 2007 at 02:07 AM EDT #

We're also seeing periodic latencies on LockSupport.unpark(). The normal case is 2-3us, but in 5-10% of the cases we see 80-200us.

Posted by Nils on May 07, 2009 at 07:12 AM EDT #

Hi Nils, if there's no thread waiting then unpark() will just set a flag -- think of a restricted range semaphore -- that a subsequent park() will consume. This particular path is obviously very fast. If there's already a thread park()ed, however, we need make a kernel call to make it runnable (ready). That can take longer and varies considerably by platform. Finally, it's not uncommon to find that the waker is preempted by the wakee, which can result in long apparent latencies. What OS are you using, and how many CPUs do you have? Regards, -Dave

Posted by David Dice on May 09, 2009 at 12:44 PM EDT #

> if there's no thread waiting then unpark() will just set a flag

Within what context? The entire VM?

> If there's already a thread park()ed, however, we need make a kernel call to make it runnable (ready).

You mean if we try to unpark a thread that's already parked (the most common case I would assume) then unpark takes a kernel call?

>What OS are you using, and how many CPUs do you have?
We're running CentOS on 8 and 16 core boxes.

BTW, thanks for responding.

Posted by Nils on May 09, 2009 at 12:53 PM EDT #

Hi Nils, The flag I mentioned is per-thread. Regarding your 2nd comment, if unpark() needs to wake a thread, then yes, it requires a kernel call. The latency of the call depends quite a bit on the platform and the scheduler state. On Linux (and CentOS, I presume) when unpark() needs to wake a thread it'll use pthread_cond_signal() on a condvar that's uniquely associated with the wakee thread. pthread_cond_signal() should be a thin veneer over the futex interfaces. In kernel-land making a blocked thread ready can be as simple as moving the wakee onto a dispatch queue, or as complicated as preempting the waker in preference of the wakee, or perhaps if the wakee retains affinity for some other CPU on which it recently ran, firing off inter-processor interrupts to poke that CPU into dispatching the wakee. And of course there can be contention on queues, locks, etc., so the range of possible latencies you might experience is extremely wide. The variation you're seeing doesn't stike me as excessive, particularly if there are lots of threads in play. Regards, -Dave

Posted by David Dice on May 09, 2009 at 01:13 PM EDT #

We've actually seen latencies up to 700us, so we're talking more than two orders of magnitude difference. For us it matters :-)

Do you know if Solaris, with it's real-time capabilities, can improve on the excessive variance we see with CentOS?

Posted by Nils on May 09, 2009 at 01:18 PM EDT #

Hi Nils, If the system is saturated then almost any latency would be possible. What does "top" show for a loadaverage, by the way? If the system isn't fully utilized then I think your most likely cause would be self-preemption where the waker is displaced by the wakee. (There are more exotic but less likely explanations, such as memory issue or forced migration, but we'll ignore those). It's possible that some of the utilities in the "schedtools" packages might be of use.

As for Solaris, the RT scheduling class is intended for just this type of problem, and we've found it works very well when used judiciously. (not surprisingly, real-time java uses the RT scheduling class).

Regards, -Dave

Posted by David Dice on May 09, 2009 at 01:40 PM EDT #

We have reasonably low CPU utilization, so self-preemption might be to blame. Unfortunately Java has no support for processor affinity.

Regarding Solaris, are you saying that the stock Sun JVM won't be able to take advantage of any of the real-time capabilities in Solaris, only something specifically coded for it, like RTSJ?

Posted by Nils on May 09, 2009 at 02:22 PM EDT #

Hi Nils, If you're pressed, you might try making JNI calls to native code to call sched_setaffinity(). We generally advise against such approaches (it presumes a thread model, for instance) but in practice it'll work -- explicitly dispersing your threads over the CPUs might provide relief.

On Solaris you should be able to force any stock Sun JVM into the RT scheduling class. See the man page for the priocntl command. Similar to situation on CentOS, if necessity dictates, you could again use JNI calls to force binding, set RT priorities, use cpusets, etc., for fine-grained control.

RTSJ goes above and beyond that by providing much stronger latency & predictability guarantees. In addition you have useful constructs such as priority inheritance locking, scoped memory, threads that are effectively immune to GC latency, etc. But it's a somewhat different programming model.

Regards, -Dave

Posted by David Dice on May 09, 2009 at 02:50 PM EDT #

Isn't sched_setaffinity() for the entire process, i.e. VM, not per thread, which is really what we'd want?

Yes, we've looked into RTSJ, but for our purpose it seems overkill.

Thanks again, this has been most helpful.

Posted by Nils on May 11, 2009 at 01:30 AM EDT #

Hi Nils, I recall that the affinity APIs accept "gettid()" values, in which case attribute is per-thread instead of per-process. If that's indeed accurate then you might have a workable solution. Regards,-Dave

Posted by David Dice on May 11, 2009 at 03:07 AM EDT #

Nice article , you have indeed cover the topic with great details. I have also blogged my experience on java <a href="">How Synchronization works in Java</a>. let me know how do you find it.

Posted by JP @ classpath in java on May 17, 2011 at 03:03 AM EDT #

Nice article , you have indeed cover the topic with great details. I have also blogged my experience on java my blog. let me know how do you find it.

Posted by JP @ classpath in java on May 17, 2011 at 03:04 AM EDT #

My microbenchmark tests show that ReentrantLock is ~30% faster than synchronized() construct. I made an assumption that thread collisions are very rare so I only tested single thread scenario.

My environment is Linux 3.2.0-34-generic x86_64 GNU/Linux, Java version "1.7.0_09".

Posted by yurx on December 07, 2012 at 12:20 PM EST #

Yurx, If you'd like, post your source in pastebin some other location and I'll see if I can find time to try it. Regards, -Dave

Posted by guest on December 07, 2012 at 05:07 PM EST #

Post a Comment:
  • HTML Syntax: NOT allowed

Dave is a senior research scientist in the Scalable Synchronization Research Group within Oracle Labs : Google Scholar.


« January 2017