X

An Oracle blog about Transactional locks

java.util.concurrent ReentrantLock vs synchronized() - which should you use?

Dave Dice
Senior Research Scientist

In J2SE 6 there's little difference - either should suffice in most circumstances. ReentrantLock might be more convenient if you need to implement a hand-over-hand locking protocol. A classic example of hand-over-hand locking would be a thread that traverses a linked list, locking the next node and then unlocking the current node. That's hard to express with Java's lexically balanced synchronized construct. Synchronized, however, is a mature and first-class language feature.

With respect to performance, both mechanisms are up to the task. (It's worth noting that the synchronization primitives in modern JVMs provide latency and throughput performance that is typically better than that of the native pthreads_mutex constructs). The builtin synchronized construct currently has a few advantages, such as lock coarsening, biased locking (see below) and the potential for lock elision via escape analysis. Those optimizations aren't currently implemented for ReentrantLock and friends. There's no fundamental reason, however, why a JVM couldn't eventually apply such optimizations to ReentrantLock.

The Synchronized implementation also provides adaptive spinning, whereas ReentantLock currently does not. Adaptive spinning employs a two-phase spin-then-block strategy. Briefly, on multiprocessor systems a contended synchronized enter attempt will spin briefly before blocking in order to avoid context switching. Context switching is wasted work -- it doesn't contribute toward forward progress of the application. Worse, it causes TLBs and caches to be repopulated when the blocked thread eventually resumes. (This is the so-called "cache reload transient"). The spin duration varies as a function of the success/failure ratio of recent spin attempts on that same monitor, so the mechanism adapts automatically to parallelism, current system load, application modality, critical section length, etc. In addition, we avoid spinning for a lock where the current lock owner is itself blocked and unlikely to release the lock in a timely fashion. On solaris our checks can be more refined, determining if the target thread is ONPROC (running), for instance, via the contract private thr_schedctl interface. And it should go without saying that we spin "politely", using a backoff to avoid generating excessive and wasteful traffic on the coherency bus, as well as using PAUSE on IA32 and AMD64 platforms. We'll likely add spinning support to ReentrantLock in a future release.

If you're curious about the inner workings or ReentrantLock, see Doug Lea's The java.util.concurrent Synchronizer Framework. If you're curious about adaptive spinning see synchronizer.cpp in the J2SE 6 source kit.

Finally, ReentrantLock and synchronized are equivalent with respect to the clarified Java Memory Model (JMM, JSR133).

Join the discussion

Comments ( 16 )
  • Alex Tuesday, May 15, 2007
    What about LockSupport.park() and LockSupport.unpark(<thread>) on Linux and Windows? There seem to be periodical latencies associated with unpark calls. Do other, more intelligent locking classes, such as locks, use park/unpark behind the sciences or not?
    Thank you
  • Dave Wednesday, May 30, 2007
    Hello Alex,
    All the JSR166 j.u.c operators that might need to block or wake threads are based on LockSupport.park() and unpark().
    As for synchronized, it also uses an internal form of park-unpark at the lowest levels.
    I'm curious about your comment regarding periodical latencies. Could you provide more details?
    Regards
    Dave
  • Nils Thursday, May 7, 2009

    We're also seeing periodic latencies on LockSupport.unpark(). The normal case is 2-3us, but in 5-10% of the cases we see 80-200us.


  • David Dice Saturday, May 9, 2009

    Hi Nils, if there's no thread waiting then unpark() will just set a flag -- think of a restricted range semaphore -- that a subsequent park() will consume. This particular path is obviously very fast. If there's already a thread park()ed, however, we need make a kernel call to make it runnable (ready). That can take longer and varies considerably by platform. Finally, it's not uncommon to find that the waker is preempted by the wakee, which can result in long apparent latencies. What OS are you using, and how many CPUs do you have? Regards, -Dave


  • Nils Saturday, May 9, 2009

    > if there's no thread waiting then unpark() will just set a flag

    Within what context? The entire VM?

    > If there's already a thread park()ed, however, we need make a kernel call to make it runnable (ready).

    You mean if we try to unpark a thread that's already parked (the most common case I would assume) then unpark takes a kernel call?

    >What OS are you using, and how many CPUs do you have?

    We're running CentOS on 8 and 16 core boxes.

    BTW, thanks for responding.


  • David Dice Saturday, May 9, 2009

    Hi Nils, The flag I mentioned is per-thread. Regarding your 2nd comment, if unpark() needs to wake a thread, then yes, it requires a kernel call. The latency of the call depends quite a bit on the platform and the scheduler state. On Linux (and CentOS, I presume) when unpark() needs to wake a thread it'll use pthread_cond_signal() on a condvar that's uniquely associated with the wakee thread. pthread_cond_signal() should be a thin veneer over the futex interfaces. In kernel-land making a blocked thread ready can be as simple as moving the wakee onto a dispatch queue, or as complicated as preempting the waker in preference of the wakee, or perhaps if the wakee retains affinity for some other CPU on which it recently ran, firing off inter-processor interrupts to poke that CPU into dispatching the wakee. And of course there can be contention on queues, locks, etc., so the range of possible latencies you might experience is extremely wide. The variation you're seeing doesn't stike me as excessive, particularly if there are lots of threads in play. Regards, -Dave


  • Nils Saturday, May 9, 2009

    We've actually seen latencies up to 700us, so we're talking more than two orders of magnitude difference. For us it matters :-)

    Do you know if Solaris, with it's real-time capabilities, can improve on the excessive variance we see with CentOS?


  • David Dice Saturday, May 9, 2009

    Hi Nils, If the system is saturated then almost any latency would be possible. What does "top" show for a loadaverage, by the way? If the system isn't fully utilized then I think your most likely cause would be self-preemption where the waker is displaced by the wakee. (There are more exotic but less likely explanations, such as memory issue or forced migration, but we'll ignore those). It's possible that some of the utilities in the "schedtools" packages might be of use.

    As for Solaris, the RT scheduling class is intended for just this type of problem, and we've found it works very well when used judiciously. (not surprisingly, real-time java uses the RT scheduling class).

    Regards, -Dave


  • Nils Saturday, May 9, 2009

    We have reasonably low CPU utilization, so self-preemption might be to blame. Unfortunately Java has no support for processor affinity.

    Regarding Solaris, are you saying that the stock Sun JVM won't be able to take advantage of any of the real-time capabilities in Solaris, only something specifically coded for it, like RTSJ?


  • David Dice Saturday, May 9, 2009

    Hi Nils, If you're pressed, you might try making JNI calls to native code to call sched_setaffinity(). We generally advise against such approaches (it presumes a thread model, for instance) but in practice it'll work -- explicitly dispersing your threads over the CPUs might provide relief.

    On Solaris you should be able to force any stock Sun JVM into the RT scheduling class. See the man page for the priocntl command. Similar to situation on CentOS, if necessity dictates, you could again use JNI calls to force binding, set RT priorities, use cpusets, etc., for fine-grained control.

    RTSJ goes above and beyond that by providing much stronger latency & predictability guarantees. In addition you have useful constructs such as priority inheritance locking, scoped memory, threads that are effectively immune to GC latency, etc. But it's a somewhat different programming model.

    Regards, -Dave


  • Nils Monday, May 11, 2009

    Isn't sched_setaffinity() for the entire process, i.e. VM, not per thread, which is really what we'd want?

    Yes, we've looked into RTSJ, but for our purpose it seems overkill.

    Thanks again, this has been most helpful.


  • David Dice Monday, May 11, 2009

    Hi Nils, I recall that the affinity APIs accept "gettid()" values, in which case attribute is per-thread instead of per-process. If that's indeed accurate then you might have a workable solution. Regards,-Dave


  • JP @ classpath in java Tuesday, May 17, 2011
    Nice article , you have indeed cover the topic with great details. I have also blogged my experience on java <a href="http://javarevisited.blogspot.com/2011/04/synchronization-in-java-synchronized.html">How Synchronization works in Java</a>. let me know how do you find it.
  • JP @ classpath in java Tuesday, May 17, 2011
    Nice article , you have indeed cover the topic with great details. I have also blogged my experience on java my blog. let me know how do you find it.
  • yurx Friday, December 7, 2012

    My microbenchmark tests show that ReentrantLock is ~30% faster than synchronized() construct. I made an assumption that thread collisions are very rare so I only tested single thread scenario.

    My environment is Linux 3.2.0-34-generic x86_64 GNU/Linux, Java version "1.7.0_09".


  • guest Friday, December 7, 2012

    Yurx, If you'd like, post your source in pastebin some other location and I'll see if I can find time to try it. Regards, -Dave


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.