X
  • General |
    Friday, November 23, 2012

Java @Contended annotation to help reduce false sharing

By: Dave Dice | Senior Research Scientist

See this posting by Aleksey Shipilev for details -- @Contended is something we've wanted for a long time. The JVM provides automatic layout and placement of fields. Usually it'll (a) sort fields by descending size to improve footprint, and (b) pack reference fields so the garbage collector can process a contiguous run of reference fields when tracing. @Contended gives the program a way to provide more explicit guidance with respect to concurrency and false sharing. Using this facility we can sequester hot frequently written shared fields away from other mostly read-only or cold fields. The simple rule is that read-sharing is cheap, and write-sharing is very expensive. We can also pack fields together that tend to be written together by the same thread at about the same time.

More generally, we're trying to influence relative field placement to minimize coherency misses. In a simple single-threaded environment fields that are accessed closely together in time should be placed proximally in space to promote cache locality. That is, temporal locality should condition spatial locality. Fields accessed together in time should be nearby in space. That having been said, when threads are accessing our fields concurrently we have to be careful to avoid false sharing and excessive invalidation from coherence traffic. As such, we try to cluster or otherwise sequester fields that tend to written at approximately the same time by the same thread onto the same cache line. Note that there's a tension at play: if we try too hard to minimize single-threaded capacity misses then we can end up with excessive coherency misses running in a parallel environment. In native C/C++ code it's fairly typical for programmers to use informed concurrency-aware structure layout. @Contended should give use the same capability in Java, although in native code the binding of fields to offsets happens at compile-time, while it happens at load-time for the Java. It's worth pointing out that in the general case there is no single optimal layout for both single-thread and multithreaded environments. And the ideal layout problem itself is NP-hard.

Ideally, a JVM would employ hardware monitoring facilities to detect sharing behavior and change the layout on the fly. That's a bit difficult as we don't yet have the right plumbing to provide efficient and expedient information to the JVM. Hint: we need to disintermediate the OS and hypervisor. Another challenge is that raw field offsets are used in the unsafe facility, so we'd need to address that issue, possibly with an extra level of indirection.

Finally, I'd like to be able to pack final fields together as well, as those are known to be read-only.

Join the discussion

Comments ( 12 )
  • Damon Hart-Davis Saturday, November 24, 2012

    Hmm, I like the idea of @Contented, presumably to go with relaxed memory semantics and lazy eval? B^>

    I also like the idea of @Contended and the layout dynamically chosen to fit the CPU/cache properties is kewl.

    Rgds

    Damon


  • guest Saturday, November 24, 2012

    Hi Damon, Even on a system with a strong sequentially consistent memory model we'll be exposed to issues with false sharing. That is, our concern isn't particularly related to strong or weak memory models.

    Regards, -Dave


  • Damon Hart-Davis Saturday, November 24, 2012

    Sorry, I was mainly ribbing you about the typo in your title, though I *do* also like the real thrust of this feature (and it's something I can tease my C++ addicted friends about with their old-fashioned static compilation)...

    Rgds

    Damon


  • guest Saturday, November 24, 2012

    Thanks for catching the embarrassing typo, which I just fixed! Like you, I wonder exactly what the semantics of "@contented" would mean (:>). Regards, -Dave


  • guest Tuesday, April 23, 2013

    Hi,

    Many local variables need not have to go through GC as they can be GCed when going out of scope. Is it possible to introduce this function.

    Suminda


  • Suminda Dharmasena Friday, May 3, 2013

    Hi,

    I am thinking if Java introduces defined GC timing and overload relative to code location. It will reduce the dependence on GC related issues for real time / embedded systems programming. Proper escape analysis can determine what objects can be deleted in the code.

    I am thinking it would be great if the following annotation are introduced:

    GCAtBlockEnd

    GCAtBlockExit - for loops the GC happens when the

    GCAtLastReference

    GCAtEndOfScope

    GCOnAssignment

    GCOnReturn

    GCOnGCOfContainingObject - GC on GC of aggregate / composing object

    GCOnStatementCompletion - GC parameters and return value (if not assigned) of a call when the function returns. Need to support statement level annotations.

    GCOnLastResort - Only GC at last resort before outofmemory exception - class level

    ExcludeFinalising - prevent finalizers running on the object - class level

    GCUsingCurrentThread

    GCUsingDefaultThread

    GCUsingNewThread(Priority = n)

    GCUsing(GCSystem) - all objects created in the specified scope will be GCed using specified GC - annotations need to support static references

    GCSuspend - a synchronous object can be used so that the GC threads do not run while the code block is executing

    Suminda


  • Nathan Reynolds Sunday, June 2, 2013

    In the penultimate paragraph, you raise a concern about the unsafe facility using raw field offsets. JEP 159 Enhanced Class Redefinition allows for adding and removing fields. Wouldn't JEP 159 have to deal with the unsafe facility as well?


  • guest Monday, June 3, 2013

    Regarding variables going out of scope, escape analysis should provide the desired benefit. But escape analysis is tricky and the JIT is not always able to prove non-escape.

    Regards, -Dave


  • guest Monday, June 3, 2013

    Regarding variables going out of scope, escape analysis should provide the desired benefit. But escape analysis is tricky and the JIT is not always able to prove non-escape.

    Regards, -Dave


  • guest Monday, June 3, 2013

    Hi Nathan, The raw offsets used by unsafe present a number of challenges. First, as mentioned above, we can't easily change object layout on the fly. 2nd, as you mentioned, it interacts poorly with JEP-159. I'm not certain what the expected outcomes would be regarding redefinition. That is, the JEP-159 implementation might check, for instance, to see if offsets have been queried or if unsafe code using such accesses is currently extant and reachable, and if so, deny the edit. But then again unsafe is just that, so responsibility may fall on the users of unsafe to properly hide their types.

    Regards

    Dave


  • Mohan Radhakrishnan Thursday, November 7, 2013

    What tool can I use to view false sharing and its remedy ? I didn't know how to use Oracle Studio Analyzer for Java.

    Thanks,

    Mohan


  • guest Thursday, November 7, 2013

    Hi Mohan,

    Good question. There are some conservative static tools that detect that false sharing _might be taking place, but they don't really have the ability to gauge the magnitude of the issue. They're also vulnerable to false-positives so you have to sort through the output. (Static race detection operates on similar ideas). There are also runtime approaches that try to tag areas (regions, objects, lines, pages, etc) with the ID of the last thread to write, and then try to intuit false sharing based on the ID of threads that subsequently read or write that region. The overheads and probe effect can be rather large with such approaches. Based on what I've seen, most of the viable approaches use hardware performance counters based on coherence events, and statistical sampling (with precise or semi-precise traps) to try to identify pieces of code that are generating lots of coherence traffic. This captures both true intentional sharing and false sharing. I'd recommend starting with O S Analyzer. Once you've found a suspected site in the code, you can usually change the data structure layout (@Contended or other approaches) or change the access frequency to the hot fields.

    Regards

    Dave


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
Oracle

Integrated Cloud Applications & Platform Services