Sunday Oct 03, 2010

Memory ordering

Just had a couple of white papers published on memory ordering. This is a topic which is quite hard to find documentation on, and also quite complex. Fortunately, it's also rarely encountered.

In Oracle Solaris Studio 12.2 we introduced the file mbarrier.h. This defines some intrinsics which allow the developer to enforce memory ordering.

The first paper covers avoiding the reordering of memory operations that the compiler may perform when compiling an application. The second paper covers the more complex issue of avoiding the reordering of memory operations that the processor may do at runtime.

Tuesday Mar 02, 2010

Compiler memory barriers

I've written in the past about memory barriers. Basically a membar instruction ensures that other processors see memory operations in the order that they appear in the source code. The obvious example being a mutex lock where you want the memory operations that occurred while the lock was held to be visible to other processors before the memory operation that releases the lock.

There's actually another kind of memory ordering and that is the ordering used by the compiler. If you write:

  \*a=1;
  \*b=2;

If the compiler can determine that a and b do not alias, then there's no reason for it not to swap the stores to a and b if it thinks that will be a more optimal code pattern.

The most cross-platform way of enforcing this ordering is to put a function call between the two stores:

  \*a=1;
  reorder_barrier();
  \*b=2;

Memory needs to be the program defined state at the call, so the compiler cannot defer the store to a, and cannot hoist the store to b.

This is a great solution, but causes the overhead of a function call, and function calls can have significant costs. There are some compiler intrinsics that cause the compiler to enforce the desired memory ordering. Sun Studio 12 Update 1 supports the GCC flavour:

  \*a=1;
  asm volatile ("":::"memory");
  \*b=2;

You can test the performance overhead using the following code:

void barrier(){}

void main()
{
  for (int i=0; i<1000000000;i++)
  {
    barrier();
  }
} 

On the test system this code took about 5 seconds to run. The alternative code is:

void main()
{
  for (int i=0; i<1000000000;i++)
  {
    asm volatile ("":::"memory");
  }
}

This code took under a second.

About

Darryl Gove is a senior engineer in the Solaris Studio team, working on optimising applications and benchmarks for current and future processors. He is also the author of the books:
Multicore Application Programming
Solaris Application Programming
The Developer's Edge

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
5
6
8
9
10
12
13
14
15
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Bookmarks
The Developer's Edge
Solaris Application Programming
Publications
Webcasts
Presentations
OpenSPARC Book
Multicore Application Programming
Docs