I was recently having a discussion on one of the OpenSolaris lists on the topic of when to use the volatile keyword, and when it was necessary to use membars.
volatile is a clue to the compiler to load the variable from memory and immediately store it back to memory. What it does not do is to tell the hardware anything. So the application can perform the store, but that store may not be immediately visible to the rest of the system. Most of the time this is not a problem - so long as the store is visible to the processor on which the thread is executing it's fine. Variability of when the store is visible to other processors may also be fine. There is one clear situation where the ordering of store operations could be a problem - and that's unlocking mutexes.
The problem here is best illustrated by the following scenario. I lock some data structure, then store new values into it, then unlock the structure. Immediately another thread comes along and uses the values in that structure. Not an uncommon situation. Unlocking a mutex is often just a case of storing a value (of zero) into the mutex structure. And here's the potential problem. In some weaker ordering architectures there is no guarantee that other processors see the stores in the same order that they are performed. So if you have
Store A followed by
Store B it may be possible for other processors to observe the change in the value of B before they see the change in the value of A. In the case of mutex unlock, the store of B would be the action that unlocked the mutex, enabling other threads to access the variable A... and there could be problems if they see the old value of A.
The solution to this is to put a membar in before the store that unlocks the mutex. You can see this happening in the OpenSolaris code:
42 \* lock_clear(lp)
43 \* - clear lock.
46 membar #LoadStore|#StoreStore
48 clrb [%o0]
The membar ensures that all the pending stores are visible to other processors before the store that releases the lock becomes visible to them.