By meem on Jul 30, 2006
Recently, I've been hip-deep in the 105,000 lines of heavily-multithreaded code that comprise Solaris's IPv4/IPv6 implementation, finishing the bring-up of our new IP Network Multipathing (IPMP) implementation for Clearview. Along the way, I've been reminded of some collected wisdom regarding locking that could use wider dissemination:
- An object cannot synchronize its own visibility.
Most long-lived objects are put into a structure such as a hash table that allows them to be looked up at some point in the future. In order to track the number of threads that currently have an object "checked out", objects are usually reference counted, with the reference count itself being manipulated under the object's lock.
Things get tricky when the object must go away. In particular, many make the mistake of trying to synchronize the object's removal from visibility under the object's lock when the reference count reaches one. This is not possible: any thread looking up the object -- by definition -- does not yet have the object and thus cannot hold the object's lock during the lookup operation (unless the object lock is not tied to the object itself -- see (2)). Thus, another thread can race and acquire another reference at the same time the object is being destroyed, leading to incorrect behavior. Thus, whatever higher-level synchronization is used to coordinate the threads looking up the object must also be used as part of removing the object from visibility.
Once the object has been removed from visibility, an object can indeed synchronize its own destruction . The simplest approach is to have the object's destruction done by whatever thread causes the object's reference count to reach zero  -- that is, "if you're the last one out, turn off the lights". Note that this logic can be part of the standard code that decrements the object's reference count, but will be guaranteed to be unreachable until the object is removed from visibility. This is because the object's reference count will be incremented when it is made visible to another structure (e.g., the hash table providing its visibility), and that reference will remain until the object is removed from visibility.
- An object should not synchronize itself without due cause.
This is a generalization of (1). When building objects that are intended to be used in a multithreaded environment, it is tempting to build the locks into the objects themselves. For instance, a stack object might contain an internal lock to ensure that multiple threads issuing a push or pop operation simultaneously will operate without corrupting the underlying stack object or returning erroneous results. Languages like Java have promoted this sort of locking to a first-class concept through the "synchronized" keyword.
While this "works" in the small, it misses the big picture. Specifically, each of those aforementioned threads working on the stack object was performing its pushes and pops as part of accomplishing some larger task. Those larger tasks are indeed what need to be synchronized with one another, so that they appear atomic to each other as a whole. However, only the callers (the threads using the stack objects) have insight into the granularity and semantics of those tasks and the objects that comprise them -- so only they can implement that locking. However, once that locking has been implemented, any internal object locks performing similar functions become superfluous, and only end up complicating the object's implementation (e.g., to avoid recursive mutex locking).
Thus, many objects are better off leaving their synchronization to their callers, since those users will have to synchronize between each other anyway. Of course, objects will in turn use other objects -- so it's still quite likely that an object will have embedded locks. However, those locks will be used to synchronize access to the objects it's using, rather than attempting to synchronize its own use across multiple threads. In short, locks are best kept external to the data structures they manage, unless the structure itself must support high-performance concurrent access.
- A condition can be signaled without holding the condition's lock.
There is a longstanding and deeply embedded superstition that signaling a condition without holding that condition's lock will compromise correctness. In fact, Solaris's own cond_signal(3C) manpage contains:
Both functions should be called under the protection of the same mutex that is used with the condition variable being signaled. Otherwise, the condition variable may be signaled between the test of the associated condition and blocking in cond_wait(). This can cause an infinite wait.This is completely false: the thread heading into cond_wait() must have already tested the condition under the lock and concluded it was false in order to decide to cond_wait(). Since any thread changing state that would affect the condition must also be holding the lock, there is no way for the state (and thus the outcome of the test) to change beween the test and the cond_wait(), and thus any cond_signal() sent during that window would end up being spurious anyway. Accordingly, 6437070 has been filed.
All that said, it may be true that signaling the condition while holding the lock may improve overall determinism since it eliminates a possible avenue for priority inversion. It also may waste cycles since the thread being signalled may not be able to grab the lock (if the signaling thread has not yet dropped it). This recent thread on mdb-discuss contains more on this issue.
Footnotes As David Powell mentioned to me while discussing this blog entry, "This problem is frequently misrepresented as an inability to synchronize against one's own destruction To be precise, the problem is that an object can't synchronize its own visibility."
 The other common option is to have the thread removing the object to wait for the reference counts to drop, but that forces the thread to block for a potentially unbounded period of time, and should only be used if required for correctness.