Beware of the Performance of RW Locks

In my naive little mind a rw lock would represents a performant scalable construct inasmuch as WRITERS do not hold the lock for a significant amount of time. One figures that the lock would be held for short WRITERS times followed by concurrent execution of RW_READERS.

What I recently found out is quite probably well known to seasoned kernel engineer but this was new to me. So I figured it could be of interest to others.

The SETUP

So Reader/Writer locks (RW) can be used in kernel and user level code to allow multiple READERS of, for instance, a data structure, to access the structure while allowing only a single WRITER at a time within the bounds of the rwlock().

A RW locks (rwlock(9F), rwlock(3THR)) is more complex that a simple mutex. So acquiring such locks will be more expensive. This means that if the expected hold times of a lock is quite small (say to update or read 1 or 2 fields of a structure) then regular mutexes can usually do that job very well. A common programming mistake is to expect faster execution of RW locks for those cases. However when READ hold times need to be fairly long; then RW locks represent an alternative construct. With those locks we expect to have multiple READERS executing concurrently thus leading to performant code that scales to large numbers of threads. As I said, if WRITERS are just quick updates to the structure, we can naively believe that our code will scale well.

Not So

Let’s see how it goes. A WRITER lock cannot get in the protected code while READERS are executing protected code. The WRITER must then wait at the door until READERS releases their hold. If the implementation of RW locks didn’t pay attention, there would be cases in which at least one READER is always present within the protected code and WRITERS would get starved of access. To prevent such starvation, RW lock must block READERS as soon as a WRITER has requested access. But no matter, our WRITERS will quickly update the structure and we will get concurrent execution most of the time. Isn’t it ?

Well not quite. As just stated, a RW locks will block readers as soon as a WRITER has hit the door. This means that the construct does not allow parallel execution at that point. Moreover the WRITER will stay at the door while READERS are executing. So the construct stays fully serializing from the time a WRITER hits until all current READERS are done followed by the WRITERS time.
For Instance:

- a RW_READER gets in and will keep a long time. 

---|

- a RW_WRITER hits the lock; is put on hold.

   |

- other RW_READERS now also block.

   |

.... time passes

   |

- the long RW_READER releases

<----------------|

- the RW_WRITER gets the lock; work; releases

- other RW_READER now work concurrently.

Pretty obvious once you think about it. So to assess the capacity of a RW lock to allow parallel execution, one must consider the average hold time as a READER but also the frequency of access as a WRITER. The construct becomes efficient and scalable to N threads if and only if:

(avg interval between writers) >> (N \* avg read hold times).

Roundup

In the end, from a performance point of view, RW locks should be used only when the average hold times is significant in order to justify the use of this more complex type of lock: for instance, calling a function of unknown latency or issuing an I/O while holding the lock represent good candidates. But the construct will be scalable to N threads, if and only if WRITERS are very infrequent.

[T]: NiagaraCMT Solaris Sun

Beware of the Performance of RW Locks

The SETUP

Not So

Roundup

Roch Bourbonnais

Principal Performance Engineer

Runtime Token Expansion - some clarification

C++ Dynamic Linking - symbol visibility issues

Beware of the Performance of RW Locks

The SETUP

Not So

Roundup

Authors

Roch Bourbonnais

Principal Performance Engineer

Runtime Token Expansion - some clarification

C++ Dynamic Linking - symbol visibility issues