CAS and Cache Trivia - Invalidate or Update In-Place
By Dave on Mar 14, 2008
In a previous blog entry on biased locking I noted that CAS (the atomic Compare-And-Swap instruction) usually has the same semantics as store with respect to caches and the interconnect. It's worth calling out an interesting exception, however. For background, the level-1 data cache -- often shortened to L1D$ or just D$ -- of Niagara-1 and Niagara-2 processors uses a write-around policy. All stores go over the cross-bar, but if the line is also present in the D$, the store instruction (ST) will also update the line in-place. This is where CAS differs from ST. If the line targeted by the CAS is also in the the D$ of processor executing the CAS, the line will be invalidated instead of being updated in-place. Of course it's not uncommon for the line to be present in the cache given the prevalence of the LD...CAS usage idiom. More importantly, it's extremely common for a thread to access the same line in short order after a CAS. This is where the CAS-Invalidates policy can impact performance. A good example is a lock that's acquired via CAS where the lock metadata is collocated with the data covered by the lock. Obviously, after having acquire a lock a thread is quite likely to access data protected that same lock. The first data access to the just-CASed-to-line will miss in the D$. Another example would be a thread that repeatedly locks and unlocks the same lock. Assuming the lock and unlock operators both use a LD...CAS idiom, even if the line containing the lock metadata wasn't accessed within the critical section, the CAS in lock operation will cause the LD in the subsequent unlock to miss, and the CAS in unlock will cause the LD in the subsequent lock operation to miss. Thankfully on Niagara-family processors a miss in the D$ that hits in the level-2 cache can be satisfied fairly quickly. Still, we'd like to avoid that D$ miss if at all possible. I'd argue that to the extent feasible, processor designers should avoid a CAS-Invalidates policy, which I believe to be an implementation artifact and not strictly fundamental to CAS. CAS-Invalidates is cache-antagonistic to common synchronization usage.