Simplifying Concurrent Algorithms by Exploiting Hardware Transactional Memory will appear in SPAA 2010.
Permanent link to this entry
general question not really related to the above post but involves any algorithm whether in sw/hw that involves (work attemptedcommit rollback loop) so I guess that would include non blocking queues.
I m arguing it leads to wasted cpu and memory bandwidth under heavy load as compared to a thread that takes a fine grained lock and spins on it or better yet spins a bit and then sleeps.
For example if 20 threads tried to enqueue in a non blocking algorithm at the same time only one would win the other 9 just wasted cpu/memory in their attempts something a mere spin on locally cached memory would have avoided. local spin would have avoided extra memory traffic and sleep would ahve let other threads from other processes make meaningful progress.
Am I simply not \*read\* enough in this area and these problems are solved or is this really a problem under heavy load.
on May 02, 2010 at 09:21 AM EDT
It's not at uncommon to find that pessimistic concurrency control (locks) will beat optimistic concurrency control (e.g., lock-free techniques) under some loads and on some systems. In that circumstance the usual approach is to try to reduce and throttle concurrency and see if the optimistic mechanism can still beat the lock or locks, or make the system adaptive and switch to locks.
But as usual, the answer is that "it depends" and there's no simple answer. And as you point out, coherency and communication costs play into the equation.
A rather poor analogy is that optimistic concurrency control is like CSMA-CD access protocols while pessimistic is (vaguely) like Token-ring. Optimistic techniques can often beat pessimistic under light load but pessimistic may the better choice under extreme load.
on May 03, 2010 at 07:06 AM EDT
Dave is a senior research scientist in the Scalable Synchronization Research Group within Oracle Labs : Google Scholar.