Simplifying Concurrent Algorithms by Exploiting Hardware Transactional Memory will appear in SPAA 2010.
Hi Davegeneral question not really related to the above post but involves any algorithm whether in sw/hw that involves (work attemptedcommit rollback loop) so I guess that would include non blocking queues.
I m arguing it leads to wasted cpu and memory bandwidth under heavy load as compared to a thread that takes a fine grained lock and spins on it or better yet spins a bit and then sleeps.
For example if 20 threads tried to enqueue in a non blocking algorithm at the same time only one would win the other 9 just wasted cpu/memory in their attempts something a mere spin on locally cached memory would have avoided. local spin would have avoided extra memory traffic and sleep would ahve let other threads from other processes make meaningful progress.
Am I simply not \*read\* enough in this area and these problems are solved or is this really a problem under heavy load.
It's not at uncommon to find that pessimistic concurrency control (locks) will beat optimistic concurrency control (e.g., lock-free techniques) under some loads and on some systems. In that circumstance the usual approach is to try to reduce and throttle concurrency and see if the optimistic mechanism can still beat the lock or locks, or make the system adaptive and switch to locks.
But as usual, the answer is that "it depends" and there's no simple answer. And as you point out, coherency and communication costs play into the equation.
A rather poor analogy is that optimistic concurrency control is like CSMA-CD access protocols while pessimistic is (vaguely) like Token-ring. Optimistic techniques can often beat pessimistic under light load but pessimistic may the better choice under extreme load.
Malthusian Locks appears in EuroSys 2017. An extended version is in
arxiv.Abstract:Applications running in modern multithreaded...
Pointers: Experiences with HTM-Based Reference Counting in C++by Maria Carpen-Amarie , Dave Dice, Gaël Thomas and Pascal...
The following is inspired by some discussions with John Rose on how we might
implement atomic access to value type "tuples" for subsequent...