Multicore expo available - Microparallelisation
By Darryl Gove on Apr 29, 2008
My presentation "Strategies for improving the performance of single threaded codes on a CMT system" has been made available on the OpenSPARC site.
The presentation discusses "microparallelisation", in the the context of parallelising an example loop. Microparallelisation is the aim of obtaining parallelism through assigning small chunks of work to discrete processors. Taking a step back...
With traditional parallelisation the idea is to identify large chunks of work that can be split between multiple processors. The chunks of work need to be large to amortise the synchronisation costs. This usually means that the loops have a huge trip count.
The synchronisation costs are derived from the time it takes to signal that a core has completed its work. The lower the synchronisation costs, the smaller amount of work is needed to make parallelisation profitable.
Now, a CMT processor has two big advantages here. First of all it has many threads. Secondly these threads have low latency access to a shared level of cache. The result of this is that the cost of synchronisation between threads is greatly reduced, and therefore each thread is free to do a smaller chunk of work in a parallel region.
All that's great in theory, the presentation uses some example code to try this out, and discovers, rather fortunately, that the idea also works in practice!
The presentation also covers using atomic operations rather that microparallelisation.
In summary the presentation is more research than solid science, but I hoped that presenting it would get some people thinking about non-traditional ways to extract parallelism from applications. I'm not alone in this area of work, Lawrence Spracklen is also working on it. We're both at presenting CommunityOne next week.