SPECcpu : an increasingly multi-threaded benchmark
By sprack on Oct 08, 2008
It is interesting to look at recent SPEC2006 results and compare them with the results from just a year or so ago. In addition to the normal improvements one would expect as compilers become increasingly familiar with this fairly new benchmark suite, autoparallelism is being employed to boost scores on this traditionally single-thread performance benchmark.
For SPECint, the autopar gains seem to be limited to one benchmark – libquantum, while on the FP side, there are several. Looking at libquantum:
1) August 07, 3.00 GHz Intel Dual core, 4MB L2$ (4MB between 2-cores), no autopar : libquantum 31.6 2) August 07, 3.00 GHz Intel Dual core, 4MB L2$ (4MB between 2-cores), autopar : libquantum 78.9 3) September 08, 3.20 GHz Intel Quad core, 12MB L2$ (6MB between 2-cores), autopar : libquantum 283 4) September 08, 2.66 Ghz Intel Six core, 9MB L2$ (3MB between 2-cores), autopar : libquantum 316
For libquantum there are obviously other compiler optimizations being undertaken in these latest benchmark submissions (as evidenced by various interim results e.g. here), but the comparison of (3) and (4) alone illustrate the power of threading – 50% reduction in per core L2$, and a 20% reduction in frequency, but the libquantum score improves by 10%. Indeed, if you factor out the frequency difference, the 4 to 6 score scaling is very good. It is also interesting to note that the new libquantum scores are around 10X higher than the scores for the other benchmarks. In fact, if you assume that (3) would be say 3X lower without autoparallelism support in the compiler, then the processors whole SPECint ratio score (which is a geometric mean of all of the benchmark scores) falls by over 10%. If the libquantum score is reduced to that seen in (1), then the overall score falls by almost 20%!
It is interesting to note that comparing (1) and (3), the overall score has improved from 21 to 29.3 – an almost 40% improvement. However, roughly half of that gain comes from threading libquantum! Of the remainder, some comes from frequency, some comes from compiler optimizations that significantly improve hmmer performance, and the remainder from a number of other small increases.
It is interesting that the majority of the gains on this traditionally single-threaded benchmark that we have observed in the last several processor generations come from multithreading.....
Similarly, for SPECfp, there are a number of benchmarks which benefit significantly from autoparallelism. This is not surprising, as FP workloads are typically more amenable to this style of optimization. Comparing the recent results from a quad-core Opteron both with and without autopar, it looks like there are 4 FP benchmarks that benefit significantly:
bwaves : 3X improvement
cactusADM : 7.6X improvement
gemsFDTD : 2.2X improvement
wrf : 1.48X improvement
Cumulatively, threading these 3 benchmarks delivers about a 30% improvement in the SPECfp score!