By sprack on Jun 04, 2009
Some great work by Krishan Yenduri has led to nice improvements in the multi-socket bulk cipher performance on UltraSPARC T2+ processors. The improvements are available in the current build, snv_117. Krishna has performance data for scaling on a 4-socket T5440 system in his recent blog. Using the same kernel umicrobenchmark, the following plot shows the scaling on a dual-socket UltraSPARC T2 Plus system:
In this test, the requesting threads are scheduled by Solaris (rather than bound to specific cores), so Solaris will tend to even distribute the threads across the 16 cores in the system – this explains by you get this rapid increase in aggregate cryptographic throughout as the number of threads is increased. If the first 8-threads where bound to core 0, the second 8 to core 1 and so on, the scaling would be essentially linear as the cores are added.
So, a 2-socket T2+ system is delivering around 9GBytes/second. Not bad, given most other dual-socket systems can deliver at max around 2GB/s. Further, from the above it is apparent that we hit 9GB/s on the T2 system with less than 50% of the HW strands being utilized.