By Josh Simons on Oct 13, 2008
Sun just introduced yet another chip multi-threading (CMT) SPARC system, the Sun SPARC Enterprise T5440. To me, that's a Batoka (sorry, branding police) because I can't keep the model names straight. In any case, this time we've put 256 hardware threads and up to 1/2 TeraByte of memory into a four-RU server. That works out to a thread-count of about 36 threads per vertical inch, which doesn't compete with fine Egyptian cotton, but it can beat the crap out of servers with 3X faster processor clocks. If you don't understand why, then you should take the time to grok this fact. Clock speed is dead as a good metric for assessing the value of a system. For years it has been a shorthand proxy for performance, but with multi-core and multi-threaded processors becoming ubiquitous that has all changed.
Old-brain logic would say that for a particular OpenMP problem size, a faster clock will beat a slower one. But it isn't about clock anymore--it is about parallelism and latency hiding and efficient workload processing. Which is why the T5440 can perform anywhere from 1.5X to almost 2X better on a popular OpenMP benchmark against systems with almost twice the clock rate of the T5540. Those 256 threads and 32 separate floating point units are a huge advantage in parallel benchmarks and in real-world environments in which heavy workloads are the norm, especially those that can benefit from the latency-hiding offered by a CMT processor like the UltraSPARC T2 Plus. Check out BM Seer over the next few days for more specific, published benchmark results for the T5440.
Yes, sure. If you have a single-threaded application and don't need to run a lot of instances, then the higher clock rate will help you significantly. But if that is your situation, then you have a big problem since you will no longer see clock rate increasing as it has in the past. You are surely starting to see this now with commodity CPUs slowly increasingly clock rates and embrace of multicore, but you can look to our CMT systems to understand the logical progression of this trend. It's all going parallel, baby, and it is past time to start thinking about how you are going to deal with that new reality. When you do, you'll see the value of this highly threaded approach for handling real-world workloads. You can see my earlier post on CMT and HPC for more information, including a pointer to an interesting customer analysis of the value of CMT for HPC workloads.
A pile of engineers have been blogging about both the hardware and software aspects of the new system.
Check out Allan Packer's stargate blog entry
for jump coordinates into the
Batoka T5440 blogosphere.