IBM POWER7 SPECfp_rate2006: Poor Scaling? Or Configuration Confusion?
By jhenning on Feb 22, 2010
The recently announced IBM POWER7 systems occupy the top of some SPEC benchmark lists. With the CPU chip at the center of the announcement, it is perhaps of interest to examine the POWER7 results with the SPECfp_rate2006 benchmarks, which exercise the processor, compiler, and memory hierarchy.
Scaling POWER7 from 2 to 4 chips is not impressive
As of 23-Feb-2010, IBM's best published 2-chip result and best 4-chip result for SPECfp_rate2006 are, respectively, 586 and 851. The scaling from 2 chips to 4 chips is less than 1.5x (851/586=1.452). SPEC also includes a less aggressively tuned metric, "SPECfp_rate_base2006", and for base the scaling is similar (776/531=1.461).
As can be seen in Table 1, although the 4-chip system has twice as many cores, its MHz is slightly lower. In all other dimensions listed in Table 1, it would seem to provide twice the capability of the 2-chip system:
Scaling is not uniform
The CPU2006 floating point suite contains 17 individual benchmarks. Do they show uniform scaling?
In fact, scaling is not uniform; when twice as many copies are run on twice as many chips, some of the programs scale well, while others stall out.
If you click the icon on the right for the detailed graph, you will see that seven of the tested programs (shown in red) scale relatively poorly. The poorly-scaling tests are, according to SPEC, drawn from fluid dynamics, speech recognition, physics, linear programming, and electromagnetics applications.
Explanation: memory system
The computations performed in these benchmarks exercise more than just the the chip and its caches. They are memory intensive, and will not scale well unless memory bandwidth is also scaled.
Differences between the IBM 750 vs. IBM 780
If the memory system is so important to the benchmarks that scale poorly, would that match with what we know of the 2-chip vs. 4-chip system? It would seem so. As noted in Table 1, the current best-published 2-chip result is on the Power 780, whereas the best 4-chip result is on the Power 750. The 750 uses one 4-RU box to hold up to 4 chips, whereas the 780 places only 2 chips into each 4-RU enclosure. Presumably, the extra space in the 780 is used to take better advantage of the POWER7 memory system, perhaps by using more of its memory controllers / channels.
As of 23-Feb-2010, the "IBM Web price" at IBM's Power 750 Express server browse and buy page for a 4-chip, 3.3 GHz 750 with 128 GB was $174,192.
The 4-chip system in Table 1 used 256 GB, which would add $34,080 to the cost, for a total of $208,272.
($6390/(2x16GB) \* 256 GB) - ($2130/(2x8GB) \* 128GB) = $34080
But wait, now how much would you pay?
The $208,272 price would still be for a 3.3 GHz system; but the tested system in Table 1 was 3.55 GHz, so would presumably cost noticeably more. As of 23-Feb-2010, the above referenced IBM web pages simply say "Call for price" on the 3.55 GHz model.
Finally, if you wanted a 4-chip system that scaled well for all of the SPECfp_rate2006 benchmarks when compared to the 2-chip 780, you should presumably build a 4-chip 780 rather than a 4-chip 750 - and, one presumes, the 780 will again cost noticeably more than the 750.
Disclaimer: this blogger is going solely by pricing found on the IBM web site as of 23-Feb-2010. I do not claim to be an expert in IBM pricing. The presumptions of the previous paragraphs are, IMHO, reasonable; but are unproven.
Bottom line: caveat emptor
IBM has some strong results, but if you want scaling, you have to pay attention to whether your application is hungry for memory bandwidth; and, if so, you need to pay careful attention to which model you are looking at. Try not to be confused by the different benchmarks that exercise different capabilities of the different configurations.