Sun Studio Trounces Intel Compiler on Intel Chip

Today Sun announces a new world record for SPECfp2006: 50.4 on a 2-chip Nehalem (Intel Xeon X5570) Sun Blade X6270.

Congratulations to my colleagues in the Sun Studio Compiler group - the fun thing about this result is that it beats Intel's own compiler on this Intel chip by 20%, due to the optimization technologies found in the Sun Studio 12 Update 1 compiler.


System Processors Performance Results Comments
Type GHz Chips Cores Peak Base
Sun Blade X6270 Xeon 5570 2.93 2 8 50.4 45.0 New
Hitachi BladeSymphony BS2000 Xeon 5570 2.93 2 8 42.0 39.3 Top result at as of 14 Apr 2009
IBM Power 595 POWER6 5.00 1 1 24.9 20.1 Best POWER6 as of 14 Apr 2009

Note that even with the less aggressive "Base" tuning [SPECfp_base2006] the Sun Blade X6270 beats the best-posted "Peak" tuning from competitors [SPECfp2006].

Of course, the Intel compiler engineers are bright folks too, and they will no doubtless quickly provide additional performance on Nehalem. Still, it's fun to see the multi-target Sun Studio optimization technology deliver top results on a variety of platforms, now including Nehalem.

As to integer performance - the Sun Blade also takes top honors there [for peak]:


System Processors Performance Results Comments
Type GHz Chips Cores Peak Base
Sun Blade X6270 Xeon 5570 2.93 2 8 36.9 32.0 New
Fujitsu Celsius R570 Xeon 5570 2.93 2 8 36.3 32.2 Top SPECint2006 result as of 14 Apr 2009

The Sun Blade results have been submitted to SPEC for review, and should appear at SPEC's website in about 2 weeks.

On a personal note, this was my first time using OpenSolaris. The level of compatibility with other operating systems is substantially improved; and utilities that this tester likes having handy are built in (e.g. the NUMA Observability Tools); and ZFS zips right along, needing less attention than ufs and delivering better performance.

SPEC, SPECint, SPECfp reg tm of Standard Performance Evaluation Corporation. Competitive results from as of 4/14/2009.


While SPECint2006/SPECfp2006 are considered "speed" metrics, they aren't necessarily single-threaded metrics any longer right? When I look-up the IBM SPECpf2006 result it says Auto Parallel: No, and the Hitachi result says Auto Parallel: Yes. Was the X6270 result also parallel?

Posted by rick jones on April 16, 2009 at 03:36 PM EDT #

Why do you draw the difference? As long as we dont put out N cpu results and claim that you get the same performance with fewer CPUs (1), its all legal and moral, isnt it?
Used to be, that parallelism was "special". Now with multiple cores, do you really think its fair NOT to use the other cores on the socket to claim "non-parallel" performance.
IMO, if IBM's numbers for parallel were good, they would no doubt be posting them. After all, SIMD automatic-microvectorization etc are also "parallelization" technologies. Using the entire real-estate on the chip isnt a crime, is it? [You arent asking IBM what FP result they would get if they used only one FP unit instead of both, are you?]
Seems to me, the results are fair measure of the compiler's ability to generate code that fully utilizes all silicon on the chip.
I think there is one fair angle to asking this question and that is: CPU benchmarks tend to be really well analyzed and every parallelism in it squeezed; in real apps, thats probably not possible as well, so how well does it reflect those apps that cant be parallelized? Beats me. Like with any real app, look to the right comparator with CPU2006 to get realistic performance for your app, not just broad, or even best numbers. The best numbers are a guide of the systems potential (HW, OS,Compiler combination), not a guarantee.
IMO the team has done a bangup job of delivering these numbers. Particularly on x86 where the playing field is level and killer competition exists for the platform (with GCC, Intel, PGI and PathScale all providing great optimizations).

Posted by Vijay Tatkar on April 19, 2009 at 07:36 AM EDT #

The claimed result is indeed with auto parallel = yes but to be fair the claim is sound, because so was the comparison run on the same hardware with the Intel compiler.

However, this comes at a cost. The results for the throughput version of the same benchmark are not so marked. In fact, Sun's SPECfp_rate_base2006 is a few points \*worse\* (182 vs 187), and the SPECfp_rate2006 (which permits the vendor to be as aggressive as they can) only slightly better (197 vs 194). So for heavy server workloads they start to look very similar.

So clearly SS12u1 is doing a great job of using single-thread resources, but that benefit mostly disappears when the CPU is more completely engaged on multiple tasks.

It would be interesting to learn at what kind of %age saturation the Intel compiler gains parity the Sun compiler.

Posted by Kosh on June 25, 2009 at 10:27 PM EDT #

Post a Comment:
  • HTML Syntax: NOT allowed

« April 2014