OpenMP vs. Autopar for STREAM on SunStudio/Linux


The Sun Studio Compilers on Linux (click here to download the latest Technology Preview) is now showing good signs of maturity, stability and equivalent performance characteristics to the Solaris version. In particular, we now have OpenMP and Automatic Parallelization implemented in the compiler.
OpenMP is based on parallelization directives manually insert as defined by the de facto OpenMP standard. You can get the OpenMP API Users Guide here and this portal page has a bunch of related HPTC articles. Automatic Parallelization is inferred directly by the compiler based on opportunities found during the optimizer's analysis phase.
This article on the portal that introduces OpenMP and parallelization concepts is actually a nice way of learning about the issues encountered in parallelizing programs.
It seemed logical that I should give it a spin with the STREAM benchmark.
The STREAM Benchmark is the de facto industry standard benchmark for the measurement of computer memory bandwidth. To be fair, its not a measure of compiler performance in the strict computational sense that SPEC CPU benchmarks are, but it is also a popular enough measure of how well the compiler optimizes for memory.
The STREAM benchmark results pages are maintained here and you can download the sources here.

So, I thought I would post results for a SunFire V40z machine (4 CPU x  2.6GHz, DDR1-400 16GB (2GB DIMMs x 8) Memory):
I compiled it this way on a Linux box running SuSE 9:
For OpenMP:
cc -fast -xarch=amd64a -xvector=simd -xprefetch -xprefetch_level=3 -xopenmp stream_d_omp.c second.c
For Automatic parallelization:
cc -fast -xarch=amd64a -xvector=simd -xprefetch -xprefetch_level=3 -xautopar stream_d.c second.c

Heres what I got for various levels of OMP and Parallelization scaling:

setenv OMP_NUM_THREADS 4  vs. setenv PARALLEL 4
Function
OpenMP Rate (MB/s)
Parallel Rate (MB/s)
Copy
17660.2274 
18120.3922
Scale
17467.1692 18108.1662
Add
17750.5371 17758.3657
Triad
17731.7766 17626.2119

setenv OMP_NUM_THREADS 2  vs. setenv PARALLEL 2
Function
OpenMP Rate (MB/s)
Parallel Rate (MB/s)
Copy
9029.7180  9211.2915
Scale
8789.0595 9169.1302
Add
9082.2661 9090.8784
Triad
9072.0346 9066.7234


setenv OMP_NUM_THREADS 1  vs. setenv PARALLEL 1
Function
OpenMP Rate (MB/s)
Parallel Rate (MB/s)
Copy
4559.0261 4657.2653
Scale
4425.3925 4614.3545
Add
4621.6104 4628.7296
Triad
4617.5824 4627.3465

These numbers are virtually identical to the numbers on Solaris.
The conclusion: the compiler is finding parallelization opportunities on-par with hand-inserted OpenMP directives, which is indeed good news on Linux. The same is true on Solaris, BTW
I am going to measure these with competitive offerings as well (Intel, GCC, PGI and Pathscale) and will post those results separately.
Comments:

Looks promising. I just tried doing something similar with SunStudio 12, an Intel Core 2 Quad and a PIC (particle in cell) code and I don't get any acceleration. The OS is Suse 10.2 (x86_64).

Posted by Danny Holstein on October 17, 2007 at 04:19 AM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

I have worked with Sun and Oracle for 25 years now; in compilers and tools organization for most of these years followed by a couple of years in Cloud Computing. I am now in ISV Engineering, where our primary task is to improve synergy between Oracle Sun Systems and our rich ISV ecosystem

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Interesting Links