Thursday Apr 14, 2011

C++ FAQ

The Solaris Studio C++ FAQ has been updated here.

Thursday Jun 03, 2010

We're Back! With a New Release!

Oracle Solaris Studio Express 6/10 release is now available .. follow this link.

Lots of updates and new compiler features, like optimizations for the latest SPARC and x86/x64 platforms, new dbx commands for debugging OpenMP programs, and new performance analyzer features, like the ability to compare two runtime experiment.

Glad to be back! Stay tuned.

Monday Feb 16, 2009

Feedback

Another useful optimization option available with Sun Studio compilers is profile feedback.

This option can be especially helpful with codes that contain a lot of branching. The compiler is unable to determine from the source code alone which branches in an IF or CASE statement are the most likely to be taken. Using the profile feedback feature, you can run an instrumented version of the code using typical data to collect statistics on code coverage and branching, and then recompile the code using this collected data.

Darryl Gove has a great description of profile feedback in his book Solaris Application Programming.

With profile feedback, the compiler is better able to do certain optimizations that it cannot do by just analyzing the source code:

  • Layout the compiled code so that branches are rarely taken. The most frequent branches "fall-through" to the next memory location, avoiding a fetch and branch to a distant location.
  • Inline routines called many times. This avoids costly function calls.
  • Move infrequently executed code out of the "hot" parts of the code. This improves utilization of the instruction cache.
  • Lots more optimizations based on how variables are and are not utilized, based on the mostly likely paths the program will take

Of course, all these optimizations will depend on the typicality of the test data collected in the profile. Some cases it might be useful to identify a set of "typical data", collect data for each set, and compile multiple versions using each profile. Of course, this all depends on the application.

To use profile feedback, the compilation is in three phases:

  1. Compile with -xprofile=collect to produce an instrumented executable.
  2. Run the instrumented executable with a typical data set to create a performance profile.
  3. Recompile with -xprofile=use and -xO5 to produce the optimized executable

 % cc -xO3 -xprofile=collect:/tmp/profile myapp.c
 % a.out
 % cc -xO5 -xprofile=use:/tmp/profile -o myapp myapp.c


Read about profile feedback in the compiler man pages: C++, C, Fortran

Tuesday Jan 13, 2009

What Am I Compiling For?

It's worth thinking about the target processor you intend your code to run on. If performance is not an issue, then you can go with whatever default the compiler offers. But overall performance will improve if you can be more specific about the target hardware.

Both SPARC and x86 processors have 32-bit and 64-bit modes. Which is best for your code? And are you letting the compiler generate code that utilizes the full instruction set of the target processor?

32-bit mode is fine for most applications, and it will run even if the target system is running in 64-bit mode. But the opposite is not true .. to run an application compiled for 64-bit it must be run on a system with a 64-bit kernel, it will get errors on a 32-bit system.

How do you find out if the (Solaris) system you're running on is in 32-bit or 64-bit mode? Use the isainfo -k command:

 >isainfo -v
64-bit sparcv9 applications
        vis2 vis
32-bit sparc applications
        vis2 vis v8plus div32 mul32

This SPARC system is running in 64-bit mode. The command also tells me that this processor has the VIS2 instruction set.

On another system, isainfo reports this:

 >isainfo -v
64-bit amd64 applications
    sse2 sse fxsr amd_3dnowx amd_3dnow amd_mmx mmx cmov amd_sysc cx8 tsc fpu
32-bit i386 applications
    sse2 sse fxsr amd_3dnowx amd_3dnow amd_mmx mmx cmov amd_sysc cx8 tsc fpu

On UltraSPARC systems, the only advantage to running a code in 64-bit mode is the ability to access very large address spaces. Otherwise there is very little performance gain, and some codes might even run slower. On x86/x64 systems, there is the added advantage of being able to utilize additional machine instructions and additional registers. For both, compiling for 64-bit may increase the binary size of the program (long data and pointers become 8 instead of 4 bytes). But if you're intending your code to run on x86/x64 systems, compiling for 64-bit is probably a good idea. It might even run faster.

So how do you do it?

The compiler options -m64 and -m32 specify compiling for 64-bit or 32-bit execution. And it's important to note that 64-bit and 32-bit objects and libraries cannot be intermixed in a single executable. Also, on Solaris systems -m32 is the default, but on 64-bit x64 Linux systems -m64 -xarch=sse2 is the default.

>f95 -m32 -o ran ran.f
>file ran
ran:    ELF 32-bit LSB executable 80386 Version 1 [FPU], dynamically linked, not stripped
>f95 -m64 -o ran64 ran.f
>file ran64
ran64:  ELF 64-bit LSB executable AMD64 Version 1 [SSE FXSR FPU], dynamically linked, not stripped

It's also most helpful to tell the compiler what processor you're intend to run the application on. The default is to produce a generic binary that will run well on most current processors. But that leaves out a lot of opportunities for optimization. As newer and newer processors are made available, new machine instructions or other hardware features are added to the basic architecture to improve performance. The compiler needs to be told whether or not to utilize these new features. However this can produce backward incompatibilities, rendering the binary code unable to run on older systems. To handle this, application developers will make various binary versions available for current and legacy platforms.

For example, if you compile with the -fast option, the compiler will generate the best code it can for the processor it is compiling on. -fast   includes -xtarget=native. You can override this choice by adding a different -xtarget after the -fast option on the command line (the command line is processed from left to right).  For example, to compile for an UltraSPARC T2 system when that is not the native system you are compiling on, use -fast -xtarget=ultraT2.

New processors appear on the scene often. And with each new release of the Sun Studio compilers, the list of -xtarget options expands to handle them.  These new processor values are usually announced in the Sun Studio compiler READMEs. Tipping the compiler about the target processor helps performance.

More about -xtarget and what it means next time.

(For details, check the compiler man pages)

Thursday Jan 08, 2009

Optimization Shortcut with -fast

So if I've got a code and I've already compiled it without any real options, so I know it will compile, where do I start with trying to get the best performance?

Well, the Sun Studio compilers have many options for performance optimization. You can try them all one by one and see what works. 

Or, you can start off by compiling with -fast.

-fast is a macro -- it's a set of options that are all invoked simultaneously. Some of the options that it uses can be problematic for some codes. Also, compiling with -fast may increase compile time. But the resulting executable should run faster than compiling with default options for most codes.

Also, the set of options that make up -fast are different for each compiler and on whether you're compiling on a SPARC or x86/x64 processor.

One way to see what the component options of -fast are is by using the compiler's -dryrun or -# options

For example, on a SPARC Solaris system:

edgard:/home/rchrd<42>f95 -dryrun -fast | grep ###
###     command line files and options (expanded):
### -dryrun -xO5 -xarch=sparcvis2 -xcache=64/32/4:1024/64/4 -xchip=ultra3i -xpad=local -xvector=lib -dalign -fsimple=2 -fns=yes -ftrap=common -xlibmil -xlibmopt -fround=nearest

edgard:/home/rchrd<43>CC -dryrun -fast | grep ###
###     command line files and options (expanded):
### -dryrun -xO5 -xarch=sparcvis2 -xcache=64/32/4:1024/64/4 -xchip=ultra3i -xmemalign=8s -fsimple=2 -fns=yes -ftrap=%none -xlibmil -xlibmopt -xbuiltin=%all -D__MATHERR_ERRNO_DONTCARE


On my AMD64 OpenSolaris laptop we see:

FerrariOS:/export/home/rchrd<25>CC -dryrun -fast | grep ###
###     command line files and options (expanded):
### -dryrun -xO5 -xarch=sse3a -xcache=64/64/2:1024/64/16 -xchip=opteron -xdepend=yes -fsimple=2 -fns=yes -ftrap=%none -xlibmil -xlibmopt -xbuiltin=%all -D__MATHERR_ERRNO_DONTCARE -nofstore -xregs=frameptr -Qoption CC -iropt -Qoption CC -xcallee64

FerrariOS:/export/home/rchrd<22>cc -fast -# no.c |& grep ###
###     command line files and options (expanded):
### -D__MATHERR_ERRNO_DONTCARE -fns -nofstore -fsimple=2 -fsingle -xalias_level=basic -xarch=sse3a -xbuiltin=%all -xcache=64/64/2:1024/64/16 -xchip=opteron -xdepend -xlibmil -xlibmopt -xO5 -xregs=frameptr no.c



The particular options are chosen to get the best performance on the host platform ... so this assumes that you're going to run the executable binary on the same processor that compiled it.

I have one computationally intensive Fortran 95 program that runs on an UltraSPARC IIIi system in 54.4 seconds using just default compiler options. Just adding -fast to the compile command line gives me an executable that runs in only 12.2 seconds .. almost one-fifth the time.  The same program on my AMD64 laptop runs one-third as fast with -fast than without it.

But you do have to be careful. Check the manuals, which caution:

Because -fast invokes -dalign, -fns, -fsimple=2, programs compiled with -fast can result in nonstandard floating-point arithmetic, nonstandard alignment of data, and nonstandard ordering of expression evaluation. These selections might not be appropriate for most programs.

Looks like we we may have some more explaining to do.

 
  

When Run Was A Compiler

Back in the day (I mean around 1965), the Fortran compiler for the CDC 6600 (the supercomputer of the moment, pictured at the left) was called "run".

Odd choice perhaps. Seymour Cray, the 6600 designer, and Garner McCrossen, the programmer of the run compiler, figured that all you needed to put on a command line (actually a punched card in the control deck) was

run

and the system would invoke the compiler to read the Fortran source cards in the deck and run the program.

There were no compiler options of any significance.

The compiler was written in assembly language for the 6600 and was a remarkable piece of code.

Click on the photo and it will take  you to a Google search for more images of the CDC 6600. (Back in the day, I was a systems programmer at NYU on the serial 4 machine in 1965, maintaining the run compiler and library).

Wednesday Jan 07, 2009

Options Not Optional

Compiler options can be mysterious. They can have kind of a "don't go there" mystique about them. But actually they're there to help. 

There are some things the compiler can't do without help from the programmer. So that's when the compiler designers say "let's leave it to the programmer and create an option". Options also accumulate with time, so that's why there are so many of them. Some are "legacy" options, needed for certain situations that rarely come up these days. But the rest are really quite useful, and can greatly improve the kind of code the compiler generates from your source.

There are compiler command-line options for various things, like code optimization levels and run-time performance, parallelization, numeric and floating-point issues, data alignment, debugging, performance profiling, target processor and instruction sets, source code conventions, output mode, linker and library choices, warning and error message filtering, and more. Choosing the right set of options to compile with can make a great difference on how your code performs on a variety of platforms.

Darryl Gove has a great article on selecting the right compiler options.

Over the next couple of weeks I'll be taking a look at individual compiler options, dissecting them one at a time.

 

In the meantime, you can find a detailed list of Sun Studio 12 compiler options organized by function and source language here.
About


Deep thoughts on compiling C, C++, and Fortran codes with Oracle Solaris Studio compilers, especially optimization and parallelization, from the Solaris Studio documentation lead, Richard Friedman. Email him at
Richard dot Friedman at Oracle dot com

When Run Was A Compiler

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today