Wednesday Jan 21, 2009

Not So Simple - The -fsimple Option

As mentioned earlier, the -fast compiler option is a good way to start because it is a combination of options that result in good execution performance.

But one of the options included in -fast  is  -fsimple=2. What does this do?

Unless directed to, the compiler does not attempt to simplify floating-point computations (the default is -fsimple=0). -fsimple=2 enables the optimizer to make aggressive simplifications with the understanding that this might cause some programs to produce slightly different results due to rounding effects. 

Here's what the man page says:

-fsimple=1 

Allow conservative simplifications. The resulting code does not strictly conform to IEEE 754, but numeric results of most programs are unchanged.

With -fsimple=1, the optimizer can assume the following:

  • IEEE 754 default rounding/trapping modes do not change after process initialization.

  • Computations producing no visible result other than potential floating point exceptions may be deleted.

  • Computations with NaNs (“Not a Number”) as operands need not propagate NaNs to their results; for example, x\*0 may be replaced by 0.

  • Computations do not depend on sign of zero.

With -fsimple=1, the optimizer is not allowed to optimize completely without regard to roundoff or exceptions. In particular, a floating–point computation cannot be replaced by one that produces different results with rounding modes held constant at run time.

–fsimple=2

In addition to —fsimple=1, permit aggressive floating point optimizations. This can cause some programs to produce different numeric results due to changes in the way expressions are evaluated. In particular, the standard rule requiring compilers to honor explicit parentheses around subexpressions to control expression evaluation order may be broken with -fsimple=2. This could result in numerical rounding differences with programs that depend on this rule.

For example, with -fsimple=2, the compiler may evaluate C-(A-B) as (C-A)+B, breaking the standard’s rule about explicit parentheses, if the resulting code is better optimized. The compiler might also replace repeated computations of x/y with x\*z, where z=1/y is computed once and saved in a temporary, to eliminate the costly divide operations.

Programs that depend on particular properties of floating-point arithmetic should not be compiled with -fsimple=2.

Even with -fsimple=2, the optimizer still is not permitted to introduce a floating point exception in a program that otherwise produces none.

So if you use -fast, some programs that are numerically unstable might get different results than when not compiled with -fast. If this happens to your program, you can experiment by overriding the -fsimple=2 component of -fast by compiling with    -fast -fsimple=0

Tuesday Jan 13, 2009

What Am I Compiling For?

It's worth thinking about the target processor you intend your code to run on. If performance is not an issue, then you can go with whatever default the compiler offers. But overall performance will improve if you can be more specific about the target hardware.

Both SPARC and x86 processors have 32-bit and 64-bit modes. Which is best for your code? And are you letting the compiler generate code that utilizes the full instruction set of the target processor?

32-bit mode is fine for most applications, and it will run even if the target system is running in 64-bit mode. But the opposite is not true .. to run an application compiled for 64-bit it must be run on a system with a 64-bit kernel, it will get errors on a 32-bit system.

How do you find out if the (Solaris) system you're running on is in 32-bit or 64-bit mode? Use the isainfo -k command:

 >isainfo -v
64-bit sparcv9 applications
        vis2 vis
32-bit sparc applications
        vis2 vis v8plus div32 mul32

This SPARC system is running in 64-bit mode. The command also tells me that this processor has the VIS2 instruction set.

On another system, isainfo reports this:

 >isainfo -v
64-bit amd64 applications
    sse2 sse fxsr amd_3dnowx amd_3dnow amd_mmx mmx cmov amd_sysc cx8 tsc fpu
32-bit i386 applications
    sse2 sse fxsr amd_3dnowx amd_3dnow amd_mmx mmx cmov amd_sysc cx8 tsc fpu

On UltraSPARC systems, the only advantage to running a code in 64-bit mode is the ability to access very large address spaces. Otherwise there is very little performance gain, and some codes might even run slower. On x86/x64 systems, there is the added advantage of being able to utilize additional machine instructions and additional registers. For both, compiling for 64-bit may increase the binary size of the program (long data and pointers become 8 instead of 4 bytes). But if you're intending your code to run on x86/x64 systems, compiling for 64-bit is probably a good idea. It might even run faster.

So how do you do it?

The compiler options -m64 and -m32 specify compiling for 64-bit or 32-bit execution. And it's important to note that 64-bit and 32-bit objects and libraries cannot be intermixed in a single executable. Also, on Solaris systems -m32 is the default, but on 64-bit x64 Linux systems -m64 -xarch=sse2 is the default.

>f95 -m32 -o ran ran.f
>file ran
ran:    ELF 32-bit LSB executable 80386 Version 1 [FPU], dynamically linked, not stripped
>f95 -m64 -o ran64 ran.f
>file ran64
ran64:  ELF 64-bit LSB executable AMD64 Version 1 [SSE FXSR FPU], dynamically linked, not stripped

It's also most helpful to tell the compiler what processor you're intend to run the application on. The default is to produce a generic binary that will run well on most current processors. But that leaves out a lot of opportunities for optimization. As newer and newer processors are made available, new machine instructions or other hardware features are added to the basic architecture to improve performance. The compiler needs to be told whether or not to utilize these new features. However this can produce backward incompatibilities, rendering the binary code unable to run on older systems. To handle this, application developers will make various binary versions available for current and legacy platforms.

For example, if you compile with the -fast option, the compiler will generate the best code it can for the processor it is compiling on. -fast   includes -xtarget=native. You can override this choice by adding a different -xtarget after the -fast option on the command line (the command line is processed from left to right).  For example, to compile for an UltraSPARC T2 system when that is not the native system you are compiling on, use -fast -xtarget=ultraT2.

New processors appear on the scene often. And with each new release of the Sun Studio compilers, the list of -xtarget options expands to handle them.  These new processor values are usually announced in the Sun Studio compiler READMEs. Tipping the compiler about the target processor helps performance.

More about -xtarget and what it means next time.

(For details, check the compiler man pages)

Friday Jan 09, 2009

Optimization Levels

Sun Studio compilers provide five levels of optimization, -xO1 thru -xO5, and each increasing level adds more optimization strategies for the compiler, with -xO5 being the highest level.

And, the higher the optimization level the higher the compilation time, depending on the complexity of the source code, which is understandable because the compiler has to do more.

The default when an optimization level is not specified on the command line, is to do no optimization at all. This is good when you just want to get the code to compile, checking for syntax errors in the source and the right runtime behavior, with minimal compile time.

So, if you are concerned about runtime performance you need to specify an optimization level at compile time. A good starting point is to use the -fast macro, as described in an earlier post, which includes -xO5, the highest optimization level. Or, compile with an explicit level, like -xO3, which provides a reasonable amount of optimization without increasing compilation time significantly.

But keep in mind that the effectiveness of the compiler's optimization strategies depend on the source code being compiled. This is especially true in C and C++ where the use of pointers can frustrate the compiler's attempt at generating optimimal code due to the side effects such optimizations can cause. (But, of course, there are other options, like -xalias_level,  you can use to help the compiler make assumptions about the use of pointers in the source code.)

Another concern is whether or not you might need to use the debugger, dbx, during or after execution of the program.  For the debugger to provide useful information, it needs to see the symbol tables and linker data that are usually thrown away after compilation. The -g debug option preserves these tables in the executable file so the debugger can read them and associate the binary dump file with the symbolic program.

But the optimized code that the compiler generates may mix things up so that it's hard to tell where the code for one source statement starts and another ends. So that's why the compiler man pages talk a lot about the interaction between optimization levels and debugging. With optimization levels greater than 3,  the compiler provides best-effort symbolic information for the debugger.

Bottom line, you almost always get better performance by specifying an optimization level (or -fast which includes -xO5) on the compile command.

(Find out more...)

Wednesday Jan 07, 2009

Options Not Optional

Compiler options can be mysterious. They can have kind of a "don't go there" mystique about them. But actually they're there to help. 

There are some things the compiler can't do without help from the programmer. So that's when the compiler designers say "let's leave it to the programmer and create an option". Options also accumulate with time, so that's why there are so many of them. Some are "legacy" options, needed for certain situations that rarely come up these days. But the rest are really quite useful, and can greatly improve the kind of code the compiler generates from your source.

There are compiler command-line options for various things, like code optimization levels and run-time performance, parallelization, numeric and floating-point issues, data alignment, debugging, performance profiling, target processor and instruction sets, source code conventions, output mode, linker and library choices, warning and error message filtering, and more. Choosing the right set of options to compile with can make a great difference on how your code performs on a variety of platforms.

Darryl Gove has a great article on selecting the right compiler options.

Over the next couple of weeks I'll be taking a look at individual compiler options, dissecting them one at a time.

 

In the meantime, you can find a detailed list of Sun Studio 12 compiler options organized by function and source language here.
About


Deep thoughts on compiling C, C++, and Fortran codes with Oracle Solaris Studio compilers, especially optimization and parallelization, from the Solaris Studio documentation lead, Richard Friedman. Email him at
Richard dot Friedman at Oracle dot com

When Run Was A Compiler

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today