How To Compile for Performance: Compiler Options For Beginners

When I build some source code and that I want to achieve the best performance I use the   Sun Studio compilers, especially on Solaris and SPARC.

Sun Studio offers a unique set of optimization features dedicated to processor instruction set that help me squeeze out the best perf out of C, C++ or Fortran code. Yet these options are so numerous that it can be a bit daunting to look into them.

If you are in a rush, you can use the  -fast option. What it really does is triggering a set of other options for maximum runtime performance. These options can be listed with:

$ CC -fast -dryrun ### command line files and options (expanded): ### -xO5 -xarch=sparc -xcache=8/16/4:3072/64/12 -xchip=ultraT1 ... -dryrun

Yet -fast has its own drawbacks. First, the options triggered might change from one compiler release to another. Also, the values for -xarch -xcache -xchip specify the processor for which to optimize, and -fast decides of these values based on the processor on which the compiler runs, which can deffer from the processor on which the code will eventually be executed. This is why I usually stay away from -fast.

Instead, here is a basic set of rules to easily decide on optimization options.

First, if some binary code already exists, I run a quick sanity check to see which options where used for this binary. On a non-strip executable, library, or object file, I run the following commands:

$ dump -C Bar.o // for C++ code ... <122> .../tmanfe/; /opt/SUNWspro/bin/CC -G -xtarget=native -compat=4 -xO4 Bar.cpp

$ dwarfdump getpagesize //for C code ... DW_AT_SUN_command_line /opt/SUNWspro/bin/cc -c -xarch=sse2 -m32 -xO3 +w getpagesize.c


Make sure the -g option is not present. This option tells the compiler to compile for debug, which in turns disables some optimization. Also verify that  -xOn (with n=[1|2|3|4|5]) is present: this turns on generic optimization. --xO1 and -xO2 are conservative. -xO5 is aggressive and may yield to perf degradation so I don't recommend to use it for a complete application. Limit its usage to some specific portions of code that are known to be heavily used and to benefit from optimization. I usual pick -xO3 as a basic level of optimization.

Use the -xarch and -xchip options specific to the targeted runtime processor. -xarch specifies the instruction set to be used while -xchip specifies the scheduling - or the ordering of the instructions.

The best value for -xarch can be found by running CC -xtarget=native -dryrun on the runtime platform. Here is a code snippet that does the work for you:

#!/bin/bash for flag in `CC -xtarget=native -dryrun 2>&1 | grep xchip` do if echo $flag | grep xchip >/dev/null ; then target=`echo $flag | grep xchip` fi done lenght=${#target} echo ${target:7:$lenght}

The right value for -xchip is found the same way:

#!/bin/bash for flag in `CC -xtarget=native -dryrun 2>&1 | grep xchip` do if echo $flag | grep xchip ; then   target=`echo $flag | grep xchip` fi done lenght=${#target} echo ${target:7:$lenght}

The two code snippets above can be use to dynamically set up your compiler flags when generating Makefiles but again, make sure to run them on the processor targeted for runtime.

This type of generic optimization usually brings between 10 to 20% in terms of performance gain and it also sets the base-line for most sophisticated optimizations that will focus on the portions of code that are the more used during execution. These portions of code can be identified by running the Sun Studio Collector and Performance Analyzer on your code: no need to instrument your binary, no need to recompile for profiling. Just run the collect utility on the optimized binary you generated. Simple and easy!

Comments:

Post a Comment:
Comments are closed for this entry.
About

Application tuning, sizing, monitoring, porting on Solaris 11

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
5
6
8
9
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today