Compiling for Nehalem and other processors
By Darryl Gove-Oracle on Apr 17, 2009
First thing I'd suggest is to make sure that you're using a recent compiler. Practically that means Sun Studio 12 (which is quite old now), or the Sun Studio Express releases (aka Sun Studio 12 Update 1 Early Access). Obviously the latest features are only found in the latest compilers.
In terms of flags, the starting point should be
-fast, you can later trim that if you need to remove floating point simplification, or are not happy with one or other of the flags that it includes.
The next flags to use are:
-xipo=2This flag enables crossfile optimisation and tracking of allocated memory. I find crossfile optimisation useful because it limits the impact of the source code structure on the final application.
-xarch=sse4_2This flag is included in
-fast(assuming that the build system supports it). However, if you later plan to fine tune the compiler flags, it's best to start off with it explicitly. This allows the compiler to use the SSE4 instruction set. Probably
-xarch=sse2will be sufficient in most circumstances - it's a call depending on the system that the application will be deployed on.
-xvector=simdThis flag tells the compiler to generate SIMD (single instruction multiple data) instructions - basically the combination of this and the architecture flag enables the compiler to generate applications that use SSE instructions. These instructions can lead to substantial performance gains in some floating point applications.
-m64On x86 there's performance to be gained from using the 64-bit instruction set extensions. The code gets more registers and a better calling convention. These tend to outweigh the costs of the larger memory footprint of 64-bit applications.
-xpagesize=2MThis tells the operating system to provide large pages to the application.
- The other optimisation that I find very useful is profile feedback. This does complicate and lengthen the build process, but is often the most effective way of getting performance gains for codes dominated by branches and conditional code.
- The other flags to consider are the aliasing flags
-xalias_level=compatiblefor C++, and
-xrestrict. These flags do lead to performance gains, but require the developer to be comfortable that their code does conform to the requirements of the flags. (IMO, most code does.)
All this talk about flags should not be a replacement for what I consider to be the basic first step in optimising the performance of an application: to take a profile. Compiler flags tell the compiler how to do a good job of producing the code, but the compiler can't do much about the algorithms used. Profiling the application will often give a clue as to a way that the performance of the application can be improved by a change of algorithm - something that even the best compiler flags can't always do.