The Sun Studio compilers and performance - part 1

This is a big topic and I plan to cover it in bits and pieces as time goes by.

With Sun Studio on Solaris we ship compilers for Fortran, C and C++. These are Sun developed products and have come a long way. These compilers support SPARC processors, certain Intel processors and the AMD Opteron processor. In the remainder I will focus on SPARC and AMD.

If it comes to performance, the -fast option is key. This option is really a macro that expands to a series of options. What is being put in depends on the platform, language and compiler release. This option also typically changes from release to release, as we get a better grip on the heuristics behind some of the options implied with -fast for example. Or we may decide to pull an option in under -fast.

Decisions like this are based on the outcome of extensive performance tests covering a wide range of applications.

In general, -fast is meant to give good performance across a diverse range of applications, but there are always exceptions. Exploring some other or additional options is always recommended.

I also recommend users to link with -fast. This ensures the optimal libraries are linked in.

In the future I will write more about what goes on under the hood of -fast. Before I do that however, I want to bring two other options under the attention.

The -xarch option is used to select the instruction set. Various processors support various instruction sets.

On SPARC for example we have -xarch=v8plusb for 32-bit and -xarch=v9b for 64-bit. These are the most recent instruction sets available. They are supported on all UltraSPARC III based systems and follow-on processors. For example, UltraSPARC III Cu, UltraSPARC IIIi, UltraSPARC IV and the recently announced UltraSPARC IV+ processor all support these instruction sets.
But, several older SPARC instruction sets are also accessible under the -xarch option. Check the documentation for a list.

I often get asked about the performance of 64-bit. There are (too) many misconceptions floating around about this. Basically, 64-bit means the address range is expanded from 32 to 64 bits. This gives a dramatic increase in the number of addresses that can be covered and therefore one can use more memory. Nothing more, nothing less. The increase in address space is the main reason to go to 64-bit.

In some cases, 64-bit may give rise to a slight performance degradation. A pointer is an address. Instead of using 32 bits in the cache it now uses twice as much space. The resulting effective cache capacity is cut in half and pointer intensive applications will get affected by that.

On the other hand, on some processors, 64-bit means more than increasing the address space and AMD's Opteron processor is a good example of that.

Just like SPARC, the AMD Opteron is a 64-bit processor. This is why the Sun compilers provide the -xarch=amd64 option to use the 64-bit instructions and extensions from AMD. Several 32-bit instruction sets are supported as well. For example -xarch=sse2 for those processors that support the SSE2 instruction set.
In general however it is recommended to -xarch=amd64 on AMD Opteron. This will not only give a much bigger address range, but also exploits the architectural enhancements AMD has put in.

The choice for -xarch also controls compatibility. Typically, a more recent instruction set is not supported on older processors. For example, -xarch=v8plusb is not supported on UltraSPARC II, but the SPARC processors are backward compatible. If one compiles and links with -xarch=v8plusa for example, the resulting binary will run on both UltraSPARC II as well as on more recent SPARC processors.

Of course a more recent instruction set tends to have richer features and therefore a "wrong" choice may affect performance as well.

The last option covered is called -xchip. This instructs the compiler to optimize the instruction schedule for the specific processor. For example -xchip=ultra4plus to request the compiler to optimize for the new UltraSPARC IV+ processor, or -xchip=opteron for AMD Opteron. The choice for -xchip also affects some of the higher level optimizations performed by the compiler.
In contrast with -xarch, a "wrong" choice for -xchip does not affect compatibility, but may impact performance.

In absence of specific settings for -xarch and/or -xchip, the compiler will select a default.

In summary:

Use -fast to get good performance with one single option. The -xarch option is used to specify the instruction set (32 or 64-bit). The -xchip controls which processor the compiler should optimize for. In absence of -xarch and/or -xchip the compiler will select a default.


Post a Comment:
  • HTML Syntax: NOT allowed

Picture of Ruud

Ruud van der Pas is a Senior Principal Software Engineer in the SPARC Microelectronics organization at Oracle. His focus is on application performance, both for single threaded, as well as for multi-threaded programs. He is also co-author on the book Using OpenMP

Cover of the Using OpenMP book


« August 2016