Monday Nov 07, 2005

The Sun Studio compilers and performance - part 1

This is a big topic and I plan to cover it in bits and pieces as time goes by.

With Sun Studio on Solaris we ship compilers for Fortran, C and C++. These are Sun developed products and have come a long way. These compilers support SPARC processors, certain Intel processors and the AMD Opteron processor. In the remainder I will focus on SPARC and AMD.

If it comes to performance, the -fast option is key. This option is really a macro that expands to a series of options. What is being put in depends on the platform, language and compiler release. This option also typically changes from release to release, as we get a better grip on the heuristics behind some of the options implied with -fast for example. Or we may decide to pull an option in under -fast.

Decisions like this are based on the outcome of extensive performance tests covering a wide range of applications.

In general, -fast is meant to give good performance across a diverse range of applications, but there are always exceptions. Exploring some other or additional options is always recommended.

I also recommend users to link with -fast. This ensures the optimal libraries are linked in.

In the future I will write more about what goes on under the hood of -fast. Before I do that however, I want to bring two other options under the attention.

The -xarch option is used to select the instruction set. Various processors support various instruction sets.

On SPARC for example we have -xarch=v8plusb for 32-bit and -xarch=v9b for 64-bit. These are the most recent instruction sets available. They are supported on all UltraSPARC III based systems and follow-on processors. For example, UltraSPARC III Cu, UltraSPARC IIIi, UltraSPARC IV and the recently announced UltraSPARC IV+ processor all support these instruction sets.
But, several older SPARC instruction sets are also accessible under the -xarch option. Check the documentation for a list.

I often get asked about the performance of 64-bit. There are (too) many misconceptions floating around about this. Basically, 64-bit means the address range is expanded from 32 to 64 bits. This gives a dramatic increase in the number of addresses that can be covered and therefore one can use more memory. Nothing more, nothing less. The increase in address space is the main reason to go to 64-bit.

In some cases, 64-bit may give rise to a slight performance degradation. A pointer is an address. Instead of using 32 bits in the cache it now uses twice as much space. The resulting effective cache capacity is cut in half and pointer intensive applications will get affected by that.

On the other hand, on some processors, 64-bit means more than increasing the address space and AMD's Opteron processor is a good example of that.

Just like SPARC, the AMD Opteron is a 64-bit processor. This is why the Sun compilers provide the -xarch=amd64 option to use the 64-bit instructions and extensions from AMD. Several 32-bit instruction sets are supported as well. For example -xarch=sse2 for those processors that support the SSE2 instruction set.
In general however it is recommended to -xarch=amd64 on AMD Opteron. This will not only give a much bigger address range, but also exploits the architectural enhancements AMD has put in.

The choice for -xarch also controls compatibility. Typically, a more recent instruction set is not supported on older processors. For example, -xarch=v8plusb is not supported on UltraSPARC II, but the SPARC processors are backward compatible. If one compiles and links with -xarch=v8plusa for example, the resulting binary will run on both UltraSPARC II as well as on more recent SPARC processors.

Of course a more recent instruction set tends to have richer features and therefore a "wrong" choice may affect performance as well.

The last option covered is called -xchip. This instructs the compiler to optimize the instruction schedule for the specific processor. For example -xchip=ultra4plus to request the compiler to optimize for the new UltraSPARC IV+ processor, or -xchip=opteron for AMD Opteron. The choice for -xchip also affects some of the higher level optimizations performed by the compiler.
In contrast with -xarch, a "wrong" choice for -xchip does not affect compatibility, but may impact performance.

In absence of specific settings for -xarch and/or -xchip, the compiler will select a default.

In summary:

Use -fast to get good performance with one single option. The -xarch option is used to specify the instruction set (32 or 64-bit). The -xchip controls which processor the compiler should optimize for. In absence of -xarch and/or -xchip the compiler will select a default.

Monday Oct 31, 2005

Why another weblog?

My name is Ruud van der Pas. I'm in engineering and have been with Sun for a little over 7 years now. My main interests are in the area of application performance and interval arithmetic/analysis.

The reason for me to start this weblog is because I meet a lot of our customers. It is always very inspiring to talk with them. I'm interested to find out what they're doing, how they use our products and what sort of problems they're struggling with. In talking with them, I learn a lot and hopefully they sometimes also pick up something from me.

I realized a weblog could be a convenient and easy way to share some of that information with a larger group of people all around the world. We will see how this works out, but for now I plan to go for it.

Regarding application performance I focus on technical-scientific programs. I'm not only interested in cranking up single processor performance, but also to apply shared memory parallelization through either the Sun compiler that you can ask to automatically parallelize your application, and/or by using the OpenMP programming model (
I expect both to get increasingly popular, given the CMT technologies that are out there today and what looms on the horizon. Think about it. If you have a chip with multiple cores, isn't it great if you can take advantage of that and speed up a single application? For a while, most people will go for the additional throughput and run several applications (e.g. Mozilla and StarOffice) side by side. But how far can you push that? Eventually you will want one single application to go faster, especially as the number of cores on a chip is going to increase, and then OpenMP provides are very nice solution. I plan to write a lot about that in the future.

My second passion is about interval arithmetic and interval analysis. You can expect me to write a lot about that as well. So what is it? Conceptually it is easy. Instead of using a single variable to store some value, you use an interval [a,b] say to store a range of values.
In many cases, that is a very natural approach, as data is usually not known precisely and/or may fluctuate (think of the wind speeds around a building for example).
Once you do this, a whole new world opens up. And it is a fascinating world. A description of many problems in physics, chemistry and math is often more natural when using intervals, because the parameters in the model are for example not known with 100 percent accuracy. So, intervals are not only more natural, one can also solve problems that can not be solved otherwise.
I'm the first one to admit though that this is not easy. To me, it is the way to go though. The fact it is hard is a challenge that should encourage people to figure things out and make progress.

Well, that is it for the first time.


Picture of Ruud

Ruud van der Pas is a Senior Principal Software Engineer in the SPARC Microelectronics organization at Oracle. His focus is on application performance, both for single threaded, as well as for multi-threaded programs. He is also co-author on the book Using OpenMP

Cover of the Using OpenMP book


« July 2016