By Darryl Gove on Oct 26, 2008
I'll be presenting in Second Life on Tuesday 28th at 9am PST. The title of the talk is "Utilising CMT systems".
Ok, there's a few glitches. I need to coach Kuldip on pronouncing my last name 'Gove' - think 'mauve' or 'stove' rather than 'love' or 'glove'. Hopefully, they can revise the caption to correct my first name to 'Darryl' rather than 'Darrel', and also to remove the suggestion that I'm a director of anything (still early stages for my plans of world domination . That said I'm used to my name being mangled, so this is not a big thing.
The compiler flag
-fast gets an unfair rap. Even the compiler reports:
cc: Warning: -xarch=native has been explicitly specified, or implicitly specified by a macro option, -xarch=native on this architecture implies -xarch=sparcvis2 which generates code that does not run on pre UltraSPARC III processors
which is hardly fair given the the UltraSPARC III line came out about 8 years ago! So I want to quickly discuss what's good about the option, and what reasons there are to be cautious.
The first thing to talk about is the warning message.
-xtarget=native is a good option to use when the target platform is also the deployment platform. For me, this is the common case, but for people producing applications that are more generally deployed, it's not the common case. The best thing to do to avoid the warning and produce binaries that work with the widest range of hardware is to add the flag
-fast (compiler flags are parsed from left to right, so the rightmost flag is the one that gets obeyed). The generic target represents a mix of all the important processors, the mix produces code that should work well on all of them.
The next option which is in
-fast for C that might cause some concern is
-xalias_level=basic. This tells the compiler to assume that pointers of different basic types (e.g. integers, floats etc.) don't alias. Most people code to this, and the C standard actually has higher demands on the level of aliasing the compiler can assume. So code that conforms to the C standard will work correctly with this option. Of course, it's still worth being aware that the compiler is making the assumption.
The final area is floating point simplification. That's the flags
-fsimple=2 which allows the compiler to reorder floating point expressions,
-fns which allows the processor to flush subnormal numbers to zero, and some other flags that use faster floating point libraries or inline templates. I've previously written about my rather odd views on floating point maths. Basically it comes down to If these options make a difference to the performance of your code, then you should investigate why they make a difference..
-fast contains a number of flags which impact performance, it's probably a good plan to identify exactly those flags that do make a difference, and use only those. A tool like ats can really help here.
Dan Berger posted a comment about the compiler flags we'd used for Ruby. Basically, we've not done compiler flag tuning yet, so I'll write a quick outline of the first steps in tuning an application.
These directions are more a list of possible experiments than necessary an item-by-item checklist, but they form a good basis. And they are not an exhaustive list...