Tuesday Jun 23, 2009

Sun Studio 12 Update 1

Sun Studio 12 Update 1 went live yesterday. It's still a free download, and it's got a raft of new features. Many people will have been using the express releases, so they will already be familiar with the improvements.

It's been about two years since Sun Studio 12 came out, and the most obvious change in that time is the prevalence of multicore processors. I figured the easiest way to discern this would be to look at the submissions of SPEC CPU2006 results in that time period. The following chart shows the cummulative number of SPEC CPU2006 Integer speed results over that time broken down by the number of threads that the chip was capable of supporting.

Ok, the first surprising thing about the chart is that there's very few single threaded chips. There were a few results when the suite was launched back in 2006, but nothing much since. What is more apparent is the number of dual-thread chips, that was where the majority of the market was. There were also a number of quad-thread chips at that point. If we fast-forward to the situation today, we can see that the number of dual-thread chips has pretty much leveled off, the bulk of the chips are capable of supporting four threads. But you can see the start of a ramp of chips that are capable of supporting 6 or 8 simultaneous threads.

The relevance of this chart to Sun Studio is that Sun Studio has always been a tool that supports the development of multi-threaded applications. Every release of the product improves on the support in the previous release. Sun Studio 12 Update 1 includes improvements in the compiler's ability to automatically parallelise codes - afterall the easiest way to develop parallel applications is if the compiler can do it for you; improvements to the support of parallelisation specifications like OpenMP, this release includes support for the latest OpenMP 3.0 specification; and improvements in the tools and their ability to provide the developer meaningful feedback about parallel code, for example the ability of the Performance Analyzer to profile MPI code.

Footnote SPEC and the benchmark names SPECfp and SPECint are registered trademarks of the Standard Performance Evaluation Corporation. Benchmark results stated above reflect results posted on www.spec.org as of 15 June 2009.

Wednesday Feb 25, 2009

Maximising application performance

I was asked to provide the material for the 2008-2009 Techdays session "Maximising Application Performance". I recorded this as a presentation back last year, and it's now available through SDN. The talk covers basic compiler and profiling material, and is a relatively short 37 minutes in duration.

Wednesday Jan 28, 2009

Tying the bell on the cat

Diane Meirowitz has finally written the document that many of us have either thought about writing, or wished that someone had already written. This is the document that maps gcc compiler flags to Sun Studio compiler flags.

Wednesday Jul 23, 2008

Hot topics presentation: "Compilers, Tools, and Performance"

Just put together the slides for tomorrow's presentation: "Compilers, Tools, and Performance". One of my colleagues has kindly agreed to translate the presentation into Japanese, so I don't expect to get much material into half an hour, since we'll be sharing the time.

Friday Jul 18, 2008

OpenSolaris Hot Topics talk 25th July - Tokyo!

The last few weeks have been quite manic, which is why I've managed to write fewer blog entries than I'd hoped. Unfortunately that's likely to continue for a while longer - I'm traveling to Japan next week for some customer visits. Due to some fortuitous timing, there's an OpenSolaris meeting on Friday the 25th. I've been invited to talk for half an hour.

Monday Jun 09, 2008

Compiler Forensics - update

In my post on compiler forensics I talk about how to find out as much build information as possible from the metadata left in the compiler. Richard Friedman has pointed me to the table which enables you to go from the internal version number to the name of the actual product.

Wednesday May 28, 2008

Updated article on compiler options

An updated version of my article on compiler options has been put up. I think I last refreshed this a couple of years back. It's interesting to see what difference a couple of years makes. I'd started the update with the expectation that things would have become much simpler, although this is true, there's still some complexity left. The major changes were:

  • More focus on -xtarget=generic. Really that's the best option to start with and to use when possible. But there are exceptions....
    • For x86 it's a good plan to use -xarch=sse2 -xvector=simd to get the compiler to use the SSE2 instruction set which is common on all recent x86 processors.
    • For SPARC the SPARC64 VI processor supports floating point multiply accumulate instructions. These are a boost for floating point codes, and are generated under the flags -xarch=sparcfmaf -xfma=fused
  • The compiler is pretty good at generating prefetch instructions by default, so there's no real need to emphasise the prefetch flags.
  • Profile feedback and crossfile optimisation are pretty common options, and should be considered for all applications.

Monday May 19, 2008

Compiler forensics

If you need to find out which version of the compiler is installed use:

$ cc -V
cc: Sun C 5.9 SunOS_sparc 2007/05/03

I've not been able to find a table which maps the version numbers back to the product names. For the record this is Sun Studio 12.

A more interesting question is what compiler generated an executable.

$ mcs -p test.o
acomp: Sun C 5.8 2005/10/13
as: Sun Compiler Common 11 2005/10/13

The test file was generated by Sun Studio 11.

Finally what flags were used to generate an executable. Use dwarfdump and grep for "command" for binaries generated with Sun Studio 12, or for C code compiled with Sun Studio 11.

$ dwarfdump test.o|grep command
                DW_AT_SUN_command_line       /opt/SUNWspro/prod/bin/cc -xtarget=generic64 -c  test.c      <   13>   DW_AT_SUN_command_line      DW_FORM_string

For older compilers use dumpstabs and grep for "CMD":

$ getr.o|grep CMD
   2:  .stabs "/export/home; /opt/SUNWspro/bin/../prod/bin/cc -c -O  getr.c",N_CMDLINE,0x0,0x0,0x0

Of course it's very unlikely that shipped binaries will contain information about compiler flags.

Thursday May 15, 2008

Redistributable libraries

Steve Clamage and I just put together a short article on using the redistributable libraries that are shipped as part of the compiler. The particular one we focus on is stlport4 since this library is commonly substituted for the default libCstd.

There are two points to take away from the article. First of all, that the required libraries should be copied into a new directory structure for distribution with your application - this makes it easy to patch them, and ensures that the correct version is picked up. The second point is to use the $ORIGIN token when linking the application to specify the path, relative to the location of the executable, where the library will be found at runtime.

Runtime linking is one of my bugbears. I really get fed up with software that requires libraries to be located in particular places in order for it to run, or worse software that requires LD_LIBRARY_PATH to be set for the application to locate the libraries (see Rod Evan's blog entry).

Monday Apr 28, 2008

static and inline functions

Hit a problem when compiling a library. The problem is with mixing static and inline functions, which is not allowed by the standard, but is allowed by gcc. Example code looks like:

char \* c;

static void foo(char \*);

inline void work()

void foo(char\* c)

When this code is compiled it generates the following error:

% cc s.c
"s.c", line 7: reference to static identifier "foo" in extern inline function
cc: acomp failed for s.c

It turns out that there is a workaround for this problem, which is the flag -features=no%extinl. Douglas Walls describes the issue in much more detail.

Monday Mar 17, 2008

auto_ilp32, unsafe for production?

A while back I was reading up on the intel compiler's -auto_ilp32 flag. This flag produces a binary that enjoys the benefits of the EMT64 instructions set extensions, but uses a 32-bit memory model. The idea of the flag is to get the performance from the instruction set while avoiding the increase in memory footprint that comes from 64-bit pointers. I'm totally in favour of the idea, after all, that's the idea behind the v8plus/sparcvis architecture.

However, I was a bit distressed at the details of the implementation. The flag tells the compiler to make two assumptions, firstly that longs can be constrained to 4-bytes (rather than 8-bytes), secondly that pointers can also be held in 4 bytes (rather than 8).

The assumption for longs can be argued that if the application works when compiled for IA32, then any longs in the program do fit into 32-bits, so only using 32-bits for longs is therefore ok.

The docs place this restriction on the use of the flag for pointers:

Specifies that the application cannot exceed a 32-bit address space, which allows the compiler to use 32-bit pointers whenever possible.

The idea being that if the application only uses a 32-bit memory range, then the upper bits of the 64-bit pointer are going to be zero anyway - so why store them.

The problem with both of these is that the code ends up being run as if it were a 64-bit application. So the OS thinks the app is 64-bit, so will quite happily pass 8 byte longs to the app, or pass memory that is not in the low 32-bit addressable memory.

To go into a couple of situations where this could be 'bad':

  • Imagine a program which relies on a zero return value from malloc to tell it to stop allocating memory (perhaps it caches data, or maybe the algorithm it choses depends on how much memory is available). Under -auto_ilp32, the OS keeps returning new memory, so the application thinks it has not run out of memory, but in fact it is getting memory that is no longer addressable using just 32-bits.
  • Consider an application which allocates a chunk of memory to handle a problem. The larger the problem, the more memory is required. At some point the size of the problem will require a chunk of memory that is not entirely 32-bit addressable. Because malloc does not return zero, the application has no way of knowing that the problem is too large.
  • The application takes the pointers to memory that is not 32-bit addressable, and drops the upper 32-bits. Now it has a pointer into 32-bit addressable memory that has already been used for other data, and it starts storing new data on top of the old data.
  • Imagine another situation where the application is profiled, and it's realised that too much time is spent in the memory management library. So the developer produces a new library and gets the application to use this in preference to the system provided library. The only side-effect of this library is that it starts allocating memory at a higher address, which doesn't matter if you have a 64-bit address, but it eats up a sizable amount of the 32-bit addressable space.

In many of these cases the memory corruption problem would just leap out, and the user would know to remove the flag, but there will be some cases where the flag could cause silent data corruption.

But hang on a second ... didn't I just say that SPARC has the same hybrid. Well, not quite. v8plus is, as it's name suggests, built on SPARC V8 - so the OS thinks that it is a 32-bit application. Hence longs are 32-bits in size, as are pointers. There's some code necessary to store the additional state of a V9 processor. But basically, the application is a 32-bit app which happens to use a few more instructions.

The thing that's frustrating is that it's quite possible to make the OS aware of the 32-bit/64-bit hybrid, or to engineer a layer to be able to safely run a 32-bit hybrid app with the OS believing it to be a 64-bit app, but that was not the approach taken. A bit of a missed opportunity, IMO.

Monday Jan 28, 2008

Sample chapter from Solaris Application Programming available

There's a sample chapter from my book up on sun.com/books.

It's chapter 4 which is the chapter which discusses the tools that come with Solaris and Sun Studio. The chapter exists because I find that there are some tools that I use every day, and some tools that I might touch once a month, and some that I use even more rarely. The problems I hit are:

  • What was the name of the tool which ....?
  • What are the command line options to ...?
  • Is there a tool to ....?

Obviously I hit the third problem very infrequently, but I'm sometimes surprised when I discover a tool which I'd previously never heard of which just happens to do exactly what I need. Anyway I hope you find the chapter useful. It's one of my two solutions to this problem.

The other solution is spot which attempts to collect all the data that you routinely need for performance analysis of an application. So it calls the other tools - so you don't need to know the commandlines, or the names of the tools. One of the things that should be noticeable with spot is that it has few commandline options. I was hoping that we'd end up with none, but some are inevitable; but those are really house-keeping options (where to put the report, what to call it). There's only -X which generates an extended report, given the time it can take to get the data, it seemed appropriate to do the high value stuff quickly with an option for the tool to take a longer time when the user specified that it was ok.

Tuesday Nov 06, 2007

Values defined by the compiler

The Sun Studio compilers have a number of default #defines included in the header files, more details on these can be found here:

Tuesday Oct 09, 2007

Compiling for the UltraSPARC T2

Today, Sun launched systems based on the UltraSPARC T2. A question that is bound to come up is what compiler flags should be used for the processor?

Sun Studio 12 has the flag -xtarget=ultraT2 to specifically target the UltraSPARC T2. But before jumping off and using this flag, let's take the flag apart and see what it actually means. There are three components that are set by the -xtarget flag :

  • -xcache flag. This flag tells the compiler to target a particular cache configuration. The flag will have an impact on floating point code where the loops can be tiled to fit into cache. Obviously not all codes are amenable to this optimisation, so the -xcache setting is usually unimportant.
  • -xchip flag. This sets the instruction latencies and instruction selection preferences. The UltraSPARC T2 (in common with the UltraSPARC T1) has a simple pipeline so there is nothing much to gain from accurately modelling the instruction latencies. There are also no real situations where it will do better with one instruction sequence in preference to another (unless one is longer than the other). So for the UltraSPARC T2 this flag has little impact on the generated code.
  • -xarch flag. The -xarch flag controls the target architecture. This is traditionally used principally to control whether 32-bit or 64-bit binaries are generated. However, Sun Studio 12 introduced the flags -m32 and -m64 to separate the address-size of the binary from the instruction set selection. There are no UltraSPARC T2 specific instructions which the compiler currently generates, so the default of the SPARC V9 ISA is fine.
  • To summarise, there is an UltraSPARC T2 specific compiler flag, but for most situations the best target to use would be -xtarget=generic which should give good performance over a wide range of processors.

Thursday Aug 16, 2007

Presenting at Stanford HPC conference

I'll be presenting at Stanford next week as part of their HPC conference (Register here). I plan to cover:

Wednesday Jul 25, 2007

List of Sun Studio redistributable libraries

List of libraries that are included with Sun Studio, and can be redistributed with applications compiled by Sun Studio.

All the documentation for Sun Studio 12.

Tuesday Jul 24, 2007

CMT Developer Tools

We've just released the CMT Developer Tools for Sun Studio 12. The tools are available for both SPARC and x64 systems, although not all tools are on the x64 platform. The list of tools is as follows:

  • ATS - Automatic Tuning and Troubleshooting System. Automatically finds the best compiler flags or locates optimisation bugs in an application without access to the source code. (SPARC and x64)
  • SPOT - Simple Performance Optimisation Tool. Generates an html report detailing where the application is spending time, and information about why it might be spending time there.(SPARC and x64)
  • BIT - Binary Improvement Tool. Reports information about application coverage also produces instruction execution counts, branch taken data etc. (SPARC only)
  • Discover - Sun Memory Error Discovery Tool. Detects memory access issues such in an application. Examples are accesses to uninitialised memory, accesses beyond array bounds, memory leaks, etc. (SPARC only)

Darryl Gove is a senior engineer in the Solaris Studio team, working on optimising applications and benchmarks for current and future processors. He is also the author of the books:
Multicore Application Programming
Solaris Application Programming
The Developer's Edge


« April 2014
The Developer's Edge
Solaris Application Programming
OpenSPARC Book
Multicore Application Programming