Wednesday Mar 16, 2011

We've Got Articles

Just a reminder that we've got a bunch of technical articles over at the Oracle Technical Network (OTN) regarding Oracle Solaris Studio. These are deep dives into the technology of compilers and application development:

Recently Published

Stability of the C++ ABI: Evolution of a Programming Language (revised March 2011)
As C++ evolved over the years, the Application Binary Interface (ABI) used by a compiler often needed changes to support new or evolving language features. This paper addresses these issues in Oracle Solaris Studio C++, and what you can expect when you develop programs using Oracle Solaris Studio C+

Mixing C and C++ Code in the Same Program (revised February 2011)

Profiling MPI Applications (Updated January 2011)
Profiling of Message Passing Interface (MPI) applications with the Oracle Solaris Studio Performance Tools.

Oracle Solaris Studio Performance Tools
This article describes the kinds of performance questions users typically ask, and then it describes the Oracle Solaris Studio performance tools and shows examples of what the tools can do.

Taking Advantage of OpenMP 3.0 Tasking with Oracle Solaris Studio
A technical white paper that shows how to use Oracle Solaris Studio 12.2 to implement, profile and debug and example OpenMP program.

Oracle Solaris Studio FORTRAN Runtime Checking Options Whitepaper

Translating gcc/g++/gfortran Options to Oracle Solaris Studio Compiler Options Technical Article

Examine MPI Applications with the Oracle Solaris Studio Performance Analyzer How to Guide

Handling Memory Ordering in Multithreaded Applications with Oracle Solaris Studio 12 Update 2: Part 1, Compiler Barriers Technical Article

Handling Memory Ordering in Multithreaded Applications with Oracle Solaris Studio 12 Update 2: Part 2, Memory Barriers and Memory Fences Technical Article

Developing Enterprise Applications with Oracle Solaris Studio Whitepaper

Developing Parallel Programs — A Discussion of Popular Models Whitepaper

>See the complete list

Tuesday Feb 08, 2011

Overview of Oracle Solaris Studio Compilers & Tools

There's a great overview of the components and features of Oracle Solaris Studio compilers and tools now available in HTML and PDF:

»Oracle Solaris Studio Overview - HTML - PDF

Oracle Solaris Studio provides everything you need to develop C, C++, and Fortran applications to run in Oracle Solaris 10 on SPARC or x86 and x64 platforms, or in Oracle Linux on x86 and x64 platforms. The compilers and tools are engineered to make your applications run optimally on Oracle Sun systems.

In particular, Oracle Solaris Studio tools are designed to leverage the capabilities of multicore CPUs including the Sun SPARC T3, UltraSPARC T2, and UltraSPARC T2 Plus processors, and the Intel® Xeon® and AMD Opteron processors. The tools allow you to more easily create parallel and concurrent software applications for these platforms.

The components of Oracle Solaris Studio include:

  • IDE for application development in a graphical environment. The Oracle Solaris Studio IDE integrates several other Oracle Solaris Studio tools and uses Oracle Solaris technologies such as DTrace.

  • C, C++, and Fortran compilers for compiling your code at the command line or through the IDE. The compilers are engineered to work well with the Oracle Solaris Studio debugger (dbx), and include the ability to optimize your code by specifying compiler options.

  • Libraries to add advanced performance and multithreading capabilities to your applications.

  • Make utility (dmake) for building your code in distributed computing environments at the command line or through the IDE.

  • Debugger (dbx) for finding bugs in your code at the command line, or through the IDE, or through an independent graphical interface (dbxtool).

  • Performance tools that employ Oracle Solaris technologies such as DTrace can be used at the command line or through independent graphical interfaces to find trouble spots in your code that you cannot detect through debugging.

These tools together enable you to build, debug, and tune your applications for high performance on Oracle Solaris running on Oracle Sun systems. Each component is described in greater detail later in this document.

Wednesday Feb 02, 2011

Where To Find Oracle Solaris Studio Resources

Here's where to find information and discussions for the latest Oracle Solaris Studio compilers and tools at it's new home on the Oracle Technical Network (OTN):

There are also pages focused on primary topics regarding Solaris Studio compilers and tools:

Building
Oracle Solaris Studio C, C++, and Fortran compilers include advanced features for building applications on Oracle Solaris SPARC and x86/x64 platforms.

Debugging
Successful program debugging is more an art than a science. dbx is an interactive, source-level, post-mortem and real-time command-line debugging tool plus much more.

Analyzing
Performance analysis is the first step toward program optimization. Oracle Solaris Studio Performance Analyzer can help you assess the performance of your code, identify potential performance problems, and locate the part of the code where the problems occur.

Tuning
Oracle Solaris Studio C, C++, and Fortran compilers offer a rich set of compile-time options for specifying target hardware and advanced optimization techniques. 

Multicore/Parallel Programming
High Performance and Technical Computing (HPTC) applies numerical computation techniques to highly complex scientific and engineering problems. Oracle Solaris Studio compilers and tools provide a seamless, integrated environment from desktop to TeraFLOPS for both floating-point and data-intensive computing.

Computing
The floating-point environment on Oracle Sun SPARC and x86/x64 platforms enables you to develop robust, high-performance, portable numerical applications. The floating-point environment can also help investigate unusual behavior of numerical programs written by others. The Sun Performance Library provides highly optimized versions of many advanced math function routines.

Still under development, there's more to do. Open for suggestions.


Wednesday Jan 21, 2009

Not So Simple - The -fsimple Option

As mentioned earlier, the -fast compiler option is a good way to start because it is a combination of options that result in good execution performance.

But one of the options included in -fast  is  -fsimple=2. What does this do?

Unless directed to, the compiler does not attempt to simplify floating-point computations (the default is -fsimple=0). -fsimple=2 enables the optimizer to make aggressive simplifications with the understanding that this might cause some programs to produce slightly different results due to rounding effects. 

Here's what the man page says:

-fsimple=1 

Allow conservative simplifications. The resulting code does not strictly conform to IEEE 754, but numeric results of most programs are unchanged.

With -fsimple=1, the optimizer can assume the following:

  • IEEE 754 default rounding/trapping modes do not change after process initialization.

  • Computations producing no visible result other than potential floating point exceptions may be deleted.

  • Computations with NaNs (“Not a Number”) as operands need not propagate NaNs to their results; for example, x\*0 may be replaced by 0.

  • Computations do not depend on sign of zero.

With -fsimple=1, the optimizer is not allowed to optimize completely without regard to roundoff or exceptions. In particular, a floating–point computation cannot be replaced by one that produces different results with rounding modes held constant at run time.

–fsimple=2

In addition to —fsimple=1, permit aggressive floating point optimizations. This can cause some programs to produce different numeric results due to changes in the way expressions are evaluated. In particular, the standard rule requiring compilers to honor explicit parentheses around subexpressions to control expression evaluation order may be broken with -fsimple=2. This could result in numerical rounding differences with programs that depend on this rule.

For example, with -fsimple=2, the compiler may evaluate C-(A-B) as (C-A)+B, breaking the standard’s rule about explicit parentheses, if the resulting code is better optimized. The compiler might also replace repeated computations of x/y with x\*z, where z=1/y is computed once and saved in a temporary, to eliminate the costly divide operations.

Programs that depend on particular properties of floating-point arithmetic should not be compiled with -fsimple=2.

Even with -fsimple=2, the optimizer still is not permitted to introduce a floating point exception in a program that otherwise produces none.

So if you use -fast, some programs that are numerically unstable might get different results than when not compiled with -fast. If this happens to your program, you can experiment by overriding the -fsimple=2 component of -fast by compiling with    -fast -fsimple=0

Monday Jan 12, 2009

Debugging

Debugging your code is a necessary evil. Things never seem to work the way you expect them. Programs crash or get the wrong results, leaving you wondering why. So the next step is to invoke a debugger on the code.

Most compilers, like the Sun Studio compilers, have a -g option or equivalent, which adds debugging information, like symbol tables, to the object code. The Sun Studio debugging tool, dbx, reads the object code, symbol tables, and a core dump if available, and tries to locate the spot in the program where it died. Now you can look at what was happening when the code crashed and try to determine the cause.

But debugging code is an art. There's no good book on the subject of debugging code that I know of. Programmers learn by accumulating experience.

Debuggers have typically been command-line tools, like dbx. But it helps to use a GUI debugger that can reference the source code.

A new standalone lightweight GUI debugger, dbxtool, is part of the Sun Studio Express 11/08 release and is fully integrated into the Sun Studio NetBeans-based IDE.

There's a new screencast you can watch to learn about dbxtool and the features of the dbx debugger, and it's run by Dave Ford from the Sun Studio dbx engineering team. Click on the image to start the screencast.  

UPDATE: dbxtool is now part of the current Sun Studio 12 Update 1 release.

Saturday Jan 10, 2009

What Am I Optimizing?

Let's think about this a little bit more.

If I add an optimization option, like -xO3 or -fast, to my compile command-line, what does that actually mean?

Well, it means that everything in that compilation unit (source files) will be compiled with a certain set of optimization stragegies. The compiler will try to produce the best code it can at that level. But ambiguities in the source code might inhibit some optimizations because the compiler has to make sure that the machine code it generates will always do the right thing .. that is, do what the programmer expects it to do.

Note that all the routines, functions, modules, procedures, classes, compiled in that compilation unit will be compiled with the same options. In some cases the extra time spent by the compiler might be wasted on some routines because they are rarely called and do not really participate in the compute-intensive parts of the program.

For short programs, this hardly matters .. compile time is short, and you might only compile infrequently.

But this can become an issue with "industrial-strength" codes consisting of thousands of lines, hundreds of program units (routines, functions, etc..). Compile time might become a major concern, so we probably would want to compile only those routines that factor into the overall performance of the complete program.

That means you really need to know where your program is spending most of it's CPU time, and focus your performance optimization efforts primarily on those program units. This goes for any kind of performance optimization .. you do need to know and understand the flow of the program -- its footprint.

The Sun Studio Performance Analyzer is the tool to do that. While it does provide extensive features for gathering every piece of information about your program's execution, it also has a simple command-line interface that you can use immediately to find out where the program is spending most of its time.

Compile your code with the -g option (to produce a symbol table) and run the executable under the collect command.

>f95 -g -fixed -o shal shalow.f90

>collect shal

Creating experiment database test.1.er ...

1NUMBER OF POINTS IN THE X DIRECTION     256

 NUMBER OF POINTS IN THE Y DIRECTION     256

....

Running under the collect command generates runtime execution data in test.1.er/ that can be used by the er_print command of the Performance Analyzer:

>er_print -functions test.1.er
Functions sorted by metric: Exclusive User CPU Time

Excl.     Incl.      Name  
User CPU  User CPU         
  sec.      sec.      
18.113    18.113     <Total>
 6.805     6.805     calc1_
 6.384     6.384     calc2_
 4.893     4.893     calc3_
 0.020     0.020     inital_
 0.010     0.010     calc3z_
 0.        0.        cosf
 0.        0.        cputim_
 0.        0.        etime_
 0.        0.        getrusage
 0.       18.113     main
 0.       18.113     MAIN
 0.        0.        __rusagesys
 0.        0.        sinf
 0.       18.113     _start


The er_print -functions command gives us a quick way of seeing timings for all routines (this was a Fortran 95 program), including library routines. Right away I know that calc1, calc2, and calc3 do all the work, as expected. But we also see that calc3 is not as significant as calc1. ("Inclusive Time" includes time spent in the routines called by that routine, while "Exclusive Time" only counts time spent in the routine, exclusive of any calls to other routines.)

Well, this is a start. Note that no optimization was specified here. Lets see what happens with -fast.

>f95 -o shalfast -fast -fixed -g shalow.f90
>collect shalfast
Creating experiment database test.3.er ...
....
>er_print -functions test.3.er
Functions sorted by metric: Exclusive User CPU Time

Excl.     Incl.      Name  
User CPU  User CPU         
 sec.      sec.       
7.695     7.695      <Total>
7.675     7.695      MAIN
0.020     0.020      __rusagesys
0.        0.020      etime_
0.        0.020      getrusage



Yikes! What happened?

Clearly, with -fast the compiler compressed the program as much as it could, replacing the calls to the calc routines by compiling them inline into one hunk of code. Note also the 2x improvement in performance.

Of course, this was a little toy test program. Things would look a lot more complicated with a large "industrial" program.

But you get the idea.

More information on er_print and collect.

Friday Jan 09, 2009

Optimization Levels

Sun Studio compilers provide five levels of optimization, -xO1 thru -xO5, and each increasing level adds more optimization strategies for the compiler, with -xO5 being the highest level.

And, the higher the optimization level the higher the compilation time, depending on the complexity of the source code, which is understandable because the compiler has to do more.

The default when an optimization level is not specified on the command line, is to do no optimization at all. This is good when you just want to get the code to compile, checking for syntax errors in the source and the right runtime behavior, with minimal compile time.

So, if you are concerned about runtime performance you need to specify an optimization level at compile time. A good starting point is to use the -fast macro, as described in an earlier post, which includes -xO5, the highest optimization level. Or, compile with an explicit level, like -xO3, which provides a reasonable amount of optimization without increasing compilation time significantly.

But keep in mind that the effectiveness of the compiler's optimization strategies depend on the source code being compiled. This is especially true in C and C++ where the use of pointers can frustrate the compiler's attempt at generating optimimal code due to the side effects such optimizations can cause. (But, of course, there are other options, like -xalias_level,  you can use to help the compiler make assumptions about the use of pointers in the source code.)

Another concern is whether or not you might need to use the debugger, dbx, during or after execution of the program.  For the debugger to provide useful information, it needs to see the symbol tables and linker data that are usually thrown away after compilation. The -g debug option preserves these tables in the executable file so the debugger can read them and associate the binary dump file with the symbolic program.

But the optimized code that the compiler generates may mix things up so that it's hard to tell where the code for one source statement starts and another ends. So that's why the compiler man pages talk a lot about the interaction between optimization levels and debugging. With optimization levels greater than 3,  the compiler provides best-effort symbolic information for the debugger.

Bottom line, you almost always get better performance by specifying an optimization level (or -fast which includes -xO5) on the compile command.

(Find out more...)

About


Deep thoughts on compiling C, C++, and Fortran codes with Oracle Solaris Studio compilers, especially optimization and parallelization, from the Solaris Studio documentation lead, Richard Friedman. Email him at
Richard dot Friedman at Oracle dot com

When Run Was A Compiler

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today