What Am I Optimizing?

Let's think about this a little bit more.

If I add an optimization option, like -xO3 or -fast, to my compile command-line, what does that actually mean?

Well, it means that everything in that compilation unit (source files) will be compiled with a certain set of optimization stragegies. The compiler will try to produce the best code it can at that level. But ambiguities in the source code might inhibit some optimizations because the compiler has to make sure that the machine code it generates will always do the right thing .. that is, do what the programmer expects it to do.

Note that all the routines, functions, modules, procedures, classes, compiled in that compilation unit will be compiled with the same options. In some cases the extra time spent by the compiler might be wasted on some routines because they are rarely called and do not really participate in the compute-intensive parts of the program.

For short programs, this hardly matters .. compile time is short, and you might only compile infrequently.

But this can become an issue with "industrial-strength" codes consisting of thousands of lines, hundreds of program units (routines, functions, etc..). Compile time might become a major concern, so we probably would want to compile only those routines that factor into the overall performance of the complete program.

That means you really need to know where your program is spending most of it's CPU time, and focus your performance optimization efforts primarily on those program units. This goes for any kind of performance optimization .. you do need to know and understand the flow of the program -- its footprint.

The Sun Studio Performance Analyzer is the tool to do that. While it does provide extensive features for gathering every piece of information about your program's execution, it also has a simple command-line interface that you can use immediately to find out where the program is spending most of its time.

Compile your code with the -g option (to produce a symbol table) and run the executable under the collect command.

>f95 -g -fixed -o shal shalow.f90

>collect shal

Creating experiment database test.1.er ...

1NUMBER OF POINTS IN THE X DIRECTION     256

 NUMBER OF POINTS IN THE Y DIRECTION     256

....

Running under the collect command generates runtime execution data in test.1.er/ that can be used by the er_print command of the Performance Analyzer:

>er_print -functions test.1.er
Functions sorted by metric: Exclusive User CPU Time

Excl.     Incl.      Name  
User CPU  User CPU         
  sec.      sec.      
18.113    18.113     <Total>
 6.805     6.805     calc1_
 6.384     6.384     calc2_
 4.893     4.893     calc3_
 0.020     0.020     inital_
 0.010     0.010     calc3z_
 0.        0.        cosf
 0.        0.        cputim_
 0.        0.        etime_
 0.        0.        getrusage
 0.       18.113     main
 0.       18.113     MAIN
 0.        0.        __rusagesys
 0.        0.        sinf
 0.       18.113     _start


The er_print -functions command gives us a quick way of seeing timings for all routines (this was a Fortran 95 program), including library routines. Right away I know that calc1, calc2, and calc3 do all the work, as expected. But we also see that calc3 is not as significant as calc1. ("Inclusive Time" includes time spent in the routines called by that routine, while "Exclusive Time" only counts time spent in the routine, exclusive of any calls to other routines.)

Well, this is a start. Note that no optimization was specified here. Lets see what happens with -fast.

>f95 -o shalfast -fast -fixed -g shalow.f90
>collect shalfast
Creating experiment database test.3.er ...
....
>er_print -functions test.3.er
Functions sorted by metric: Exclusive User CPU Time

Excl.     Incl.      Name  
User CPU  User CPU         
 sec.      sec.       
7.695     7.695      <Total>
7.675     7.695      MAIN
0.020     0.020      __rusagesys
0.        0.020      etime_
0.        0.020      getrusage



Yikes! What happened?

Clearly, with -fast the compiler compressed the program as much as it could, replacing the calls to the calc routines by compiling them inline into one hunk of code. Note also the 2x improvement in performance.

Of course, this was a little toy test program. Things would look a lot more complicated with a large "industrial" program.

But you get the idea.

More information on er_print and collect.

Comments:

Post a Comment:
Comments are closed for this entry.
About


Deep thoughts on compiling C, C++, and Fortran codes with Oracle Solaris Studio compilers, especially optimization and parallelization, from the Solaris Studio documentation lead, Richard Friedman. Email him at
Richard dot Friedman at Oracle dot com

When Run Was A Compiler

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today