See Compiler Run, Run Compiler, Run...

If you've ever wondered what the compiler is doing when it optimizes your code, you can use the command-line tool, er_src, which is part of the Sun Studio Performance Analyzer, to view the "compiler commentary".

Just compile with some optimization level and -g and then pass the object code to er_src.

>f95 -O3 -g -c fall.f95 ; er_src fall.o
Source file: fall.f95
Object file: fall.o
Load Object: fall.o

     1.         parameter (n=100)
        <Function: MAIN>
     2.         real psi(n,n)
     3.         a = 1E6
     4.         tpi = 2\*3.14159265
     5.         di = tpi/float(n)
     6.         dj = di

    Source loop below has tag L1
    Source loop below has tag L2
    L1 could not be pipelined because it contains calls
     7.     forall (j=1:n, i=1:n) psi(i,j)= a\*sin((float(i)-.5) \* di) \* sin((float(j)-.5)\*dj)
     8.         print\*, psi(50,50)
     9.         end

This is a little test example using a Fortran 95 FORALL loop, compiled at optimization level O3.

Lets try it again, but this time with -fast for full optimization:

>f95 -fast -g -c fall.f95 ; er_src fall.o
Source file: fall.f95
Object file: fall.o
Load Object: fall.o

     1.         parameter (n=100)
        <Function: MAIN>
     2.         real psi(n,n)
     3.         a = 1E6
     4.         tpi = 2\*3.14159265
     5.         di = tpi/float(n)
     6.         dj = di

    Source loop below has tag L1
    Source loop below has tag L2
    L1 fissioned into 2 loops, generating: L3, L4
    L1 transformed to use calls to vector intrinsics: __vsinf_
    L4 scheduled with steady-state cycle count = 2
    L4 unrolled 3 times
    L4 has 1 loads, 1 stores, 0 prefetches, 0 FPadds, 1 FPmuls, and 0 FPdivs per iteration
    L4 has 0 int-loads, 0 int-stores, 4 alu-ops, 0 muls, 0 int-divs and 0 shifts per iteration
    L3 scheduled with steady-state cycle count = 4
    L3 unrolled 2 times
    L3 has 0 loads, 1 stores, 0 prefetches, 3 FPadds, 1 FPmuls, and 0 FPdivs per iteration
    L3 has 0 int-loads, 0 int-stores, 3 alu-ops, 0 muls, 0 int-divs and 0 shifts per iteration
     7.     forall (j=1:n, i=1:n) psi(i,j)= a\*sin((float(i)-.5) \* di) \* sin((float(j)-.5)\*dj)
     8.         print\*, psi(50,50)
     9.         end

A lot more going on here. Note that transforms the FORALL into two loops and then unrolls them. It also uses a vector version of the sin() function to process a bunch of arguments in a single call.

While the compiler commentary can get somewhat bit cryptic, you can get a feel for the kinds of optimizations the compiler is performing on your code.

It's also useful when using the auto parallelization options. We'll have more to say about that. But it's worth using er_src to get an idea about what the compiler can and cannot do. And don't forget to also compile with -g.

Comments:

Post a Comment:
Comments are closed for this entry.
About


Deep thoughts on compiling C, C++, and Fortran codes with Oracle Solaris Studio compilers, especially optimization and parallelization, from the Solaris Studio documentation lead, Richard Friedman. Email him at
Richard dot Friedman at Oracle dot com

When Run Was A Compiler

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today