Using pragma opt

The Studio compiler has the ability to control the optimisation level that is applied to particular functions in an application. This can be useful if the functions are designed to work at a specific optimisation level, or if the application fails at a particular optimisation level, and you need to figure out where the problem lies.

The optimisation levels are controlled through pragma opt. The following steps need to be followed to use the pragma:

  • The directive needs to be inserted into the source file. The format of the directive is #pragma opt /level/ (/function/). This needs to be inserted into the code before the start of the function definition, but after the function header.
  • The code needs to be compiled with the flag -xmaxopt=level. This sets the maximum optimisation level for all functions in the file - including those tagged with #pragma opt.

We can see this in action using the following code snippet. This contains two identical functions, both return the square of a global variable. However, we are using #pragma opt to control the optimisation level of the function f().

int f();
int g();

#pragma opt 2 (f)

int d;

int f()
{
  return d\*d;
}

int g()
{
  return d\*d;
}

The code is compiled with the flag -xmaxopt=5, this specifies the maximum optimisation level that can be applied to any functions in the file.

$ cc -O -xmaxopt=5 -S opt.c

If we compare the disassembly for the functions f() and g(), we can see that g() is more optimal as it does not reload the global data.

                        f:
/\* 000000          0 \*/         sethi   %hi(d),%o5

!   10                !  return d\*d;

/\* 0x0004         10 \*/         ldsw    [%o5+%lo(d)],%o4 ! volatile    // First load of d
/\* 0x0008            \*/         ldsw    [%o5+%lo(d)],%o3 ! volatile    // Second load of d
/\* 0x000c            \*/         retl    ! Result =  %o0
/\* 0x0010            \*/         mulx    %o4,%o3,%o0

                        g:
/\* 000000         14 \*/         sethi   %hi(d),%o5
/\* 0x0004            \*/         ld      [%o5+%lo(d)],%o4               // Single load of d

!   15                !  return d\*d;

/\* 0x0008         15 \*/         sra     %o4,0,%o3
/\* 0x000c            \*/         retl    ! Result =  %o0
/\* 0x0010            \*/         mulx    %o3,%o3,%o0
Comments:

Wow, loading d in o4 then copying it to o3 using o3=o4>>0 is one fancy way of loading a variable into o3... Good thing we're not writing so much asm by hand anymore if that's what efficient code looks like (and I haven't seen the new sparc T3 instructions documented anywhere).

Posted by Marc on April 25, 2011 at 06:11 AM PDT #

Yeah, looks a bit rough :)

It's a shift right zero which is used to sign extend to the upper bits of the destination register. A more efficient snippet of code would be to use an signed load (rather than the unsigned load actually used).

Darryl.

Posted by Darryl Gove on April 25, 2011 at 06:20 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

Darryl Gove is a senior engineer in the Solaris Studio team, working on optimising applications and benchmarks for current and future processors. He is also the author of the books:
Multicore Application Programming
Solaris Application Programming
The Developer's Edge

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
5
6
8
9
10
12
13
14
15
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Bookmarks
The Developer's Edge
Solaris Application Programming
Publications
Webcasts
Presentations
OpenSPARC Book
Multicore Application Programming
Docs