Monday Apr 02, 2012

Efficient inline templates and C++

I've talked before about calling inline templates from C++, I've also talked about calling inline templates efficiently. This time I want to talk about efficiently calling inline templates from C++.

The obvious starting point is that I need to declare the inline templates as being extern "C":

  extern "C"
  {
    int mytemplate(int);
  }

This enables us to call it, but the call may not be very efficient because the compiler will treat it as a function call, and may produce suboptimal code based on that premise. So we need to add the no_side_effect pragma:

  extern "C"
  {
    int mytemplate(int); 
    #pragma no_side_effect(mytemplate)
  }

However, this may still not produce optimal code. We've discussed how the no_side_effect pragma cannot be combined with exceptions, well we know that the code cannot produce exceptions, but the compiler doesn't know that. If we tell the compiler that information it may be able to produce even better code. We can do this by adding the "throw()" keyword to the template declaration:

  extern "C"
  {
    int mytemplate(int) throw(); 
    #pragma no_side_effect(mytemplate)
  }

The following is an example of how these changes might improve performance. We can take our previous example code and migrate it to C++, adding the use of a try...catch construct:

#include <iostream>

extern "C"
{
  int lzd(int);
  #pragma no_side_effect(lzd)
}

int a;
int c=0;

class myclass
{
  int routine();
};

int myclass::routine()
{
  try
  {
    for(a=0; a<1000; a++)
    {
      c=lzd(c);
    }
  }
  catch(...)
  {
    std::cout << "Something happened" << std::endl;
  }
 return 0;
}

Compiling this produces a slightly suboptimal code sequence in the hot loop:

$ CC -O -xtarget=T4 -S t.cpp t.il
...
/* 0x0014         23 */         lzd     %o0,%o0
/* 0x0018         21 */         add     %l6,1,%l6
/* 0x001c            */         cmp     %l6,1000
/* 0x0020            */         bl,pt   %icc,.L77000033
/* 0x0024         23 */         st      %o0,[%l7]

There's a store in the delay slot of the branch, so we're repeatedly storing data back to memory. If we change the function declaration to include "throw()", we get better code:

$ CC -O -xtarget=T4 -S t.cpp t.il
...
/* 0x0014         21 */         add     %i1,1,%i1
/* 0x0018         23 */         lzd     %o0,%o0
/* 0x001c         21 */         cmp     %i1,999
/* 0x0020            */         ble,pt  %icc,.L77000019
/* 0x0024            */         nop

The store has gone, but the code is still suboptimal - there's a nop in the delay slot rather than useful work. However, it's good enough for this example. The point I'm making is that the compiler produces the better code with both the "throw()" and the no side effect pragma.

Pragmas and exceptions

The compiler pragmas:

  #pragma no_side_effect(routinename)
  #pragma does_not_write_global_data(routinename)
  #pragma does_not_read_global_data(routinename)

are used to tell the compiler more about the routine being called, and enable it to do a better job of optimising around the routine. If a routine does not read global data, then global data does not need to be stored to memory before the call to the routine. If the routine does not write global data, then global data does not need to be reloaded after the call. The no side effect directive indicates that the routine does no I/O, does not read or write global data, and the result only depends on the input.

However, these pragmas should not be used on routines that throw exceptions. The following example indicates the problem:

#include <iostream>

extern "C"
{
  int exceptional(int);
  #pragma no_side_effect(exceptional)
}

int exceptional(int a)
{
  if (a==7)
  {
    throw 7;
  }
  else
  {
   return a+1;
  } 
}


int a;
int c=0;

class myclass
{
  public:
  int routine();
};

int myclass::routine()
{
  for(a=0; a<1000; a++)
  {
    c=exceptional(c);
  }
 return 0;
}

int main()
{
  myclass f;
  try
  {
    f.routine();
  }
  catch(...)
  {
    std::cout << "Something happened" << a << c << std::endl;
  }
  
}

The routine "exceptional" is declared as having no side effects, however it can throw an exception. The no side effects directive enables the compiler to avoid storing global data back to memory, and retrieving it after the function call, so the loop containing the call to exceptional is quite tight:

$ CC -O -S test.cpp
...
                        .L77000061:
/* 0x0014         38 */         call    exceptional     ! params =  %o0 ! Result =  %o0
/* 0x0018         36 */         add     %i1,1,%i1
/* 0x001c            */         cmp     %i1,999
/* 0x0020            */         ble,pt  %icc,.L77000061
/* 0x0024            */         nop

However, when the program is run the result is incorrect:

$ CC -O t.cpp
$ ./a.out
Something happend00

If the code had worked correctly, the output would have been "Something happened77" - the exception occurs on the seventh iteration. Yet, the current code produces a message that uses the original values for the variables 'a' and 'c'.

The problem is that the exception handler reads global data, and due to the no side effects directive the compiler has not updated the global data before the function call. So these pragmas should not be used on routines that have the potential to throw exceptions.

About

Darryl Gove is a senior engineer in the Solaris Studio team, working on optimising applications and benchmarks for current and future processors. He is also the author of the books:
Multicore Application Programming
Solaris Application Programming
The Developer's Edge

Search

Categories
Archives
« April 2012 »
SunMonTueWedThuFriSat
1
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
21
22
24
25
26
27
28
29
30
     
       
Today
Bookmarks
The Developer's Edge
Solaris Application Programming
Publications
Webcasts
Presentations
OpenSPARC Book
Multicore Application Programming
Docs