Thursday May 22, 2008

Alloca on SPARC

On SPARC there's a slight complication. The load and store instructions have an offset range of -4096 to +4096. To use a larger offset than that it is necessary to put the offset into a register and use that to calculate the address.

If the size of the local variables are less than 4KB, then a load or store instruction can use the frame pointer together with an offset in order to access the memory on the stack. If the stack is greater than 4KB, then it's possible to use the frame pointer to access memory in the upper 4KB range, and the stack pointer to access memory in the lower 4KB. Rather like this diagram shows:

frame pointer -> top of stack
               \^
               | Upper 4KB can be accessed 
               v using offset+ frame pointer
               \^
               | Lower 4KB can be accessed 
               v using offset+ frame pointer
stack pointer -> bottom of stack

The complication is when temporary memory is allocated on the stack using alloca, and the size of the local variables exceed 4KB. In this case it's not possible to just shift the stack pointer downwards - since that may cause variables that were previously accessed through the stack pointer to become out of the 4KB offset range, or it would change the offset from the stack pointer where variables are stored (by an amount which may only be known at runtime). Either of these situations would not be good.

Instead of just shifting the stack pointer, a slightly more complex operation has to be carried out. The memory gets allocated in the middle of the range, and the lower memory gets shifted (or copied) downwards. The end result is something like this:

frame pointer -> top of stack
               \^
               | Upper 4KB can be accessed 
               v using offset+ frame pointer
               [Alloca'd memory]
               \^
               | Lower 4KB can be accessed 
               v using offset+ frame pointer
stack pointer -> bottom of stack

The routine that does this manipulation of memory is called __builtin_alloca. You can see in the code that it moves the stack pointer, and then has a copy loop to move the contents of the stack.

Unfortunately, the need to copy the data means that it takes longer to allocate memory. So if the function __builtin_alloca appears in a profile, the first thing to do is to see whether it's possible to reduce the amount of local variables/stack space needed for the routine.

As a footnote, take a look at the equivalent code for the x86 version of __builtin_alloca. The x86, being CISC, does not have the limit on the size of the offset that can be used. Hence the x86 code does not need the copy routine to move variables in the stack around.

Reserving temporary memory using alloca

There are occasions where it's useful to allocate a small temporary working area for the duration of a call to a routine - for example to hold an array. One way of doing this is to use malloc and free:

void f(int a)
{
  int\* array=(int\*)malloc(sizeof(int)\*a);
  ...
  free(array);
}

Obviously the use of malloc and free does incur some overhead, and which is undesirable, particularly if this is a performance critical routine.

An alternative approach is to use alloca which allocates memory on the stack, here's the man page. Being on the stack, the memory is 'freed' when the routine exits. Well, the memory isn't allocated, so it's not really freed, but accesses to the memory once the routine exits are not valid - you'll be accessing data either beyond the stack, or in the stack frame of another routine. Neither situation is likely to be good.

The equivalent code is:

#include 

void f(int a)
{
  int\* array=(int\*)alloca(sizeof(int)\*a);
  ...
}

For routines which do require temporary storage, this can be a much faster way of allocating it.

About

Darryl Gove is a senior engineer in the Solaris Studio team, working on optimising applications and benchmarks for current and future processors. He is also the author of the books:
Multicore Application Programming
Solaris Application Programming
The Developer's Edge

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
5
6
8
9
10
12
13
14
15
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Bookmarks
The Developer's Edge
Solaris Application Programming
Publications
Webcasts
Presentations
OpenSPARC Book
Multicore Application Programming
Docs