On allocas


The alloca call is used when one would like to dynamically allocate memory within function scope. Such memory is reclaimed after the function call returns, hence obviating the necessity of explicitly freeing the memory.
On SPARC, alloca is a macro, defined in alloca.h.

     55 #if defined(__BUILTIN_VA_ARG_INCR) || \\
     56 	defined(__sparc) || defined(__i386) || defined(__amd64)
     57 #define	alloca(x)	__builtin_alloca(x)
     58  

The compiler, when it pre-processes a source file containing an alloca call, replaces it with a call to __builtin_alloca(invoking a "cc -P" generates pre-processed source code into a filename.i, in Sun Studio).

After the file is compiled, however, the allocas may not show up when DTrace is used to profile the application. This is because the compiler generates inline assembly for the __builtin_alloca call. This happens even if inlining is disabled using the "-xinline=" compiler option. All that the alloca implementation needs is to decrement the stack pointer by the number of bytes allocated by the alloca call, and the code to do this is generated by the compiler in place of the alloca call.(there is an exception to this simple algorithm, which is documented by Darryl)
For example,

     #include <stdio.h>
     #include <alloca.h>

     void main(void)
     {
       void \* ptr = alloca(262144);
     }

is compiled into an object, which on disassembly(which can be done using the er_src command with Sun Studio) looks like
Annotated disassembly
---------------------------------------
Source file: ./allocate.c
Object file: ./allocate
Load Object: ./allocate

   
   
   
     1. #include <stdio.h>
     2. #include <alloca.h>
     3. 
     4. void main(void)
     5. {
        
        [5]    10b70:  save        %sp, -104, %sp
     6.   void \* ptr = alloca(262144);
        [6]    10b74:  sethi       %hi(0x40000), %o0
        [6]    10b78:  sub         %sp, %o0, %sp
        [6]    10b7c:  ret         
        [6]    10b80:  restore     %g0, 0, %g0
     7. }


The save and restore calls are made on entry and exit from main, they obtain a fresh set of registers for the main routine, using a SPARC hardware feature called register windows. The alloca call is broken into the sethi and sub instructions. sethi sets the most significant 22 bits of register o0 with the 22 most significant bits of the hex value 0x40000(which is what the %hi achieves). the sub call then subtracts 0x40000(or 262144 in decimal), which is stored in register o0, from the stack pointer(-xO4 optimization was used to compile this code).

The alloca code is inlined in the object, hence a call to alloca(or __builtin_alloca) will not show up through DTrace or nm(unless the code falls into the exception category mentioned above).

A good way to trace this is to use the collector(and analyzer) or SPOT.
It helps if the binary is compiled using "-g -xO4 -xbinopt=prepare". Using -g does not reduce performance if -xO4 or higher is used, and -xbinopt=prepare does not affect performance.

Comments:

Post a Comment:
Comments are closed for this entry.
About

prashant

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today