Faster globals access on Sparc (part 1)

For unclear to me reason sparcv9 CPU doesn't have PC-relative addressing, what means that PIC variable access could become really problematic. Consider following innocently looking piece of code:
int a;

int foo1() {
  return a;
}
compile it with cc -c -xarch=v9 -KPIC -xO3 ~/pic.c and you'll get
0x0000000100000870: foo1       :        mov      %o7, %g1
0x0000000100000874: foo1+0x0004:        sethi    %hi(0x0), %o5
0x0000000100000878: foo1+0x0008:        call     foo1+0x10      ! 0x100000880
0x000000010000087c: foo1+0x000c:        mov      %o7, %o7
0x0000000100000880: foo1+0x0010:        sethi    %hi(0x100000), %o4
0x0000000100000884: foo1+0x0014:        xor      %o5, 928, %o3
0x0000000100000888: foo1+0x0018:        inc      160, %o4
0x000000010000088c: foo1+0x001c:        add      %o4, %o7, %o2
0x0000000100000890: foo1+0x0020:        mov      %g1, %o7
0x0000000100000894: foo1+0x0024:        add      %o2, %o3, %o1
0x0000000100000898: foo1+0x0028:        ld       [%o1], %o0
// this part isn't so interesting
0x000000010000089c: foo1+0x002c:        retl     
0x00000001000008a0: foo1+0x0030:        sra      %o0, 0, %o0
So we're getting 12 instructions just to load an integer variable from memory. So could one speed up access to critical piece of data, from the shared library? One of possible approaches is to take an absolute address in the address space, and just do direct loads from there. We can rewrite foo1() as
int foo2() {
  return ((int\*)0x8000)[0];
}
what gives us
0x00000001000008c0: foo2       :        sethi    %hi(0x8000), %o5
0x00000001000008c4: foo2+0x0004:        ld       [%o5 + 0x0000000000000000], %o3        ! 0x8000
// this part isn't so interesting
0x00000001000008c8: foo2+0x0008:        retl     
0x00000001000008cc: foo2+0x000c:        sra      %o3, 0, %o0
So now we're getting only 2 instructions comparing to 12, to access integer variable. We intentionally chosen address 0x8000, as Sparc allows up to 22-bit immediates in sethi. This approach has some drawbacks, although, most important one is ability to steal part of address space from the shared object, without introducing bugs. In my next posting I'll explain technology to avoid it.
Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

nike

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today