Friday Mar 25, 2011

R_AMD64_PC32 error? There, I Fixed It!

I try to fairly regularly build recent git checkouts of all the upstream modules from X.Org (at least all those listed in the current build.sh) on Solaris. Normally I do this in 32-bit mode on x86 machines using the Sun compilers on the latest Solaris 11 internal development build, but I also occasionally do it in 64-bit mode, or with gcc compilers, or on a SPARC machine. This helps me catch issues that would break our builds when we integrate the new releases before those releases happen. (Ideally I'd set up a Solaris client of the X.Org tinderbox, but I've not gotten around to that.)

Anyways, recently I finally decided to track down an error that only shows up in the 64-bit builds of the xscope protocol monitor/decoder for X11 on Solaris. The builds run fine up until the final link stage, which fails with:

ld: fatal: relocation error: R_AMD64_PC32: file audio.o: symbol littleEndian: value 0x8086c355 does not fit
ld: fatal: relocation error: R_AMD64_PC32: file audio.o: symbol ServerHostName: value 0x8086b4fe does not fit
ld: fatal: relocation error: R_AMD64_PC32: file decode11.o: symbol LBXEvent: value 0x808664c3 does not fit
(and over 150 more symbols that didn't fit)

A google search turned up some forum posts, a blog post, and an article on the AMD64 ABI support in the Sun Studio compilers. And indeed, the solutions they offered did work - building with -Kpic did allow the program to link.

But is that really the best answer? xscope is a simple program, and shouldn't be overflowing the normal memory model. Once it linked, looking at the resulting binary was a bit shocking:

% /usr/gnu/bin/size  xscope
   text	   data	    bss	    dec	    hex	filename
 416753	   5256	2155921980	2156343989	808732b5	xscope

% /usr/bin/size -f xscope

23(.interp) + 32(.SUNW_cap) + 5860(.eh_frame_hdr) + 27200(.eh_frame)
 + 2964(.SUNW_syminfo) + 5944(.hash) + 4224(.SUNW_ldynsym)
 + 17784(.dynsym) + 14703(.dynstr) + 192(.SUNW_version)
 + 1482(.SUNW_versym) + 3168(.SUNW_dynsymsort) + 96(.SUNW_reloc)
 + 1944(.rela.plt) + 1312(.plt) + 291018(.text) + 33(.init) + 33(.fini)
 + 280(.rodata) + 38461(.rodata1) + 1376(.got) + 784(.dynamic)
 + 1952(.data) + 0(.bssf) + 1144(.picdata) + 0(.tdata) + 0(.tbss)
 + 2155921980(.bss) = 2156343989

% pmap -x `pgrep xscope`
26151:	./xscope
         Address     Kbytes        RSS       Anon     Locked Mode   Mapped File
0000000000400000        408        408          -          - r-x--  xscope
0000000000476000          8          8          8          - rw---  xscope
0000000000478000    2105388       1064       1064          - rw---  xscope
0000000080C83000         52         52         52          - rw---    [ heap ]
[....]
FFFFFD7FFFDF8000         32         32         32          - rw---    [ stack ]
---------------- ---------- ---------- ---------- ----------
        total Kb    2108668       3204       1300          -

Two gigabytes of .bss space allocated!?!?! That can't be right. Looking through the output of the elfdump and nm programs a single symbol stood out:

Symbol Table Section:  .SUNW_ldynsym
     index    value              size              type bind oth ver shndx          name
[...]
      [89]  0x00000000009ff280 0x0000000080280000  OBJT GLOB  D    1 .bss           FDinfo

[Index]   Value                Size                Type  Bind  Other Shndx   Name
[...]
[528]   |            10482304|          2150105088|OBJT |GLOB |0    |28     |FDinfo

Unfortunately, that wasn't one of the ones listed in the linker errors, since it's starting address fit inside the normal memory model, but everything that came after it was out of range.

So what is this giant static allocation for? It's defined in scope.h:

#define BUFFER_SIZE (1024 \* 32)

struct fdinfo
{
  Boolean Server;
  long    ClientNumber;
  FD      pair;
  unsigned char   buffer[BUFFER_SIZE];
  int     bufcount;
  int     bufstart;
  int     buflimit;     /\* limited writes \*/
  int     bufdelivered; /\* total bytes delivered \*/
  Boolean writeblocked;
};

extern struct fdinfo   FDinfo[StaticMaxFD];

So it allocates a 32k buffer for up to StaticMaxFD file descriptors. How many is that? For that we need to look in xscope's fd.h:

/\* need to change the MaxFD to allow larger number of fd's \*/
#define StaticMaxFD FD_SETSIZE

and from there to the Solaris system headers, which define FD_SETSIZE in <sys/select.h>:

/\*
 \* Select uses bit masks of file descriptors in longs.
 \* These macros manipulate such bit fields.
 \* FD_SETSIZE may be defined by the user, but the default here
 \* should be >= NOFILE (param.h).
 \*/
#ifndef FD_SETSIZE
#ifdef _LP64
#define FD_SETSIZE      65536
#else
#define FD_SETSIZE      1024
#endif  /\* _LP64 \*/

So this makes the buffer fields alone in FDinfo become 65536 \* 32 \* 1024 bytes, aka 2 gigabytes.

Thus in this case, while compiler flags like -Kpic allow the code to link, using -DFD_SETSIZE=256 instead, builds code that's a little bit saner, fits in the normal memory model, and is less likely to fail with out of memory errors when you need it most:

% /usr/gnu/bin/size -f xscope
   text	   data	    bss	    dec	    hex	filename
 409388	   3352	8449804	8862544	 873b50	xscope

% pmap -x `pgrep xscope`
         Address     Kbytes        RSS       Anon     Locked Mode   Mapped File
0000000000400000        404        404          -          - r-x--  xscope
0000000000475000          4          4          4          - rw---  xscope
0000000000476000       8248         20         20          - rw---  xscope
0000000000C84000         52         52         52          - rw---    [ heap ]
[...]
FFFFFD7FFFDFD000         12         12         12          - rw---    [ stack ]
---------------- ---------- ---------- ---------- ----------
        total Kb      11500       2136        232          -

Of course that assumes that xscope is not going to be monitoring more than about 120 clients at a time (since it opens two file descriptors for each client, one connected to the client and one to the real X server), and still wastes many page mappings if you're only monitoring one client. The real fix being worked on for the next upstream release is to make the buffer allocation be dynamic, and allocate just enough for the number of clients we actually are monitoring.

The moral of this story? Just because you can make it build doesn't mean you've fixed it well, and sometimes it's useful to understand why the linker is giving you a hard time.

About

Engineer working on Oracle Solaris and with the X.Org open source community.

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle, the X.Org Foundation, or anyone else.

See Also
Follow me on twitter

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today