I try to fairly regularly build recent git checkouts of all the upstream modules from X.Org (at least all those listed in the current build.sh) on Solaris. Normally I do this in 32-bit mode on x86 machines using the Sun compilers on the latest Solaris 11 internal development build, but I also occasionally do it in 64-bit mode, or with gcc compilers, or on a SPARC machine. This helps me catch issues that would break our builds when we integrate the new releases before those releases happen. (Ideally I’d set up a Solaris client of the X.Org tinderbox, but I’ve not gotten around to that.)
Anyways, recently I finally decided to track down an error that only shows up in the 64-bit builds of the xscope protocol monitor/decoder for X11 on Solaris. The builds run fine up until the final link stage, which fails with:
ld: fatal: relocation error: R_AMD64_PC32: file audio.o: symbol littleEndian: value 0x8086c355 does not fit ld: fatal: relocation error: R_AMD64_PC32: file audio.o: symbol ServerHostName: value 0x8086b4fe does not fit ld: fatal: relocation error: R_AMD64_PC32: file decode11.o: symbol LBXEvent: value 0x808664c3 does not fit (and over 150 more symbols that didn't fit)
A google search turned up some forum posts, a blog post, and an article on the AMD64 ABI support in the Sun Studio compilers. And indeed, the solutions they offered did work – building with -Kpic did allow the program to link.
But is that really the best answer? xscope is a simple program, and shouldn’t be overflowing the normal memory model. Once it linked, looking at the resulting binary was a bit shocking:
% /usr/gnu/bin/size xscope
text data bss dec hex filename
416753 5256 2155921980 2156343989 808732b5 xscope
% /usr/bin/size -f xscope
23(.interp) + 32(.SUNW_cap) + 5860(.eh_frame_hdr) + 27200(.eh_frame)
+ 2964(.SUNW_syminfo) + 5944(.hash) + 4224(.SUNW_ldynsym)
+ 17784(.dynsym) + 14703(.dynstr) + 192(.SUNW_version)
+ 1482(.SUNW_versym) + 3168(.SUNW_dynsymsort) + 96(.SUNW_reloc)
+ 1944(.rela.plt) + 1312(.plt) + 291018(.text) + 33(.init) + 33(.fini)
+ 280(.rodata) + 38461(.rodata1) + 1376(.got) + 784(.dynamic)
+ 1952(.data) + 0(.bssf) + 1144(.picdata) + 0(.tdata) + 0(.tbss)
+ 2155921980(.bss) = 2156343989
% pmap -x `pgrep xscope`
26151: ./xscope
Address Kbytes RSS Anon Locked Mode Mapped File
0000000000400000 408 408 - - r-x-- xscope
0000000000476000 8 8 8 - rw--- xscope
0000000000478000 2105388 1064 1064 - rw--- xscope
0000000080C83000 52 52 52 - rw--- [ heap ]
[....]
FFFFFD7FFFDF8000 32 32 32 - rw--- [ stack ]
---------------- ---------- ---------- ---------- ----------
total Kb 2108668 3204 1300 -
Two gigabytes of .bss space allocated!?!?! That can’t be right. Looking through the output of the elfdump and nm programs a single symbol stood out:
Symbol Table Section: .SUNW_ldynsym
index value size type bind oth ver shndx name
[...]
[89] 0x00000000009ff280 0x0000000080280000 OBJT GLOB D 1 .bss FDinfo
[Index] Value Size Type Bind Other Shndx Name
[...]
[528] | 10482304| 2150105088|OBJT |GLOB |0 |28 |FDinfo
Unfortunately, that wasn’t one of the ones listed in the linker errors, since it’s starting address fit inside the normal memory model, but everything that came after it was out of range.
So what is this giant static allocation for? It’s defined in scope.h:
#define BUFFER_SIZE (1024 * 32)
struct fdinfo
{
Boolean Server;
long ClientNumber;
FD pair;
unsigned char buffer[BUFFER_SIZE];
int bufcount;
int bufstart;
int buflimit; /* limited writes */
int bufdelivered; /* total bytes delivered */
Boolean writeblocked;
};
extern struct fdinfo FDinfo[StaticMaxFD];
So it allocates a 32k buffer for up to StaticMaxFD file descriptors. How many is that? For that we need to look in xscope’s fd.h:
/* need to change the MaxFD to allow larger number of fd's */ #define StaticMaxFD FD_SETSIZE
and from there to the Solaris system headers, which define FD_SETSIZE in <sys/select.h>:
/* * Select uses bit masks of file descriptors in longs. * These macros manipulate such bit fields. * FD_SETSIZE may be defined by the user, but the default here * should be >= NOFILE (param.h). */ #ifndef FD_SETSIZE #ifdef _LP64 #define FD_SETSIZE 65536 #else #define FD_SETSIZE 1024 #endif /* _LP64 */
So this makes the buffer fields alone in FDinfo become 65536 * 32 * 1024 bytes, aka 2 gigabytes.
Thus in this case, while compiler flags like -Kpic allow the code to link, using -DFD_SETSIZE=256 instead, builds code that’s a little bit saner, fits in the normal memory model, and is less likely to fail with out of memory errors when you need it most:
% /usr/gnu/bin/size -f xscope
text data bss dec hex filename
409388 3352 8449804 8862544 873b50 xscope
% pmap -x `pgrep xscope`
Address Kbytes RSS Anon Locked Mode Mapped File
0000000000400000 404 404 - - r-x-- xscope
0000000000475000 4 4 4 - rw--- xscope
0000000000476000 8248 20 20 - rw--- xscope
0000000000C84000 52 52 52 - rw--- [ heap ]
[...]
FFFFFD7FFFDFD000 12 12 12 - rw--- [ stack ]
---------------- ---------- ---------- ---------- ----------
total Kb 11500 2136 232 -
Of course that assumes that xscope is not going to be monitoring more than about 120 clients at a time (since it opens two file descriptors for each client, one connected to the client and one to the real X server), and still wastes many page mappings if you’re only monitoring one client. The real fix being worked on for the next upstream release is to make the buffer allocation be dynamic, and allocate just enough for the number of clients we actually are monitoring.
The moral of this story? Just because you can make it build doesn’t mean you’ve fixed it well, and sometimes it’s useful to understand why the linker is giving you a hard time.
