I received an application from a customer the other day. It's quite a big sucker, consisting of the application and over 70 shared objects (that's besides the system objects that also get used).
% size -x main \*.so main: 2df35c + 2675a4 + 80918f8 = 0x85d81f8 libxxx.so: 64d4d9c + 9af9f6 + 19604ba = 0x87e4c4c libyyy.so: 4db7aeb + 76aa4c + 32cc16c = 0x87ee6a3 libzzz.so: 3f347ce + d8ebb1 + 4642a3b = 0x9305dba ....
The customer has complained that it takes a long time to load this application. In particular, it takes a long time to verify their objects using ldd(1) and the -r option. See the Solaris 10 man pages section 1: User Commands.
Using ldd(1) and the -d option, emulates the cost of starting a process. The relocation processing is exactly the same as applied by ld.so.1(1), at runtime. Using the -r option processes all relocations as would be applied by ld.so.1(1) if the environment variable LD_BIND_NOW were in effect. Using ldd(1) and the -r option is a convenient way of testing that all symbol references can be found for a particular object.
The set of shared objects supplied by the customer do not specify any of their dependencies. In fact, the application seems responsible for establishing all dependencies, not only those that the application references, but also those needed to satisfy all dependencies. If each shared object defined their dependencies, then ldd(1) could be used on each object to validate its symbol requirements. However, with this set of objects, the only means of validating symbol requirements is to run ldd(1) against the application, and this is taking a long time.
A quick poke around with elfdump(1), reveals that there are over 3.3 million relocations to process for this application and dependencies. Some profiling (DTrace is your friend here), revealed that relocation processing is taking nearly 99% of the startup cost. Locating and mapping all the objects is trivial.
Looking a little deeper I found that around 2.3 million relocations are RELATIVE relocations. These are relocations that simply need the base offset of the object to be added to the relocation offset. This is a simple operation, involving no symbol lookup, and is only accounting for a few percent of the cost.
The rest of the startup cost stems from the symbolic relocations, of which there are some 740,000 that needed processing with ldd(1) and the -d option. Poking some more revealed that 680,000 of these symbols are of the form $XBGEQEsZEHHBGQS.... (it's the $X prefix that's the give away). These are local symbols that have been made unique and promoted to global symbols by the compilers when the -g (debugging) option is used. By all accounts, they allow for the debuggers to provide fix-and-continue processing, or, if you're compilers have this capability, inter-object optimization.
I don't have the source for this set of objects to experiment with not using -g. But I'm left concluding that the bulk of the startup cost of this process is due to these $X.... symbols.
If you don't want fix-and-continue, be careful how you use the compiler flags. This overhead in relocation processing probably isn't what you want in production software.
Note, you can also build objects with the -zcombreloc flag of ld(1). This option combines relocation sections into one table. The RELATIVE relocations are all concatenated, and represented by dynamic entries that allow the runtime linker to process them through an optimized loop, that is even faster than normal relative relocation processing.