Sunday May 08, 2005

Loading Multiple Files - same name, different directories

A recent customer observation reminded me of a subtlety of shared object dependency lookup, and a change that occurred between Solaris 8 and 9. The customer observed different dependencies being loaded on the two systems, although the applications file system hierarchy was the same on the two systems.

On Solaris 8, the following was observed.

  $ ldd ./app
      libX.so.1 =>     /opt/ISV/weblib/libX.so.1
      libY.so.1 =>     /opt/ISV/weblib/libY.so.1
      libZ.so.1 =>     /opt/ISV/weblib/libZ.so.1

And, on Solaris 9, the following was observed.

  $ ldd ./app
      libX.so.1 =>     /opt/ISV/weblib/libX.so.1
      libY.so.1 =>     /opt/ISV/weblib/libY.so.1
      libZ.so.1 =>     /opt/ISV/weblib/libZ.so.1
      libX.so.1 =>     /opt/ISV/lib/libX.so.1
      libY.so.1 =>     /opt/ISV/lib/libY.so.1

Notice, with Solaris 9 we seem to have gained two new dependencies from the directory /opt/ISV/lib.

In a previous posting I'd discussed some warnings in regard to using LD_LIBRARY_PATH, and how using a runpath was a better alternative. This customers application and dependencies are using runpaths, however the runpaths are not consistent, and are revealing the different behavior between Solaris 8 and Solaris 9.

In Solaris 8 and prior releases, dependencies were loaded by:

  • First, determining whether the dependency name had already been loaded. This comparison used the NEEDED entry, in this case:

      $ elfdump -d /opt/ISV/weblib/libY.so.1
      ....
          [1]  NEEDED    0x122ef      libX.so.1
    

    This name is a simple filename, and thus gets matched against the libX.so.1 that has already been loaded as a dependency of app, as /opt/ISV/weblib/libX.so.1.

  • Had this filename comparison failed, ld.so.1 would then use any search paths to try and locate the file.

It was discovered that this dependency name pattern matching was becoming a significant bottleneck, especially as the number of application dependencies continues to increase. A second drawback to this model was that requirements started to materialize for processes to be able to open different dependencies. That is, the same filename, but where the files were located in different directories.

These observations resulted in a change to the loading behavior. With Solaris 9 we no longer carry out the filename dependency pattern match against previously loaded objects. We simply search for the file using any search paths relevant to the caller (which includes the RPATH of the caller). Should this search result in a file that has already been loaded, a quick dev/inode check catches this, and prevents a duplicate loading.

The result is a much faster, and scalable search for dependencies, and the flexibility required to locate the same filename in different locations.

Hence, starting with Solaris 9 we now see:

  $ ldd -s ./app
  ....
  find object=libX.so.1; required by /opt/ISV/weblib/libY.so.1
    search path=/opt/ISV/lib:/opt/ISV/lib/../SS:......:/usr/lib/lwp \\
        (RPATH from file /opt/ISV/weblib/libY.so.1)
    trying path=/opt/ISV/lib/libX.so.1
      libX.so.1 =>     /opt/ISV/lib/libX.so.1

This search path, initiated from libY.so.1, is \*different\* from the path search that originated from ./app, and we are therefore finding two different versions of the file with the same name.

Whether two different versions of the same file are required in this users hierarchy are still unknown, perhaps they should be consolidated. But, if you want to insure the dependencies located by your components are the same, the search paths (RPATHS - set using ld -R) for all components of your system should be the same.

Tuesday Apr 26, 2005

My Relocations Don't Fit - Position Independence

A couple of folks have come across the following relocation error when running their applications on AMD64:

  $ prog
  ld.so.1: prog: fatal: relocation error: R_AMD64_32: file \\
      libfoo.so.1: symbol (unknown): value 0xfffffd7fff0cd457 does not fit

The culprit, libfoo.so.1 has been built using position dependent code (often referred to as non-pic).

Shared objects are typically built using position independent code, using compiler options such as -Kpic. This position independence allows the code to execute efficiently at a different address in each process that uses the code.

If a shared object is built from position-dependent code, the text segment can require modification at runtime. This modification allows relocatable references to be assigned to the location that the object has been loaded. The relocation of the text segment requires the segment to be remapped as writable. This modification requires a swap space reservation, and results in a private copy of the text segment for the process. The text segment is no longer sharable between multiple processes. Position-dependent code typically requires more runtime relocations than the corresponding position-independent code. Overall, the overhead of processing text relocations can cause serious performance degradation.

When a shared object is built from position-independent code, relocatable references are generated as indirections through data in the shared object's data segment. The code within the text segment requires no modification. All relocation updates are applied to corresponding entries within the data segment.

The runtime linker attempts to handle text relocations should these relocations exist. However, some relocations can not be satisfied at runtime.

The AMD64 position-dependent code sequence typically generates code which can only be loaded into the lower 32-bits of memory. The upper 32-bits of any address must all be zeros. Since shared objects are typically loaded at the top of memory, the upper 32-bits of an address are required. Position-dependent code within an AMD64 shared object is therefore insufficient to cope with relocation requirements. Use of such code within a shared object can result in runtime relocation errors cited above.

This situation differs from the default ABS64 mode that is used for 64-bit SPARCV9 code. This position-dependent code is typically compatible with the full 64-bit address range. Thus, position-dependent code sequences can exist within SPARCV9 shared objects. Use of either the ABS32 mode, or ABS44 mode for 64-bit SPARCV9 code, can still result in relocations that can not be resolved at runtime. However, each of these modes require the runtime linker to relocate the text segment.

Build all your shared objects using position independent code.


Update - Wednesday March 21, 2007

If you believe you have compiled all the components of your shared object using -Kpic, and still see an error of this sort, look a little more closely. First, determine if the link-editor thinks the shared object contains text relocations.

  $ elfdump -d library | fgrep TEXTREL

If this flag is found, then the link-editor thinks this file contains non-pic code. One explanation might be that you have include an assembler file in the shared library. Any assembler must be written using position-independent instructions. Another explanation is that you might have included objects from an archive library as part of the link-edit. Typically, archives are built with non-pic objects.

You can track down the culprit from the link-editors debugging capabilities. Build your shared object.

  $ LD_OPTIONS=-Dreloc,detail cc -o library .... 2> dbg

The diagnostic output in dbg can be inspected to locate the non-pic relocation, and from this you can trace back to the input file that supplied the relocation as part of building your library.


Update - Wednesday April 14, 2010

A question arose on how to interpret the diagnostic output so as to determine which input file is non-pic. You can look for the "creating output relocations" information. These are the relocation records that are created in the output file, and must be processed at runtime. The non-pic relocations will probably be against the .text section. So, if you have:

  debug: creating output relocations
  debug:          type                       offset     addend  section        symbol
  debug:        R_SPARC_HI22                  0x4d4          0  .SUNW_reloc    foo1

Then you can associate the offset with an output section:

  % elfdump -cN.text output-file.so 

  Section Header[7]:  sh_name: .text
    sh_addr:      0x4c0           sh_flags:   [ SHF_ALLOC SHF_EXECINSTR ]
    sh_size:      0x28            sh_type:    [ SHT_PROGBITS ]
    sh_offset:    0x4c0           sh_entsize: 0
    ...

Another aid might be to also use the link-editors -znocombreloc option. This suppresses the normal combination of output relocations sections, and might provide for a more informative diagnostic:

  debug: creating output relocations
  debug:          type                       offset     addend  section        symbol
  debug:        R_SPARC_HI22                  0x4d4          0  .rela.text     foo1

Here you can now see that the relocation is against the .text section.

Having found a non-pic relocation, search back in the diagnostics and try and find the matching input relocation using the relocation type and symbol name. You should find something like:

  debug: collecting input relocations: section=.text, file=foo.o
  debug:          type                       offset     addend  section        symbol
  debug:     in R_SPARC_HI22                   0x14          0  [13].rela.text foo1
  debug:    out R_SPARC_HI22                   0x14          0  .text          foo1

Here, the input relocation is against the text section (.rela.text), and to provide for this, an output relocation must be produced against the .text section. The non-pic culprit is the file foo.o.

You might be able to discover your non-pic relocations by just scanning through the "collecting input relocations" information. But for large links this can be a substantial amount of information to digest.

Wednesday Feb 02, 2005

Loading Relocatable Objects at Runtime - Very Expensive

The runtime linker, ld.so.1(1), is capable of loading relocatable objects. This capability arrived somewhat by accident, and as a side effect of some other projects related to the link-editor, ld(1). But, it was thought that this capability could prove useful within a development environment. A user can preload relocatable objects, and this technique might provide a quick prototyping turnaround instead of having to first combine the relocatable objects into some huge shared object.

However, this capability doesn't come cheap, and it was never thought to be useful for production software. Recently, a customer problem uncovered a regression in relocatable object processing. But, it was a surprise to see that this technique, in this case the dlopen(3c) of a relocatable object, was being used in production code.

The runtime linker only knows how to deal with dynamic objects, that is, dynamic executables and shared objects. When a relocatable object is encountered, the runtime linker first loads the link-editor. The link-editor then converts the relocatable object into a shared object memory image within the process. The runtime linker then processes this memory image as it would any other shared object.

The link-editor is really a family of libraries, and you can see these if you inspect the link-maps of a process that has triggered the loading of a relocatable object.

   % mdb main
   > ::bp exit
   > :r
   > ::Rt_maps
   Link-map lists (dynlm_list): 0x8045f38
   ----------------------------------------------
     Lm_list: 0xfeffa204  (LM_ID_BASE)
     ----------------------------------------------
       Link_map\*  ADDR()     NAME()
       ----------------------------------------------
       0xfeffc8f4 0x08050000 main
       0xfeffccc4 0xfeef0000 /lib/libc.so.1
     ----------------------------------------------
     Lm_list: 0xfeffa1c8  (LM_ID_LDSO)
     ----------------------------------------------
       0xfeffc590 0xfefc8000 /lib/ld.so.1
       0xfeee06e4 0xfee60000 /lib/libld.so.2
       0xfeee0af0 0xfed90000 /lib/libc.so.1
       0xfed80068 0xfed50000 /lib/libelf.so.1

As you can see, the runtime linkers link-map list (LM_ID_LDSO) contains the runtime linker itself, and all the libraries that effectively make up the link-editor.

Besides loading all these support libraries, the shared object image adds additional overhead. Relocatable objects are typically non-pic, and thus require considerable more relocation than their pic (position independent) counterparts. This relocation overhead is passed on to the shared object memory image.

And, because the image is created in memory, each process that uses this technique creates it's own private image. There is no sharing of text pages from the shared object image, as is common for shared objects. This customers application forked off a number of processes that all dlopen()'ed the same relocatable object.

So, this technique may have some use within a development environment, but there are expenses:

  • the link-editor must be loaded into the process,

  • the relocatable object is then converted into a shared object memory image,

  • there are typically many more relocations to process in the memory image than there would be from a comparable pic shared object, and

  • the resulting shared object image isn't sharable between processes.

Personally, I'd recommend not using this technique in production software.

Sunday Jan 02, 2005

Interface Creation - using the compilers

In a previous posting, I covered how the interface of a dynamic object could be established by defining the interface symbols within a mapfile, and feeding this file to the link-edit of the final object. Establishing an objects interface in this manner hides all non-interface symbols, making the object more robust and less vulnerable to symbol name space pollution. This symbol reduction also goes a long way to reducing the runtime relocation cost of the dynamic object.

This mapfile technique, while useful with languages such as C, can be challenging to exploit with languages such as C++. There are two major difficulties.

First, the link-editor only processes symbols in their mangled form. For example, even a simple interface such as:

    void foo(int bar)

has a C++ symbolic representation of:

    % elfdump -s foo.o
    ....
    [2]  .... FUNC GLOB  D   0 .text  __1cDfoo6Fi_v_

As no tool exists that can determine a symbols mangled name other than the compilers themselves, trying to establish definitions of this sort within a mapfile, is no simple task.

The second issue is that some interfaces created by languages such as C++, provide implementation details of the language itself. These implementation interfaces often must remain global within a group of similar dynamic objects, as one interface must interpose on all the others for the correct execution of the application. As users generally are not aware of what implementation symbols are created, they can blindly demote these symbols to local when applying any interface definitions with a mapfile. Even the use of linker options like -B symbolic are discouraged with C++, as these options can lead to implementation symbols being created that are non-interposable.

Thankfully, some recent ELF extension work carried out with various Unix vendors has established a set of visibility attributes that can be applied to ELF symbol table entries. These attributes are maintained within the symbol entries st_other field, and are fully documented under "ELF Symbol Visibility" in the "Object File Format" chapter of the Linker and Libraries Guide.

The compilers, starting with Sun ONE Studio 8, are now capable of describing symbol visibility. These definitions are then encoded in the symbol table, and used by ld(1) in a similar manner as reading definitions from a mapfile. Using a combination of code "definitions" and command line options, you can now defined the runtime interface of a C++ object.

As with any interface definition technique, this compilation method can greatly reduce the number of symbols that would normally be employed in runtime relocations. Given the number and size of C++ symbols, this technique can produce runtime relocation reductions that far exceed those that would be found in similar C objects. In addition, as the compiler knows what implementation symbols must remain global within the final object, these symbols are given the appropriate visibility attribute to insure their correct use.

Presently there are two recommendations for establishing an objects interface. The first is to define all interface symbols using the __global directive, and reduce all other symbols to local using the -xldscope=hidden compiler option. This model provides the most flexibility. All global symbols are interposable, and allow for any copy relocations1 to be processed correctly.

The second model is to define all interface symbols using the __symbolic directive, and again reduce all other symbols to local using the -xldscope=hidden compiler option. Symbolic symbols (also termed protected), are globally visible, but have been internally bound to. This means that these symbols do not require symbolic runtime relocation, but can not be interposed upon, or have copy relocations against them.

In practice, I'd expect to see significant savings in the runtime relocation of any modules that used either model. However, the savings between using the __global or the __symbolic model may be harder to measure. In a nutshell, if you do not want a user to interpose upon your interfaces, and don't export data items, you can probably go with __symbolic. If in doubt, stick with the more flexible use of __global.

The following examples uses C++ code that was furnished to me as being representative of what users may develop.

    % cat interface.h
    class item {
    protected:
        item();
    public:
        virtual void method1() = 0;
        virtual void method2() = 0;
        virtual ~item();
    };

    extern item \*make_item();

    % cat implementation.cc
    #include "interface.h"

    class __global item; /\* Ensures global linkage for any
                            implicitly generated members. \*/

    item::item() { }
    item::~item() { }

    class item_impl : public item {
        void method1();
        void method2();
    };

    void item_impl::method1() { }
    void item_impl::method2() { }

    void helper_func() { }

    __global item \*make_item() {
        helper_func();
        return new item_impl;
    }

All interface symbols have employed the __global attribute. Compiling this module with -xldscope=hidden reveals the following symbol table entries.

    % elfdump -CsN.symtab implementation.so.1
    ...
    [31]  .... FUNC LOCL  H  0 .text  void helper_func()
    [32]  .... OBJT LOCL  H  0 .data  item_impl::__vtbl
    [35]  .... FUNC LOCL  H  0 .text  void item_impl::method1()
    [36]  .... FUNC LOCL  H  0 .text  void item_impl::method2()
    [54]  .... OBJT GLOB  D  0 .data  item::__vtbl
    [55]  .... FUNC GLOB  D  0 .text  item::item()
    [58]  .... FUNC GLOB  D  0 .text  item::~item #Nvariant 1()
    [59]  .... FUNC GLOB  D  0 .text  item\*make_item()
    [61]  .... OBJT GLOB  D  0 .data  _edata
    [67]  .... FUNC GLOB  D  0 .text  item::~item()
    [77]  .... FUNC GLOB  D  0 .text  item::item #Nvariant 1()

Notice that the first 4 local (LOCL) symbols would normally have been defined as global without using the symbol definitions and compiler option. This is a simple example, as implementations get more complex, expect to see a larger fraction of symbols demoted to locals.

For a definition of other related compiler options, at least how they relate to C++, see Linker Scoping2.


1 Copy relocations are a technique employed to allow references from non-pic code to external data items, while maintaining the read-only permission of a typical text segment. This relocations use, and overhead, can be avoided by designing shared objects that do not export data interfaces.

2 It's rumored the compiler folks are also working on __declspec and GCC __attribute__ clause implementations. These should aid porting code and interface definitions from other platforms.


A Update - Sunday May 29, 2005

Giri Mandalika has posted a very detailed article on this topic, including the __declspec implementation.

Monday Dec 06, 2004

Static Linking - where did it go?

With Solaris 10 you can no longer build a static executable. It's not that ld(1) doesn't allow static linking, or using archives, it's just that libc.a, the archive version of libc.so.1, is no longer provided. This library provides the interfaces between user land and the kernel, and without this library it is rather hard to create any form of application.

We've been warning users against static linking for some time now, and linking against libc.a has been especially problematic. Every solaris release, or update (even some patches) has resulted in some application that was built against libc.a, failing. The problem is that libc is supposed to isolate an application from the user/kernel boundary, a boundary which can undergo changes from release to release.

If an application is built against libc.a, then any kernel interface it references is extracted from the archive and becomes a part of the application. Thus, this application can only run on a kernel that is in-sync with the kernel interfaces used. Should these interfaces change, the application is treading on shaky ground.

In addition, implementation details of libc, such as localization and the name service switch, require dynamic linking (they dlopen() other objects). As we antispated libc.a to be used in a static environment, any dynamic capabilities were #ifdef'ed out of the code. libc.a has always been a subset of libc.so.1.

One common failing we have discovered is that many folks built against libc.a but otherwise used dynamic objects to supply other interfaces. These applications, termed partially static, are particularly fragile. The dynamic objects these partially static applications invoke, commonly depend on interfaces contained in libc.so.1. However, at runtime these dynamic objects are bound to the portions of libc that have been statically linked with the executable, leaving any other references to be bound to the dynamic libc.so.1. This combination of inconsistent interfaces can lead to chaos and ruin.

Because of this potential for distroying any application binary interface guarantee, the 64-bit version of Solaris never delivered any 64-bit system archives libraries. We figured we nip that problem in the bud right away.

But why did we wait until Solaris 10 to stop delivering 32-bit system archives? It turns out the merge of libc and libthread put the last nail in the coffin.

In earlier releases of Solaris, threaded applications needed to build against libthread. This library offered the true thread interfaces that allow you to create and manage threads. However, to allow library developers to create libraries that were thread aware (ie. that could be used in a threading environment and a non-threading environment), libc also provided all the thread interfaces. These interfaces were basically no-op stubs. The gyrations we had to go through to make these two libraries cooperate was a story in itself.

Now, if you built an application against libc.a and happened to reference the thread interfaces, you ended up with the no-op stubs becoming part of the application. This turned out to be a major problem when the application moved to another release and became bound to new dynamic objects.

In Solaris 10, libthread and libc have been merged. Effectively all applications are thread capable, but until any new threads are created, the application remains single threaded. This model simplifies application and especially library development because there are no longer three process models to contend with (threaded, non-threaded, and statically linked). There is now only one model, one that is thread-capable. Libraries can be assured they always operate in a thread-capable application environment. These libraries can take advantage of threading interfaces and features like thread-local storage that cannot be provided in a non-thread-capable environment.

The merger of libthread and libc makes any partially static application doomed to failure, even if a stripped-down, crippled archive version of libc.a were made available.

Therefore, to put to rest the consistent failure of partially static applications from release to release, we've now made it impossible to make such applications.

Some folks thought of static applications as being a means of insulating themselves from system changes. But as I explained above, they were not insulating themselves from user/kernel interface changes.

Note that there is some flexibility even in a dynamic linking environment. Although applications are built to encode the interpretor /usr/lib/ld.so.1, applications can be built to specify an alternative interpretor:

    % cp  /usr/lib/ld.so.1  /local/safe/lib
    % cp  /usr/lib/libc.so.1  /local/safe/lib
    ...
    % LD_OPTIONS=-I/local/safe/lib/ld.so.1 \\
        cc -o main main.c -R/local/safe/lib ...

Or, you can invoke an alternative interpretor directly:

    % /local/safe/lib/ld.so.1  main ...

But, even if you create an environment with an alternate runtime linker and dynamic libc, it must be in-sync with the kernel on which it will execute. Otherwise, this technique is no more robust than using an archive version of libc.a.

See also this explanation.

Friday Oct 22, 2004

Shared Object Filters

Filters are a class of shared objects that are available with Solaris. These objects allow for the redirection of symbol bindings at runtime. They have been employed to provide standards interfaces and to allow the selection of optimized function implementations at runtime. For those of you unfamiliar with filters, here's an introduction, plus a glimpse of some new techniques available with Solaris 10.

Filters can exist in two forms, standard filters and auxiliary filters. At runtime, these objects redirect bindings from themselves to alternative shared objects, that are known as filtees. Shared objects are identified as filters by using the link-editors -F and -f options.

Standard filters provide no implementation for the interfaces they define. Essentially they provide a symbol table. When used to build a dynamic object they satisfy the symbol resolution requirements of the link-editing process. However at runtime, standard filters redirect a symbol binding from themselves to a filtee.

Standard filters have been used to implement libdl.so.1, the library that offers the dynamic linking interfaces. This library has no implementation details, it simply points to the real implementation within the runtime linker:

    % elfdump -d /usr/lib/libdl.so.1

    Dynamic Section:  .dynamic
      index  tag       value
      [0]  SONAME      0x138  libdl.so.1
      [1]  FILTER      0x143  /usr/lib/ld.so.1
      ...

Auxiliary filters work in much the same way, however if a filtee implementation can't be found, the symbol binding is satisfied by the filter itself. Basically, auxiliary filters allow an interface to find a better alternative. If an alternative is found it will be used, and if not, the generic implementation provided by the filter provides a fallback.

Auxiliary filters have been used to provide platform specific implementations. Typically, these are implementations that provide a performance improvement on various platforms. libc.so.1 has used this technique to provide optimized versions of the memcpy() family of routines:

    % elfdump -d /usr/lib/libc.so.1

    Dynamic Section:  .dynamic
      index  tag        value
      ...
      [3]  SONAME      0x6280  libc.so.1
      [4]  AUXILIARY   0x628a  /usr/platform/$PLATFORM/lib/libc_ psr.so.1
      ...

You can observe that a symbol binding has been satisfied by a filtee by using the runtime linkers tracing. The following output shows the symbol memset being searched for in the application date, the dependency libc.so.1, and then in libc's filtee, libc_psr.so.1.

    % LD_DEBUG=symbols,bindings  date
    .....
    11055: symbol=memset;  lookup in file=/usr/bin/date
    11055: symbol=memset;  lookup in file=/usr/lib/libc.so.1
    11055: symbol=memset;  lookup in file=/usr/platform/SUNW,Sun-Fire/lib/libc_psr.so.1
    11055: binding file=/usr/lib/libc.so.1 to \\
        file=/usr/platform/SUNW,Sun-Fire/lib/libc_psr.so.1: symbol `memset'
    .....

Until now, a filter has been an identification applied to a whole shared object. With Solaris 10, per-symbol filtering has been introduced. This allows individual symbol table entries to identify themselves as standard, or auxiliary filters. These filters provide greater flexibility, together with less runtime overhead than whole object filters. Individual symbols are identified as filters using mapfile entries at the time an object is built.

For example, libc.so.1 now provides a number of per-symbol filters. Each filter is defined using a mapfile entry:

   % cat mapfile
   SUNW_1.22 {
       global:
           ....
           dlopen = FUNCTION FILTER /usr/lib/ld.so.1;
           dlsym = FUNCTION FILTER /usr/lib/ld.so.1;
           ....
   };
   ....
   SUNW_0.7 {
      global:
          ....
          memcmp = AUXILIARY /platform/$PLATFORM/lib/libc_psr.so.1;
          memcpy = AUXILIARY /platform/$PLATFORM/lib/libc_psr.so.1;
          ....
   };

The above definitions provide a couple of advantages. First, you no longer need to link against libdl to obtain the dynamic linking family of routines. Second, the overhead of searching for a filtee will only occur if a search for the associated symbol is requested. With whole object filters, any reference to a symbol within the filter would trigger a filtee lookup, plus every interface offered by the filter would be searched for within the filtee.

Per-symbol filters have also proved useful in consolidating existing interfaces. For example, for historic standards compliance, libc.so.1 has offered a small number of math routines. However the full family of math routines are provided in libm.so.2, the library that most math users link with. Providing the small family of duplicate math routines in libc was a maintenance burden, plus there was always the chance of them getting out of sync. With per-symbol filtering, the libc interfaces are maintained, while pointing at the one true implementation. elfdump(1) can be used to reveal the filter symbols (F) offered by a shared object:

   % elfdump -y /lib/libc.so.1

   Syminfo Section:  .SUNW_syminfo
     index  flgs         bound to           symbol
      ....
      [93]  F        [1] libm.so.2          isnand
      [95]  F        [1] libm.so.2          isnanf
      ....

The definition of a filtee has often employed runtime tokens such as $PLATFORM. These tokens are expanded to provide pathnames specific to the environment in which the filter is found. A new capability provided with Solaris 10 is the $HWCAP token. This token is used to identify a directory in which one or more hardware capability libraries can be found.

Shared objects can be built to record their hardware capabilities requirements. Filtees can be constructed that use various hardware capabilities as a means of optimizing their performance. These filtees can be collected in a single directory. The $HWCAP token can then be employed by the filter to provide the selection of the optimal filtee at runtime:

   % elfdump -H /opt/ISV/lib/hwcap/\*

   /opt/ISV/lib/hwcap/libfoo_hwcap1.so.1:

   Hardware/Software Capabilities Section:  .SUNW_cap
     index  tag               value
       [0]  CA_SUNW_HW_1     0x869  [ SSE  MMX  CMOV  SEP  FPU ]

   /opt/ISV/lib/hwcap/libfoo_hwcap2.so.1:

   Hardware/Software Capabilities Section:  .SUNW_cap
     index  tag               value
      [0]  CA_SUNW_HW_1     0x1871  [ SSE2  SSE  MMX  CMOV  AMD_SYSC  FPU ]

   % elfdump -d /opt/ISV/lib/libfoo.so.1

    Dynamic Section:  .dynamic
      index  tag       value
      ...
      [3]  SONAME      0x138  libfoo.so.1
      [4]  AUXILIARY   0x4124 /opt/ISV/lib/hwcap/$HWCAP
      ...

The hardware capabilities of a platform are conveyed to the runtime linker from the kernel. The runtime linker then matches these capabilities against the requirements of each filtee. The filtees are then sorted in descending order of their hardware capability values. These sorted filtees are used to resolve symbols that are defined within the filter.

This model of file identification provides greater flexibility than the existing use of $PLATFORM, and is well suited to filtering use.

As usual, examples and all the gory details of filters and the new techniques outlined above can be found in the Solaris 10 Linker and Libraries guide.

Discussions on how to alter any hardware capabilities established by the compilers can be found hear, and here.

Wednesday Sep 29, 2004

Tracing a link-edit

Since Solaris 2.0, the link-editors have provided a mechanism for tracing what they're doing. As this mechanism has been around for so long, plus I've used some small examples in previous postings, I figured most folks knew of its existence. I was reminded the other day that this isn't the case. For those of you unfamiliar with this tracing, here's an introduction, plus a glimpse of a new analysis tool available with Solaris 10.

You can set the environment variable LD_DEBUG to one or more pre-defined tokens. This setting causes the runtime linker, ld.so.1(1), to display information regarding the processing of any application that inherits this environment variable. The special token help provides a list of token capabilities without executing any application.

One of the most common tracing selections reveals the binding of a symbol reference to a symbol definition.

    % LD_DEBUG=bindings  main
    .....
    00966: binding file=main to file=/lib/libc.so.1 symbol `_iob'
    .....
    00966: binding file=/lib/libc.so.1 to file=main: symbol `_end'
    .....
    00966: 1: transferring control: main
    .....
    00966: 1: binding file=main to file=/lib/libc.so.1: symbol `atexit'
    .....
    00966: 1: binding file=main to file=/lib/libc.so.1: symbol `exit'

Those bindings that occur before transferring to main are the immediate (data) bindings. These bindings must be completed before any user code is executed. Those bindings that occur after the transfer to main, are established when the associated function is first called. These are lazy bindin gs.

Another common tracing selection reveals what files are loaded.

    % LD_DEBUG=files  main
    .....
    16763: file=libc.so.1;  needed by main
    16763: file=/lib/libc.so.1  [ ELF ]; generating link map
    .....
    16763: 1: transferring control: ./main
    .....
    16763: 1: file=/lib/libc.so.1;  \\
        filter for /platform/$PLATFORM/lib/libc_psr.so.1
    16763: 1: file=/platform/SUNW,Sun-Blade-1000/lib/libc_psr.so.1;  \\
        filtered by /lib/libc.so.1
    16763: 1: file=/platform/SUNW,Sun-Blade-1000/lib/libc_psr.so.1  [ ELF ]; \\
        generating link map
    .....
    16763: 1: file=libelf.so.1;  dlopen() called from file=./main \\
        [ RTLD_LAZY  RTLD_LOCAL  RTLD_GROUP  RTLD_WORLD ]
    16763: 1: file=/lib/libelf.so.1  [ ELF ]; generating link map

This reveals initial dependencies that are loaded prior to transferring control to main. It also reveals objects that are loaded during process execution, such as filters and dlopen(3c) requests.

Note, the environment variable LD_DEBUG_OUTPUT can be used to specify a file name to which diagnostics are written (the file name gets appended with the pid). This is helpful to prevent the tracing information from interfering with normal program output, or for collecting large amounts of data for later processing.

In a previous posting I described how you could discover unused, or unreferenced dependencies. You can also discover these dependencies at runtime.

    % LD_DEBUG=unused  main
    .....
    11143: 1: file=libWWW.so.1  unused: does not satisfy any references
    11143: 1: file=libXXX.so.1  unused: does not satisfy any references
    .....
    11143: 1: transferring control: ./main
    .....

Unused objects are determined prior to calling main and after any objects are loaded during process execution. The two libraries above aren't referenced before main, and thus make ideal lazy-loading candidates (that's if they are used at all).

Lastly, there are our old friends .init sections. Executing these sections in an attempt to fulfill the expectations of modern languages (I'm being polite here), and expected programming techniques, has been shall we say, challenging. .init tracing is produced no matter what debugging token you chose.

    % LD_DEBUG=basic  main
    .....
    34561: 1: calling .init (from sorted order): libYYY.so.1
    34561: 1: calling .init (done): libYYY.so.1
    .....
    34561: 1: calling .init (from sorted order): libZZZ.so.1
    .....
    34561: 1: calling .init (dynamically triggered): libAAA.so.1
    34561: 1: calling .init (done): libAAA.so.1
    .....
    34561: 1: calling .init (done): libZZZ.so.1

Note that in this example, the topologically sorted order established to fire .init's has been interrupted. We dynamically fire the .init of libAAA.so.1 that has been bound to while running the .init of libZZZ.so.1. Try to avoid this. I've seen bindings cycle back into dependencies whose .init hasn't completed.

The debugging library that provides these tracing diagnostics is also available to the link-editor, ld(1). This debugging library provides a common diagnostic format for tracing both linkers. Use the link-editors -D option to obtain tracing info. As most compilers have already laid claim to this option, the LD_OPTIONS environment variable provides a convenient setting. For example, to see all the gory details of the symbol resolution undertaken to build an application, try:

    % LD_OPTIONS=symbols,detail cc -o main $(OBJS) $(LIBS) ...

and stand back ... the output can be substantial.

Although tracing a process at runtime can provide useful information to help diagnose process bindings, the output can be substantial. Plus, it only tells you what bindings have occurred. This information lacks the full symbolic interface data of each object involved, which in turn can hide what you think should be occurring. In Solaris 10, we added a new utility, lari(1), which provides the Link Analysis of Runtime Interfaces.

This perl(1) script analyzes a debugging trace, together with the symbol tables of each object involved in a process. lari(1) tries to discover any interesting symbol relationships. Interesting, typically means that a symbol name exists in more than one dynamic object, and interposition is at play. Interposition can be your friend, or your enemy - lari(1) doesn't know which. But historically, a number of application failures or irregularities have boiled down to some unexpected interposition which at the time was hard to track down.

For example, a typical interposition might show up as:

   % lari  main
   [2:3]: foo(): /opt/ISV.I/lib/libfoo.so.1
   [2:0]: foo(): /opt/ISV.II/lib/libbar.so.1
   [2:4]: bar[0x80]: /opt/ISV.I/lib/libfoo.so.1
   [2:0]: bar[0x100]: /opt/ISV.II/lib/libbar.so.1

Here, two versions of function foo(), and two version of the data item bar[] exist. With interposition, all bindings have resolved to the first library loaded. Hopefully the 3 callers of foo() expect the signature and functionality provided by ISV.I. But you have to wonder, do the 4 users of bar[] expect the array to be 0x80 or 0x100 in size?

lari(1) also uncovers direct bindings or symbols that are defined with protected visibility. These can result in multiple instances of a symbol being bound to from different callers:

   % lari  main
   [2:1D]: foo(): ./libA.so
   [2:1D]: foo(): ./libB.so

Again, perhaps this is what the user wants to achieve, perhaps not ... but it is interesting.

There are many more permutations of symbol diagnostic that can be produced by lari(1), including the identification of explicit interposition (such as preloading, or objects built with -z interpose), copy relocations, and dlsym(3c) requests. Plus, as lari(1) is effectively discovering the interfaces used by each object within a process, it can create versioning mapfiles that can be used as templates to rebuild each object.

Monday Aug 30, 2004

Relocations - careful with that debugging flag

I received an application from a customer the other day. It's quite a big sucker, consisting of the application and over 70 shared objects (that's besides the system objects that also get used).

    % size -x main \*.so
    main:       2df35c + 2675a4 + 80918f8 = 0x85d81f8
    libxxx.so: 64d4d9c + 9af9f6 + 19604ba = 0x87e4c4c
    libyyy.so: 4db7aeb + 76aa4c + 32cc16c = 0x87ee6a3
    libzzz.so: 3f347ce + d8ebb1 + 4642a3b = 0x9305dba
    ....

The customer has complained that it takes a long time to load this application. In particular, it takes a long time to verify their objects using ldd(1) and the -r option. See the Solaris 10 man pages section 1: User Commands

Using ldd(1) and the -d option, emulates the cost of starting a process. The relocation processing is exactly the same as applied by ld.so.1(1), at runtime. Using the -r option processes all relocations as would be applied by ld.so.1(1) if the environment variable LD_BIND_NOW were in effect. Using ldd(1) and the -r option is a convenient way of testing that all symbol references can be found for a particular object.

The set of shared objects supplied by the customer do not specify any of their dependencies. In fact, the application seems responsible for establishing all dependencies, not only those that the application references, but also those needed to satisfy all dependencies. If each shared object defined their dependencies, then ldd(1) could be used on each object to validate its symbol requirements. However, with this set of objects, the only means of validating symbol requirements is to run ldd(1) against the application, and this is taking a long time.

A quick poke around with elfdump(1), reveals that there are over 3.3 million relocations to process for this application and dependencies. Some profiling (DTrace is your friend here), revealed that relocation processing is taking nearly 99% of the startup cost. Locating and mapping all the objects is trivial.

Looking a little deeper I found that around 2.3 million relocations are RELATIVE relocations. These are relocations that simply need the base offset of the object to be added to the relocation offset. This is a simple operation, involving no symbol lookup, and is only accounting for a few percent of the cost.

The rest of the startup cost stems from the symbolic relocations, of which there are some 740,000 that needed processing with ldd(1) and the -d option. Poking some more revealed that 680,000 of these symbols are of the form $XBGEQEsZEHHBGQS.... (it's the $X prefix that's the give away). These are local symbols that have been made unique and promoted to global symbols by the compilers when the -g (debugging) option is used. By all accounts, they allow for the debuggers to provide fix-and-continue processing, or, if you're compilers have this capability, inter-object optimization.

I don't have the source for this set of objects to experiment with not using -g. But I'm left concluding that the bulk of the startup cost of this process is due to these $X.... symbols.

If you don't want fix-and-continue, be careful how you use the compiler flags. This overhead in relocation processing probably isn't what you want in production software.

Note, you can also build objects with the -zcombreloc flag of ld(1). This option combines relocation sections into one table. The RELATIVE relocations are all concatenated, and represented by dynamic entries that allow the runtime linker to process them through an optimized loop, that is even faster than normal relative relocation processing.

Sunday Aug 22, 2004

Dynamic Object Versioning

For some time now, we've been versioning core system libraries. You can display version definitions, and version requirements with pvs(1). For example, the latest version of libelf.so.1 from Solaris 10, provides the following versions:

    % pvs -d /lib/libelf.so.1
        libelf.so.1;
        SUNW_1.5;
        SUNW_1.4;
        ....
        SUNWprivate_1.1;

So, what do these versions provide? Shared object versioning has often been established with various conventions of renaming the file itself with different major or minor (or micro) version numbers. However, as applications have become more complex, specifically because they are constructed from objects that are asynchronously delivered from external partners, this file naming convention can be problematic.

In developing the core Solaris libraries, we've been rather obsessed with compatibility, and rather than expect customers to rebuild against different shared object file names (i.e., libfoo.so.1, and later libfoo.so.2), we've maintained compatibility by defining fixed interface sets within the same library file. And, the only changes we've made to the library is to add new interface sets. These interface sets are described by version names.

Now you could maintain compatibility by retaining all existing public interfaces, and only adding new interfaces, without the versioning scheme. However, the version scheme has a couple of advantages:

  • consumers of the interface sets record their requirements on the version name they reference.

  • establishing interface sets removes unnecessary interfaces from the name-space.

  • the version sets provide a convenient means of policing interface evolution.

When a consumer references a versioned shared object, the version name representing the interfaces the consumer references are recorded. For example, an application that references the elf_getshnum(3elf) interface from libelf.so.1, will record a dependency on the SUNW_1.4 version:

    % cc -o main main.c -lelf
    % pvs -r main
        libelf.so.1 (SUNW_1.4);

This version name requirement is verified at runtime. Therefore, should this application be executed in an environment consisting of an older libelf.so.1, one that perhaps only offers version names up to SUNW_1.3, then a fatal error will result when libelf.so.1 is processed:

    % pvs -dn /lib/libelf.so.1
        SUNW_1.3;
        SUNWprivate_1.1;
    % main
    ld.so.1: ./main: fatal: libelf.so.1: version `SUNW_1.4' not found \\
        (required by file ./main)

This verification might seem simplistic, and won't the application be terminated anyway if a required interface can't be located? Well yes, but function binding normally occurs at the time the function is first called. And this call can be some time after an application is started (think scientific applications that can run for days or weeks). It is far better to be informed that an interface can't be located when a library is first loaded, that to be killed some time later when a specific interface can't be found.

Defining a version typically results in the demotion of many other global symbols to local scope. This localization can prevent unintended symbol collisions. For example, most shared objects are built from many relocatable objects, each referencing one another. The interface that the developer wishes to export from the shared object is normally a subset of the number of global symbols that would normally remain visible.

Version definitions can be defined using a mapfile. For example, the following mapfile defines a version containing two interfaces. Any other global symbols that would normally be made available by the objects that contribute to the shared object are demoted, and hence hidden as locals:

    % cat mapfile
    ISV_1.1 {
        global:
            foo1();
            foo2();
        local:
            \*;
    };
    % cc -o libfoo.so.1 -G -Kpic -Mmapfile foo.c bar.c ...
    % pvs -dos libfoo.so.1
    libfoo.so.1 -       ISV_1.1: foo1;
    libfoo.so.1 -       ISV_1.1: foo2;

The demotion of unnecessary global symbols to locals greatly reduces the relocation requirements of the object at runtime, and can significantly reduce the runtime startup cost of loading the object.

Of course, interface compatibility requires a disciplined approach to maintaining interfaces. In the previous example, should the signature of foo1() be changed, or foo2() be deleted, then the use of a version name is meaningless. Any application that had built against the original interfaces, will fail at runtime when the new library is delivered, even though the version name verification will have been satisfied.

With the core Solaris libraries we maintain compatibility as we evolve through new releases by maintaining existing public interfaces and only adding new version sets. Auditing of the version sets help catch any mistaken interface deletions or additions. Yeah, we fall foul of cut-and-paste errors too :-)

For more information on versioning refer to the Versioning Quick Reference. Or for a detailed description refer to Application Binary Interfaces and Versioning.

Sunday Aug 01, 2004

Lazy Loading - there's even a fall back

In my previous posting, I described the use of lazy loading. Of course, when we initially played with an implementation of this technology, a couple of applications immediately fell over. It turns out that a fall back was necessary.

Let's say an application developer creates an application with two dependencies. The developer wishes to employ lazy loading for both dependencies.

    % ldd main
        foo.so =>        ./foo.so
        bar.so =>        ./bar.so
        ...

The application developer has no control over the dependency bar.so, as this dependency is provided by an outside party. In addition, this shared object has its own dependency on foo.so, however it does not express the required dependency information. If we were to inspect this dependency, we would see that it is not ldd(1) clean.

    % ldd -r bar.so
        symbol not found: foo     (./bar.so)

The only reason this library has been successfully employed by any application is because the application, or some other shared object within the process, has made the dependency foo.so available. This is probably more by accident than design, but sadly it is an all to common occurrence.

Now, suppose the application main makes reference to a symbol that causes the lazy loading of bar.so before the application makes reference to a symbol that would cause the lazy loading of foo.so to occur.

    % LD_DEBUG=bindings,symbols,files main  
    .....
    07683: 1: transferring control: ./main
    .....
    07683: 1: file=bar.so;  lazy loading from file=./main: symbol=bar
    .....
    07683: 1: binding file=./main to file=./bar.so: symbol `bar'

When control is passed to bar(), the reference it makes to its implicit dependency foo() is not going to be found, because the shared object foo.so is not yet available. Because this scenario is so common, the runtime linker provides a fall back. If a symbol can not be found, and lazy loadable dependencies are still pending, the runtime linker will process these pending dependencies in a final attempt to locate the symbol. This can be observed from the remaining debugging output.

    07683: 1: symbol=foo;  lookup in file=./main  [ ELF ]
    07683: 1: symbol=foo;  lookup in file=./bar.so  [ ELF ]
    07683: 1: 
    07683: 1: rescanning for lazy dependencies for symbol: foo
    07683: 1: 
    07683: 1: file=foo.so;  lazy loading from file=./main: symbol=foo
    .....
    07683: 1: binding file=./bar.so to file=./foo.so: symbol `foo'

Of course, there can be a down-side to this fall back. If main were to have many lazy loadable dependencies, each will be processed until foo() is found. Thus, several dependencies may get loaded that aren't necessary. The use of lazy loading is never going to be more expensive than non-lazy loading, but if this fall back mechanism has to kick in to find implicit dependencies, the advantage of lazy loading is going to be compromised.

To prevent lazy loading from being compromised, always record those dependencies you need (and nothing else).

Tuesday Jul 27, 2004

Dependencies - perhaps they can be lazily loaded

In a previous posting, I stated that you should only record those dependencies you need, and nothing else. There's another step you can take to reduce start-up processing overhead.

Dynamic objects need to resolve symbolic references from each other. Function calls are typically implemented through an indirection that allows the function binding to be deferred until the function call is first made. See When Relocations Are Performed. Because of this deferral, it is also possible to cause the defining dependency to be loaded when the function call is first made. This model is referred to as Lazy Loading.

To establish lazy loading, you must pass the -z lazyload option to ld(1) when you build your dynamic object. In addition, the association of a symbol reference to a dependency requires that the dependency is specified as part of the link-edit. It is recommended that you use the link-editors -z defs option to insure that all dependencies are specified when you build your dynamic object. The following example establishes lazy dependencies for the references foo() and bar().

    % cat wally.c
    extern void foo(), bar();

    void wally(int who)
    {
        who ? foo() : bar();
    }
    % cc -o wally.so wally.c -G -Kpic -zdefs -zlazyload -R'$ORIGIN' foo.so bar.so

The lazy loading attribute of these dependencies can be displayed with elfdump(1).

   % elfdump -d wally.so | egrep "NEEDED|POSFLAG"
        [0]  POSFLAG_1        0x1               [ LAZY ]
        [1]  NEEDED           0x66              foo.so
        [2]  POSFLAG_1        0x1               [ LAZY ]
        [3]  NEEDED           0x6d              bar.so

By default, ldd(1) displays all dependencies, in that it will force lazy loaded objects to be processed. To reveal lazy loading, use the -L option. For example, when a dynamic object is loaded into memory, all data relocations are performed before the object can gain control. Thus the following operation reveals that neither dependency is loaded.

    % ldd -Ld wally.so
    %

Once function relocations are processed, both dependencies are loaded to resolve the function reference.

    % ldd -Lr wally.so
        foo.so =>       ./foo.so
        bar.so =>       ./bar.so

ldd(1) becomes a convenient tool for discovering whether lazy loading might be applicable. Suppose we rebuilt wally.so without the -z lazyload option. And recall from my previous posting that the -u option can be used to discover unused dependencies.

    % cc -o wally.so wally.c -G -Kpic -zdefs -R'$ORIGIN' foo.so bar.so
    % ldd -Ldu wally.so
        foo.so =>        ./foo.so
        bar.so =>        ./bar.so

      unused object=./foo.so
      unused object=./bar.so

This has revealed that loading wally.so and relocating it as would occur at process startup, did not require the dependencies foo.so or bar.so to be loaded. This confirms that these two dependencies can be lazily loaded when reference to them is first made.

Lazy loading can be observed at runtime using the runtime linkers debugging capabilities (LD_DEBUG=files). For example, if wally() was called with a zero argument, we'd see bar.so lazily loaded.

   % LD_DEBUG=files main
   .....
   25670: 1: transferring control: ./main
   .....
   25608: 1: file=bar.so;  lazy loading from file=./wally.so: symbol=bar
   .....

Note, not only on does lazy loading have the potential of reducing the cost of start-up processing, but if lazy loading references are never called, the dependencies will never be loaded as part of the process.

Thursday Jul 22, 2004

Linker Alien Spotting

Excellent, Mike has decided to join the party.

Thursday Jul 15, 2004

Dependencies - define what you need, and nothing else

I recently attended Usenix, where Bryan explained how DTrace had been used to uncover some excessive system load brought on by the behavior of one application. A member of the audience asked whether the application was uncovering a poorly implemented part of the system. Bryan responded that in such cases the system will always be analyzed to determine whether it could do better. But there comes a point, that if an application requests an expensive service, that's what it will get. Perhaps the application should be reexamined to see if it needs the service in the first place?

This observation is very applicable to the runtime linking environment. Over the years we've spent countless hours pruning the cost of ld.so.1(1), only to see little improvement materialize with real applications. Alternatively, there's no better way of reducing the overhead of servicing a particular operation, than not requesting the operation in the first place :-)

Think about the work the runtime linker has to do to load an object. It has to find the object (sometimes through a plethora of LD_LIBRARY_PATH components), load the object, and relocate it. The runtime linker then repeats this process for any dependencies of the loaded object. That's a lot of work. So why do so many applications load dependencies they don't need?

Perhaps it's sloppyness, too much Makefile cut-and-pasting, or the inheritance of global build flags. Or, perhaps the developer doesn't realize a dependency isn't required. One way to discover dependency requirements is with ldd(1) and the -u option. For example, this application, nor any of its dependencies, make reference to libmd5.so.1:

    % ldd -u -r app
    ...
    unused object=/lib/libmd5.so.1

Note the use of the -r option. We want to force ldd(1) to bind all relocations, data and functions. However, here we're wastfully loading libmd5.so.1. This should be removed as a dependency.

The -u option uncovers totally unused objects, but there can still be wasteful references. For example, the same application reveals that a number of objects have wasteful dependencies:

    % ldd -U -r app
    ...
    unreferenced object=/usr/openwin/lib/libX11.so.4; unused dependency of app
    unreferenced object=/usr/openwin/lib/libXt.so.4; unused dependency of app
    unreferenced object=/lib/libw.so.1; unused dependency of app

Although the X libraries are used by some dependency within the process, they're not referenced by the application. There are data structures maintained by the runtime linker that track dependencies. If a dependency isn't required, we've wasted time creating these data structures. Also, should the object that requires the X libraries be redelivered in a form that no longer needs the X libraries, the application is still going to cause them to be wastefully loaded.

To reduce system overhead, only record those dependencies you need, and nothing else. As part of building the core OS, we run scripts that perform actions such as ldd(1) -U in an attempt to prevent unnecessary dependency loading from creeping in.

Note, you can also observe unused object processing using the runtime linkers debugging capabilities (LD_DEBUG=unused). Or, you can uncover unused objects during a link-edit using the same debugging technique (LD_OPTIONS=-Dunused). Another way of pruning unwanted dependencies is to use the -z ignore option of ld(1) when building your application or shared object.

Saturday Jul 10, 2004

LD_LIBRARY_PATH - just say no

A recent email discussion reminded me of how fragile, and prevalent, LD_LIBRARY_PATH use it. Within a development environment, this variable is very useful. I use it all the time to experiment with new libraries. But within a production environment, use of this environment variable can be problematic. See Directories Searched by the Runtime Linker for an overview of LD_LIBRARY_PATH use at runtime.

People use this environment variable to establish search paths for applications whose dependencies do not reside in constant locations. Sometimes wrapper scripts are employed to set this variable, other times users maintain an LD_LIBRARY_PATH within their .profile. This latter model can often get out of hand - try running:

    % ldd -s /usr/bin/date
    ...
    find object=libc.so.1; required by /usr/bin/date
	search path=/opt/ISV/lib	 (LD_LIBRARY_PATH)

If you have a large number of LD_LIBRARY_PATH components specified, you'll see libc.so.1 being wastefully searched for, until it is finally found in /usr/lib. Excessive LD_LIBRARY_PATH components don't help application startup performance.

Wrapper scripts attempt to compensate for inherited LD_LIBRARY_PATH use. For example, a version of acroread reveals:

    LD_LIBRARY_PATH="`prepend "$ACRO_INSTALL_DIR/$ACRO_CONFIG/lib:\\
	$ACRO_INSTALL_DIR/$ACRO_CONFIG/lib" "$LD_LIBRARY_PATH"`

The script is prepending its LD_LIBRARY_PATH requirement to any inherited definition. Although this provides the necessary environment for acroread to execute, we're still wasting time looking for any system libraries in the acroread sub-directories.

When 64-bit binaries came along, we had a bit of a dilemma with how to interpret LD_LIBRARY_PATH. But, because of its popularity, it was decided to leave it applicable to both class of binaries (64 and 32-bit), even though its unusual for a directory to contain both 64 and 32-bit dependencies. We also added LD_LIBRARY_PATH_64 and LD_LIBRARY_PATH_32 as a means of specifying search paths that are specific to a class of objects. These class specific environment variables are used instead of any generic LD_LIBRARY_PATH setting.

Which leads me back to the recent email discussion. Seems a customer was setting both the _64 and _32 variables as part of their startup script, because both 64 and 32 bit processes could be spawned. However, one spawned process was acroread. Its LD_LIBRARY_PATH setting was being overridden by the _32 variable, and hence it failed to execute. Sigh.

Is there a solution to this mess? I guess we could keep bashing LD_LIBRARY_PATH into submission some way, but why not get rid of the LD_LIBRARY_PATH requirement altogether? This can be done. Applications and dependencies can be built to include a runpath using ld(1), and the -R option. This path is used to search for the dependencies of the object in which the runpath is recorded. If the dependencies are not in a constant location, use the $ORIGIN token as part of the pathname.

Is there a limitation to $ORIGIN use? Yes, as directed by the security folks, expansion of this token is not allowed for secure applications. But then again, for secure applications, LD_LIBRARY_PATH components are ignored for non-secure directories anyway. See Security.

For a flexible mechanism of finding dependencies, use a runpath that includes the $ORIGIN token, and try not to create secure applications :-)

Wednesday Jul 07, 2004

Hello there

So, blogs seem popular, and a couple of folks have suggested I start one, so here it is. I've been at Sun for 15 years, most of that time developing and maintaining the linker-editors, various related tools, documentation, and building lots of software.

The link-editors start with ld(1), which takes various input from the compilers, and typically spits out a dynamic executable or share object. Then there's ld.so.1(1), the runtime linker, which takes a dynamic executable and combines it with its dependencies as part of executing a process. The utilities that compliment these processes include, crle(1), elfdump(1), pvs(1), and a bunch of support libraries, auditors, etc.

I maintain all related manual pages, and the Linker and Libraries Guide. This is one of the few manuals written by the engineers that maintain the code. The DTrace crew have followed this example, and have produced their own excellent documentation.

For those looking for link-editor information, I suggest starting with the latest and greatest, which at this point is a version of the Solaris 10 Linker and Libraries Guide available off of http://docs.sun.com. A good starting point is Appendix A - Link-Editor Quick Reference, a cheat sheet of the various objects that are created by the link-editor. From this section you can vector off to all sort of gory details. If you're not runing Solaris 10 yet (which you could if you used Solaris Express), don't worry. Appendix D - Linker and Libraries Updates and New Features itemizes the major changes from release to release, so you can always determine what is, or isn't available for your release.

But you might be running something newer than you think. The link-editors are delivered as part of the Solaris core OS, however we're always providing new features that are required by other utilities, such as the compliers. And the compilers are delivered asynchronously with various OS releases. Consequently we're always providing patches. And our patches are a snapshot of some of the latest bits available at the time the patch was created. Effectively, we only have one source base for the link-editors. Changes are made in one place, and integrated into the latest patches. Thus a patch to Solaris 8 or Solaris 9, SPARC and Intel, will be the same, and comprise of a snapshot of what's been integrated into Solaris 10. Again, the best place to find documentation is the Solaris 10 Linker and Libraries Guide.

So, that completes this introduction. Hopefully I'll follow up with other postings, perhaps some clarification of existing practice, some new cheat-sheets, or other items that seem helpful. If you've got any comments, questions or advice for improvements, let us know. The door is always open, and we're always looking for ideas and feedback.

Oh yeah, you're supposed to get a little personal with this blog stuff aren't you? When I'm not working, I'm on a bike (road and mountain), or chasing my daughter around. And, having originated from the British Isles, a passion for real beer remains :-)

About

user12613883

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today