• Categories
  • Search
Sun | Wednesday, January 31, 2018

Weak Filters: Dealing With libc Refactoring Over The Years

By: Ali Bahrami | Principal Software Engineer
(Written: December 2015)

libc Refactoring Over The Years

libc has always been the core library in Unix, providing the system calls, and basic math and I/O abilities (e.g. printf). In the era before dynamic linking, it was an archive, libc.a. It's safe to say that some part of libc is in every program. When dynamic linking was introduced to Unix with SunOS 4.x, libc was delivered as a shared library. Years later, we eliminated the ability to create static executables in Solaris 10, by no longer delivering the libc archive (libc.a). You are not required to use any other shared objects on Solaris, but if you build a program on Solaris today that does anything useful, your program will be dynamic, and be linked to libc.so.1.

A central organizational problem for libc is deciding what should be in it, and what should be kept out. libc is the most desirable real estate in the system, so everyone wants to be in it. That's usually a mistake. For most libraries, this isn't a complicated question — you can generally tell what belongs and what doesn't with a superficial glance. libc is different, because it contains a lot of seemingly unrelated functionality. Does printf() really need to be in the same library as socket(), or math? A basic truism for creating shared objects is that libraries should be combined if they are never used separately, and kept apart otherwise. At first glance, there's a lot in libc that would seem to be separable.

As the core library for the entire system, libc is a bit of a singularity, and normal rules break down a bit under that pressure, but here are some approximate general rules we use to decide whether something belongs there.

  • Is it really general purpose, and likely to be used by lots of unrelated programs?
  • Is it callable from the C language?
  • History: Is it already in libc? Do other systems deliver it in libc, and if so, do programs expect to find it there?
  • Is it large?
  • Is it unproven, or a fad?
  • Removing APIs from libc is nearly impossible, as are changing the calling signature (return type, arguments). Is there is any question that these things could change, become abandoned, or become unmaintainable?

That's clearly not a full list, and I'm sure we could add numerous other rules and qualifications. The overriding point is that it involves making judgments, sometimes subjective, and that opinions can change, particularly over time.

When Sun and AT&T made the move from SunOS 4.x to 5.x, more commonly known as Solaris 2, the general feeling of the time was that libc had become uncomfortably bloated. Solaris 2 was seen as a rare opportunity to fix perceived mistakes of the past, and so, libc was duly broken up, with many pieces migrating to other system libraries (libsocket, libnsl, etc). At the time, this was considered beneficial, because applications that don't need a particular type of functionality (e.g. networking) wouldn't pay for loading it, and everything would be cleaner, smaller, and simpler.

It didn't play out that way:

  • Developers found it confusing, leading to the creation of a whole genre of Solaris 2.x questions such as "Where did XXX go?", and "Why doesn't Solaris 2.x support XXX when SunOS 4.x did?" I was a Sun ISV in those days, creating an interactive language used by scientists, and I certainly asked my share of these.
  • It was thought that this functionality was unrelated, and most applications would only use a small subset, but in practice it became common to see large numbers of these libraries being pulled in.
  • The interdependency of things like threading APIs with other functionality turned out to be larger than anticipated.
  • The inefficiencies of multiple mappings and runtime linker processing for large numbers of small libraries cancel out the minor memory savings.
  • libc is largely pure text, and so, a single copy of most of libc ends up being shared by all the processes in the system.

The authors of these changes, expecting thanks and praise, were surprised to find that the new organization was universally disliked. You might think that hard measurement would win out over popularity, but the math doesn't really bear that out, particularly if you consider it within the context of Moore's Law. I happen to still have a CD for SunOS 4.1.3U1, dating from 1994. For purposes of comparison, I pulled a copy of libc off that CD. This was nearly the final release of SunOS 4.x, and so, is likely to have one of the larger versions of libc in the 4.x series:

% file libc.so.1.9
libc.so.1.9:    Sun demand paged SPARC executable dynamically linked 
% ls -alF libc.so.1.9
-rwxr-xr-x   1 ali      emvision  516096 Jan 20  1994 libc.so.1.9* 

Let's compare that to the libc in use on a sparc system, running a prerelease version of Solaris 11 Update 4 in December 2015:

% file /lib/libc.so.1
/lib/libc.so.1: ELF 32-bit MSB dynamic lib SPARC32PLUS Version 1, V8+ Required, dynamically linked, not stripped, no debugging information available
% ls -alF /lib/libc.so.1
-rwxr-xr-x   1 root     bin      2556980 Dec 16 12:50 /lib/libc.so.1*

In 1994, I ran SunOS 4.1.3U1 on a Sparcstation IPX which boasted 16MB of memory. That was one of Sun's cheaper machines at the time, but I recall my company paying between $5000-10000 USD for that machine, a number that includes Sun's generous ISV discount. Today, I run Solaris 11 Update 4 on a PC. If my employer spent $1000 on this machine, I'd be rather surprised. My cheap PC has 16GB of memory, which is 1024 times larger than the IPX. At the same time, libc has grown 5 times larger. In real terms, libc has shrunk by an amazing amount over the last 21 years. In 1994, a fully loaded libc represented about 3% of physical memory. Today, it represents 0.15%. And, that's using a low end PC for the comparison, rather than a T7-4 with 2TB of memory.

libc could always be smaller, and perhaps it should be. Nonetheless, the conclusion should be obvious: libc "bloat" is nothing to lose sleep over. It was small in 1994, and now you can't even see it without a microscope.

Over the years, most of the libraries cleaved from libc have found their way back, and have been remerged, as have other core system components:

Solaris 2.6:   libintl, libw
Solaris 10:    libdl, libpthread, libshed, libthread
Solaris 11:    libaio, libdoor, librt, libsecdb
Solaris 11.4: libnsl, libsendfile, libsocket, libxnet

libc Pure Filters Libraries

Recombining libc has made life easier in many respects, but it created a new problem: What to do about existing programs that link against the libraries that got merged back into libc? These emptied out libraries still exist, in the form of ELF filter objects, for the benefit of existing objects that are linked to them. At runtime, these filter dependencies are loaded, but when other objects access symbols from them, the runtime linker transparently redirects them to the filtee object. In the case of the above system libraries, the filtee is libc. One can think of filters as ELF-aware symbolic links.

The benefit of these filters is clear: They have allowed the system to evolve, while preserving our backward compatibility guarantee. The downside is less obvious: Ideally, no one would link to these objects anymore, as they represent pure overhead. We would keep them around forever, for old objects, but new objects would not use them, and their use would fade away over time. This solution is effective for small tightly controlled code bases. A primary example of that is the core OS itself, where we've manually eradicated their use, and use build checks to prevent them from coming back. That approach works well in our small and disciplined world, but it scales poorly in the wider world of multi-platform open source, and customers with 25 year old makefiles. New objects continue to link against these redundant filters, and likely always will.

Outside of the core OS, we've been able to apply ld unused dependency processing (-zignore or -zdiscard-unused=dependencies) to good effect in eliminating unnecessary dependencies, but that strategy is ineffective against these standard filters. The standard rule enforced by ld is that an external symbol is bound to the first dependency that offers it. Furthermore, convention, often enforced by the compilers that add arguments to the link, holds that libc is usually the last library on the link line. Hence, scenarios like this are common:

% cat test.c
#include 
#include 

int
main(int argc, char **argv) 
{ 
        (void) printf("thr_self()=%d\n", thr_self());
        return (0);
}
% elfdump -d /lib/libthread.so.1 | grep FILTER
      [1]  FILTER       0xc8d    libc.so.1
% cc test.c -lthread -lc -zdiscard-unused=dependencies
% elfdump -d a.out | grep NEEDED
      [0]  NEEDED          0x12d      libthread.so.1
      [1]  NEEDED          0x145      libc.so.1

libthread is a standard filter on libc, so you might expect that it would be discarded by the above, but that's not what happens. Any libthread symbols will be bound to libthread, as it comes first in the link line. This makes libthread appear used, and therefore not eligible for removal.

It is tempting to simply let ld replace the use of a standard filter from ld invocations with the filtee (e.g. libc for libthread), but this is not completely safe:

  • Although the filter does offer libc APIs, it only offers a subset. Replacing it with libc at the same spot could alter bindings to the symbols from the libraries between the filter and its filtee.
  • There can be a 1 to many relationship from filter to filtees, and per-symbol filtering means that the filtees can be different for each symbol.
  • If the filter is to something other than libc, it is possible that the link line doesn't contain that other library, and simply removing the filter will leave undefined symbols.
  • Sometimes, the filter serves other purposes, such as pulling in other dependencies. Removing it can break the application, often silently, in subtle ways only observed at runtime.

Clearly it would be fine in the vast majority of cases for ld to ignore bindings to a filter if the filtee is also present on the link line, but this requires a degree of human judgment. Weak filters provide a solution to this problem. Weak filters are identical to standard filters at runtime. The runtime linker considers them to be one and the same. They are also identical at link time, unless unused dependency checking (-zignore or -zdiscard-unused=dependencies) is enabled. However, when unused dependency checking is enabled, ld will allow symbols bound to a weak filter to be overridden by the symbol from the filtee. Using the previous example, and a version of libthread built as a weak filter:

% elfdump -d /lib/libthread.so.1 | grep FILTER
      [1]  FILTER       0xcb3       libc.so.1
     [14]  FLAGS_1      0x20021000  [ NODUMP NODIRECT WEAKFILTER ]
% cc test.c -lthread -lc -zdiscard-unused=dependencies
% elfdump -d a.out | grep NEEDED
      [0]  NEEDED          0x12d      libc.so.1

In Solaris 11 Update 4, all of the libc filters discussed above are delivered as weak filters that can be discarded by ld if -zignore or -zdiscard-unused=dependencies are used.

Weak Filter Implementation

Weak filters are essentially standard filters, with a flag that informs the link-editor that they can be treated specially. As shown in the above example, that flag is the DF_1_WEAKFILTER bit of the DT_FLAGS_1 dynamic entry.

Filters can be created at both the object, and per-symbol level. To indicate that a per-symbol filter is weak, the new SYMINFO_FLG_WEAKFILTER flag is set in conjunction with SYMINFO_FLG_FILTER. For instance, the symbol inet_aton from libresolv.so.2 is currently a per-symbol filter to libc, and will become a weak per-symbol filter:

    % elfdump -y /lib/libresolv.so.2 | grep inet_aton
      [144]  FW        [9] libc.so.1         inet_aton

To allow filters to be completely specified from a mapfile, as well as to allow for the creation of weak object level filters, the version 2 mapfile language gains a new top level directive:

    FILTER {
            FILTEE = soname;
            TYPE = filter_type
    };

where filter_type can be one of STANDARD, WEAK, or AUXILIARY. For example, this is the definition used to create the weak filter version of libthread shown above:

    # The content that used to be in this library now resides in libc.so.1.
    # Make this library a weak filter so that ld can eliminate it as a
    # dependency when -z discard-unused=dependencies is used.
    FILTER {
            FILTEE = libc.so.1;
            TYPE = WEAK;
    };

To create a per-symbol weak filter, the FILTER symbol attribute is used:

    SYMBOL_xxx {
...
            symbol_name {
...
                    FILTER {
                            FILTEE = soname;
                            TYPE = filter_type;
                    };
...
            };
    };

The Numbers: Applying Weak Filters to Userland

The Userland consolidation, which can be found at https://github.com/oracle/solaris-userland was the primary target for this project.

Clearly, weak filters are not a cure all. Most projects are still better off manually removing these unnecessary libc filters from their Makefiles. The place where weak filters pay off is when building large quantities of FOSS (Free Open Source Software), where changing their configuration scripts is intractable. The link-editor LD_OPTIONS environment variable can be used to pass extra options to ld without having to modify the Makefiles for such software. Userland is a prime example of an environment in which this technique works well.

Prior to integration, in order to gauge the effectiveness of weak filters on the FOSS code in the Userland consolidation, I built Userland using a build machine running bits that contained the linker support for weak filters, and which had the system filters listed above (libaio, etc) converted to weak filter form. Each Userland component creates a build subdirectory with its build artifacts and proto area. I searched these 'build' subdirectories for ELF objects, and compared their NEEDED records against those from a userland workspace built with stock bits. A large number of NEEDED records are removed, showing the value of the approach:

Library Before After Removed

libcrypt.so.1 314 263 51 16%
libdoor.so.1 2 0 2 100%
libintl.so.1 31 0 0 100%
libnsl.so.1 687 33 654 96%
libpthread.so.1 2744 1973 771 28%
libresolv.so.2 311 307 4 1%
librt.so.1 627 271 356 57%
libsocket.so.1 698 232 466 67%
libthread.so.1 2266 56 2210 98%

In principle, all of these filters could be completely eliminated. The gain for Userland is impressive, but it does not quite achieve 100%. There are 2 main reasons for this:

  1. Some libraries have per-symbol filters, but also provide other content directly that may be legitimately used by a program linking to it (e.g. libresolv).
  2. Although an attempt is made within Userland to apply the -z discard-unused=dependencies option widely, the Makefiles for many of the components are not amenable to this, and others have disabled it for their own reasons. As such, not all of the components use it.

Even so, the win is easy, and large.

Join the discussion

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
 

Visit the Oracle Blog

 

Contact Us

Oracle

Integrated Cloud Applications & Platform Services