The Stub Proto: Not Just For Stub Objects Anymore

One of the great pleasures of programming is to invent something for a narrow purpose, and then to realize that it is a general solution to a broader problem. In hindsight, these things seem perfectly natural and obvious. The stub proto area used to build the core Solaris consolidation has turned out to be one of those things.

As discussed in an earlier article, the stub proto area was invented as part of the effort to use stub objects to build the core ON consolidation. Its purpose was merely as a place to hold stub objects. However, we keep finding other uses for it. It turns out that the stub proto should be more properly thought of as an auxiliary place to put things that we would like to put into the proto to help us build the product, but which we do not wish to package or deliver to the end user. Stub objects are one example, but private lint libraries, header files, archives, and relocatable objects, are all examples of things that might profitably go into the stub proto.

Without a stub proto, these items were handled in a variety of ad hoc ways:

  • If one part of the workspace needed private header files, libraries, or other such items, it might modify its Makefile to reach up and over to the place in the workspace where those things live and use them from there. There are several problems with this:

    • Each component invents its own approach, meaning that programmers maintaining the system have to invest extra effort to understand what things mean. In the past, this has created makefile ghettos in which only the person who wrote the makefiles feels confident to modify them, while everyone else ignores them. This causes many difficulties and benefits no one.

    • These interdependencies are not obvious to the make, utility, and can lead to races.

    • They are not obvious to the human reader, who may therefore not realize that they exist, and break them.

  • Our policy in ON is not to deliver files into the proto unless those files are intended to be packaged and delivered to the end user. However, sometimes non-shipping files were copied into the proto anyway, causing a different set of problems:

    • It requires a long list of exceptions to silence our normal unused proto item error checking.

    • In the past, we have accidentally shipped files that we did not intend to deliver to the end user.

    • Mixing cruft with valuable items makes it hard to discern which is which.

The stub proto area offers a convenient and robust solution. Files needed to build the workspace that are not delivered to the end user can instead be installed into the stub proto. No special exceptions or custom make rules are needed, and the intent is always clear.

We are already accessing some private lint libraries and compilation symlinks in this manner. Ultimately, I'd like to see all of the files in the proto that have a packaging exception delivered to the stub proto instead, and for the elimination of all existing special case makefile rules. This would include shared objects, header files, and lint libraries. I don't expect this to happen overnight — it will be a long term case by case project, but the overall trend is clear.

The Stub Proto, -z assert_deflib, And The End Of Accidental System Object Linking

We recently used the stub proto to solve an annoying build issue that goes back to the earliest days of Solaris: How to ensure that we're linking to the OS bits we're building instead of to those from the running system.

The Solaris product is made up of objects and files from a number of different consolidations, each of which is built separately from the others from an independent code base called a gate. The core Solaris OS consolidation is ON, which stands for "Operating System and Networking". You will frequently also see ON called the OSnet. There are consolidations for X11 graphics, the desktop environment, open source utilities, compilers and development tools, and many others. The collection of consolidations that make up Solaris is known as the "Wad Of Stuff", usually referred to simply as the WOS. None of these consolidations is self contained. Even the core ON consolidation has some dependencies on libraries that come from other consolidations.

The build server used to build the OSnet must be running a relatively recent version of Solaris, which means that its objects will be very similar to the new ones being built. However, it is necessarily true that the build system objects will always be a little behind, and that incompatible differences may exist.

The objects built by the OSnet link to other objects. Some of these dependencies come from the OSnet, while others come from other consolidations. The objects from other consolidations are provided by the standard library directories on the build system (/lib, /usr/lib). The objects from the OSnet itself are supposed to come from the proto areas in the workspace, and not from the build server. In order to achieve this, we make use of the -L command line option to the link-editor.

The link-editor finds dependencies by looking in the directories specified by the caller using the -L command line option. If the desired dependency is not found in one of these locations, ld will then fall back to looking at the default locations (/lib, /usr/lib). In order to use OSnet objects from the workspace instead of the system, while still accessing non-OSnet objects from the system, our Makefiles set -L link-editor options that point at the workspace proto areas. In general, this works well and dependencies are found in the right places.

However, there have always been failures:

  1. Building objects in the wrong order might mean that an OSnet dependency hasn't been built before an object that needs it. If so, the dependency will not be seen in the proto, and the link-editor will silently fall back to the one on the build server.

  2. Errors in the makefiles can wipe out the -L options that our top level makefiles establish to cause ld to look at the workspace proto first. In this case, all objects will be found on the build server.
These failures were rarely if ever caught. As I mentioned earlier, the objects on the build server are generally quite close to the objects built in the workspace. If they offer compatible linking interfaces, then the objects that link to them will behave properly, and no issue will ever be seen. However, if they do not offer compatible linking interfaces, the failure modes can be puzzling and hard to pin down. Either way, there won't be a compile-time warning or error.

The advent of the stub proto eliminated the first type of failure. With stub objects, there is no dependency ordering, and the necessary stub object dependency will always be in place for any OSnet object that needs it. However, makefile errors do still occur, and so, the second form of error was still possible.

While working on the stub object project, we realized that the stub proto was also the key to solving the second form of failure caused by makefile errors:

  1. Due to the way we set the -L options to point at our workspace proto areas, any valid object from the OSnet should be found via a path specified by -L, and not from the default locations (/lib, /usr/lib). Any OSnet object found via the default locations means that we've linked to the build server, which is an error we'd like to catch.

  2. Non-OSnet objects don't exist in the proto areas, and so are found via the default paths. However, if we were to create a symlink in the stub proto pointing at each non-OSnet dependency that we require, then the non-OSnet objects would also be found via the paths specified by -L, and not from the link-editor defaults.

  3. Given the above, we should not find any dependency objects from the link-editor defaults. Any dependency found via the link-editor defaults means that we have a Makefile error, and that we are linking to the build server inappropriately. All we need to make use of this fact is a linker option to produce a warning when it happens.

  4. Although warnings are nice, we in the OSnet have a zero tolerance policy for build noise. The -z fatal-warnings option that was recently introduced with -z guidance can be used to turn the warnings into fatal build errors, forcing the programmer to fix them.

This was too easy to resist. I integrated

7021198 ld option to warn when link accesses a library via default path
PSARC/2011/068 ld -z assert-deflib option
into snv_161 (February 2011), shortly after the stub proto was introduced into ON. This putback introduced the -z assert-deflib option to the link-editor:
-z assert-deflib=[libname]
Enables warning messages for libraries specified with the -l command line option that are found by examining the default search paths provided by the link-editor. If a libname value is provided, the default library warning feature is enabled, and the specified library is added to a list of libraries for which no warnings will be issued. Multiple -z assert-deflib options can be specified in order to specify multiple libraries for which warnings should not be issued.

The libname value should be the name of the library file, as found by the link-editor, without any path components. For example, the following enables default library warnings, and excludes the standard C library.

           ld ... -z assert-deflib=libc.so ...
-z assert-deflib is a specialized option, primarily of interest in build environments where multiple objects with the same name exist and tight control over the library used is required. If is not intended for general use.
Note that the definition of -z assert-deflib allows for exceptions to be specified as arguments to the option. In general, the idea of using a symlink from the stub proto is superior because it does not clutter up the link command with a long list of objects. When building the OSnet, we usually use the plain from of -z deflib, and make symlinks for the non-OSnet dependencies. The exception to this are dependencies supplied by the compiler itself, which are usually found at whatever arbitrary location the compiler happens to be installed at. To handle these special cases, the command line version works better.

Following the integration of the link-editor change, I made use of -z assert-deflib in OSnet builds with

7021896 Prevent OSnet from accidentally linking to build system
which integrated into snv_162 (March 2011). Turning on -z assert-deflib exposed between 10 and 20 existing errors in our Makefiles, which were all fixed in the same putback. The errors we found in our Makefiles underscore how difficult they can be prevent without an automatic system in place to catch them.

Conclusions

The stub proto is proving to be a generally useful construct for ON builds that goes beyond serving as a place to hold stub objects. Although invented to hold stub objects, it has already allowed us to simplify a number of previously difficult situations in our makefiles and builds. I expect that we'll find uses for it beyond those described here as we go forward.
Comments:

Post a Comment:
Comments are closed for this entry.
About

I work in the core Solaris OS group on the Solaris linkers. These blogs discuss various aspects of linking and the ELF object file format.

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Feeds