What does dynamic linking and communism have got in common?

It is simple enough, really – both were ideas that sounded nice in theory but came crushing down during their first serious brush with reality. And since there's no shortage of experts trying to explain how come communism wasn't meant to be – I'm going to focus on the other one. But before I do, I want to make it extra clear that this article is about dynamic linking and not dynamic loading. The later one consists of dlopen(3)/dlsym(3)/dlclose(3) and is a fine idea. Dynamic linking on the other hand is all about magic that makes your application work, even though bits and pieces of it might be in places you've never heard of. And, of course, as any magic it does promise a lot. Among the biggest claims of the dynamic linking (as it is currently implemented in UNIX and similar OSes) are the following three:
  1. all applications are capable of sharing common code at runtime, thus reducing total memory footprint of an entire system
  2. all applications can reference common code without actually storing it as part of their ELF (or similar) file image, thus reducing total storage footprint of an entire system
  3. you can fix problems in common code, thus benefiting all of the applications available on your systems at once
May be there are others, but these three are most commonly cited in favor of justifying the mind boggling complexity of the modern dynamic linkers (and if you don't believe me how complex they are – try asking our resident Solaris linker guru) and even bigger complexity of how, what I referred to as, "common code" is supposed to be packaged and delivered in order for the magic to work. Of course, given the price we pay in complexity I would expect dividends to be quite significant. Unfortunately, they are not.
The rest of this article discusses why dynamic linking does not deliver on any of its promises and why, just as communism, it might be an idea that only works in an ideal world (as a curious footnote I must add that just as communism done right seems to be doing quite well in one particular eastern country, dynamic linking within a binary compatibility guarantee of one particular OS gets as closed to being true to its promises as one can get).
The goal of this article is not to present an alternative model (I still don't have a 100% satisfactory one even for dynamic linking; not to mention communism) but to merely make the reader question whether static linking is, in fact, long dead and buried or may be the people who try very hard to make us think that way have just spent too much time in an ivory tower and haven't seen the real world in a while.
With that, let me start with tackling the last purported benefit of dynamic linking (an ability to fix problems in common code) not only because it is the easiest to knock down, but also, because once knocked it virtually eliminates the first two benefits completely. An ability to fix problems in common code and effectively addressing it once instead of doing it as many times as you have applications sharing that common code sounds really nice till you ask yourself -- what is a "problem"? What is a bug? And could it be that one application's problem is something that a second application depends upon in order to work properly? The answer to the last question is a resounding YES and there's no better example than a very prominent C++ compiler vendor who had to leave a pretty nasty C++ ABI bug unfixed for a number of years just because any possible fix would break all previously compiled applications. And of course, since C++ runtime library is dynamically linked into any application written in C++ that was unacceptable. You see, in real world programs have bugs. Worse yet -- the line between a bug and a feature sometimes gets quite blurry. That is especially true for common code. Why? For two obvious reasons: first of all, since most likely than not you didn't write the code shared by different applications yourself you have no way of knowing whether your usage patters of that common code do indeed trigger a bug, or whether they are just an example of GIGO principle. Second and most importantly -- you are very likely not to have any control over the common code and even if you can prove that the problem is indeed a bug you'd rather workaround it than wait for a vendor to issue a patch. These two issues combined create a very unpleasant situation where problems in common code now become unfixable not because we can't fix them for good, but because the old buggy behavior is now something that quite a few applications depend upon. This is a classical "doomed if you do, doomed if you don't" principle at work. But where does it leave us as far as dynamic linking goes? In a mess! And a big one at that. All of a sudden we have a system where half of the applications want that piece of common code fixed and the other half wants it broken. All of a sudden we have to make sure that we CAN isolate applications that still depend on an old buggy behavior and the magic of dynamic linking just starts getting blacker and blacker with abominations like LD_LIBRARY_PATH and DSO symbol versioning. What we've got on our hands now is a simple situation where common code becomes segmented in a sense that it is common among just a subset of applications. And that is the point where dynamic linking just breaks. There's no way for my application to be sure that the same common code I tested it with is the one that is being in use. And for any serious software vendor that is just unacceptable. You see, serious software vendors care about their customers and they don't play finger pointing games saying things like: it is all your fault you should have not upgraded that shared library over there. What do they do instead? Well, just try to do
find . -name \\.so
to see for yourself. If you do that with any commercial piece of software (or even free large ones like Open Office) don't be surprised to see things like private versions of glibc.so being repackaged and redelivered. It is much safer for them to do that instead of constantly dreading the ugly upgrade of /lib/libc.so.
But wait! Haven't it just annulled the first and the second claims that the dynamic linking had? Sure it did. There's no sharing possible between /lib/libc.so and /opt/bigapp/lib/glibc.so. None. The memory gets wasted as much as the diskspace does. It might as well be static linking at that point.
In fact, static linking would be quite beneficial for the application since if done right with the smart compiler it would enable things like: not wasting precious CPU cycles on position independent code (if you think PIC is free see Performance Analyzer in action), doing interprocedural optimization, cross-file inlining and template elimination. And a few others. And unlike dynamic linking you can be dead certain that the very same code you tested would be working at your customer's site. Not only that -- but when you do need to fix it, you fix wouldn't break anybody else.
Ain't this the magic?
Comments:

I've got to disagree. Without dynamic linking, it would not be possible to offer the Solaris binary compatibility guarantee. It would be impossible to do make those bug fixes at all. and the interface with the kernel would have to be static and completely backwards compatible forever.

Posted by Brian Utterback on April 02, 2007 at 02:57 AM PDT #

If you look close at my footnote I do actually refer to Solaris as one shiny example where dynamic linking gets as close to actually working as one can get. There is, however, a price you pay for that. Binary compatibility guarantee doesn't come for free. Far from it. As I pointed out in my article -- we do have to be bug-compatible in order to preserve it. Not a good thing, but it works. There's yet another price you pay -- you can't innovate as fast (or as reckless) as Linux does. And finally, binary compatibility guarantee can only exist as long as Sun is the only 800lb gorilla actually productizing Solaris. As soon as IBM creates their own distro based on OpenSolaris: my bet is -- the guarantee will be out of the window. In Linux and many other UNIXes you have all of the above and dynamic linking simply doesn't work there. Now, to some other points you've made in your comment: "Without dynamic linking, it would not be possible to offer the Solaris binary compatibility guarantee.". This is simply far from true -- it is possible to do that with or without dynamic linking. And your last point is, in fact, true -- interface with the kernel \*IS\* static and compatible forever. At least with Solaris it is. Because, you see, /lib/libc.so is NOT the only way I can do a system call on Solaris. And finally -- I do NOT, by any means, say that dynamic linking should be abolished. What I do say is that we have to rethink the blind trust we have in its abilities to deliver. I hate to see young programmers poisoned by the kind of thinking Ulrich Drepper puts forward since it is simply too narrow.

Posted by Roman Shaposhnik on April 02, 2007 at 03:08 AM PDT #

Well, even if Drepper's point is too narrow (I'ld say too aggressive), still it has its big arguments.
I'ld say that by putting together both Drepper's "never use static linking" and yours "dynamic linking does not work" we can get a resolution for this problem! Its like marrying communism with capitalism - some say its what Russia is currently, but I envision it would be more like buddhism - "do not link, it does not work" is a mantra almost as good as "Om Mani Padme Hum"! :)

Posted by Fedor Sergeev on April 05, 2007 at 06:41 PM PDT #

To: Fedor
I agree. The point of the article was to expose some of the arguments people take as a given when they talk about dynamic linking. Of course, I wanted to "do my homework" but the more articles like Drepper's one I read the angrier I got (I was lucky I didn't have Hulk syndrome ;-)). In the end, mine ended up being tilted the other way. It was also intended as the first one in a series of articles for Open Source projects to consider when they make decisions on how to structure their builds and deployment configurations. Most of the time the decisions are based on practices which are more of a historic artifact, than a necessity -- so stay tuned and let me know what you think about them.

Posted by Roman Shaposhnik on April 17, 2007 at 11:42 AM PDT #

My comment was rather large so I posted it at my own site: http://www.freeswitch.org/node/56 Our project has a lot to say about this topic.

Posted by Anthony C Minessale II on May 05, 2007 at 01:53 AM PDT #

i agree with this analysis. It's also worth noting that dynamic linking is always presented as a good way of fixing bugs in all applications at once, but less often is the converse noted, that bugs (or unpleasantness) can be introduced into all applications at once. in my own tests years ago i found that a competent library implementation statically linked often used much less overall in working set than the shared library implementations that supposedly were saving so much (because the dynamic shared library had absolutely everything in it).

Posted by Charles Forsyth on July 01, 2007 at 10:39 AM PDT #

While I do see the points made and do agree that dynamic linking can be a problem you are forgetting the easy fix to the problems you talk about: Simply have more than one library. If you depend on a bug to be present ask for the library which contains that bug. If it's not there -- tough luck. The admin has to install it. If half your programs use one version of a library and the other half doesn't chances are at least half of the the bugged apps can use the other library meaning you have a small percentage of apps running bugged code.

Posted by Frederik Hertzum on July 18, 2007 at 05:30 AM PDT #

To: Frederik Hertzum
If I understand your suggestion correctly, I believe it is much closer to what I call 'dynamic loading', which as I have pointed out -- I have no problem with. I consider the line between linking and loading to be where ld.so starts implementing policies, not mechanisms. Personally, I'm always against such a design.
Thanks,
Roman.

P.S. I really wanted to comment on your blog on the post that you've made on Plan9. How do I register?

Posted by Roman Shaposhnik on July 19, 2007 at 07:48 AM PDT #

So, it is NOT true that the kernel<->libc interface is static on Solaris. In fact, since S10 it most certainly is not, and it is always critical to have a matched kernel and libc across any changes to that interface.

Posted by Nicolas Williams on June 10, 2009 at 05:16 AM PDT #

To: Nicolas Williams

Solaris binary compatibility guarantee as expressed here (http://www.sun.com/software/solaris/guarantee.jsp) makes me believe that a binary statically compiled on Solaris 8 is guaranteed to run on Solaris 10. Please let me know if that's not the case.

Posted by Roman Shaposhnik on June 10, 2009 at 10:23 AM PDT #

@Roman:

That is, indeed true. However, you'll note that there is no libc.a on S10, and if you look carefully (at PSARC/2002/117, the ARC case that led to the removal of all static libraries in S10, and at ONNV putback/push history and flag day messages), you might notice that Solaris core engineering considers itself free to change all kernel<->libc interfaces other than those captured in S8 and S9 libc.a, and at any time. Eventually S8 and S9 will no longer be supported releases, so there is an end in sight for those kernel<->libc interfaces that are frozen. Moreover, S10 and Nevada/OpenSolaris have added a great many kernel<->user-land interfaces, so that the proportion of those which are frozen is smaller than you might think. Nor is the binary compatibility guarantee without exceptions -- features can be EOFed, and binary incompatible changes are allowed at certain boundaries (check the ARC interface stability and release taxonomy best practice documents).

Since S10 it is always required to update kernels and libc.so.1, and even ld.so.1, together. Not only that, but private interfaces between various kernel components and user-land libraries too can change this way, therefore one must always update the kernel an user-land together. If you look at on10 and onnv flag days you'll often see flag day messages that say that one must use BFU, upgrade, or live upgrade (and soon, image-update) to get past the flag day, not cap-I install -- such flag day messages are a symptom of what I just described above.

Sadly, PSARC/2002/117 (Solaris Process Model Unification) is not available on arc.opensolaris.org. But you can see it within SWAN by wandering over to http://sac.sfbay/PSARC/2002/117/.

Posted by Nico on June 10, 2009 at 04:01 PM PDT #

Oh, and looking at PSARC/2002/117 I see that the binary compatibility guarantee was not, at the time, extended to statically linked binaries. In fact, the appcert tool at the time already specifically complained about statically linked executables. Which means that we can, in fact, remove from S10/Nevada/OpenSolaris, all kernel<->libc private interfaces captured by libc.a on S8 and S9.

Posted by Nico on June 11, 2009 at 04:38 AM PDT #

You say that vendors don't want to get into "fingerpointing", but I can't see how saying "Our tested/supported configurations are libc versions X1.Y1.Z1, X2.Y2.Z2 and X3.Y3.Z3" is substantially different from saying "The only supported configuration is on Solaris version X.Y", which is standard practice.

If we're going to just ship a big statically linked blob, then we might as well go the whole way and just ship a VM image which includes the OS and app all bundled up together.

Posted by caf on October 22, 2009 at 09:33 AM PDT #

There is a simple counterexample to your argument: operating systems. If dynamic linking were really as hopeless as you say, it would not be possible to write an operating system, since by definition an operating system is nothing but shared code. Dynamic linking is not \*inherently\* bad, but it does demand careful management (and versioning) of the interfaces. If you don't do that, you are indeed hosed. But if you do, you're not.

Posted by Ron Garret on October 22, 2009 at 11:58 AM PDT #

Looks like you're just discovering life. Which is fine. But does it merit such a long rant for such a lite content, already very very very widely known in the field? Not so sure.

As for the issue, well, real men and women use real free software that you can fix, so you don't have to rely on retarded backward compatibility with evil proprietary software that you can't fix. And BTW versioning DSO symbols is NOT supposed to be used to let buggy pieces of code exist. So if you do that, you're just doing it wrong.

And for environment indeed made of lot of proprietary software where the issues you describe really are issues, well everybody redistribute all .dll alongside with their own binary blobs. For example MS even recommends to redistribute their libc in every installer (the very specific version you used during compilation, don't rely on their version number, they use the same one for different incompatible binaries...)

Code sharing via DSO do exists and do works in some environment. The fact that you're in an other one does not mean DSO are bad. The fact you're invoking communism indeed means you're a troll.

Posted by xilun on July 26, 2010 at 10:45 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

rvs

Search

Top Tags
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today