Why do I love multiple versioned symbols with the same name.

When I was a postgraduate student at St.Petersburg State University I had come across the writeup from the Tom Duff (yes! of the Duff's device fame) where he stated that shared libraries are pure evil one true sign that apocalypse is at hand. At the time I didn't gave it much thought, but now that I've worked for Sun for some odd number of years I think I tend to agree with him.

I believe that the main gripe I now have with shared libraries (AKA Dynamic Shared Objects -- DSOs) is the fact that they truly aim at solving two mutually exclusive problems: give vendors a flexibility to patch systems "live" and also protect end-users from experiencing failures of the unsuspecting applications which don't want to be patched.

One of the tools for protecting the endusers is, of course, versioning of the symbols in DSOs introduced by Sun more than 10 years ago. And even what Sun did was somewhat of an overkill, but the GNU crowd decided to go the whole nine yard as far as complexity is concerned when they decided to "augment" Sun's versioning strategy with a couple of things of their own.

Of course the best of it is: "The second GNU extension is to allow multiple versions of the same function to appear in a given shared library."

Why do I care? Well, primarily because the following doesn't really work as expected on Linux:
int pthread_cond_signal(pthread_cond_t \*cond)
   /\* Snitch on pthread_cond_signal \*/
   sym = dlsym(RTLD_NEXT, "pthread_cond_signal");
   return sym(cond);
In fact it breaks. Horribly! Why ? Well, because pthread_cond_signal happens to be a versioned symbol with the previous version still available in glibc (and in libpthread.so, but that's a different story):
$ nm /lib/libc.so.6 | grep pthread_cond_signal
000cb780 t __pthread_cond_signal
000cb780 t __pthread_cond_signal_2_0
000cb780 T pthread_cond_signal@GLIBC_2.0
000cb780 T pthread_cond_signal@@GLIBC_2.3.2
And regardless of the fact that the default one is supposed to be the GLIBC_2.3.2 one when I call dlsym() I get the older guy. Of course the older guys now has problems working with a cond. variable initialized by the unitercepted (2.3.2) pthread_cond_init and the whole thing goes kaboom.

Which means that in order for my code to work not only do I have to now version my symbols in order to intercept only what's needed but I also have to do a funny dance around dl[v]sym.

Versioning my own symbols was a bit of a challenge as well. Don't get me wrong -- the Sun way of writing linker map files worked quite nicely, but I really wanted to experience some of that magical world of GNU asm:
__asm__(".symver old_foo,foo@@VERS_2.0");
Suffice it to say, that the following example broke:
$ cat test.c
void old_foo() {}
__asm__(".symver old_foo,foo@@VERS_2.0");
$ gcc -shared -fPIC -o test.so test.c
      test.so: undefined versioned symbol name foo@@VERS_2.0
/usr/lib/gcc/i586-suse-linux/bin/ld: failed to set dynamic 
                                     section sizes: Bad value
collect2: ld returned 1 exit status
and it took me a while to realize that the claim they make: "This was done mainly to reduce the burden on the library maintainer." is a bit further from realiaty than I expected -- you still need the mapfile!

Oh well, yet another day, taming glibc.

The Linux crowd solved the problem of multiple variants in the same library routine by adding the aliasing functionality. I think Sun solves the same kinds of problems by using #pragma redefine_extname This performs the aliasing effect inside the compiler instead of inside the linker. It also removes all the aliasing cruft from the libraries, since it is a compile-time-only artifact. This means you won't see problems with dlsym() on Solaris. Of course, the problems with dlsym should be fixable on Linux, they just need to make dlsym resolve the same way it would have if that symbol were referenced by the compiler from inside the shared library that called dlsym().

Posted by Chris Quenelle on July 23, 2006 at 01:40 PM PDT #

Hi Chris!
Nice to see you reading my stuff ;-)
Anyway, the problem with GNU world is not so much that they do it the other way, but that they have "extended" Solaris idea of versioning which added quite a few layers of additional complexity but didn't provide much in return.
Consider this -- in glibc it is possible to have two versions associated with the \*same\* symbol 'foo'. Something that Solaris doesn't allow. This actually means that I now have to force my application to stay in a particular "version space" if I got there. Its a bit like keeping track of matching parenthesis: if for some reason I end up in foo_constructor@NON_DEFAULT_VERSION I have to make sure that foo_destructor@NON_DEFAULT_VERSION gets called (and not the foo_destructor@DEFAULT_VERSION) when its time to clean things up.
Maybe its just me but I despise complexity when it doesn't come with huge benefits. And in this particular case I see none. Especially given the fact that everybody recompiles everything against the system libc anyway.
So what exactly symbols with the same name but different versions in the same .so buy us ?

Posted by Roman Shaposhnik on July 26, 2006 at 01:03 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed



Top Tags
« July 2016