ELF files are full of things we need to keep track of for later access: Names, addresses, sizes, and intended purpose. Without this information, an ELF file would not be very useful. We would have no way to make sense of the impenetrable mass of octal or hexidecimal numbers.
Consider: When you write a program in any language above direct machine code, you give symbolic names to functions and data. The compiler turns these things into code. At the machine level, they are known only by their address (offset within the file) and their size. There are no names in this machine code. How then, can a linker combine multiple object files, or a symbolic debugger know what name to use for a given address? How do we make sense of these files?
Symbols are the way we manage this information. Compilers generate symbol information along with code. Linkers manipulate symbols, reading them in, matching them up, and writing them out. Almost everything a linker does is driven by symbols. Finally, debuggers use them to figure out what they are looking at and to provide you with a human readable view of that information.
It is therefore a rare ELF file that doesn't have a symbol table. However, most programmers have only an abstract knowledge that symbol tables exist, and that they loosely correspond to their functions and data, and some "other stuff". Protected by the abstractions of compiler, linker, and debugger, we don't usually need to know too much about the details of how a symbol table is organized. I've recently completed a project that required me to learn about symbol tables in great detail. Today, I'm going to write about the symbol tables used by the linker.
Sharable objects and dynamic executables usually have 2 distinct symbol tables, one named ".symtab", and the other ".dynsym". (To make this easier to read, I am going to refer to these without the quotes or leading dot from here on.)
The dynsym is a smaller version of the symtab that only contains global symbols. The information found in the dynsym is therefore also found in the symtab, while the reverse is not necessarily true. You are almost certainly wondering why we complicate the world with two symbol tables. Won't one table do? Yes, it would, but at the cost of using more memory than necessary in the running process.
To understand how this works, we need to understand the difference between allocable and a non-allocable ELF sections. ELF files contain some sections (e.g. code and data) needed at runtime by the process that uses them. These sections are marked as being allocable. There are many other sections that are needed by linkers, debuggers, and other such tools, but which are not needed by the running program. These are said to be non-allocable. When a linker builds an ELF file, it gathers all of the allocable sections together in one part of the file, and all of the non-allocable sections are placed elsewhere. When the operating system loads the resulting file, only the allocable part is mapped into memory. The non-allocable part remains in the file, but is not visible in memory. strip(1) can be used to remove certain non-allocable sections from a file. This reduces file size by throwing away information. The program is still runnable, but debuggers may be hampered in their ability to tell you what the program is doing.
The full symbol table contains a large amount of data needed to link or debug our files, but not needed at runtime. In fact, in the days before sharable libraries and dynamic linking, none of it was needed at runtime. There was a single, non-allocable symbol table (reasonably named "symtab"). When dynamic linking was added to the system, the original designers faced a choice: Make the symtab allocable, or provide a second smaller allocable copy. The symbols needed at runtime are a small subset of the total, so a second symbol table saves virtual memory in the running process. This is an important consideration. Hence, a second symbol table was invented for dynamic linking, and consequently named "dynsym".
And so, we have two symbol tables. The symtab contains everything, but it is non-allocable, can be stripped, and has no runtime cost. The dynsym is allocable, and contains the symbols needed to support runtime operation. This division has served us well over the years.
Given how long symbols have been around, there are surprisingly few types:
- Used when we don't know what a symbol is, or to indicate the absence of a symbol.
- STT_OBJECT / STT_COMMON
These are both used to represent data. (The word OBJECT in this context should not interpreted as having anything to do with object orientation. STT_DATA might have been a better name.)
STT_OBJECT is used for normal variable definitions, while STT_COMMON is used for tentative definitions. See my earlier blog entry about tentative symbols for more information on the differences between them.
- A function, or other executable code.
- When I first started learning about ELF, and someone would say something about "section symbols", I thought they meant a symbol from some given section. That's not it though: A section symbol is a symbol that is used to refer to the section itself. They are used mainly when performing relocations, which are often specified in the form of "modify the value at offset XXX relative to the start of section YYY".
- The name of a file, either of an input file used to construct the ELF file, or of the ELF file itself.
- A third type of data symbol, used for thread local data. A thread local variable is a variable that is unique to each thread. For instance, if I declare the variable "foo" to be thread local, then every thread has a separate foo variable of their own, and they do not see or share values from the other threads. Thread local variables are created for each thread when the thread is created. As such, their number (one per thread) and addresses (depends on when the thread is created, and how many threads there are) are unknown until runtime. An ELF file cannot contain an address for them. Instead, a STT_TLS symbol is used. The value of a STT_TLS symbol is an offset, which is used to calculate a TLS offset relative to the thread pointer. You can read more about TLS in the Linker And Libraries Guide.
- The Sparc architecture has a concept known as a "register symbol". These symbols are used to validate symbol/register usage, and can also be used to initialize global registers. Other architectures don't use these.
In addition to symbol type, each symbols has other attributes:
The exact meaning for some of these attributes depends on the type of symbol involved. For more details, consult the Solaris Linker and Libraries Guide, which is available in PDF form online.
The symbols in a symbol table are written in the following order:
What would happen if we ignored these rules and reordered things in some other way (e.g. sorted by address)? There is no way to answer this question with 100% certainty. It would probably confuse existing tools that manipulate ELF files. In particular, it seems clear that the local and global symbols must remain separate. For years and years, arbitrary software has been free to assume the above layout. We can't possibly know how much software has been written, or how dependent on layout it is. The only safe move is to maintain the well known layout described above.
One of the big advantages of Solaris relative to other operating systems is the extensive support for observability: The ability to easily look inside a running program and see what it is doing, in detail. To do that well requires symbols. The symbols in the dynsym may not be enough to do a really good job. For example, to produce a stack trace, we need to take each function address and match it up to its name. If we are looking at a stripped file, or referencing the file from within the process using it via dladdr(3C), we won't have any way to find names for the non-global functions, and will have to resort to displaying hex addresses. This is better than nothing, but not by much. The standard files in a Solaris distribution are not stripped for exactly this reason. However, many files found in production are stripped, and in-process inspection is still limited to the dynsym.
Machines are much larger than they used to be. The memory saved by the symtab/dynsym division is still a good thing, but there are times when we wish that the dynsym contained a bit more data. This is harder than it sounds. The layout of dynsym interacts with the rest of an ELF file in ways that are set in stone by years of existing practice. Backward compatibility is a critical feature of Solaris. We try extremely hard to keep those old programs running. And yet, the needs of observability, spearheaded by important new features like DTrace, put pressure on us in the other direction.
This discussion is prelude to work I recently did to augment the dynsym to contain local symbols, while preserving full backward compatibility with older versions Solaris. I plan to cover that in a future blog entry. ELF is old, and much of how it works cannot be changed. Its original designers (our "Founding Fathers", as Rod calls them) anticipated that this would be the case, based no doubt on hard experience with earlier systems. The ELF design is therefore uniquely flexible, which explains why it has survived as long as it has. There is always a way to add something new. Sometimes, it takes several tries to find the best way.
[ This article is permanently archived at http://www.linker-aliens.org/blogs/ali/entry/inside_elf_symbol_tables/ ]