DWARF and relocations

Note: while "canonical" dwarfdump is a self-contained utility (linked statically with libdwarf), its variant shipped with Sun Studio depends on dynamic library libdwarf.so, which actually does most of the job. So when I write dwarfdump I usually mean libdwarf[.a|.so].

In order to explain why relocation processing is important when reading DWARF info, brief introduction into DWARF2 format is necessary.

A little something about DWARF

DWARF is widely used debugging data format. It is suitable for virtually any source language and in Sun Studio is used to represent debugging information for C, C++ and Fortran. There exists three versions of the standard - 1.1, 2.0 and most recent 3.0. Sun Studio compilers generate version 2.0, commonly referred as DWARF2.

DWARF2 information is stored in a table of records (contained in .debug_info ELF section) called Debug Information Entries (DIE). Each DIE has a

  • type (for example, DW_TAG_compile_unit),
  • several attribute/value pairs that describe the entry (for example, DW_AT_language=DW_LANG_C_plus_plus).
In order to save space, attribute names are not stored and each DIE type has corresponding entry in another table (stored in .debug_abbrev ELF section) that lists all attributes used in a compilation unit (CU) along with their types for given DIE type. For example:
DW_TAG_compile_unit DW_children_yes
  DW_AT_name                 DW_FORM_string
  DW_AT_language             DW_FORM_data1
  DW_AT_low_poc              DW_FORM_addr
  DW_AT_high_pc	             DW_FORM_addr
This entry describes a CU, which contains name (string), source language attribute (1 byte data), addresses of beginning and the end (4/8-byte data, depending on memory model) and other fields.

So DIE type is just an index into that table of abbreviations:

  CU abbrev_offset=0
  DW_TAG_compile_unit (code=1) ----+
  ...                              |
  ...                              |
                                   | (points to debug_abbrev entry #1)  
.debug_abbrev                      |
  DW_TAG_compile_unit (code=1)  <--+
    DW_AT_producer           DW_FORM_strp
    DW_AT_language           DW_FORM_data1
  DW_TAG_variable (code=2)
    DW_AT_name               DW_FORM_strp
    DW_AT_decl_file          DW_FORM_data1

Relocations that affect DWARF data

When several object files are linked together with

$ ld -r file1.o file2.o -o combined.o

to produce relocatable file combined.o, all .debug_abbrev sections are glued together by the linker into one section, which invalidates indexes of abbreviation tables for all but first object file.

For example, if second file, file2.o, contained description of DW_TAG_variable in its own .debug_abbrev table at index 4 and first file, file1.o, had, say, DW_TAG_typedef at that index, dwarfdump for combined.o would look in .debug_abbrev table at index 4 thinking that it describes variable, while this entry actually describes typedef. There's no way to validate such a reference. Results vary from wrong data printed to a crash.

In order to solve this problem, debug info header for every compilation unit has "abbrev offset" field, which points to the beginning of abbrev table part of that compilation unit. This field is always 0 for .o files produced from one source file; since there's only one compilation unit, abbreviations table starts from byte 0 of .debug_abbrev section. This abbrev_offset field is updated by corresponding relocation record when object files are linked together.

When linker is asked to generate executable or shared library, it applies this kind of relocations and resulting load object has correct abbrev_offset for each CU. When -r linker option is in effect, it is supposed to generate a file that has all relocations intact, so ld copies (updated versions of) relocations from input files into output file.

Let's take a look at this relocation record. On Solaris, for sparcv9 (64-bit) object file, it looks like this:

$ readelf -r file2.o

Relocation section '.rela.debug_info' at offset 0x4a8 contains 3 entries:
  Offset          Info           Type          Sym. Value    Sym. Name + Addend
00000000000e  000300000036 R_SPARC_UA64     0000000000000000 .debug_abbrev + 0

There are more relocation records, but only one refers to section .debug_abbrev, which gives a good hint: after all, only one field in debug_info depends on knowing the "address" of debug_abbrev section. More thorough examination (or rather, calculation) involving offset = 14 (0xe) leads to the same conclusion: this relocation record updates abbrev_offset.

After file1.o and file2.o are linked together with -r linker option, combined.o would have two relocations records relative to .debug_abbrev section:

$ readelf -r combined.o | grep
00000000000e  000c00000036 R_SPARC_UA64     0000000000000000 .debug_abbrev + 0
000000000136  000c00000036 R_SPARC_UA64     0000000000000000 .debug_abbrev + 3b

First one is obviously for the first file as the offset is too small and the second one is intended to update abbrev offset for second file. Note that it is RELA-type relocation, relocation with addend, which in this case if 0x3b or 59. It means that abbreviations table of the second file starts at offset 59 bytes in .debug_abbrev section. It also means that location this record is supposed to update probably contains zero (for REL-type relocations, it would contain addend - 59 in this case).

Here's how it looks from DWARF point of view (combined.o):

.debug_info section
  CU file1.o, abbrev_offset=0
    DW_TAG_compile_unit (code=1) ---> (points to debug_abbrev entry #1)
  CU file2.o, abbrev_offset=59
    DW_TAG_compile_unit (code=1) ---> (points to debug_abbrev entry #59+1=60)

.debug_abbrev section      

1:  DW_TAG_compile_unit (code=1) <-- part of table for file1.o starts from here
    DW_AT_producer             DW_FORM_strp
    DW_AT_language             DW_FORM_data1
2:  DW_TAG_variable (code=2)
    DW_AT_name                 DW_FORM_strp
    DW_AT_decl_file            DW_FORM_data1
60: DW_TAG_compile_unit (code=1)  <-- part of table for file2.o starts from here
    DW_AT_producer             DW_FORM_strp
    DW_AT_language             DW_FORM_data1

On x86, relocation record is of type REL, which means that addend is supposed to be in the location to be modified; in other words, in abbrev_offset field. Therefore, on x86 linker writes correct offset into debug info header, making relocation entry for debug_abbrev redundant, at least for dwarfdump. Which is why dwarfdump will always work on x86 and sparcv8.

On SPARCv9 (as well as on x64), relocation record is of type RELA, meaning that addend is stored in relocation entry itself. So when producing relocatable object file (ld -r) linker does not touch abbrev_offset field in the section, it changes relocation record for second compilation unit (file2.o) and puts correct offset into that relocation record. In order to obtain right value of abbrev_offset, one has to perform relocation first.

Recent versions of dwarfdump have built-in relocation processing for x64, sparcv9 and MIPS.


  1. DWARF standard.
  2. David Anderson's page, the source of dwarfdump and libdwarf.

Опубликовать комментарий:
  • HTML Syntax: Отключен

Articles, news, notes on dbx, the Sun Studio debugger and other stuff.


« Июль 2016