Debugging on AMD64 - Part One

The amd64 port of Solaris has been available (internally) for about a month and a half, and the rest of the group is starting to realize what those of us on the project team have known for a while: debugging on amd64 is a royal pain. The difficulty comes not from processor changes, but from design choices made in the AMD64 ABI. The ABI was designed primarily with performance in mind - debuggability and observability was largely an afterthought. There are two features of the ABI that really hurt debuggability. In this post I'll cover the less annoying of the two - look for another followup soon.

Frame Pointers

In the i386 ABI, you almost always have to establish a frame pointer for the current function (leaf routines being the exception). This gives you the familiar opening function sequence:

        pushl   %ebp
        movl    %esp, %ebp

And your frame ends up looking like this:

...
arg1
arg0
return PC
previous frame
%ebp

current frame

%esp

This is a restriction of the ABI, not the processor. You can cheat by using the -fomit-frame-pointer flag to gcc, but this is not ABI compliant (although some people still think it's a great idea).

The problem

With amd64, you would think that they would just keep this convention. At first glance it seems that way, until you find this little footnote in section 3.3.2:

The conventional use of %rbp as a frame pointer for the stack frame may be avoided by using %rsp (the stack pointer) to index into the stack frame. This technique saves two instructions in the prologue and epilogue and makes one additional general-purpose register (%rbp) available.

On amd64, the frame pointer is explicitly optional. To make debugging somewhat easier, they provide a .eh_frame ELF section that gives enough information (in the form of a binary search table) to traverse a stack from any point. This is slightly better than DWARF, but still requires a lot of processing. The problem with this is that it unnecessarily restricts the context from which you can gather a backtrace. On i386, your stack walking function is something like:

      frame = %ebp
      while (not at top of stack)
             process frame
             frame = \*frame

Simple and straightforward. This omits a few nasty details like signal frames and #gp faults, but it's largely correct. On amd64, you now have to load the .eh_frame section, process it, and keep it someplace where you have easy access to it. While this doesn't sound so bad for gdb, it becomes a huge nightmare for something like DTrace. If you read a little bit of the technical details behind DTrace, you'll understand that probes execute in arbitrary context. You may be in the middle of handling an interrupt, in dispatcher or VM code, or processing a trap (although on SPARC, DTracing code that executes at TL > 0 is strictly verboten). This means that the set of possible actions is severely limited, not to mention performance-critical. In order to process a stack() directive on amd64, we would now have to do something like:

        frame = %ebp
        while (not at top of stack)
                process frame
                for (each module in the system)
                        next = binary search in .eh_frame
                        if (next)
                                frame = next
                if (frame not found)
                        frame = \*frame

Of course, you could maintain a merged lookup table for all modules on the system, but this is considerably more difficult and a maintenance nightmare. The real show stopper comes with the ustack() action. It is impossible, from arbitrary context within the kernel, to process the objects in userland and find the necessary debugging information. And unless we're using only the pid provider, there's no way to know a priori what processes we will need to examine via ustack(), so we can't even cache the information ahead of time.

The solution

What do we do in Solaris? We punt. Our linkers will happily process .eh_frame sections correctly, but our debugging tools (DTrace, mdb, pstack, etc) will only understand executables that use a frame pointer. All of our code (kernel, libraries, binaries) is compiled with frame pointers, and hopefully our users will do so as well.

The amd64 ABI is still a work in progress, and the Solaris supplement is not yet finished. More language may be added to clarify the Solaris position on this "feature". It will probably be a non-issue as long as GCC defaults to having frame pointers on amd64 Solaris. I'm not completely sure how the latest GCC behaves - I believe that it defaults to using frame pointers, which is good. I just hope -fomit-frame-pointer never becomes common practice as we move to OpenSolaris and a larger development community.

Motivation

Why was this written into the amd64 ABI? It's a dubious optimization that severely hinders debuggability. Some research claims a substantial improvement, though their own data shows questionable gains. On i386, you at least had the advantage of increasing the number of usable registers by 20%. On amd64, adding a 17th general purpose register isn't going to open up a whole new world of compiler optimizations. You're just saving a pushl, movl, an series of operations that (for obvious reasons) is highly optimized on x86. And for leaf routines (which never establish a frame), this is a non-issue. Only in extreme circumstances does the cost (in processor time and I-cache footprint) translate to a tangible benefit - circumstances which usually resort to hand-coded assembly anyway. Given the benefit and the relative cost of losing debuggability, this hardly seems worth it.

It may seem a moot point, since you've been able to use -fomit-frame-pointer on i386 for years. The difference here is that on i386, you were knowingly breaking ABI compatibility by using that option. Your application was no longer guaranteed to work properly, especially when it came to debugging. On amd64, this behavior has received official blessing, so that your application can be ABI compliant but completely opaque to DTrace and mdb. I'm not looking forward to "DTrace can't ustack() my gcc-compiled app" bugs (DTrace already has enough trouble dealing with curious gcc-isms as it is).

It's conceivable that we could add support for this functionality in our userland tools, but don't expect it any time soon. And it will never happen for DTrace. If you think saving a pushl, movl here or there is worth it, then you're obviously so performance-oriented that debuggability is the last thing on your mind. I can understand some of our HPC customers needing this; it's when people start compiling /usr/bin/\* without frame pointers that it gets out of control. Just don't be suprised when you try to DTrace your highly tuned app and find out you can't get a proper stack trace...

Next post, I'll discuss register passing conventions, which is a much more visible (and annoying) problem.

Comments:

It gets even worse:I just took a look at the Intel Compiler (ICC for IA32) 8.1. It defaults to -fomit-frame-pointer for all optimization levels above -O0.

It seems Intel has focused solely on performance, violating its on ABI. Very sad...

Posted by Ralf on November 25, 2004 at 05:32 AM PST #

The use of a framepointer is NOT mandated by the i386 UNIX ABI. What the ABI mandates is that <tt>%ebp</tt> is to be treated like <tt>%ebx/%esi/%edi</tt>, aka a nonvolatile register to be preserved for the caller.

The ABI only _suggests_ that <tt>%ebp</tt> shall be used as a framepointer, giving function prologue/epilogue code samples that do it.

See figure 3.14 and page 3.11 in the i386 UNIX ABI supplement you quote above. It twice mentions using <tt>%ebp</tt> as framepointer is optional, but explicitly states it must be preserved for the caller.

The abovementioned link on gcc modifications indeed introduces an ABI violation - not because gcc doesn't use a framepointer when <tt>-fomit-frame-pointer</tt> is given, but instead because it modifies gcc so that it will no longer treat %ebp as nonvolatile.
In other words: framepointer-using code will do:

<tt> pushl %ebp
movl %esp, %ebp
...
movl %ebp, %esp
popl %ebp
ret
</tt>

gcc code with <tt>-fomit-frame-pointer</tt> will do:

<tt> ...
pushl %ebp
...
popl %ebp
ret
</tt>

which is still perfectly ok with the i386 UNIX ABI, and on 32bit x86 really worth it for compute-intensive code given the scarcity of registers. Our own Sun Workshop compilers do eliminate the framepointer (but of course continue to preserve <tt>%ebp</tt> for the caller) with <tt>-xO4</tt> and above. Which is the reason why we use <tt>-xO3</tt>. Serviceability is more than a good reason for adding five bytes of code to every function.

gcc code with the above-mentioned "fix" also eliminates the <tt>pushl %ebp</tt> / <tt>popl %ebp</tt> even if the given function uses <tt>%ebp</tt>. I.e. it'll make gcc treat <tt>%ebp</tt> like <tt>%eax/%ecx/%edx</tt> - and THAT is an ABI violation. It would, by the way, even be an ABI violation on AMD64, because there <tt>%rbp</tt> is also declared as nonvolatile, see 3.2.1 in the AMD64 UNIX ABI. The gcc folks seem to have developed a healty hate against the ABI's restrictions on what you can do with <tt>%ebp/%rbp</tt> and what not ...

Posted by Frank Hofmann on December 12, 2004 at 07:36 PM PST #

Post a Comment:
Comments are closed for this entry.
About

Musings about Fishworks, Operating Systems, and the software that runs on them.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today