Debugging on AMD64 - Part One
By eschrock on Nov 19, 2004
The amd64 port of Solaris has been available (internally) for about a month and a half, and the rest of the group is starting to realize what those of us on the project team have known for a while: debugging on amd64 is a royal pain. The difficulty comes not from processor changes, but from design choices made in the AMD64 ABI. The ABI was designed primarily with performance in mind - debuggability and observability was largely an afterthought. There are two features of the ABI that really hurt debuggability. In this post I'll cover the less annoying of the two - look for another followup soon.
In the i386 ABI, you almost always have to establish a frame pointer for the current function (leaf routines being the exception). This gives you the familiar opening function sequence:
movl %esp, %ebp
And your frame ends up looking like this:
With amd64, you would think that they would just keep this convention. At first glance it seems that way, until you find this little footnote in section 3.3.2:
The conventional use of %rbp as a frame pointer for the stack frame may be avoided by using %rsp (the stack pointer) to index into the stack frame. This technique saves two instructions in the prologue and epilogue and makes one additional general-purpose register (%rbp) available.
On amd64, the frame pointer is explicitly optional. To make debugging somewhat easier, they provide a .eh_frame ELF section that gives enough information (in the form of a binary search table) to traverse a stack from any point. This is slightly better than DWARF, but still requires a lot of processing. The problem with this is that it unnecessarily restricts the context from which you can gather a backtrace. On i386, your stack walking function is something like:
frame = %ebp
while (not at top of stack)
frame = \*frame
Simple and straightforward. This omits a few nasty details like signal frames and #gp faults, but it's largely correct. On amd64, you now have to load the .eh_frame section, process it, and keep it someplace where you have easy access to it. While this doesn't sound so bad for gdb, it becomes a huge nightmare for something like DTrace. If you read a little bit of the technical details behind DTrace, you'll understand that probes execute in arbitrary context. You may be in the middle of handling an interrupt, in dispatcher or VM code, or processing a trap (although on SPARC, DTracing code that executes at TL > 0 is strictly verboten). This means that the set of possible actions is severely limited, not to mention performance-critical. In order to process a stack() directive on amd64, we would now have to do something like:
frame = %ebp
while (not at top of stack)
for (each module in the system)
next = binary search in .eh_frame
frame = next
if (frame not found)
frame = \*frame
Of course, you could maintain a merged lookup table for all modules on the system, but this is considerably more difficult and a maintenance nightmare. The real show stopper comes with the ustack() action. It is impossible, from arbitrary context within the kernel, to process the objects in userland and find the necessary debugging information. And unless we're using only the pid provider, there's no way to know a priori what processes we will need to examine via ustack(), so we can't even cache the information ahead of time.
What do we do in Solaris? We punt. Our linkers will happily process .eh_frame sections correctly, but our debugging tools (DTrace, mdb, pstack, etc) will only understand executables that use a frame pointer. All of our code (kernel, libraries, binaries) is compiled with frame pointers, and hopefully our users will do so as well.
The amd64 ABI is still a work in progress, and the Solaris supplement is not yet finished. More language may be added to clarify the Solaris position on this "feature". It will probably be a non-issue as long as GCC defaults to having frame pointers on amd64 Solaris. I'm not completely sure how the latest GCC behaves - I believe that it defaults to using frame pointers, which is good. I just hope -fomit-frame-pointer never becomes common practice as we move to OpenSolaris and a larger development community.
Why was this written into the amd64 ABI? It's a dubious optimization that severely hinders debuggability. Some research claims a substantial improvement, though their own data shows questionable gains. On i386, you at least had the advantage of increasing the number of usable registers by 20%. On amd64, adding a 17th general purpose register isn't going to open up a whole new world of compiler optimizations. You're just saving a pushl, movl, an series of operations that (for obvious reasons) is highly optimized on x86. And for leaf routines (which never establish a frame), this is a non-issue. Only in extreme circumstances does the cost (in processor time and I-cache footprint) translate to a tangible benefit - circumstances which usually resort to hand-coded assembly anyway. Given the benefit and the relative cost of losing debuggability, this hardly seems worth it.
It may seem a moot point, since you've been able to use -fomit-frame-pointer on i386 for years. The difference here is that on i386, you were knowingly breaking ABI compatibility by using that option. Your application was no longer guaranteed to work properly, especially when it came to debugging. On amd64, this behavior has received official blessing, so that your application can be ABI compliant but completely opaque to DTrace and mdb. I'm not looking forward to "DTrace can't ustack() my gcc-compiled app" bugs (DTrace already has enough trouble dealing with curious gcc-isms as it is).
It's conceivable that we could add support for this functionality in our userland tools, but don't expect it any time soon. And it will never happen for DTrace. If you think saving a pushl, movl here or there is worth it, then you're obviously so performance-oriented that debuggability is the last thing on your mind. I can understand some of our HPC customers needing this; it's when people start compiling /usr/bin/\* without frame pointers that it gets out of control. Just don't be suprised when you try to DTrace your highly tuned app and find out you can't get a proper stack trace...
Next post, I'll discuss register passing conventions, which is a much more visible (and annoying) problem.