Debugging on AMD64 - Part Two

Last post I talked about one of the annoying features of the amd64 ABI - the optional frame pointer. Today, I'll examine the much more painful problem of argument passing on amd64. For sake of discussion, I'll avoid structure passing and floating point - nasty little kinks in the problem.

Argument Passing on i386

On i386, all arguments are passed on the stack. Before establishing a frame, the caller pushes each argument to the function in reverse order. This gives you this stack layout:

...
arg1
arg0
return PC
previous frame
%ebp

current frame

%esp

If you want to access the third argument, you simply reference 16(%ebp) (8 for the frame + 8 to skip first two args). This makes debugging a breeze. For any given frame pointer (easy to find thanks to the i386 ABI), we can always find the initial arguments to the function. Another trick we use is that nearly every function call is followed by a addl x, %esp instruction. Using this information, we can figure out how many arguments were passed to the function, without relying on CTF or STABS data. Putting this all together, it's easy to get a meaningful stack trace:

        > a76de800::findstack -v
        stack pointer for thread a76de800: a77c5dd4
        [ a77c5dd4 0xfe81994d() ]
          a77c5dec swtch+0x1cb()
          a77c5e10 cv_wait_sig+0x12c(a78a79b0, a6c57028)
          a77c5e70 cte_get_event+0x4d()
          a77c5ea4 ctfs_endpoint_ioctl+0xc2()
          a77c5ec4 ctfs_bu_ioctl+0x2f()
          a77c5ee4 fop_ioctl+0x1e(a79a7980, 63746502, 80d3f48, 102001, a69daf08, a77c5f74)
          a77c5f80 ioctl+0x19b()
          a77c5fac sys_call+0x16e()

Arguments Passing on AMD64

Enter amd64. As previously mentioned, the amd64 ABI was designed primarily for for performance, not debugging. The architects decided that pushing arguments on the stack was expensive, and that with 16 general purpose registers, we might as use some of them to pass arguments. Specifically, we have:

arg0%rdi
arg1%rsi
arg2%rdx
arg3%rcx
arg4%r8
arg5%r9
argN8\*(N-4)(%ebp)

This is an disaster for debugging. Debugging tools that operate in-place (DTrace and truss) can get meaningful arguments, but cannot know how many there are. Tools which examine a stack trace (pstack, mdb) cannot get arguments for any frame. The arguments may or may not be pushed on the stack, or they could be lost completely. If we try to get a stack with arguments, we find:

        > ffffffff8af1c720::findstack -v
        stack pointer for thread ffffffff8af1c720: ffffffffb2a51af0
          ffffffffb2a51d00 vpanic()
          ffffffffb2a51d30 0xfffffffffe972ae3()
          ffffffffb2a51d60 exitlwps+0x1f1()
          ffffffffb2a51dd0 proc_exit+0x40()
          ffffffffb2a51de0 exit+9()
          ffffffffb2a51e40 psig+0x2bc()
          ffffffffb2a51ee0 post_syscall+0x7d5()
          ffffffffb2a51f00 syscall_exit+0x5d()
          ffffffffb2a51f10 sys_syscall32+0x1d8()

The solution

The solution, as envisioned by the amd64 ABI designers, is to rely on DWARF to get the necessary information. If you have ever read the DWARF spec, you know that it a gigantic, ugly beast - an interpreted language that can be used to mine virtually any debugging data in an abstract manner. The problem here is that it requires significantly more work than on i386, and it requires debugging information to be present in the target object.

Implementing a DWARF interpreter is technically quite doable. We even had one brave soul go so far as to implement a limited DWARF disassembler capable of grabbing arguments for functions. But it turns out that the sheer amount of data we would have to add to the kernel to enable this was prohibitive. The bloat would have pushed us past the limit of the miniroot, not to mention the increased memory footprint and necessary changes to krtld and KMDB. That's not to say we won't support it in userland some day.

The lack of an argument count is a less serious. DTrace doesn't need to know how many arguments there are. For the moment, truss simply shows the first 6 arguments always. But truss could be enhanced to use CTF and/or DWARF data to determine the number of arguments to a given function. But it probably won't happen any time soon.

Workaround

Given that there will be no solution to this problem any time soon, you may ask how one can do any kind of debugging at all. The answer is "painfully". I'll walk through an example of finding the arguments to a function, using the following stack:

        > ffffffff8356c100::findstack -v
        stack pointer for thread ffffffff8356c100: ffffffffb2bbdb10
        [ ffffffffb2bbdb10 _resume_from_idle+0xe4() ]
          ffffffffb2bbdb40 swtch+0xc9()
          ffffffffb2bbdb90 cv_wait_sig+0x170()
          ffffffffb2bbdc50 cte_get_event+0xb0()
          ffffffffb2bbdc70 ctfs_endpoint_ioctl+0x7e()
          ffffffffb2bbdc80 ctfs_bu_ioctl+0x32()
          ffffffffb2bbdc90 fop_ioctl+0xb()
          ffffffffb2bbdd70 ioctl+0xac()
          ffffffffb2bbde00 dosyscall+0x12b()
          ffffffffb2bbdf00 trap+0x1308()
        >

Let's say that we want to know the first argument to fop_ioctl(), which is a vnode. The first step is to look at the caller and see where the argument came from:

        > ioctl+0xac::dis -n 6
------> ioctl+0x8e:                     movq   0x10(%r12),%rdi
        ioctl+0x93:                     movq   0x1a0(%rax),%r8
        ioctl+0x9a:                     leaq   -0xcc(%rbp),%r9
        ioctl+0xa1:                     movq   %r15,%rdx
        ioctl+0xa4:                     movl   %r13d,%esi
------> ioctl+0xa7:                     call   +0xeed99 <fop_ioctl>
        ioctl+0xac:                     testl  %eax,%eax
        ioctl+0xae:                     movl   %eax,%ebx
        ioctl+0xb0:                     jne    +0x74    <ioctl+0x124>
        ioctl+0xb2:                     cmpl   $0x8004667e,%r13d
        ioctl+0xb9:                     je     +0x27    <ioctl+0xe0>
        ioctl+0xbb:                     movl   %r14d,%edi
        ioctl+0xbe:                     call   -0x1408e <releasef>

We can see that %rdi (the first argument) came from %r12. Looks like we lucked out - %r12 must be preserved by the function being called. So we look at fop_ioctl():

        > fop_ioctl::dis
        fop_ioctl:                      movq   0x40(%rdi),%rax
        fop_ioctl+4:                    pushq  %rbp
        fop_ioctl+5:                    movq   %rsp,%rbp
        fop_ioctl+8:                    call   \*0x28(%rax)
        fop_ioctl+0xb:                  leave
        fop_ioctl+0xc:                  ret

No dice. We can see that %r12 (as well as %rdi) is still active at this point. Let's keep looking:

        > ctfs_bu_ioctl::dis ! grep r12
        > ctfs_endpoint_ioctl::dis ! grep r12
        > cte_get_event::dis ! grep r12
        cte_get_event+0x13:             pushq  %r12
        cte_get_event+0x32:             movq   0x20(%rdi),%r12
        ...

Finally, we found a function that preserves %r12. Taking a closer look at cte_get_event():

        > cte_get_event::dis -n 8
        cte_get_event:                  pushq  %rbp
        cte_get_event+1:                movq   %rsp,%rbp
        cte_get_event+4:                pushq  %r15
        cte_get_event+6:                movl   %esi,%r15d
        cte_get_event+9:                pushq  %r14
        cte_get_event+0xb:              movq   %rcx,%r14
        cte_get_event+0xe:              pushq  %r13
        cte_get_event+0x10:             movl   %r9d,%r13d
        cte_get_event+0x13:             pushq  %r12

We can see that %r12 was pushed fourth after establishing the frame pointer. This would put it 32 bytes below %rbp for this frame. Remembering that what was really passed was 0x10(%r12), we can finally find our original argument:

        > ffffffffb2bbdc50-20/K
        0xffffffffb2bbdc30:             ffffffff8330ec88
        > ffffffff8330ec88+10/K
        0xffffffff8330ec98:             ffffffff83a5f600
        > ffffffff83a5f600::print vnode_t v_path
        v_path = 0xffffffff83978c40 "/system/contract/process/pbundle"

Whew. We can see that we have the proper vnode, since the path references a /system/contract file. And all it took was about 12 steps! You can see how this has become such a pain for us kernel developers. From the above example, you can see the approximate method is:

  1. Determine where the argument came from in the caller. Hopefully, you will find something that came from the stack, or one of the callee-saved registers (%r12-%r15). If not, look at the function and see if the argument was pushed on the stack or moved somewhere more permanent. This doesn't happen often, so it may be that your argument is lost forever.

  2. If the argument came from a callee-saved register, examine every function in the stack until you find one that saves the value.

  3. By this point, you've hopefully found a place where the value is stored relative to %ebp. Using the frame pointers displayed in the stack trace, fetch the value from the stack.

This is not always guaranteed to work, and is obviously a royal pain. In my next post, I'll go into some future ideas we have to make this (and other debugging) better.

Comments:

First I would like to agree with you that the ommission of frame pointers is a Really Bad Idea(TM). I have run into this problem several times.

A prime example is the linux kernel, which at least has an option to enable compiling with frame pointers for about two years. Without frame pointers (and of course without any debug information) all you can do is just guess that a certain address on the stack is a return address. So you cannot get reliabe stack traces at all (not to mention the parameters). For more details look at 'ksymoops'.

The run time benefit of omitting frame pointers is somewhere between 0 and 10%, but in reality is it more likely to be around 2% or less. Decide for yourself...

In regard to the paramenter passing convention, AMD 64 is no different from most RISC processors: parameters are passed in registers in order to avoid memory traffic and recude code size. I guess RISC machines are the main reason for the existence of the complex DWARF debug format. The AMD64 ABI is at least a main reason, why AMD64 applications can be faster (>2%) than their i386 counterpart.

For example PPC: The PPC application binary interface (defined by IBM and SUN) states that the first 8 parameters are passed in register. Anybody trying to port OpenSolaris/DTrace to PPC will experience the same problems you have described.

Another example are C++ applications: They have to unwind the stack for handling an exception (catch/try/throw). This is not very efficient either.

Summary:

Unwinding the stack on most RISC machines including parameters is complex and therfore not efficient, but is possible, if you have compiled with debug information. Seems like you have to include a DWARF interpreter at least in KMDB...

Posted by Ralf on November 21, 2004 at 01:19 AM PST #

Ralf-

Thanks for the info - this is all true. On SPARC we end up somewhere in the middle, because we usually have the 'in' and 'local' registers pushed somewhere on the stack as a frame pointer. Perhaps i386 was the exception to the convention, but it was a nice exception :-)

As for the performance benefit, I would argue that the majority of the performance benefit on amd64 comes from the increased register set. The compiler can do a whole lot more optimizations when you have 16 registers instead of 5. That being said, I won't argue that pushing arguments to the stack is faster than doing nothing.

It will be interesting to see what happens if/when an OpenSolaris PPC port comes along. Sounds like there are a lot of interesting roadblocks for the debugging tools. As to not including DWARF in the kernel, the reason is not because writing an interpreter is hard, but because including the necessary information is a colossal waste of space for a rarely used (by non-developers) feature. Perhaps we could include the info on DEBUG builds, but stay tuned for another theory that doesn't involve DWARF...

Posted by Eric Schrock on November 21, 2004 at 03:41 AM PST #

The act command tries to find function arguments when displaying amd64 stack's, it has not had a lot of testing yet but a version is available in the CTEactx package available from http://cpre-emea.uk/tools/act.html for amd64 the command is shell script which runs mdb, for interactive debug you can load the act.so from the package and use the act_thread dcmd to dump stack for thread > \*panic_thread::act_thread \*\*\* process id 197396 is halt -d, parent process is 194515 uid is 0, gid is 0 thread addr ffffffff86082a80, proc addr ffffffff8aa87968, lwp addr ffffffff867c3d50 Thread bound to cpu id 0x1 t_state is 0x4 - TS_ONPROC Scheduling info: t_pri is 0, t_epri is 0, t_cid is 0x1 scheduling class is: TS t_disp_time: is 0xede9a6, 0t15591846 last switched: 4 secs ago on cpu 0x1 t_stk ffffffffb2961f10 stack trace is: unix: panicsys+0x6e (0xffffffffb2961af0,vpanic+0x179,panic_stack+0x1f20, 0x822c2040) unix: vpanic+0x179 () unix: fffffffffe8404b2 () unix: panic (0xfffffffffea678b0) genunix: kadmin+0x4c1 (5,0,0,0xffffffff880d9440) genunix: uadmin+0x93 (5,0,?) genunix: dosyscall+0x12b () unix: trap+0x115f (0xffffffffb2961f10,0xfeef9d41,1) unix: _cmntrap+0x1d2 () > ::help act_thread NAME act_thread - print thread [-PCfFnt] SYNOPSIS ::act_thread DESCRIPTION Print a summary for a thread or all threads including stack trace by default threads are sorted by the cpu and time when they last ran [addr]::act_thread [-PCfFnt] -P do not print if ONPROC threads -C only print ONPROC threads -f print registers in stack trace -F print all active stack -n do not sort threads -t sort threads only by last switch time ATTRIBUTES Target: kvm Module: act Interface Stability: Unstable

Posted by Adrian Frost on November 22, 2004 at 02:11 AM PST #

Since Adrian's comments did not really do ACT justice due to the formating I have posted them Here

Posted by Chris Gerhard on November 23, 2004 at 06:56 AM PST #

you should read that spec again and see that despite the many registers the way of passing parameters is braind-dead. Check what happens to the registers which have the parameter during functions calls. Regards Friedrich

Posted by Friedrich on December 09, 2004 at 04:14 PM PST #

Post a Comment:
Comments are closed for this entry.
About

Musings about Fishworks, Operating Systems, and the software that runs on them.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today