Friday Jun 04, 2010

Obtaining AMD64 function arguments

Debugging AMD64 crash dumps is made slightly more tricky when compared to SPARC due to its lack of register windows. In order to determine what a value was when initially passed into a function we can not look at a register in the previous register window. We must instead use the stack.

This is a topic that comes up frequently and one that I don't get enough practise at (and therefore tend to forget), it's a worthy blog entry. If you're really interested, all of this (and a lot more) is covered in Frank Hofmann's excellent book The Solaris Operating System on x86 Platforms: Crashdump Analysis, Operating System Internals which I can't recommend highly enough.

> ffffffff9bc9cc60::findstack -v
stack pointer for thread ffffffff9bc9cc60: fffffe800145d890
[ fffffe800145d890 _resume_from_idle+0xf8() ]
fffffe800145d8c0 swtch+0x12a()
fffffe800145d8e0 cv_wait+0x68()
fffffe800145d910 pr_p_lock+0x79()
fffffe800145d960 pr_lookup_piddir+0x7e()
fffffe800145d9c0 prlookup+0xd4()
fffffe800145da10 fop_lookup+0x35()
fffffe800145dbe0 lookuppnvp+0x1bf()
fffffe800145dc50 lookuppnat+0xf9()
fffffe800145dd10 lookupnameat+0x86()
fffffe800145de40 vn_openat+0x2aa()
fffffe800145def0 copen+0x1e5()
fffffe800145df00 open+0x19()
fffffe800145df10 sys_syscall+0x17b()

In the above stack we are interested in finding the first argument to pr_lookup_piddir(), which is a vnode_t pointer. We know that prlookup() makes a call to pr_lookup_piddir() therefore it must pass one of its registers to the input of pr_lookup_piddir(). A callee expects to find its input arg0 in register %rdi (this is part of the AMD64 ABI, more details are discussed in Frank's book and also at Solaris 64-bit Developer's Guide: AMD64 ABI Features). Therefore by disassembling the calling function we can check where %rdi comes from:

> prlookup+0xd4::dis
prlookup+0xae: orl %edx,%eax
prlookup+0xb0: testb $0x1,%al
prlookup+0xb2: jne +0xbf
prlookup+0xb8: cmpl $0x24,%r12d
prlookup+0xbc: je +0xb5
prlookup+0xc2: movl %r12d,%edx
prlookup+0xc5: xorl %eax,%eax
prlookup+0xc7: movq %r14,%rsi
prlookup+0xca: movq %rbx,%rdi
prlookup+0xcd: call \*0xfffffffffbd0e460(,%rdx,8)
prlookup+0xd4: cmpq $0x1,%rax

At prlookup+0xca (just prior to calling pr_lookup_piddir) we see that the contents of register %rbx are moved to the callee's input register, %rdi. We now know that at the time we enter pr_lookup_piddir() both %rdi and %rbx contain the same value (a vnode_t pointer). If pr_lookup_piddir() is to use %rbx for scratch it must save the value so it can subsequently restore it when it returns control to pr_lookup().

We can disassemble pr_lookup_piddir() to get an idea of what it's doing (truncated for this example):

> pr_lookup_piddir::dis     
pr_lookup_piddir: pushq %rbp
pr_lookup_piddir+1: movq %rsp,%rbp
pr_lookup_piddir+4: pushq %r15
pr_lookup_piddir+6: movq %rdi,%r15
pr_lookup_piddir+9: pushq %r14
pr_lookup_piddir+0xb: xorl %r14d,%r14d
pr_lookup_piddir+0xe: pushq %r13
pr_lookup_piddir+0x10: movq %rsi,%r13
pr_lookup_piddir+0x13: pushq %r12
pr_lookup_piddir+0x15: pushq %rbx

Above we are saving the caller's frame pointer (pushq %rbp) and setting our frame pointer (movq %rsp,%rbp) before we begin to push registers that we wish to reuse, onto the stack (the pushq instructions).

Of particular interest is pr_lookup_piddir+0x15 where we push %rbx onto the stack. From the top of the function this is the sixth pushq instruction and therefore the sixth register that we have stored to the stack. We can use this knowledge to vnode_t pointer we passed into pr_lookup_piddir().

Looking back at the ::findstack output we can see the function names on the right and the frame pointer on the left. pr_lookup_piddir() is the function that is pushing to the stack so we'll start with the pr_p_lock()'s fp (fffffe800145d910) and print down the stack, including pr_lookup_piddir()'s fp (fffffe800145d960):

> fffffe800145d910,10/naP             
0xfffffe800145d910: 0xfffffe800145d960
0xfffffe800145d918: pr_lookup_piddir+0x7e
0xfffffe800145d920: 0xffffffff80037008
0xfffffe800145d928: 0xc
0xfffffe800145d930: 0xffffffffb0939700
0xfffffe800145d938: 0xffffffffb297e240
0xfffffe800145d940: 2
0xfffffe800145d948: 0xffffffffb0939700
0xfffffe800145d950: 0xfffffe800145dab0
0xfffffe800145d958: 0xfffffe800145da88
0xfffffe800145d960: 0xfffffe800145d9c0
0xfffffe800145d968: prlookup+0xd4
0xfffffe800145d970: 0xfffffe800145d990
0xfffffe800145d978: 0x19c691b80
0xfffffe800145d980: 0xffffffff816b6440
0xfffffe800145d988: 5

At 0xfffffe800145d960 we have pr_lookup()'s fp (1), this was the first register that we pushed to the stack. Counting five values up the stack we get to 0xfffffe800145d938 (2) which is the sixth value pushed to pr_lookup_piddir()'s stack. This value, 0xffffffffb297e240, is the value of pr_lookup()'s %rbx register when pr_lookup_piddir() was called. As we've shown above, this is also the register we sourced %rdi from and is therefore a vnode_t pointer:

> 0xffffffffb297e240::print vnode_t v_path
v_path = 0xffffffff91155060 "/proc/21391"

> 0t21391::pid2proc|::ps -f
R 21391 21379 21350 21265 41311 0x4a004000 ffffffff836021a0
/app/common/java/jdk1.5.0_14/bin/amd64/java -server -Xms1g -Xmx1g -Duser.langua

Just as expected! Furthermore, since we were waiting for a CV we were able to determine from the vnode what path we were waiting on and, since this was in /proc, we could even look up the process.

Monday Jan 11, 2010

mdb: biowait(buf_t \*bp) to (s)sd softstate

How to get the sd_lun structure from a buf (e.g. in biowait()).

> 0x2a1002efca0::findstack -v
stack pointer for thread 2a1002efca0: 2a1002ee1a1
[ 000002a1002ee1a1 sema_p+0x138() ]
000002a1002ee251 biowait+0x6c(46bb44a9d00, 0, 18bac00, 30024b12000, 1a, 46bb44a9d00)
000002a1002ee301 default_physio+0x388(12ebf74, 24, 0, 46bb44a9d40, 12ddc10, 46bb44a9d38)
000002a1002ee431 scsi_uscsi_handle_cmd+0x1b8(2000000010, 1, 338a8de2c50, 12ebf74, 46bb44a9d00, 3005a3d1d70)
000002a1002ee521 sd_send_scsi_cmd+0x114(2000000010, 1970800, 3005a3d1d70, 1, 3000111ecc0, 2a1002eeeb0)
000002a1002ee5e1 sd_send_scsi_MODE_SENSE+0x110(3000240be40, 6, 3389353b680, 24, 4, 1)
000002a1002ee701 sd_get_physical_geometry+0x9c(3389353b680, 2a1002ef06c, 43d5bd5, 200, 1, 3000111ecc0)
000002a1002ee7b1 sd_resync_geom_caches+0xb4(3000111ecc0, 43d5bd5, 200, 1, 3ec1, ff)
000002a1002ee881 sd_validate_geometry+0xb4(3000111ecc0, 1, 60, 1, 7, fa000050)
000002a1002ee941 sd_ready_and_valid+0x2d4(3000111ecc0, 2a1002efca0, 0, 3000240be40, 3000240be40, c1)
000002a1002eea51 sdopen+0x248(1, 3000111ecc0, 0, 1978108, 3000111eda0, 0)
000002a1002eeb01 spec_open+0x4f8(2a1002ef528, 224, 3000410be48, a21, 430043ae440, 0)
000002a1002eebc1 fop_open+0x78(2a1002ef528, 2, 3000410be48, 40000003, 301ac8b1ec0, 301ac8b1ec0)
000002a1002eec71 dev_lopen+0x34(2a1002ef5e0, 3, 4, 3000410be48, ffffffff, ffffffffffffffff)
000002a1002eed31 md_layered_open+0x120(13, 2a1002ef6c8, 3, 30003e9d580, 2000000010, 3000410be48)
000002a1002eedf1 stripe_open_all_devs+0x188(58, 3, 0, 0, 0, dc)
000002a1002eeed1 stripe_open+0xa0(dc, 3, 4, 3000113a628, 30003e88b70, 3)
000002a1002eef81 md_layered_open+0xb8(0, 2a1002ef908, 3, 3000113a628, dc, 3000410be48)
000002a1002ef041 mirror_probe_dev+0x98(3000113a078, 19be608, 0, 1, 30003e8b2b0, 0)
000002a1002ef111 md_probe_one+0x84(49976e5ee40, 3000113a078, 0, 68c965b92c0, 14, 7ba0730c)
000002a1002ef1c1 md_daemon+0x21c(0, 19bf478, 33864030100, 19bf478, 2a1002efa88, 19bf4a0)
000002a1002ef291 thread_start+4(19bf478, 0, 0, 0, 0, 0)

The first argument to biowait() is a pointer to a buf_t structure.

> 46bb44a9d00::print -t buf_t
int b_flags = 0x200067
struct buf \*b_forw = 0
struct buf \*b_back = 0
struct buf \*av_forw = 0
struct buf \*av_back = 0
o_dev_t b_dev = 0
size_t b_bcount = 0x24
union b_un = {
caddr_t b_addr = 0x3389353b680
struct fs \*b_fs = 0x3389353b680
struct cg \*b_cg = 0x3389353b680
struct dinode \*b_dino = 0x3389353b680
daddr32_t \*b_daddr = 0x3389353b680
lldaddr_t _b_blkno = {
longlong_t _f = 0
struct _p = {
int32_t _u = 0
int32_t _l = 0
char b_obs1 = '\\0'
size_t b_resid = 0x24
clock_t b_start = 0
struct proc \*b_proc = 0
struct page \*b_pages = 0
clock_t b_obs2 = 0
size_t b_bufsize = 0
int (\*)() b_iodone = 0
struct vnode \*b_vp = 0
struct buf \*b_chain = 0
int b_obs3 = 0
int b_error = 0x5
void \*b_private = 0x3005a3d1d70
dev_t b_edev = 0x2000000010
ksema_t b_sem = {
void \* [2] _opaque = [ 0, 0 ]
ksema_t b_io = {
void \* [2] _opaque = [ 0x2a1002efca0, 0 ]
struct buf \*b_list = 0
struct page \*\*b_shadow = 0x338a79b81c0
void \*b_dip = 0x30003961b90
struct vnode \*b_file = 0
offset_t b_offset = 0xffffffffffffffff

We are interested in getting the sd_lun so we'll take the b_edev which is a dev_t. The DDI getminor(dev_t dev) and getmajor(dev_t dev) functions allow us to extract the major and minor numbers from a dev_t.

So to get the major number we shift right by NBITSMINOR64 (32) on 64-bit or NBITSMINOR (18) on 32-bit. We then AND with MAXMAJ64 (0xffffffff) on 64-bit or MAXMAJ (MAXMAJ64) on 32-bit:

> (0x2000000010>>0t32)&0xffffffff=D

And for the minor number we AND we MAXMIN64 (0xffffffff) on 64-bit or MAXMIN (MAXMIN64) on 32-bit:

> 0x2000000010&0xffffffff=D

Alternatively if your genunix module provides the ::devt dcmd, this can be used:

> 0x2000000010::devt
32 16

The ::major2name dcmd converts the major number to a name. Alternatively we could check in /etc/name2major from an explorer or on the host itself.

> 0t32::major2name

In this case the device is sd. If it had returned ssd all of the following commands that mention sd should be replaced with ssd.

Converting the minor number to an sd instance is slightly more tricky. The driver's DDI getinfo(9E) function is called, in the case of sd this is sdinfo(9E). The SDUNIT(dev_t dev) macro is called:

#define SDUNIT(dev) (getminor((dev)) >> SDUNIT_SHIFT)

So we need to shift the minor number right by SDUNIT_SHIFT (3 on my system):

> 0t16>>0t3=D

We now know that this thread is waiting on a buffer which is being serviced by sd2.

The next stage is get this sd instance's sd_lun structure. These are held on an array pointed to by sd's DDI softstate ptr, sd_softate (or ssd_softstate for ssd). For more information see ddi_soft_state(9F) in the Solaris 10 man page collection.

> \*sd_state::print -t struct i_ddi_soft_state
void \*\*array = 0x3000111a640
kmutex_t lock = {
void \* [1] _opaque = [ 0 ]
size_t size = 0x558
size_t n_items = 0x40
struct i_ddi_soft_state \*next = 0x30003b7f900

We're after sd instance 2 so that will be entry 3 in the array (remember before sd2 are sd0 and sd1). There are a number of different ways to do this so I'll cover a few.

Getting the softstate #1, without any helper dcmds:

The array is a list of pointers, so we'll need to know the size of a uintptr_t. We then multiply this by the sd_state instance that we want and add it to the array, e.g.:

> ::sizeof uintptr_t
sizeof (uintptr_t) = 8
> 0x3000111a640+8\*2/J
0x3000111a650: 3000111ecc0 <-- sd2

Getting the softstate #2, walking the array up to the state we want:

Here we tell mdb to print out the address (a) and the contents (P) of the first 3 (,3) values from array:

> 0x3000111a640,3/naP
0x3000111a640: 0x3000111e680 <-- sd0
0x3000111a648: 0x30003ea80c0 <-- sd1
0x3000111a650: 0x3000111ecc0 <-- sd2

Getting the softstate #3, using the ::array dcmd (probably the worst way but the one I somehow always try and use):

We get the first three (0t3) elements from the start of the array. We specify that each element of the array is a uintptr_t. This might make more sense if the array was not an array of pointers but of a real structure.

> 0x3000111a640::array uintptr_t 0t3
> 3000111a650/J
0x3000111a650: 3000111ecc0 <-- sd2

Getting the softstate #4, with the helpful ::softstate dcmd (definitely the best way):

> \*sd_state::softstate 0t2

Getting all softstates using the softstate walker:

> \*sd_state::walk softstate

We're now done. We have the sd_lun pointer for sd2 and we can do whatever we want with it.

Below are a few helpful things that can be dumped. This is in no way exhaustive.

> \*sd_state::softstate 0t2|::print -t struct sd_lun
struct scsi_device \*un_sd = 0x3000240be40
struct buf \*un_rqs_bp = 0x300010a1340
struct scsi_pkt \*un_rqs_pktp = 0x300039a7e90
int un_sense_isbusy = 0
int un_buf_chain_type = 0x1
int un_uscsi_chain_type = 0x8
int un_direct_chain_type = 0x8
int un_priority_chain_type = 0x9
struct buf \*un_waitq_headp = 0
struct buf \*un_waitq_tailp = 0
struct buf \*un_retry_bp = 0
int (\*)() un_retry_statp = 0
void \*un_xbuf_attr = 0x30003ba0200
uint32_t un_sys_blocksize = 0x200
uint32_t un_tgt_blocksize = 0x200
uint64_t un_blockcount = 0x43d5bd5
uchar_t un_ctype = 0x2
char \*un_node_type = 0x1972e48 "ddi_block:channel"

> 0x3000240be40::print -t struct scsi_device
struct scsi_address sd_address = {
struct scsi_hba_tran \*a_hba_tran = 0x300024290c0
ushort_t a_target = 0x2
uchar_t a_lun = 0
uchar_t a_sublun = 0
dev_info_t \*sd_dev = 0x30003961b90
kmutex_t sd_mutex = {
void \* [1] _opaque = [ 0 ]
void \*sd_reserved = 0x300024290c0
struct scsi_inquiry \*sd_inq = 0x3000393d788
struct scsi_extended_sense \*sd_sense = 0
caddr_t sd_private = 0x3000111ecc0

> 0x30003961b90::devinfo
30003961b90 sd, instance #2
System properties at 30003b59810:
name='lun' type=int items=1
name='target' type=int items=1
name='class_prop' type=string items=1
name='class' type=string items=1
Driver properties at 30003b594f0:
name='ddi-no-autodetach' type=int items=1
name='inquiry-serial-no' type=string items=1
value='00N0A2TH '
name='pm-components' type=string items=3
value='NAME=spindle-motor' + '0=off' + '1=on'
name='pm-hardware-state' type=string items=1
name='ddi-failfast-supported' type=any items=0
name='ddi-kernel-ioctl' type=any items=0
name='device-nblocks' type=int64 items=1
Hardware properties at 30003b59518:
name='devid' type=string items=1
name='inquiry-revision-id' type=string items=1
name='inquiry-product-id' type=string items=1
value='MAP3367N SUN36G'
name='inquiry-vendor-id' type=string items=1
name='inquiry-device-type' type=int items=1

I hope this was helpful. I'm going to try and put up a few more posts in a similar style. I'm happy to take requests but can't guarantee results!


stuff I get up to :)


« April 2014