multithreaded processes and mdb
By Alan Hargreaves-Oracle on May 04, 2009
Today I had to look at a gcore of devfsadm. Most specifically I wanted to have at what the threads in cond_wait() were doing. I haven't done a lot with such stuff in userland before so thought it would make a good short blog topic on things that can be done.
First off we run up mdb
# mdb /usr/sbin/devfsadm devfsadm.gcore Loading modules: [ libsysevent.so.1 libnvpair.so.1 libc.so.1 libavl.so.1 libuutil.so.1 ld.so.1 ] >
Great, we got all the modules. So, what lwps have we got?
> $L lwpids 1, 2, 3, 4, 5 and 6 are in core of process 135.
So we have six threads, let's have a look at the registers in first one (note that this is on SPARC).
> 1::regs %g0 = 0x00000000 %l0 = 0x00000000 %g1 = 0x0000001d %l1 = 0x00043748 %g2 = 0x0003cb2c %l2 = 0xffbff8ac %g3 = 0x00038000 %l3 = 0x00000001 %g4 = 0x0003cb2c %l4 = 0x00000000 %g5 = 0x00000000 %l5 = 0x00000000 %g6 = 0x00000000 %l6 = 0x00000000 %g7 = 0xff342a00 %l7 = 0x00000001 %o0 = 0xff342c40 %i0 = 0x00000001 %o1 = 0xff13b90c libc.so.1`pause+0x50 %i1 = 0x0003a2a4 %o2 = 0xff1c3800 libc.so.1`_uberdata %i2 = 0xff342a00 %o3 = 0x00000000 %i3 = 0x00039954 %o4 = 0xff342a00 %i4 = 0x00016964 %o5 = 0x00000000 %i5 = 0x00000000 %o6 = 0xffbff850 %i6 = 0xffbff8b0 %o7 = 0xff13b914 libc.so.1`pause+0x58 %i7 = 0x00015ce4 %psr = 0x00000044 impl=0x0 ver=0x0 icc=nzvc ec=0 ef=0 pil=0 s=0 ps=64 et=0 cwp=0x4 %y = 0x00000000 %pc = 0xff14c160 libc.so.1`_pause+4 %npc = 0xff14c164 libc.so.1`_pause+8 %sp = 0xffbff850 %fp = 0xffbff8b0 %wim = 0x00000082 %tbr = 0x00000000
Now to have a look at the stack we simply find the %sp value and use it with the stack dcmd.
> 0xffbff850::stack 0x15ce4(0, 43b48, 39db4, 4, 2276c, 38000) main+0x358(0, 39f2c, ffbffdec, 398e4, 1, 38000) _start+0x108(0, 0, 0, 0, 0, 0)
Note that this gives the stack frames above the current and not the current. From the value of %pc above we can see where we are executing in the current frame. You can also see that we the caller does not have an entry in the symbol table. Unfortunately, on Solaris 10, devfsadm has a lot of functions and variables declared as static, which really does make debugging a pain. Fortunately this is not the case in Nevada/OpenSolaris.
Looking at the other lwps is as simple as listing the lwp id in front of the regs dcmd and repeating what we just did. I won't go into how I worked out which of the static routines we were executing in for the other lwps in cond_wait(), save to say that there are only a couple of places that make that call in the code, and matching up the assembly around the locations to the source (especially looking at called functions), makes this not too difficult.