mdb(1) background, intro, and cheatsheet

In the Solaris kernel group, we take our crash dumps seriously. Historically, the two main tools for analyzing crash dumps on UNIX were adb(1) and crash(1M). More recently, mdb(1) has replaced them both as the debugger of choice in Solaris.

adb(1)

adb(1) is a venerable tool in the UNIX tool chest -- 7th Edition UNIX (from 1979) had a version of it. It's syntax is quite quirky (as you'd expect from such an old tool), and one thing to keep in mind is that adb(1) is an assembly-level debugger. Generally, it deals directly with register values and assembly instructions -- the only symbolic information it gives you is access to the symbol table. That said, it has a reasonably powerful macro/scripting facility.

During the development of SunOS, a large number of adb macros were written to dump out various bits of kernel state. In SunOS 3.5 (released in 1988), kadb(1) (an interactive kernel debugger version of adb(1)) already existed, as did 50-odd adb scripts, mostly generated with adbgen(1M). Solaris 2.0/SunOS 5.x continued the tradition, and by Solaris 9, there are over 890 scripts in /usr/lib/adb/sparcv9 (compared to 507 in Solaris 8).1

crash(1M)

crash(1M) is a bit more recent; it appeared sometime between 7th Edition UNIX and SysV R3, and while SunOS 3.5 did not have it, SunOS 4.x did. While adb(1) is a reasonably generic debugger with scripting facilities, crash(1M) takes an almost diametrically opposed approach: it uses compiled C code which knows how to traverse and understand various structures in the kernel to dump out information of interest. This makes crash(1M) much more powerful than adb(1M) (since you can do complicated things like virtual-to-physical address translation), while simultaneously making it much less flexible (if it wasn't already written into crash(1M), you're going to have to write it yourself, or do without).

This means that adb(1) and crash(1M) were quite complimentary. During any given debugging session, each might be used for its different strengths.2

mdb(1)

mdb(1), the Solaris "Modular Debugger", is the brain-child of Michael Shapiro and Bryan Cantrill. Upon their arrival in the Solaris Kernel Group, they took one look at adb and crash, and decided that they were both exceedingly long in the tooth. Together, they created mdb(1M) to replace them. It's designed to embody their best features, while introducing a new framework for building debugging support, live and post-mortem.

Because of the sheer number of existent adb macros, and the finger-memory of hundreds of people, mdb(1) is almost completely backwards compatible with adb(1).3

mdb(1) allows for extensibility in the form of "Debugger Modules" (dmods) which can provide "debugger commands" (dcmds) and "walkers". dcmds are similar to the commands of crash, while walkers walk a particular dataset. Both dcmds and walkers are written in C using the interface defined by the MDB module API, which is documented in the Modular Debugger Guide.

Using the mdb module API, kernel engineers (Mike, Bryan and Dan, and others) have built up a huge library of dcmds and walkers to explore Solaris -- my desktop (a Solaris 10 system) has 196 generic walkers and 368 dcmds defined. (there are ~200 auto-generated walkers for the various kmem caches on the system, which I'm not counting here)

The neat thing about walkers and dcmds is that they can build on each other, combining to do more powerful things. This occurs both in their implementation and by the user's explicit action of placing them into an mdb "pipeline".

To give you a taste of the power of pipelines, here's an example, running against the live kernel on my desktop: the ::pgrep dcmd allows you to find all processes matching a pattern, the thread walker walks all of the threads in a process, and the ::findstack dcmd gets a stack trace for a given thread. Connecting them into a pipeline, you get:

# mdb -k
Loading modules: [ unix krtld genunix specfs dtrace ufs sd ip sctp usba
s1394 fctl nca audiosup logindmux ptm cpc random fcip nfs lofs ipc ]
> ::pgrep sshd
S    PID   PPID   PGID    SID    UID      FLAGS             ADDR NAME
R 100174      1 100174 100174      0 0x42000000 0000030009216790 sshd
R 276948 100174 100174 100174      0 0x42010000 000003002d9a9860 sshd
R 276617 100174 100174 100174      0 0x42010000 0000030013943010 sshd
> ::pgrep sshd | ::walk thread
3000c4f0c80
311967e9660
30f2ff2c340
> ::pgrep sshd | ::walk thread | ::findstack
stack pointer for thread 3000c4f0c80: 2a10099d071
[ 000002a10099d071 cv_wait_sig_swap+0x130() ]
  000002a10099d121 poll_common+0x530()
  000002a10099d211 pollsys+0xf8()
  000002a10099d2f1 syscall_trap32+0x1e8()
stack pointer for thread 311967e9660: 2a100897071
[ 000002a100897071 cv_wait_sig_swap+0x130() ]
stack pointer for thread 30f2ff2c340: 2a100693071
[ 000002a100693071 cv_wait_sig_swap+0x130() ]
  000002a100693121 poll_common+0x530()
  000002a100693211 pollsys+0xf8()
  000002a1006932f1 syscall_trap32+0x1e8()
>
Yielding the stack traces of all sshd threads on the system (note that the middle one is swapped out). mdb pipelines are quite similar to standard UNIX pipelines, and allow those using the debugger a similar level of power and flexibility.

An mdb(1) cheat sheet

Because of its backwards compatibility with adb, mdb can have a bit of a learning curve. A while back, I put together an mdb(1) cheatsheet [ps pdf] to reference during late-night post-mortem debugging sessions, and it has become a pretty popular reference in the Kernel Group. It's designed to print out double-sided; the front covers the full mdb syntax, while the back is a set of commonly-used kernel dcmds and walkers, with short descriptions.

That's it for a quick history and tour -- I should be talking more about mdb later, along with libumem(3lib) (my current claim to fame), smf(5), and userland and kernel debugging in general.

Footnotes:
1  The introduction of mdb(1) in Solaris 8, and CTF (compact ANSI-C type format) in Solaris 9 has started to slow down this trend significantly -- Solaris 10 will only have about 16 new adb scripts over Solaris 9.
2  I have little direct experience with crash(1M) -- by the time I joined Sun, it had been EOLed.
3  invoking adb on Solaris 9 and later just invokes mdb in backwards-compatibility mode.

Technorati Tag:
Technorati Tag:

Comments:

I've updated the text on mdb a bit.

Posted by Jonathan Adams on October 07, 2004 at 09:05 AM PDT #

Any chance of sharing some of the cooler mdb dmods and dcmds you've stashed in your warchest?

Posted by Shaun on October 07, 2004 at 04:35 PM PDT #

Very interesting. I think there aren't many places where one can see how mdb is used. The MDB documentation doesn't have many examples in it (from what I remember). Especially if we compare it to DTrace documentation, where there are lots of examples.

Posted by Vlad Grama on October 07, 2004 at 10:39 PM PDT #

Hey Vlad,
Mike and I are principally responsible for the mdb documentation and each release we promise ourselves to work on it more, bring it up to date and add more tutorial examples. Sadly, this past few months we've been focussed almost on getting the DTrace docs in shape. I'm glad to hear you like the DTrace docs; we'll be working on the mdb docs for a Solaris 10 update.

Posted by Adam Leventhal on October 08, 2004 at 03:13 AM PDT #

Is there any combination of dcmds and/or walkers that would allow you to dump the afsr/afar registers for each cpu? On a recent core analysis a sun service engineer stated he ran a script against a core file and was able to determine on all the cpus in that system what bits were set in the afsr/afar. Just wondering if this was possible via mdb.

Posted by Dan on October 08, 2004 at 03:53 AM PDT #

Dan, here's some information from an expert:
On Solaris Express build 56 or later, the best way to view error report information is via <tt>fmdump</tt>(1M); after the panic and reboot, all CPU and memory error information is stored in the Fault Management error log, and can be viewed by issuing:
fmdump -eV
Among other things, the afar and afsr registers are captured for all memory errors and some CPU errors.
For a recent Solaris 9, I believe the following will get the AFARs of recent error events, but I'm far from an expert in this sort of thing:
> ::walk cpu | ::print unix`cpu_t cpu_m.cpu_private | ::print cheetah_private_t
and look for "chpr_mue_afars" array. This is for UltraSPARC III/III+ -- for US II, I don't see any way to get any information. (note that these fields are not stable interfaces, and can and will change without notice, in a patch, etc. The <tt>fmdump</tt>(1M) provided information is more stable than these random kernel structures can ever be.)

Posted by Jonathan Adams on October 08, 2004 at 12:15 PM PDT #

I'm being silly for Solaris 9 -- you probably just want the message buffer, which you can get by running:
> $<msgbuf
(or, if you want slightly cleaner output:
> $<msgbuf ! sed 's/\^0x[\^:]\*:   //'
)

Posted by Jonathan Adams on October 10, 2004 at 02:13 PM PDT #

MDB is in deed powerful and much more flexible than adb and crash combined. While it is powerful in analyzing crash dump, a source level kernel debugger would be easier to debug 'live' kernel. Linux KGDB and HP-UX KWDB (also based on KGDB) are both very good kernel source level debugger based on open source. I thought Solaris used to have one (ksld ??). Is there any rational for letting it go ? Is there a plan to bring it back or switch to an open source one like kgdb. Of course I do not mean to replace MDB in any way.

Posted by Tho Tran on December 20, 2004 at 01:35 AM PST #

complement != compliment

Posted by sdasdasd on May 18, 2009 at 12:19 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

jwadams

Search

Top Tags
Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today