mdb(1) background, intro, and cheatsheet
By jwadams-Oracle on Oct 07, 2004
In the Solaris kernel group, we take our crash dumps seriously. Historically, the two main tools for analyzing crash dumps on UNIX were adb(1) and crash(1M). More recently, mdb(1) has replaced them both as the debugger of choice in Solaris.
adb(1) is a venerable tool in the UNIX tool chest -- 7th Edition UNIX (from 1979) had a version of it. It's syntax is quite quirky (as you'd expect from such an old tool), and one thing to keep in mind is that adb(1) is an assembly-level debugger. Generally, it deals directly with register values and assembly instructions -- the only symbolic information it gives you is access to the symbol table. That said, it has a reasonably powerful macro/scripting facility.
During the development of SunOS, a large number of adb macros were written to dump out various bits of kernel state. In SunOS 3.5 (released in 1988), kadb(1) (an interactive kernel debugger version of adb(1)) already existed, as did 50-odd adb scripts, mostly generated with adbgen(1M). Solaris 2.0/SunOS 5.x continued the tradition, and by Solaris 9, there are over 890 scripts in /usr/lib/adb/sparcv9 (compared to 507 in Solaris 8).1
crash(1M) is a bit more recent; it appeared sometime between 7th Edition UNIX and SysV R3, and while SunOS 3.5 did not have it, SunOS 4.x did. While adb(1) is a reasonably generic debugger with scripting facilities, crash(1M) takes an almost diametrically opposed approach: it uses compiled C code which knows how to traverse and understand various structures in the kernel to dump out information of interest. This makes crash(1M) much more powerful than adb(1M) (since you can do complicated things like virtual-to-physical address translation), while simultaneously making it much less flexible (if it wasn't already written into crash(1M), you're going to have to write it yourself, or do without).
This means that adb(1) and crash(1M) were quite complimentary. During any given debugging session, each might be used for its different strengths.2
mdb(1), the Solaris "Modular Debugger", is the brain-child of Michael Shapiro and Bryan Cantrill. Upon their arrival in the Solaris Kernel Group, they took one look at adb and crash, and decided that they were both exceedingly long in the tooth. Together, they created mdb(1M) to replace them. It's designed to embody their best features, while introducing a new framework for building debugging support, live and post-mortem.
Because of the sheer number of existent adb macros, and the finger-memory of hundreds of people, mdb(1) is almost completely backwards compatible with adb(1).3
mdb(1) allows for extensibility in the form of "Debugger Modules" (dmods) which can provide "debugger commands" (dcmds) and "walkers". dcmds are similar to the commands of crash, while walkers walk a particular dataset. Both dcmds and walkers are written in C using the interface defined by the MDB module API, which is documented in the Modular Debugger Guide.
Using the mdb module API, kernel engineers (Mike, Bryan and Dan, and others) have built up a huge library of dcmds and walkers to explore Solaris -- my desktop (a Solaris 10 system) has 196 generic walkers and 368 dcmds defined. (there are ~200 auto-generated walkers for the various kmem caches on the system, which I'm not counting here)
The neat thing about walkers and dcmds is that they can build on each other, combining to do more powerful things. This occurs both in their implementation and by the user's explicit action of placing them into an mdb "pipeline".
To give you a taste of the power of pipelines, here's an example, running against the live kernel on my desktop: the ::pgrep dcmd allows you to find all processes matching a pattern, the thread walker walks all of the threads in a process, and the ::findstack dcmd gets a stack trace for a given thread. Connecting them into a pipeline, you get:
# mdb -k Loading modules: [ unix krtld genunix specfs dtrace ufs sd ip sctp usba s1394 fctl nca audiosup logindmux ptm cpc random fcip nfs lofs ipc ] > ::pgrep sshd S PID PPID PGID SID UID FLAGS ADDR NAME R 100174 1 100174 100174 0 0x42000000 0000030009216790 sshd R 276948 100174 100174 100174 0 0x42010000 000003002d9a9860 sshd R 276617 100174 100174 100174 0 0x42010000 0000030013943010 sshd > ::pgrep sshd | ::walk thread 3000c4f0c80 311967e9660 30f2ff2c340 > ::pgrep sshd | ::walk thread | ::findstack stack pointer for thread 3000c4f0c80: 2a10099d071 [ 000002a10099d071 cv_wait_sig_swap+0x130() ] 000002a10099d121 poll_common+0x530() 000002a10099d211 pollsys+0xf8() 000002a10099d2f1 syscall_trap32+0x1e8() stack pointer for thread 311967e9660: 2a100897071 [ 000002a100897071 cv_wait_sig_swap+0x130() ] stack pointer for thread 30f2ff2c340: 2a100693071 [ 000002a100693071 cv_wait_sig_swap+0x130() ] 000002a100693121 poll_common+0x530() 000002a100693211 pollsys+0xf8() 000002a1006932f1 syscall_trap32+0x1e8() >Yielding the stack traces of all sshd threads on the system (note that the middle one is swapped out). mdb pipelines are quite similar to standard UNIX pipelines, and allow those using the debugger a similar level of power and flexibility.
An mdb(1) cheat sheet
Because of its backwards compatibility with adb, mdb can have a bit of a learning curve. A while back, I put together an mdb(1) cheatsheet [ps pdf] to reference during late-night post-mortem debugging sessions, and it has become a pretty popular reference in the Kernel Group. It's designed to print out double-sided; the front covers the full mdb syntax, while the back is a set of commonly-used kernel dcmds and walkers, with short descriptions.
That's it for a quick history and tour -- I should be talking more about mdb later, along with libumem(3lib) (my current claim to fame), smf(5), and userland and kernel debugging in general.
1 The introduction of mdb(1) in Solaris 8, and CTF (compact ANSI-C type format) in Solaris 9 has started to slow down this trend significantly -- Solaris 10 will only have about 16 new adb scripts over Solaris 9.
2 I have little direct experience with crash(1M) -- by the time I joined Sun, it had been EOLed.
3 invoking adb on Solaris 9 and later just invokes mdb in backwards-compatibility mode.