Diagnosing kernel hangs/panics with kmdb and moddebug
By user12614486 on Jun 15, 2005
If you experience hangs or panics during Solaris boot, whether it's during installation or after you've already installed, using the kernel debugger can be a big help in collecting the first set of "what happened" information.
The kernel debugger is named "kmdb" in Solaris 10 and later, and is invoked by supplying the '-k' switch in the kernel boot arguments. So a common request from a kernel engineer starting to examine a problem is often "try booting with kmdb".
Sometimes it's useful to either set a breakpoint to pause the kernel startup and examine something, or to just set a kernel variable to enable or disable a feature, or enable debugging output. If you use -k to invoke kmdb, but also supply the '-d' switch, the debugger will be entered before the kernel really starts to do anything of consequence, so that you can set kernel variables or breakpoints.
So "booting with the -kd flags" is the key to "booting under the kernel debugger". Now, how do we do that?
Kernel debugging with GRUB-boot systems
On modern Solaris and OpenSolaris systems, GRUB is used to boot; to enable the kernel debugger, you add -kd arguments to the "kernel" (or "kernel$") line in the GRUB menu entry. When presented with the GRUB menu, hit 'e' to edit the entry, highlight the kernel line, and hit 'e' again to edit it; add the -kd arguments just after the /platform/i86pc/kernel/$ISADIR/unix argument, so that it says
kernel$ /platform/i86pc/kernel/$ISADIR/unix -kdand then hit 'b' to boot that edited menu entry. '-k' means "start the debugger"; '-d' means "immediately enter the debugger after loading the kernel". After some booting status, you'll see the kernel debugger announce itself like this:
(The number in square brackets is the CPU that is running the kernel debugger; that number might change for later entries into the debugger.)
Now we're in the kernel debuggerThere are two good reasons to run under the kernel debugger:
- If we panic, the panic can be examined before reboot; you can get stack backtraces and get some idea of which section of code might be at fault.
- Now we can set kernel variables, set breakpoints, etc. to affect the kernel run.
- For investigating hangs: try turning on module debugging output. You can set the value
of a kernel variable by using the '/W' command ("write a 32-bit value"). Here's how you set
moddebug to 0x80000000, and then continue execution of the kernel:
> moddebug/W 80000000 > :cThat will give you debug output for each kernel module that loads. (see /usr/include/sys/modctl.h, near the bottom, for moddebug flag information. I find 0x80000000 is the only one I really ever use.)
- To collect information about panics: when the kernel panics, it will drop into the debugger, and print
some interesting information; however, usually the most interesting thing, first, is the stack
backtrace; this shows, in reverse order, all the functions that were active at the time of
panic. To generate a stack backtrace, use
A few other very useful information commands during a panic are
::msgbufwhich will show you the last things the kernel printed onscreen, and
::statuswhich shows a summary of the state of the machine in panic.
- If you're running the kernel while the kernel debugger is active, and you
experience a hang, you may be able to break into the debugger to examine the
system state; you can do this by pressing the <F1> and <A> keys at the
same time (a sort of "F1-shifted-A" keypress). (On SPARC systems, this key
sequence is <Stop>-<A>.) This should give you the same
debugger prompt as above, although on a multi-CPU system you may see the CPU number
in the prompt is something other than 0.
Once in the kernel debugger, you can get a stack backtrace as above; you can also
use ::switch to change the CPU and get stack backtraces on the different CPU,
which might shed more light on the hang. For instance, if you break into
the debugger on CPU 1, you could switch to CPU 0 with
There's obviously a lot more you can do with the kernel debugger, but these small tips will
sometimes help get from a "I have no idea what to do" to "I have a few ideas to try that might
let me continue to boot or install", which can make all the difference.
Technorati Tag: opensolaris solaris