Knowing your tools..
By sommerfeld on Jun 12, 2004
It's something like forensic analysis - the dump is the "scene of a crime" often containing many clues as to the cause of the fault.. (If you don't mind an analogy from Hollywood, it brings to mind a scene from a recent rerun of Crossing Jordan - a junior ME misses a bunch of more subtle clues at a murder scene which Jordan spots after a careful once-over; these clues give them a running start at the investigation).
On most unix systems, you generally get the register contents at the time of the fault and from that know which instruction was executing at the time. Source level debuggers can convert that into a source line, but that loses information -- when there's ambiguity, it's often better to work backwards from the instruction rather than try to guess which part of a complex expression the processor tripped over. And if you didn't build -g, a source-level debugger will throw up its hands and point in the vague direction of a function at fault.
So what do you do when the program crashes in production? Go back, rebuild with -g, and try to reproduce it? If it's a once-a-month, hard to reproduce race condition, you may end up waiting a while before it happens again.
Even if you can't conclusively prove anything from a crash, a careful analysis will often give you a big head start towards reproducing the problem, and knowing where to lay traps to catch a precursor to the crash.
Among the folks developing solaris, MDB is the debugger of choice for this sort of analysis. It's an evolving, extensible debugger, with a published API you can use to add your own extensions. GNU's GDB's also good for low-level debugging.
If you have to "lie in wait" for the bug to happen again, dtrace can't be beat. (And, well, it's also great for any number of other observability tasks.. but that's for another day..).
How to learn this stuff? Nothing beats a little experimentation. Sit down with an architecture manual, and a C compiler. Read enough of the front matter of the architecture manual to understand the register layout. Write some simple C code, compile it -S, and look at the assembly output. Learn how your compiler tends to generate code.. Learn how C-level identifiers appear in the low-level object symbol tables. Use your debugger to disassemble a "live" program, and compare with the -S output. Single step through the program an instruction at a time and watch the control flow and how the contents of registers and memory change. If you can't figure out what an instruction does, look it up.