Crashdump restructuring in Solaris
By User13277689-Oracle on Jul 24, 2014
In Solaris 11.2 the crashdump restructuring project changed the way how dump data are stored. Data which are not pure kernel pages now go into separate files. Together with my colleague Sriman we made it to happen.
The first noticeable change was a change in the layout of the crash directory. The files are stored under /var/crash/data/uuid/ directory. The long hexadecimal string (uuid) was added to better align with FMA - it's actually uuid (universally unique ID) of the crash event which can be found in fmadm faulty output. Actually, if you look at FMA panic events from earlier versions you can see that the resource string for the event was already designed this way, it's just materialized with this project.
For example, after 2 panic events the /var/crash directory will look like this:
0 -> data/404778fb-88da-4188-f222-8da174d44fa4 1 -> data/6e50417e-95fc-4ab4-e9a8-abbe80bc6b48 bounds data/ 404778fb-88da-4188-f222-8da174d44fa4/ vmcore-zfs.0 vmcore.0 vmdump-zfs.0 vmdump.0 6e50417e-95fc-4ab4-e9a8-abbe80bc6b48/ vmdump-zfs.1 vmdump.1
The 0, 1 symlinks maintain the sequential ordering of the old layout.
The example reflects a configuration when savecore is not automatically run after boot (i.e. dumpadm -n is in effect) and the administrator has extracted the first crash dump by hand (running savecore 0 in /var/crash/0/ directory). If you take a look at the console after the system rebooted after panic there are commands which you can copy-n-paste to the terminal to perform the extraction.
The other change in the above example is that there is new vmcore-zfs.N file. This is not the only new file which can appear. Depending on dumpadm(1M) configuration there can be files like:
- vmcore.N - core kernel pages
- vmcore-zfs.N - ZFS metadata (ZIO buffers)
- vmcore-proc.N - process pages
- vmcore-other.N - other pages (ZFS data, free pages)
By splitting the dump into multiple files it is possible to transfer just vmcore.N file for analysis to quickly assess what caused the panic and transfer the rest of the files later on.
If any of the "auxiliary" files is missing, mdb will report it:
root@va64-v20zl-prg06:/var/crash/0# mdb 0 mdb: failed to locate file ./vmcore-zfs.0. Contents of ZFS metadata (ZIO buffers) will not be available Loading modules: [ unix genunix specfs dtrace mac cpu.generic cpu_ms.AuthenticAMD.15 uppc pcplusmp zvpsm scsi_vhci zfs mpt sd ip hook neti arp usba kssl sockfs lofs idm cpc nfs fcip fctl ufs logindmux ptm sppp ipc ] > ::status debugging crash dump vmcore.0 (64-bit) from va64-v20zl-prg06 operating system: 5.12 dump.on12 (i86pc) usr/src version: 23877:f2e76e2d0329:on12_51+48 usr/closed version: 1961:cad017e4c7e4:on12_51+4 image uuid: 404778fb-88da-4188-f222-8da174d44fa4 panic message: forced crash dump initiated at user request complete: yes, all pages present as configured dump content: kernel [LOADED,UNVERIFIED] (core kernel pages) zfs [MISSING] (ZFS metadata (ZIO buffers)) panicking PID: 101575 (not dumped)
Another choice is not to dump some of the sections at all. E.g. to dump only pages kernel and currently running process at the time of panic but not ZFS metadata the system can be configured as:
dumpadm -c curproc-zfs
Also, the unix.N file is no longer extracted automatically (it can be done with -u option for savecore(1M) if you need the file for some reason) since it is embedded in vmcore.N file; mdb will find it automatically.
How to load the files into mdb with all these files around ? The easiest way how to access an extracted dump is to use just the suffix, i.e.
which will pick up all the files based on metadata in the main vmcore.N file. This worked even before this change, except there was just one file (2 if counting unix.N).
It is still possible to specify the files by hand, just remember to put the main file (vmcore.N) as the first argument:
mdb vmcore.N vmcore-zfs.N ...
The other change (which is hard to notice unless you're dealing with lots of crash dump files) was that we laid out the infrastructure in kernel/mdb/libkvm to be properly backwards compatible w.r.t. on-disk crash dump format. As a result mdb will automatically load crash dump files produced in earlier Solaris versions. Currently it supports 3 latest versions.