Crashdump restructuring in Solaris

In Solaris 11.2 the crashdump restructuring project changed the way how dump data are stored. Data which are not pure kernel pages now go into separate files. Together with my colleague Sriman we made it to happen.

The first noticeable change was a change in the layout of the crash directory. The files are stored under /var/crash/data/uuid/ directory. The long hexadecimal string (uuid) was added to better align with FMA - it's actually uuid (universally unique ID) of the crash event which can be found in fmadm faulty output. Actually, if you look at FMA panic events from earlier versions you can see that the resource string for the event was already designed this way, it's just materialized with this project.

For example, after 2 panic events the /var/crash directory will look like this:

0 -> data/404778fb-88da-4188-f222-8da174d44fa4
1 -> data/6e50417e-95fc-4ab4-e9a8-abbe80bc6b48
bounds
data/
  404778fb-88da-4188-f222-8da174d44fa4/
    vmcore-zfs.0
    vmcore.0
    vmdump-zfs.0
    vmdump.0
  6e50417e-95fc-4ab4-e9a8-abbe80bc6b48/
    vmdump-zfs.1
    vmdump.1

The 0, 1 symlinks maintain the sequential ordering of the old layout.

The example reflects a configuration when savecore is not automatically run after boot (i.e. dumpadm -n is in effect) and the administrator has extracted the first crash dump by hand (running savecore 0 in /var/crash/0/ directory). If you take a look at the console after the system rebooted after panic there are commands which you can copy-n-paste to the terminal to perform the extraction.

The other change in the above example is that there is new vmcore-zfs.N file. This is not the only new file which can appear. Depending on dumpadm(1M) configuration there can be files like:

  • vmcore.N - core kernel pages
  • vmcore-zfs.N - ZFS metadata (ZIO buffers)
  • vmcore-proc.N - process pages
  • vmcore-other.N - other pages (ZFS data, free pages)

By splitting the dump into multiple files it is possible to transfer just vmcore.N file for analysis to quickly assess what caused the panic and transfer the rest of the files later on.

If any of the "auxiliary" files is missing, mdb will report it:

root@va64-v20zl-prg06:/var/crash/0# mdb 0
mdb: failed to locate file ./vmcore-zfs.0. Contents of ZFS metadata (ZIO buffers) 
will not be available
Loading modules: [ unix genunix specfs dtrace mac cpu.generic 
cpu_ms.AuthenticAMD.15 uppc pcplusmp zvpsm scsi_vhci zfs mpt sd ip hook neti arp
 usba kssl sockfs lofs idm cpc nfs fcip fctl ufs logindmux ptm sppp ipc ]
> ::status
debugging crash dump vmcore.0 (64-bit) from va64-v20zl-prg06
operating system: 5.12 dump.on12 (i86pc)
usr/src version: 23877:f2e76e2d0329:on12_51+48
usr/closed version: 1961:cad017e4c7e4:on12_51+4
image uuid: 404778fb-88da-4188-f222-8da174d44fa4
panic message: forced crash dump initiated at user request
complete: yes, all pages present as configured
dump content: kernel [LOADED,UNVERIFIED] (core kernel pages)
              zfs [MISSING] (ZFS metadata (ZIO buffers))
panicking PID: 101575 (not dumped)

Another choice is not to dump some of the sections at all. E.g. to dump only pages kernel and currently running process at the time of panic but not ZFS metadata the system can be configured as:

dumpadm -c curproc-zfs

Also, the unix.N file is no longer extracted automatically (it can be done with -u option for savecore(1M) if you need the file for some reason) since it is embedded in vmcore.N file; mdb will find it automatically.

How to load the files into mdb with all these files around ? The easiest way how to access an extracted dump is to use just the suffix, i.e.

  mdb N

which will pick up all the files based on metadata in the main vmcore.N file. This worked even before this change, except there was just one file (2 if counting unix.N).

It is still possible to specify the files by hand, just remember to put the main file (vmcore.N) as the first argument:

  mdb vmcore.N vmcore-zfs.N ...

The other change (which is hard to notice unless you're dealing with lots of crash dump files) was that we laid out the infrastructure in kernel/mdb/libkvm to be properly backwards compatible w.r.t. on-disk crash dump format. As a result mdb will automatically load crash dump files produced in earlier Solaris versions. Currently it supports 3 latest versions.

Comments:

Post a Comment:
Comments are closed for this entry.
About

blog about security and various tools in Solaris

Search

Categories
Archives
« August 2015
SunMonTueWedThuFriSat
      
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
     
Today