Capturing corefiles

One of the things that I run into occasionally is kernel panics. Kernel's panic for one reason and one reason only: TO PROTECT YOUR DATA! It seems like a really annoyance to have your big box crash constantly but its not nearly as bad as it not crashing and having it cheerfully scribble all over one's company payroll database.

So what is a panic? Well, basically its the kernel throwing its hands up saying "I give up!" due to some kind of error that it can't recover from in such a way that will guarantee integrity. It will stop whatever its doing and attempt to do some cleanup by flushing some filesystem buffers and then it will dump all of the kernel-owned memory to disk (dumping core) for a human to look at and figure out what went wrong.

Panics could be cause by a software issue. Software could be some buffer overrun, or a null pointer, or some mangled data structure. Really anything that doesn't make sense to the kernel will cause a panic.

The panic could also be caused by a hardware issue. If we have a page in memory that gets an uncorrectable ECC error (2 bits getting flipped) for example, the OS can't correct the bit flips because the ECC algorithms don't have enough information to determine which bits are incorrect. (A single bit flip would be OK, but more than one bit flip takes more information and more work and is more expensive than is practical to implement). As the system now has data that it knows is questionable, it really can't continue running.

Actually, it may not panic due to an uncorrectable ECC error depending on where this page was. If this page was in kernel space, we need to panic. The kernel knows that it can't trust itself (for lack of a better word), so its going to dump core and reboot.

However, if this page that we have an uncorrectable error in is in user space, in other words owned by a process, (say Mozilla, or Oracle) then we can do something a little bit different. As the kernel knows that is OK just that process can't continue the kernel will kill that process. Obviously, if this is a big database running month end batch jobs, simply killing Oracle and going on our merry way isn't a good thing. So the system will then do a graceful reboot. This of course is in the assumption that the system administrator has those processes configured to automatically startup and recover with a system boot.

In any case, no matter the cause, if we're going to panic, the system is going down. It has to go and write out its kernel corefile out somewhere that it won't be messed with and won't risk corrupting other data.

The place the corefile is written out is called the dumpdevice. The place that most systems have their dumpdevice set to is their swap device. Swap provides a nice place to write it out because its stable (on disk) and you're not going to corrupt anything by writing too it during the crash because its re-initialized whenever its used.

With Solaris 2.6 and earlier, the dumpdevice was always set to the first swap device that was added into the system. Unfortunately, with those earlier releases of Solaris, once the dumpdevice was set, it stayed set to that device until the system was rebooted and set again.

As of Solaris 7 and later, the dumpdevice is set according to whatever the administrator defines via the "dumpadm" command. By default, this is set to "swap" which gives the same behavior as Solaris 2.6 systems.

A good dumpdevice is a physical device as opposed to a pseudo device. This means that you'll want to use a single physical disk slice if at all possible. Mirrored devices don't necessarily make good candiates and often will not work as a dump device. Such examples would be mirroring of swap using Veritas Volume Manager (VxVM) or Solaris Logical Volume Manager (aka. DiskSuite). With Solaris 8 and greater, as disksuite is integrated into the OS, the metadevice for swap is an acceptable dump device.

If all of your swap is mirrored, which is should be if your boot disk is mirrored, then you'll want to set your dumpdevice to they physical slice of one side of the mirror. Normally, invoking access to one of the underlying slices of a mirror is a bad thing and corrupt data. However, as swap always writes data before reading it, this is an OK thing to do.

Okay, so at this point, we know we have to panic. We also know where we're going to dump our corefile to. So how do we do it?

Well, we start by dumping memory out to the end of the dump device and work our way to the beginning. We start at the end to give us some reasonable assurance that the corefile will not be overwritten by swapping when the system starts up before we can copy it to the filesystem. This was a very real problem early on with systems that did not have a lot of memory. Obviously a system with several Gigabytes of memory won't likely swap anymore at boot, if it ever swaps.

So we write out the contents of physical memory to the dumpdevice. With Solaris 8 and 9, this dump is compressed on the fly to save on diskspace. Once the dump is completed, we reboot.

Upon reboot, the savecore program is run. Savecore is run at /etc/rc2.d/S75savecore. It is the default for both Solaris 8 and 9 to run savecore at every boot to check for a new panic dump to save. With Solaris 7 and earlier, you had to manually configure savecore to run, which caused many missed corefiles.

Savecore will read from the dumpdevice and look for the magic numbers and dumpheaders in the dumpdevice that indicate there is a new corefile to be written out. Upon finding one, it will save it out by default to /var/crash//vmcore.X where X is determined by the presence of the "bounds" file. The bounds file will contain a number that will determine what X is in order to prevent overwriting new confiles on older ones. Savecore will also write out a file called unix.X which is the nameslist for the kernel. This nameslist is used to make sense of the differnt modules and structures that appear in the corefile. Both are very important and they should be sent in together for analysis. One is almost worthless without the other.

The final thing that savecore does is update the dumpheader for the dump that is on the dumpdevice to indicate that it was written out. This is necessary to prevent savecore from writing out the same corefile over and over again upon each reboot.

Contrary to popular belief, the corefile that is written to the dumpdevice is not overwritten or erased by savecore. It'll stay intact until something overwrites it (usually by being swapped to). It can stay there for days, weeks, or even years. Thus you can get it even if savecore fails to write it out to a file due to lack of filesystem space. Just run "savecore" and it will attempt to save it out to the default location. Running "savecore /directory/with/alot/of/space" will save the corefile to a location that you wish.

If a corefile has been written out already, you can still retrieve it. I've had many cases in which /var/crash was getting full and the sysadmin deleted the corefile he wanted to send out accidently. You can use "savecore -d" which tells savecore to disregard the dump headers. Some say the -d stands for "dangit!" (they actually said something else that starts with a "d" but this is a family show.) For example, here's a box that I crashed a while ago myself by doing bad things to it. The corefile has already been saved off to disk and erased, but the dump still lives on the dump device.

pwags@dazzle # savecore
pwags@dazzle # savecore -d
System dump time: Thu Aug  5 14:28:05 2004
Constructing namelist /var/crash/dazzle/unix.2
Constructing corefile /var/crash/dazzle/vmcore.2
100% done: 73282 of 73282 pages saved
The first savecore didn't work because the corefile had already been written. The -d option was then needed.

So, after you have the corefiles written to /var/crash/, you'll need to tar up vmcore.X and unix.X together and compress the tarball. PLEASE compress the tarball. Corefiles have recently gotten very large in size (several gigabytes) and transferring them up to the internet and then down internally within Sun for analysis would take hours or days. Compressing them is essential.

That's my take on capturing corefiles. Hope that you found it a bit informative.

Comments:

> Hope that you found it a bit informative.

Absolutely. I'm working on a page of solution support tactics. Too often, the facilities for capturing core files aren't available. In other words, when we go to do the autopsy, there's no corpse.

Thanks.

Take care,
brad

Posted by Brad Blumenthal on October 01, 2004 at 07:25 AM CDT #

Post a Comment:
Comments are closed for this entry.
About

Phil is an Area Technical Engineer in the Central Area of Oracle's Field Service in North America. He has 15 years of experience supporting Sun's entire product line.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today