Fun facts about corefiles

All you developers out there are probably well acquainted with corefiles. Every CS student has had a program try to dereference a NULL pointer at least once. Frequent use of assert(3c) causes your program to abort(3c) in exceptional circumstances. Unfortunately, too many developers know little else about corefiles. They are often used for nothing more than a stack backtrace or a quick session with dbx. We in the Solaris group take corefiles very seriously - they get the same amount of attention as crash dumps. Over the past few releases, we've added some great features to Solaris relating to corefiles that not everyone may be familiar with, including some really great stuff in Solaris 10. Here is a short list of some of the things that can make your life easier as a developer, especially when servicing problems from the field.

gcore

The gcore(1) command will generate a corefile from a running program - essentially a snapshot at that point in time. The process will continue on as if nothing has happened, crucial when an app is misbehaving in a non-fatal way and you don't want to resort to SIGABRT. Rather than trying to reproduce it or get access to the system while it is running, the customer can simply gcore the process and forward the corefile.

coreadm

This is a command that system administrators and developers alike should be familiar with. If you run a non-development server, processes should never coredump. Unless it's intentional (like sending SIGABRT or mucking about in /proc), every corefile produced is a bug. Admins can log all corefiles to a central location, so they know whom to blame when something goes wrong (usually us). Developers can have fine-grained control over the content of the corefile and where it gets saved. Having dozens of files named 'core' in every directory usually isn't the most helpful thing in the world.

corefile content

Starting in Solaris 10, we now have fine-grained control over the exact content of every corefile generated on the system (many thanks to Adam Leventhal for this). Read up on coreadm(1M) for all the gory details, but the most important thing is that we now have library text segments with the corefile. It used to be that if you got a core from a customer, you would need to find a matching version of every library they linked to in order to decipher what was going on. This made debugging complicated customer problems extremely difficult.

CTF data for libraries

We have supported a special form of debugging information known as CTF ("Compact Type Format") in the kernel since Solaris 9. We take the debugging information generated by the '-g' compiler flag, and strip out everything but the type information. It is stored in a compact format so it is suitable for shipping with production binaries. This information is enormously useful, so we added userland support for it in Solaris 10. MDB consumes this information, so you can do interesting things like ::print a socket structure from a core generated by your production app. Unfortunately, the tools used to convert this information from STABS are not publicly available yet, so you cannot add CTF data to your own application. We're working on it.

Comments:

That I know, CTF extraction is limited to "C" code. Any chance that feature will be available for C++ code in the future ?

Posted by Fred on June 22, 2004 at 06:02 AM PDT #

CTF was originally designed for the kernel, which is written entirely in C. We've taken the first step towards having it in userland, which is support in most libraries. We are looking at several next steps, including shipping the utilities to do the conversion, publicizing the libctf API, adding line number support, as well as a neutral API to access both STABS and CTF. I don't know if anyone's investigated C++ support, but I can certainly find out.

Posted by Eric Schrock on June 22, 2004 at 06:09 AM PDT #

We are indeed working on getting CTF to play well with C++. At first this will probably mean that CTF data only encapsulates a subset of the C++ types but that should be enough to make it useful for mdb(1) and DTrace.

Posted by Adam Leventhal on June 22, 2004 at 08:23 AM PDT #

You may also want to have a look at an infodoc that I wrote some months back that discusses how to use coreadm to capture the entire text segment (or parts thereof) when you get a coredump.

Posted by Alan Hargreaves on June 22, 2004 at 08:24 AM PDT #

[Trackback] Eric Schrock talks about corefiles in his blog. A few months back I wrote infodoc 72790 on how to use coreadm to capture the whole text segment within a corefile. Usually when you get a core dump and you ask Sun to have a look at it, we will as...

Posted by Alan Hargreaves' Weblog on June 22, 2004 at 08:33 AM PDT #

Regarding C++ support, I had Sun Cluster in mind (Sun Cluster kernel modules are written in C++ (It uses an ORB for the communication), that would be nice to have...

Posted by Fred on June 22, 2004 at 10:20 AM PDT #

Do you by any chance have any details on the inner workings of gcore? It would be interesting to see how the memory footprint for a process is captured. Thanks, - Ryan

Posted by Ryan on July 28, 2004 at 01:01 PM PDT #

Post a Comment:
Comments are closed for this entry.
About

Musings about Fishworks, Operating Systems, and the software that runs on them.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today