Fun facts about corefiles
By eschrock on Jun 22, 2004
All you developers out there are probably well acquainted with corefiles. Every CS student has had a program try to dereference a NULL pointer at least once. Frequent use of assert(3c) causes your program to abort(3c) in exceptional circumstances. Unfortunately, too many developers know little else about corefiles. They are often used for nothing more than a stack backtrace or a quick session with dbx. We in the Solaris group take corefiles very seriously - they get the same amount of attention as crash dumps. Over the past few releases, we've added some great features to Solaris relating to corefiles that not everyone may be familiar with, including some really great stuff in Solaris 10. Here is a short list of some of the things that can make your life easier as a developer, especially when servicing problems from the field.
The gcore(1) command will generate a corefile from a running program - essentially a snapshot at that point in time. The process will continue on as if nothing has happened, crucial when an app is misbehaving in a non-fatal way and you don't want to resort to SIGABRT. Rather than trying to reproduce it or get access to the system while it is running, the customer can simply gcore the process and forward the corefile.
This is a command that system administrators and developers alike should be familiar with. If you run a non-development server, processes should never coredump. Unless it's intentional (like sending SIGABRT or mucking about in /proc), every corefile produced is a bug. Admins can log all corefiles to a central location, so they know whom to blame when something goes wrong (usually us). Developers can have fine-grained control over the content of the corefile and where it gets saved. Having dozens of files named 'core' in every directory usually isn't the most helpful thing in the world.
Starting in Solaris 10, we now have fine-grained control over the exact content of every corefile generated on the system (many thanks to Adam Leventhal for this). Read up on coreadm(1M) for all the gory details, but the most important thing is that we now have library text segments with the corefile. It used to be that if you got a core from a customer, you would need to find a matching version of every library they linked to in order to decipher what was going on. This made debugging complicated customer problems extremely difficult.
CTF data for libraries
We have supported a special form of debugging information known as CTF ("Compact Type Format") in the kernel since Solaris 9. We take the debugging information generated by the '-g' compiler flag, and strip out everything but the type information. It is stored in a compact format so it is suitable for shipping with production binaries. This information is enormously useful, so we added userland support for it in Solaris 10. MDB consumes this information, so you can do interesting things like ::print a socket structure from a core generated by your production app. Unfortunately, the tools used to convert this information from STABS are not publicly available yet, so you cannot add CTF data to your own application. We're working on it.