Debugging programs that catch SEGV
By user12625760 on Apr 22, 2005
This is just plain bizarre. I spent an hour or so this morning discussing a problem with a colleague via IM (and no Two Ronnies moments) with init in a loop taking a SEGV. This is strange as init has been around for a while and is generally well behaved and does not normally do this. Then this afternoon another colleague comes around to my desk (face to face contact without any technology to help, amazing) and asks me about how to debug a problem where, you guessed it, init is in a loop taking SEGV. The second looks less interesting as there is a race condition that can result in this failure and there are patches.
The first customer however has the patch and now has a multithreaded init program which threw us for a while. It looks like it becomes multi threaded when it pulls in some libraries that are multi threaded via the name service switch, nice.
Anyway back on topic. The top tip for debugging programs that catch SIGSEGV (like they are going to be able to recover.... (I know there are cases where catching SEGV is the right thing to do, but they are few and far between and prone to not producing the desired results)) is this:
Use “truss -S SEGV -t !all -p PID” to get the process to stop when it gets the signal. Then use gcore to collect your core file and use that to work out what has gone wrong.
I'm now waiting for the third question about init so that I can say “Init problems are like busses. None for months then three come along together”.