By Hema on Nov 26, 2007
Last week I managed to replicate a problem reported by one of our customers in Japan. It was about Policy agent 2.2 for Apache 2.2 crashing on 32 bit Red Hat Enterprise Linux ES release 4 (Nahant Update 4).
I downloaded the apache code from http://httpd.apache.org/download.cgi and built it on the system that I was trying to replicate this problem on.
I downloaded the policy agent from http://www.sun.com/download/products.xml?id=471909fc
and before I installed the agent, I verified that I could start and stop the apache server correctly.
I followed the install steps outlined the install document and installation was successful. http://docs.sun.com/app/docs/doc/820-3288.
I now tried to start the apache server after installing the policy agent and it crashed with segmentation fault. Hiya, I could reproduce the problem !
#/opt/apache226/bin/apachectl -k start
/opt/apache226/bin/apachectl: line 78: 18504 Segmentation fault
(core dumped) $HTTPD $ARGV
I opened the corefile in gdb and the backtrace looked as follows:
#0 0x003a9194 in spin ()
#1 0x024dedc1 in PR_Select () from /usr/lib/libnspr4.so
#2 0x00b96371 in start_thread () from /lib/tls/libpthread.so.0
#3 0x009fdffe in clone () from /lib/tls/libc.so.6
Now that I could replicate the problem consistently on my lab system, I thought, I'll do some live debugging with gdb for that extra bit of fun.
I ran the program in gdb setting breakpoint in spin() and to my surprise, the process did not crash. The program exited normally. I thought, this might be due to the breakpoint that I set. I re-ran the program without the breakpoint and the program ran normally again in gdb .
(gdb) file /opt/apache22/bin/httpd
(gdb) set args -k start
Breakpoint 1 at 0x80618bd: file main.c, line 438.
Starting program: /opt/apache226/bin/httpd -k start
[Thread debugging using libthread_db enabled]
[New Thread 0xb7ee06c0 (LWP 28225)]
[Switching to Thread 0xb7ee06c0 (LWP 28225)]
main (argc=3, argv=0xbfeb32a4) at main.c:438
(gdb) break spin
Function "spin" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 2 (spin) pending.
Breakpoint 3 at 0x3a9326
Pending breakpoint "spin" resolved
[New Thread 0x19a5bb0 (LWP 28229)]
[Switching to Thread 0x19a5bb0 (LWP 28229)]
Breakpoint 3, 0x003a9326 in spin ()
warning: Temporarily disabling breakpoints for unloaded shared library "/hema/opensso/web_agents/apache22_agent/lib/libamapc22.so"
Breakpoint 3 at 0x4b6326
Program exited normally.
I then ran strace and ltrace to get more information and to my surprise, apache didn't crash if I ran with those monitoring tools either.
I suspect this is due to some sort of timing issue. How do I debug a program if it's behavior changes when attached to a debugger or when run with the monitoring tools !!
I gave up on live debugging and finally had to go back to core dump analysis of the core file in gdb.