program's behaviour changing when attached to debugger or when using monitoring tools

Last week I managed to replicate a problem reported by one of our customers in Japan.  It was about Policy agent 2.2 for Apache 2.2 crashing on 32 bit  Red Hat Enterprise Linux ES release 4 (Nahant Update 4).

I downloaded the apache code from http://httpd.apache.org/download.cgi and built it on the system that I was trying to replicate this problem on.

I downloaded the policy agent from http://www.sun.com/download/products.xml?id=471909fc
and before I installed the agent, I verified that I could start and stop the apache server correctly.

I followed the install steps outlined the install document and installation was successful. http://docs.sun.com/app/docs/doc/820-3288.

I now  tried to start the apache server after installing the policy agent and  it crashed with segmentation fault.  Hiya,  I could reproduce the problem !

#/opt/apache226/bin/apachectl -k start
/opt/apache226/bin/apachectl: line 78: 18504 Segmentation fault     
(core dumped) $HTTPD $ARGV

I opened the corefile in gdb and the backtrace looked as follows:

(gdb) bt
#0  0x003a9194 in spin ()
from /web_agents/apache22_agent/lib/libamapc22.so
#1  0x024dedc1 in PR_Select () from /usr/lib/libnspr4.so
#2  0x00b96371 in start_thread () from /lib/tls/libpthread.so.0
#3  0x009fdffe in clone () from /lib/tls/libc.so.6

Now that I could replicate the problem consistently on my lab system, I thought, I'll do some live debugging with gdb for that extra bit of fun.

I ran the program in gdb setting breakpoint in spin() and to my surprise, the process did not crash.  The program exited normally. I thought, this might be due to the breakpoint that I set.  I re-ran the program without the breakpoint and the program ran normally again in gdb .

(gdb) file /opt/apache22/bin/httpd
(gdb) set args -k start
(gdb) start
Breakpoint 1 at 0x80618bd: file main.c, line 438.
Starting program: /opt/apache226/bin/httpd -k start
[Thread debugging using libthread_db enabled]
[New Thread 0xb7ee06c0 (LWP 28225)]
[Switching to Thread 0xb7ee06c0 (LWP 28225)]
main (argc=3, argv=0xbfeb32a4) at main.c:438
438     {
(gdb) break spin
Function "spin" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 2 (spin) pending.
(gdb) continue
Continuing.
Breakpoint 3 at 0x3a9326
Pending breakpoint "spin" resolved
[New Thread 0x19a5bb0 (LWP 28229)]
[Switching to Thread 0x19a5bb0 (LWP 28229)]
Breakpoint 3, 0x003a9326 in spin ()
   from /web_agents/apache22_agent/lib/libamapc22.so
(gdb) continue
Continuing.
warning: Temporarily disabling breakpoints for unloaded shared library "/hema/opensso/web_agents/apache22_agent/lib/libamapc22.so"
Breakpoint 3 at 0x4b6326
Program exited normally.

 I then ran strace and ltrace to get more information and to my surprise, apache didn't crash if I ran with those monitoring tools either.

I suspect this is due to some sort of timing issue.  How do I debug a program if  it's behavior changes when attached to a debugger or when run with the monitoring tools !!

I gave up on live debugging and finally had to go back to core dump analysis of the core file in gdb.

Comments:

Post a Comment:
Comments are closed for this entry.
About

Hema

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today