Monday Dec 17, 2007

why is the stacktrace in gdb different to the one in dbx ?

After I posted my last blog, another idea struck me, how about I try live debugging with dbx ?

I downloaded Sun Studio 12 and installed on my lab system running Red Hat Enterprise Linux ES release 4 (Nahant Update 4). When I started apache in dbx, it crashed the first time around but subsequent starts did not crash. Even when it crashed, I couldn't get the symbols working correctly in dbx. What's the point in debugging live if I couldn't read the symbols correctly ?

I decided not to spend time on live debugging and focus instead on crash analysis.  But I was in for another surprise that is -- the stacktrace output in  dbx and gdb looked very different. I just couldn't figure out why the stacktrace output on gdb and dbx looked so different ?

The backtrace output in gdb looked as follows:

(gdb) bt
#0  0x003a9194 in spin () from /root/web_agents/apache22_agent/lib/libamapc22.so
#1  0x024dedc1 in PR_Select () from /usr/lib/libnspr4.so
#2  0x00b96371 in start_thread () from /lib/tls/libpthread.so.0
#3  0x009fdffe in clone () from /lib/tls/libc.so.6 

 
Now, compare that with the stacktrace in dbx ?

(dbx)

t@null (l@1) terminated by signal SEGV (Segmentation fault)
0x00929b7d: __mmap+0x000d: movl %edx,%ebx
Current function is apr_dso_load
139 os_handle = dlopen(path, flags);


(dbx) where
[1] __mmap(0xcbf000, 0x71d0, 0x5, 0x802, 0xa, 0x0), at 0x929b7d
[2] _dl_map_object_from_fd(0xbfe57e80, 0x90b9528, 0x90b9200, 0x2, 0x80000000, 0xbfe57e78), at 0x91d809
[3] _dl_map_object(0x2, 0x0, 0x80000000, 0x0, 0xbfe583b4, 0x44fd66), at 0x91f0bc
[4] openaux(0xbfe583b4, 0x80000000, 0x0, 0xbfe583b4, 0xb7f3d6bc, 0x922d50), at 0x922d98
[5] _dl_catch_error(0xbfe583b4, 0x24, 0xbfe584b0, 0xbfe58188, 0x92dfd4, 0xbfe583e8), at 0x9240fe
[6] _dl_map_object_deps(0x0, 0x80000000, 0x90000102, 0x0, 0xa5cff4, 0x0), at 0x92303a
[7] dl_open_worker(0xbfe585b0, 0x0, 0x0, 0x0, 0xb7f3d6bc, 0xa31fe0), at 0xa32159
[8] _dl_catch_error(0xbfe585b0, 0x90, 0xffffffff, 0x0, 0x970f98, 0xbfe58b40), at 0x9240fe
[9] __GI__dl_open(0xfffffffe, 0x0, 0x0, 0x92dfd4, 0xbfe58750, 0x92dca0), at 0xa32cb8
[10] dlopen_doit(0xbfe58750, 0x0, 0x0, 0xbfe58c44, 0xb7f3d6bc, 0xa88c70), at 0xa88cb8
[11] _dl_catch_error(0xbfe58750, 0xa88f50, 0x48, 0xe3736c, 0x905b0a8, 0x90c973a), at 0x9240fe
[12] _dlerror_run(0x0, 0x0, 0x31, 0x0, 0x90c9740, 0x102), at 0xa892bb
[13] __dlopen_check(0x90c9740, 0x102, 0x90c9739, 0xe23650, 0x905b0a8, 0x1), at 0xa88d11
=>[14] apr_dso_load(res_handle = <value not available>, path = <value not available>, pool = <value not available>), line 139 in "dso.c"
[15] load_module(cmd = <value not available>, dummy = <value not available>, modname = <value not available>, filename = <value not available>), line 238 in "mod_so.c"
[16] invoke_cmd(cmd = (nil), parms = (nil), mconfig = <value not available>, args = <value not available>), line 890 in "config.c"
[17] ap_build_config_sub(p = <value not available>, temp_pool = <value not available>, l = <value not available>, parms = <value not available>, current = <value not available>, curr_parent = <value not available>, conftree = <value not available>), line 1436 in "config.c"
[18] ap_build_config(parms = (nil), p = <value not available>, temp_pool = <value not available>, conftree = (nil)), line 1219 in "config.c"
[19] process_resource_config_nofnmatch(s = <value not available>, fname = (nil), conftree = <value not available>, p = <value not available>, ptemp = <value not available>, depth = <value not available>), line 1629 in "config.c"
[20] ap_process_resource_config(s = <value not available>, fname = <value not available>, conftree = <value not available>, p = <value not available>, ptemp = <value not available>), line 1661 in "config.c"
[21] include_config(cmd = (nil), dummy = <value not available>, name = (nil)), line 2584 in "core.c"
[22] invoke_cmd(cmd = (nil), parms = (nil), mconfig = <value not available>, args = <value not available>), line 890 in "config.c"
[23] ap_build_config_sub(p = <value not available>, temp_pool = <value not available>, l = <value not available>, parms = <value not available>, current = <value not available>, curr_parent = <value not available>, conftree = <value not available>), line 1436 in "config.c"
[24] ap_build_config(parms = (nil), p = <value not available>, temp_pool = <value not available>, conftree = (nil)), line 1219 in "config.c"
[25] process_resource_config_nofnmatch(s = <value not available>, fname = (nil), conftree = <value not available>, p = <value not available>, ptemp = <value not available>, depth = <value not available>), line 1629 in "config.c"
[26] ap_process_resource_config(s = <value not available>, fname = <value not available>, conftree = <value not available>, p = <value not available>, ptemp = <value not available>), line 1661 in "config.c"
[27] ap_read_config(process = <value not available>, ptemp = <value not available>, filename = <value not available>, conftree = <value not available>), line 2021 in "config.c"
[28] main(argc = -1075478912, argv = 0x90b9528), line 694 in "main.c"

There is hardly any resemblance in the stacktrace output produced by the two debug tools.  I posted this question to one of our internal aliases and I received a response from a colleague of mine in Singapore saying,  I was running backtrace against the wrong thread in gdb !!.   

When I ran the command "bt"  in gdb, I did not choose any particular thread,  all I did was open the core file in gdb and run the command "bt".  I always thought that by default when you open a core file in a debugger and run the command to print the stacktrace ( or backtrace),  it prints the trace of the thread that resulted this crash.   I now know that might always be the case. Finally, I managed to print the stacktrace of the thread that resulted in this crash in gdb.

(gdb) thread apply all bt
Thread 2 (process 7160):
#0 0x00929b7d in mmap () from /lib/ld-linux.so.2
#1 0x0091d809 in _dl_map_object_from_fd () from /lib/ld-linux.so.2
#2 0x0091f0bc in _dl_map_object () from /lib/ld-linux.so.2
#3 0x00922d98 in openaux () from /lib/ld-linux.so.2
#4 0x009240fe in _dl_catch_error () from /lib/ld-linux.so.2
#5 0x0092303a in _dl_map_object_deps () from /lib/ld-linux.so.2
#6 0x00a32159 in dl_open_worker () from /lib/tls/libc.so.6
#7 0x009240fe in _dl_catch_error () from /lib/ld-linux.so.2
#8 0x00a32cb8 in _dl_open () from /lib/tls/libc.so.6
#9 0x00a88cb8 in dlopen_doit () from /lib/libdl.so.2
#10 0x009240fe in _dl_catch_error () from /lib/ld-linux.so.2
#11 0x00a892bb in _dlerror_run () from /lib/libdl.so.2
#12 0x00a88d11 in dlopen@@GLIBC_2.1 () from /lib/libdl.so.2
#13 0x00e25222 in apr_dso_load (res_handle=0xbfe587ec,path=0x90c9740 "/root/web_agents/apache22_agent/lib/libamapc22.so",pool=0x905b0a8) at dso/unix/dso.c:139
#14 0x0809631e in load_module (cmd=0xbfe58bd0, dummy=0xbfe58a70,modname=0x90c96f8 "dsame_module", filename=0x90c9708 "/root/web_agents/apache22_agent/lib/libamapc22.so") at mod_so.c:238
#15 0x0807280f in invoke_cmd (cmd=0x80adde0, parms=0xbfe58bd0,mconfig=0xbfe58a70,args=0x90b6248 "") at config.c:890
#16 0x08072d3b in ap_build_config_sub (p=0x905b0a8,temp_pool=<value optimized out>, l=0x90b61f8 "LoadModule dsame_module /root/web_agents/apache22_agent/lib/libamapc22.so", parms=0xbfe58bd0, current=0xbfe58ab4, curr_parent=0xbfe58ab8, conftree=0xbfe58d74) at config.c:1436
#17 0x08073163 in ap_build_config (parms=0xbfe58bd0, p=0x905b0a8, temp_pool=0x90610c0, conftree=0xbfe58d74) at config.c:1219
#18 0x08073678 in process_resource_config_nofnmatch (s=0x905fc80,fname=0x90c8530 "/root/web_agents/apache22_agent/Agent_001/config/dsame.conf", conftree=0xbfe58d74, p=0x905b0a8, ptemp=0x90610c0, depth=0) at config.c:1629
#19 0x08073844 in ap_process_resource_config (s=0x905fc80,fname=0x90c8530 "/root/web_agents/apache22_agent/Agent_001/config/dsame.conf", conftree=0xbfe58d74, p=0x905b0a8, ptemp=0x90610c0) at config.c:1661
#20 0x0806bba1 in include_config (cmd=0xbfe59050, dummy=0xbfe58ef0,name=0x90c8468 "/root/web_agents/apache22_agent/Agent_001/config/dsame.conf") at core.c:2584
#21 0x0807280f in invoke_cmd (cmd=0x80a3ab8, parms=0xbfe59050,mconfig=0xbfe58ef0, args=0x90a2203 "") at config.c:890
#22 0x08072d3b in ap_build_config_sub (p=0x905b0a8,temp_pool=<value optimized out>,l=0x90a21c0 "include /root/web_agents/apache22_agent/Agent_001/config/dsame.conf", parms=0xbfe59050, current=0xbfe58f34, curr_parent=0xbfe58f38,conftree=0x80b43a8) at config.c:1436
#23 0x08073163 in ap_build_config (parms=0xbfe59050, p=0x905b0a8, temp_pool=0x90610c0, conftree=0x80b43a8) at config.c:1
#24 0x08073678 in process_resource_config_nofnmatch (s=0x905fc80,fname=0x90ab290 "/opt/apache226/conf/httpd.conf", conftree=0x80b43a8,p=0x905b0a8, ptemp=0x90610c0, depth=0) at config.c:1629
#25 0x08073844 in ap_process_resource_config (s=0x905fc80,fname=0x90ab290 "/opt/apache226/conf/httpd.conf", conftree=0x80b43a8,p=0x905b0a8, ptemp=0x90610c0) at config.c:1661
#26 0x0807418f in ap_read_config (process=0x9059120, ptemp=0x90610c0 filename=0x80a01e5 "conf/httpd.conf", conftree=0x80b43a8) at config.c:2021
#27 0x08061ee9 in main (argc=3, argv=0xbfe59354) at main.c:694

The backtrace against the correct thread in gdb made much more sense and gdb could resolve the symbols correctly too.  From gdb's backtrace it was clear that the crash was immediately after loading the dsame module, libamapc22.so which is our Policy Agent.

I filed a bug  6637032  against Access Manager Policy Agent and engineering has provided a fix to resolve this.

About

Hema

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today