why is the stacktrace in gdb different to the one in dbx ?

After I posted my last blog, another idea struck me, how about I try live debugging with dbx ?

I downloaded Sun Studio 12 and installed on my lab system running Red Hat Enterprise Linux ES release 4 (Nahant Update 4). When I started apache in dbx, it crashed the first time around but subsequent starts did not crash. Even when it crashed, I couldn't get the symbols working correctly in dbx. What's the point in debugging live if I couldn't read the symbols correctly ?

I decided not to spend time on live debugging and focus instead on crash analysis.  But I was in for another surprise that is -- the stacktrace output in  dbx and gdb looked very different. I just couldn't figure out why the stacktrace output on gdb and dbx looked so different ?

The backtrace output in gdb looked as follows:

(gdb) bt
#0  0x003a9194 in spin () from /root/web_agents/apache22_agent/lib/libamapc22.so
#1  0x024dedc1 in PR_Select () from /usr/lib/libnspr4.so
#2  0x00b96371 in start_thread () from /lib/tls/libpthread.so.0
#3  0x009fdffe in clone () from /lib/tls/libc.so.6 

 
Now, compare that with the stacktrace in dbx ?

(dbx)

t@null (l@1) terminated by signal SEGV (Segmentation fault)
0x00929b7d: __mmap+0x000d: movl %edx,%ebx
Current function is apr_dso_load
139 os_handle = dlopen(path, flags);


(dbx) where
[1] __mmap(0xcbf000, 0x71d0, 0x5, 0x802, 0xa, 0x0), at 0x929b7d
[2] _dl_map_object_from_fd(0xbfe57e80, 0x90b9528, 0x90b9200, 0x2, 0x80000000, 0xbfe57e78), at 0x91d809
[3] _dl_map_object(0x2, 0x0, 0x80000000, 0x0, 0xbfe583b4, 0x44fd66), at 0x91f0bc
[4] openaux(0xbfe583b4, 0x80000000, 0x0, 0xbfe583b4, 0xb7f3d6bc, 0x922d50), at 0x922d98
[5] _dl_catch_error(0xbfe583b4, 0x24, 0xbfe584b0, 0xbfe58188, 0x92dfd4, 0xbfe583e8), at 0x9240fe
[6] _dl_map_object_deps(0x0, 0x80000000, 0x90000102, 0x0, 0xa5cff4, 0x0), at 0x92303a
[7] dl_open_worker(0xbfe585b0, 0x0, 0x0, 0x0, 0xb7f3d6bc, 0xa31fe0), at 0xa32159
[8] _dl_catch_error(0xbfe585b0, 0x90, 0xffffffff, 0x0, 0x970f98, 0xbfe58b40), at 0x9240fe
[9] __GI__dl_open(0xfffffffe, 0x0, 0x0, 0x92dfd4, 0xbfe58750, 0x92dca0), at 0xa32cb8
[10] dlopen_doit(0xbfe58750, 0x0, 0x0, 0xbfe58c44, 0xb7f3d6bc, 0xa88c70), at 0xa88cb8
[11] _dl_catch_error(0xbfe58750, 0xa88f50, 0x48, 0xe3736c, 0x905b0a8, 0x90c973a), at 0x9240fe
[12] _dlerror_run(0x0, 0x0, 0x31, 0x0, 0x90c9740, 0x102), at 0xa892bb
[13] __dlopen_check(0x90c9740, 0x102, 0x90c9739, 0xe23650, 0x905b0a8, 0x1), at 0xa88d11
=>[14] apr_dso_load(res_handle = <value not available>, path = <value not available>, pool = <value not available>), line 139 in "dso.c"
[15] load_module(cmd = <value not available>, dummy = <value not available>, modname = <value not available>, filename = <value not available>), line 238 in "mod_so.c"
[16] invoke_cmd(cmd = (nil), parms = (nil), mconfig = <value not available>, args = <value not available>), line 890 in "config.c"
[17] ap_build_config_sub(p = <value not available>, temp_pool = <value not available>, l = <value not available>, parms = <value not available>, current = <value not available>, curr_parent = <value not available>, conftree = <value not available>), line 1436 in "config.c"
[18] ap_build_config(parms = (nil), p = <value not available>, temp_pool = <value not available>, conftree = (nil)), line 1219 in "config.c"
[19] process_resource_config_nofnmatch(s = <value not available>, fname = (nil), conftree = <value not available>, p = <value not available>, ptemp = <value not available>, depth = <value not available>), line 1629 in "config.c"
[20] ap_process_resource_config(s = <value not available>, fname = <value not available>, conftree = <value not available>, p = <value not available>, ptemp = <value not available>), line 1661 in "config.c"
[21] include_config(cmd = (nil), dummy = <value not available>, name = (nil)), line 2584 in "core.c"
[22] invoke_cmd(cmd = (nil), parms = (nil), mconfig = <value not available>, args = <value not available>), line 890 in "config.c"
[23] ap_build_config_sub(p = <value not available>, temp_pool = <value not available>, l = <value not available>, parms = <value not available>, current = <value not available>, curr_parent = <value not available>, conftree = <value not available>), line 1436 in "config.c"
[24] ap_build_config(parms = (nil), p = <value not available>, temp_pool = <value not available>, conftree = (nil)), line 1219 in "config.c"
[25] process_resource_config_nofnmatch(s = <value not available>, fname = (nil), conftree = <value not available>, p = <value not available>, ptemp = <value not available>, depth = <value not available>), line 1629 in "config.c"
[26] ap_process_resource_config(s = <value not available>, fname = <value not available>, conftree = <value not available>, p = <value not available>, ptemp = <value not available>), line 1661 in "config.c"
[27] ap_read_config(process = <value not available>, ptemp = <value not available>, filename = <value not available>, conftree = <value not available>), line 2021 in "config.c"
[28] main(argc = -1075478912, argv = 0x90b9528), line 694 in "main.c"

There is hardly any resemblance in the stacktrace output produced by the two debug tools.  I posted this question to one of our internal aliases and I received a response from a colleague of mine in Singapore saying,  I was running backtrace against the wrong thread in gdb !!.   

When I ran the command "bt"  in gdb, I did not choose any particular thread,  all I did was open the core file in gdb and run the command "bt".  I always thought that by default when you open a core file in a debugger and run the command to print the stacktrace ( or backtrace),  it prints the trace of the thread that resulted this crash.   I now know that might always be the case. Finally, I managed to print the stacktrace of the thread that resulted in this crash in gdb.

(gdb) thread apply all bt
Thread 2 (process 7160):
#0 0x00929b7d in mmap () from /lib/ld-linux.so.2
#1 0x0091d809 in _dl_map_object_from_fd () from /lib/ld-linux.so.2
#2 0x0091f0bc in _dl_map_object () from /lib/ld-linux.so.2
#3 0x00922d98 in openaux () from /lib/ld-linux.so.2
#4 0x009240fe in _dl_catch_error () from /lib/ld-linux.so.2
#5 0x0092303a in _dl_map_object_deps () from /lib/ld-linux.so.2
#6 0x00a32159 in dl_open_worker () from /lib/tls/libc.so.6
#7 0x009240fe in _dl_catch_error () from /lib/ld-linux.so.2
#8 0x00a32cb8 in _dl_open () from /lib/tls/libc.so.6
#9 0x00a88cb8 in dlopen_doit () from /lib/libdl.so.2
#10 0x009240fe in _dl_catch_error () from /lib/ld-linux.so.2
#11 0x00a892bb in _dlerror_run () from /lib/libdl.so.2
#12 0x00a88d11 in dlopen@@GLIBC_2.1 () from /lib/libdl.so.2
#13 0x00e25222 in apr_dso_load (res_handle=0xbfe587ec,path=0x90c9740 "/root/web_agents/apache22_agent/lib/libamapc22.so",pool=0x905b0a8) at dso/unix/dso.c:139
#14 0x0809631e in load_module (cmd=0xbfe58bd0, dummy=0xbfe58a70,modname=0x90c96f8 "dsame_module", filename=0x90c9708 "/root/web_agents/apache22_agent/lib/libamapc22.so") at mod_so.c:238
#15 0x0807280f in invoke_cmd (cmd=0x80adde0, parms=0xbfe58bd0,mconfig=0xbfe58a70,args=0x90b6248 "") at config.c:890
#16 0x08072d3b in ap_build_config_sub (p=0x905b0a8,temp_pool=<value optimized out>, l=0x90b61f8 "LoadModule dsame_module /root/web_agents/apache22_agent/lib/libamapc22.so", parms=0xbfe58bd0, current=0xbfe58ab4, curr_parent=0xbfe58ab8, conftree=0xbfe58d74) at config.c:1436
#17 0x08073163 in ap_build_config (parms=0xbfe58bd0, p=0x905b0a8, temp_pool=0x90610c0, conftree=0xbfe58d74) at config.c:1219
#18 0x08073678 in process_resource_config_nofnmatch (s=0x905fc80,fname=0x90c8530 "/root/web_agents/apache22_agent/Agent_001/config/dsame.conf", conftree=0xbfe58d74, p=0x905b0a8, ptemp=0x90610c0, depth=0) at config.c:1629
#19 0x08073844 in ap_process_resource_config (s=0x905fc80,fname=0x90c8530 "/root/web_agents/apache22_agent/Agent_001/config/dsame.conf", conftree=0xbfe58d74, p=0x905b0a8, ptemp=0x90610c0) at config.c:1661
#20 0x0806bba1 in include_config (cmd=0xbfe59050, dummy=0xbfe58ef0,name=0x90c8468 "/root/web_agents/apache22_agent/Agent_001/config/dsame.conf") at core.c:2584
#21 0x0807280f in invoke_cmd (cmd=0x80a3ab8, parms=0xbfe59050,mconfig=0xbfe58ef0, args=0x90a2203 "") at config.c:890
#22 0x08072d3b in ap_build_config_sub (p=0x905b0a8,temp_pool=<value optimized out>,l=0x90a21c0 "include /root/web_agents/apache22_agent/Agent_001/config/dsame.conf", parms=0xbfe59050, current=0xbfe58f34, curr_parent=0xbfe58f38,conftree=0x80b43a8) at config.c:1436
#23 0x08073163 in ap_build_config (parms=0xbfe59050, p=0x905b0a8, temp_pool=0x90610c0, conftree=0x80b43a8) at config.c:1
#24 0x08073678 in process_resource_config_nofnmatch (s=0x905fc80,fname=0x90ab290 "/opt/apache226/conf/httpd.conf", conftree=0x80b43a8,p=0x905b0a8, ptemp=0x90610c0, depth=0) at config.c:1629
#25 0x08073844 in ap_process_resource_config (s=0x905fc80,fname=0x90ab290 "/opt/apache226/conf/httpd.conf", conftree=0x80b43a8,p=0x905b0a8, ptemp=0x90610c0) at config.c:1661
#26 0x0807418f in ap_read_config (process=0x9059120, ptemp=0x90610c0 filename=0x80a01e5 "conf/httpd.conf", conftree=0x80b43a8) at config.c:2021
#27 0x08061ee9 in main (argc=3, argv=0xbfe59354) at main.c:694

The backtrace against the correct thread in gdb made much more sense and gdb could resolve the symbols correctly too.  From gdb's backtrace it was clear that the crash was immediately after loading the dsame module, libamapc22.so which is our Policy Agent.

I filed a bug  6637032  against Access Manager Policy Agent and engineering has provided a fix to resolve this.

Comments:

There's an interesting point. The gdb program did not really resolve the symbol values correctly compared to dbx, it resolved them accidentally. When dbx is printing <value not available> it is saying that the value it has may or may not be the right one. This is true for gdb as well. However, gdb is treating it as if it is correct and printing it anyway. While this is arguably misleading, it is also very useful, if the user is aware that it may not be correct, which gdb does not inform the user. Perhaps something in between, where the data is made available but some notation made that it is not reliable would be better than either approach.

Posted by Brian Utterback on December 17, 2007 at 07:02 PM EST #

Hi Brian,

I wasn't aware of the fact that gdb couldn't resolve symbols better than dbx. If this is the case, then it might be a good idea for gdb to inform users that the symbols that it has resolved may not be correct.

At the same time, it's good if dbx prints what it has resolved with some warning message. Printing "value not available" is not of much use for the end user anyway.

But luckily for me, this time around, gdb did resolve the symbols correctly which made my job a lot easier.

--Hema

Posted by Hema Huchegowda on December 18, 2007 at 04:03 AM EST #

Post a Comment:
Comments are closed for this entry.
About

Hema

Search

Categories
Archives
« April 2015
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
  
       
Today