DTrace and the case of the slow login...

An interesting little problem came my way earlier on today which yet again let me show just how useful DTrace is. Way back when I did a bit of work with a customer around a small webapp they migrated to Solaris from Linux. Anyway I got a mail from them earlier today, they split out the webserver onto a seperate box a while back, and recently the initial connections to the database tier have gotten very slow, for no obvious reasons.

They were looking into things like large logs and slow writes etc, but nothing obvious was popping up. So we went through a couple of initial steps to see if anything obvious jumped out - the various \*stat tools showed everything running normally so time to reach for DTrace. A quick oneliner showed up exactly where the problem was, and while its a simple fix I thought I'd share it as an example here. The key point here is that this isn't a resource starvation issue, which is when people generally reach for DTrace, rather its a nasty side effect of a simple change that doesn't manifest itself in an obvious manner. Right, enough typing, the oneliner was...

dtrace -n 'syscall:::entry { @sc[ustack()] = count(); } tick-5sec { printa(@sc); }'

Looking at the output the first thing that I spotted were some stacks similar to

			  libc.so.1`_so_send+0x7
              libresolv.so.2`send_dg+0xe5
              libresolv.so.2`res_nsend+0x45b
              libresolv.so.2`res_nquery+0xc9
              libresolv.so.2`ho_byaddr+0x379
              libresolv.so.2`ho_byaddr+0x80
              libresolv.so.2`gethostbyaddr_p+0x8e
              libresolv.so.2`res_gethostbyaddr+0x26
              nss_dns.so.1`_gethostbyaddr+0x21
              nss_dns.so.1`__nss_dns_getbyaddr+0x91
              nss_dns.so.1`getbyaddr+0x1b
              libc.so.1`nss_search+0x17d
              libnsl.so.1`_switch_gethostbyaddr_r+0x71
              libnsl.so.1`_uncached_gethostbyaddr_r+0x2a
              nscd`gethost_lookup+0x3dc
              nscd`0x80541da
and
              libc.so.1`stat64+0x7
              nss_files.so.1`getbygid+0x2e
              libc.so.1`nss_search+0x17d
              libc.so.1`_uncached_getgrgid_r+0x63
              nscd`getgr_lookup+0x3b2
              nscd`0x8054852
              libc.so.1`_thr_setup+0x4e
              libc.so.1`_lwp_start
Hmmm, so nscd is busy. The next question was "have there been any network changes?", they weren't aware of any but on some digging there was some reconfiguration work done recently on the DNS server they use. Anyway the temporary solution that they have put in place until the DNS issues are resolved is to add the new webserver machine directly to /etc/hosts.

Without DTrace the root cause would have been a lot more painful to find, instead this exchange was a couple of e-mails in the background while I got on with the day job. Gotta love DTrace ;).

Technorati Tag(s) : ,
Comments:

Post a Comment:
Comments are closed for this entry.
About

fintanr

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today