Dtrace solution for mission critical applications

Please read first Frederic's blog, for a detailed introduction and how the solution of this tricky problem was identified with dtrace.

In summary, a telephone application  of the service and hosting provider, Portrix, had a problem: after a couple of days of running the application, the system time went up, and eventually, became so high that the voice quality suffered. Restarting the application was not a solution and the machine had  to be rebooted. 

Our first suggestion and traditional approach, is to use the SUN profiler. We wanted to get three profiles : A first in the beginning, a second after some normal load, and a last one when the system time is high. All three profiles can then be loaded into the SUN analyser, for an easy comparison.
This would have given us two things : first a time line which could allow us to see where unexpected high system timings are spent, and second a performance profile which can be used to tune the application.

The latest version of the SUN profiler has option to attach to a running  process, and record a profile

%collect -P <pid> -t 180

should use dbx to attach, collect data for 180 seconds, and then detach and finish the experiment.

If you are using an older version of the profiler, you can do this through the debugger directly

dbx
  attach PID
  collector enable
  cont
  leave it running the time you want and stop it with Control C 
 \^C 

Any of these commands uses the debugger to attach a running process. The debugger has to stop the process for doing this (this is why you see the continue after the attach command). Furthermore, there is another risk: the debugger and profiler may catch or delay application important signals. For a mission critical application, running real users, this is completely unacceptable. At best, the application can handle the interrupts, and users will experience a small delay, at worst, the telephone conversation is cut.

How can you possible debug or profile an application without even having the possibility to use a debugger ?  Well, Solaris has a solution: Over 30-thousand probes are coming with the kernel and are waiting for you to be listened. The D scripting language is used to enable these probes, and to get a formatted and readable output.

The most general way to query a running system are these dtrace command lines. The first will grab all functions that are called, while the second while dive into the call stacks.

#dtrace -n 'profile-10ms{@a[func(arg0)]=count()} END{trunc(@a,100)}'
#dtrace -n 'profile-10ms{@a[stack()]=count()} END{trunc(@a,100)}' 

Start  any of these probes, let them run during one minute and stop them with ControlC. Firing up the first one, we see the following output:

 unix`mutex_vector_enter                                         629
 unix`default_lock_delay                                         770
 SUNW,UltraSPARC-T1`cpu_mutex_delay                              889
 unix`cas_delay                                                 1214
 zaptel`process_timers                                          6242
 unix`cpu_halt                                                 40277 

while the second one looks into the stacks of the functions and we are seeing at the end of this list

zaptel`__zt_process_getaudio_chunk+0xfbc
zaptel`__zt_transmit_chunk+0x70
zaptel`zt_receive+0xd7c
ztdummy`ztdummy_timer+0x30
genunix`cyclic_softint+0xa4
unix`cbe_level1+0x8
unix`intr_thread+0x168
60

SUNW,UltraSPARC-T1`bcopy+0x68c
zaptel`__zt_getbuf_chunk+0x20c
zaptel`__zt_transmit_chunk+0x28
zaptel`zt_receive+0xd7c
ztdummy`ztdummy_timer+0x30
genunix`cyclic_softint+0xa4
unix`cbe_level1+0x8
unix`intr_thread+0x168
96
From this listing you are already see that the ztdummy driver is doing strange things, and that it is the culprit of the high system time usage. This explains why the restart of the application did not help. The driver was not restarted.

The problem with the dtrace command line above may be that they are taken across the entire system, in both user and kernel space, with no restriction to a PID, program, user or CPU. The profile event fires with either arg0 or arg1 set. arg0 means your in the kernel, arg1 means your in user.  Typically you want to narrow it down to a PID or at least exexname (pid == $1, execname == "myprog", etc). You can do this by adding /cpu==6/ before the command, and putting either arg0 or arg1 into the func calls.

dtrace -n 'profile-10ms/cpu==6/{@a[func(arg0)]=count()} END{trunc(@a,100)}'
dtrace -n 'profile-10ms/pid==1234/{@a[func(arg0)]=count()} END{trunc(@a,100)}'

Or you can directly use this script (kindly provided by Jim Fiori) which directly attaches to a PID only.

# cat uprof.d

#!/usr/sbin/dtrace -qs BEGIN {        interval = $2 - 1; } profile-997 /arg1 && pid == $1/ {        @s1[ustack(5)] = count();        @s2[ufunc(arg1)] = count();        s1total++; } tick-1sec /--interval <= 0/ {        printf("\\nHottest stacks...\\n");        trunc (@s1,30);        printa(@s1);        printf("\\nHottest functions...\\n");        trunc (@s2,30);        printa(@s2);        printf("\\nTOTAL samples %d\\n",s1total);        exit(0); } #chmod 755 uprof.d #./uprof.d <pid> <#seconds>  

There is another tool in the Sun Studio warchest, project D-Light, that will allow you to attach to a running process and observe whatever attributes you want. Under the covers, it uses Dtrace, so anything you can collect with Dtrace is possible to show in
D-Light. Since D-Light is graphical (think the timeline view that Analyzer has) the best Dtrace data to display is time on the
X-axis and the data you want to observe on the Y-axis. Here is a whitepaper that describes an overview of this.

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

Be more productive with the Sun High-Performance Computing platform.

Search

Categories
Archives
« July 2015
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
       
Today