Friday Jan 23, 2009

DTrace at its best!

Using DTrace's destructive actions, you can perform actions on your operating system you never thought of before, like backup files with a zfs snapshot right before a user deletes them, or halt any process before it is ended.

With the help of the files in directory /usr/demo/dtrace, the DTrace Toolkit, and my colleagues who showed me the best probe for intervening before a process is really ended, I wrote and tested the following short DTrace script to halt certain processes before they are ended. It can be very useful if you encounter a large number of short-lived processes which you could not analyze otherwise. In the following example, we are looking only for processes running with userid 4 (username adm) and with the executable name date. I saved it as file name stopper.d.

#!/usr/sbin/dtrace -ws

syscall::rexit:entry
/(execname == "date") && uid == 4/
{
   printf ("%d(%d): %d %d %d %d, %s, >%s<: %Y", pid, ppid, uid,
     curpsinfo->pr_projid, curpsinfo->pr_zoneid,
     curpsinfo->pr_dmodel, cwd, curpsinfo->pr_psargs, walltimestamp);
/\*   stack(); \*/
/\*   ustack(); \*/
/\*   system ("pmap -x %d", pid); \*/
   printf ("\\nStopping Process %d ...", pid);
   stop();
   printf (" done.");
   system ("ps -eo user,pid,ppid,s,zone,projid,pri,class,nice,args | nawk '$2==\\"%d\\"{print}'", pid);
}

Be warned! Adapt the filter rules carefully on a test system before using the script on the system on which you want to halt processes! Use the script on your own risk - I cannot guarantee for anything!

For listing the stopped processes, you can use the following command:

$ ps -eo user,pid,ppid,s,zone,projid,pri,class,nice,args | \\
   nawk '$4=="T" && /date/{print}'

And for ending these stopped processes, you can use that one: 

$ ps -eo user,pid,ppid,s,zone,projid,pri,class,nice,args | \\
   nawk '$4=="T" && /date/{system ("kill -9 "$2)}'

For testing, I created a script named start-50-date-processes.ksh to start 50 date processes roughly at the same time, then started the DTrace script above as user root:

./stopper.d
and afterwards started the test script as user adm:
./start-50-date-processes.ksh

A sample output looks like:

$ ./stopper.d 
dtrace: script './stopper.d' matched 1 probe
dtrace: allowing destructive actions
CPU     ID  FUNCTION:NAME
  1   3413  rexit:entry 21922(5058): 4 3 0 1, /var/adm/bin, >date<: 2009 Jan 23 13:47:37
Stopping Process 21922 ... done.
     adm 21922  5058 T   global     3  57   IA 24 date

  1   3413  rexit:entry 22005(5058): 4 3 0 1, /var/adm/bin, >date<: 2009 Jan 23 13:47:41
Stopping Process 22005 ... done.
     adm 22005  5058 T   global     3  47   IA 24 date

  0   3413  rexit:entry 22090(22089): 4 3 0 1, /var/adm/bin, >date<: 2009 Jan 23 13:47:56
Stopping Process 22090 ... done.
     adm 22090     1 T   global     3  44   IA 20 date
... (some more lines)

For stopping the DTrace script, just press <ctrl>c in the window where you started it. Stopping the script will not let the stopped processes continue - they remain in the "T" (for Trace) status until they are killed.

The DTrace script will run faster if you comment out its last line (where it executes the ps command for each stopped process).

And here's the test script (for starting 50 date processes) which I executed as user adm:

$ cat start-50-date-processes.ksh
#!/bin/ksh
i=50
while [[ i -gt 0 ]]; do
   date &
   (( i = i - 1 ))
# or (( i-- )) with ksh93 on Solaris 10, or OpenSolaris
done

Wednesday Dec 31, 2008

A great start into the new year 2009!

I am sure you agree - at least after watching the video on Brendan's blog entry about I/O performance analysis on the Sun Storage 7000 series. The direct link is here.

Happy New Year to everyone!

About

blogfinger

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today