Thursday Nov 12, 2009

The Kings of Computing use dtrace!

I've said many times that dtrace is not just a wonderful tool for developers and performance gurus. The Kings of Computing, which are of course System Admins, also find it really useful.

There is an ancient version of make called Parallel make that occasionally suffers from a bug (1223984) where it gets into a loop like this:

waitid(P_ALL, 0, 0x08047270, WEXITED|WTRAPPED)	Err#10 ECHILD
alarm(0)					= 30
alarm(30)					= 0
waitid(P_ALL, 0, 0x08047270, WEXITED|WTRAPPED)	Err#10 ECHILD
alarm(0)					= 30
alarm(30)					= 0
waitid(P_ALL, 0, 0x08047270, WEXITED|WTRAPPED)	Err#10 ECHILD

This will then consume a CPU and the users CPU shares. The application is never going to be fixed so the normal advice is not to use it. However since it can be NFS mounted from anywhere I can't reliably delete all copies of it so occasionally we will see run away processes on our build server.

It turns out this is a snip to fix with dtrace. Simply look for cases where the wait system call returns an error and errno is set to ECHILD (10) and if that happens 10 times in a row for the same process and that process does not call fork then stop the process.

The script is simple enough for me to just do it on the command line:


# dtrace -wn 'syscall::waitsys:return / arg1 <= 0 && 
execname == "make.bin" && errno == 10  && waitcount[pid]++ > 20 / {

	stop();

	printf("uid %d pid %d", uid, pid) }

syscall::forksys:return / arg1 > 0 / { waitcount[pid] = 0 }'
dtrace: description 'syscall::waitsys:return ' matched 2 probes
dtrace: allowing destructive actions
CPU     ID                    FUNCTION:NAME
  2  20588                   waitsys:return uid 36580 pid 29252
  3  20588                   waitsys:return uid 36580 pid 2522
  5  20588                   waitsys:return uid 36580 pid 28663
  7  20588                   waitsys:return uid 36580 pid 29884
 10  20588                   waitsys:return uid 36580 pid 941
 15  20588                   waitsys:return uid 36580 pid 1098

This was way easier then messing around with prstat, truss and pstop!

Wednesday Jan 28, 2009

Cardiff System Administration Mash up

Big thank you Clive for arranging and Cardiff University for hosting the System Admin Mash up today. I was paritcularly pleased to have 100% of the Cardiff OpenSolaris User group present and to hear of people exploring COMSTAR on Thumpers to produce high performance OpenStorage at a very affordable price.

Lewis gave an excellent overview of the VSCAN & CIFS server with some brave demostrations using VirtualBox to host servers and clients. If you can persuade him to repeat it then do so.

However good though that was, for me, that was not the highlight of the day. That goes to Gwent Police Crime Forensics Unit who gave a excellent and and informative presentation about the challenges and successes of investigating issues around computer forensics. As storage devices get bigger the problems can only increase.

About

This is the old blog of Chris Gerhard. It has mostly moved to http://chrisgerhard.wordpress.com

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today