a new view into software

As Bryan has observed the past, software has a quality unique to engineering disciplines in that you can build it, but you can't see it. DTrace changes that by opening windows into parts of the system that were previously unobservable and it does so in a way that minimally changes what you're attempting to observe -- this software "uncertainty principle" has limited the utility of previous observability tools. One of the darkest areas of debugging in user-land has been around lock contention.

In multi-threaded programs synchronization primitives -- mutexes, R/W locks, semaphores, etc. -- are required to coordinate each thread's efforts and make sure shared data is accessed safely. If many threads are kept waiting while another thread owns a sychronization primitive, the program is said to suffer from lock contention. In the kernel, we've had lockstat(1m) for many years, but in user-land, the techniqes for observing lock behavior and sorting out the cause or even the presence have been very ad hoc

the plockstat provider

I just finished work on the plockstat provider for DTrace as well as a new plockstat(1m) command for observing user-land synchronization objects. If you're unfamiliar with DTrace, you might want to take a quick look at the Solaris Dynamic Tracing Guide (look through it for some examples); that will help ground some of this explanation.

The plockstat provider has these probes:

mutex-acquirefires when a mutex is acquired
mutex-releasefires when a mutex is released
mutex-blockfires when a thread blocks waiting for a mutex
mutex-spinfires when a thread spins waiting for a mutex
rw-acquirefires when an R/W lock is acquired
rw-releasefires when an R/W lock is released
rw-blockfires when a thread blocks waiting for an R/W lock

It's possible with other tools to observe these points, but -- as anyone who's tried it can attest -- other tools can alter the effects you're trying to observe. Traditional debuggers can effectively serialize your parallel program removing any trace of the lock contention you'd see during a normal run. DTrace and the plockstat provider avoid eliminate this problem.

With the plockstat provider you can answer questions that were previously very difficult to solve, such as "where is my program blocked on mutexes":

bash-2.05b# dtrace -n plockstat1173:::mutex-block'{ @[ustack()] = count() }'
dtrace: description 'plockstat1173:::mutex-block' matched 2 probes
\^C
              libc.so.1`mutex_lock_queue+0xa9
              libc.so.1`slow_lock+0x3d
              libc.so.1`mutex_lock_impl+0xec
              libc.so.1`mutex_lock+0x38
              libnspr4.so`PR_Lock+0x1a
              libnspr4.so`PR_EnterMonitor+0x35
              libxpcom.so`__1cGnsPipePGetWriteSegment6MrpcrI_I_+0x3e
              libxpcom.so`__1cSnsPipeOutputStreamNWriteSegments6MpFpnPnsIOutputStream_pvpcIIpI_I3I5_I_+0x4f
              c4654d3c
              libxpcom.so`__1cUnsThreadPoolRunnableDRun6M_I_+0xb0
              libxpcom.so`__1cInsThreadEMain6Fpv_v_+0x32
              c4ec1d6a
              libc.so.1`_thr_setup+0x50
              libc.so.1`_lwp_start
                1

(any guesses as to what program this might be?)

Not just a new view for DTrace, but a new view for user-land.

the plockstat(1m) command

DTrace is an incredibly powerful tool, but some tasks are so common that we want to make it as easy as possible to use DTrace's facilities without knowing anything about DTrace. The plockstat(1m) command wraps up a bunch of knowledge about lock contention in a neat and easy to use package:

# plockstat -s 10 -A -p `pgrep locker`
\^C
Mutex block

-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
   13 22040260 locker`lock1                 locker`go_lock+0x47

      nsec ---- Time Distribution --- count Stack
     65536 |@@@@@@@@@@@@@@          |     8 libc.so.1`mutex_lock+0x38
    131072 |                        |     0 locker`go_lock+0x47
    262144 |@@@@@                   |     3 libc.so.1`_thr_setup+0x50
    524288 |                        |     0 libc.so.1`_lwp_start
   1048576 |                        |     0 
   2097152 |                        |     0 
   4194304 |                        |     0 
   8388608 |                        |     0 
  16777216 |@                       |     1 
  33554432 |                        |     0 
  67108864 |                        |     0 
 134217728 |                        |     0 
 268435456 |@                       |     1 
  ...

This has been a bit of a teaser. I only integrated plockstat into Solaris 10 yesterday and it will be a few weeks before you can access plockstat as part of the Solaris Express program, but keep an eye on the DTrace Solaris Express Schedule.

Comments:

I can't wait to use it! Great work!

Posted by Robert on August 31, 2004 at 05:57 AM PDT #

[Trackback] Ich kenne das Problem, in Programmen nach Fehler zu suchen, ja auch schon zur Gen�ge. Ich sitze beim Kunden, eine vom Kunden pr�ferierte Version eines Opensource-Tools funktioniert nicht richtig, und dann steht man erstmal da. Das Fehlersuchen ist dann...

Posted by c0t0d0s0.org on September 06, 2004 at 09:03 PM PDT #

Post a Comment:
Comments are closed for this entry.
About

Adam Leventhal, Fishworks engineer

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today