X

a new view into software

Guest Author

As Bryan has observed the past, software has a quality unique to engineering disciplines in that you can build it, but you can't see it. DTrace changes that by opening windows into parts of the system that were previously unobservable and it does so in a way that minimally changes what you're attempting to observe -- this software "uncertainty principle" has limited the utility of previous observability tools. One of the darkest areas of debugging in user-land has been around lock contention.

In multi-threaded programs synchronization primitives -- mutexes, R/W locks, semaphores, etc. -- are required to coordinate each thread's efforts and make sure shared data is accessed safely. If many threads are kept waiting while another thread owns a sychronization primitive, the program is said to suffer from lock contention. In the kernel, we've had lockstat(1m) for many years, but in user-land, the techniqes for observing lock behavior and sorting out the cause or even the presence have been very ad hoc

the plockstat provider

I just finished work on the plockstat provider for DTrace as well as a new plockstat(1m) command for observing user-land synchronization objects. If you're unfamiliar with DTrace, you might want to take a quick look at the Solaris Dynamic Tracing Guide (look through it for some examples); that will help ground some of this explanation.

The plockstat provider has these probes:
























mutex-acquirefires when a mutex is acquired
mutex-releasefires when a mutex is released
mutex-blockfires when a thread blocks waiting for a mutex
mutex-spinfires when a thread spins waiting for a mutex
rw-acquirefires when an R/W lock is acquired
rw-releasefires when an R/W lock is released
rw-blockfires when a thread blocks waiting for an R/W lock

It's possible with other tools to observe these points, but -- as anyone who's tried it can attest -- other tools can alter the effects you're trying to observe. Traditional debuggers can effectively serialize your parallel program removing any trace of the lock contention you'd see during a normal run. DTrace and the plockstat provider avoid eliminate this problem.

With the plockstat provider you can answer questions that were previously very difficult to solve, such as "where is my program blocked on mutexes":

bash-2.05b# dtrace -n plockstat1173:::mutex-block'{ @[ustack()] = count() }'
dtrace: description 'plockstat1173:::mutex-block' matched 2 probes
\^C
libc.so.1`mutex_lock_queue+0xa9
libc.so.1`slow_lock+0x3d
libc.so.1`mutex_lock_impl+0xec
libc.so.1`mutex_lock+0x38
libnspr4.so`PR_Lock+0x1a
libnspr4.so`PR_EnterMonitor+0x35
libxpcom.so`__1cGnsPipePGetWriteSegment6MrpcrI_I_+0x3e
libxpcom.so`__1cSnsPipeOutputStreamNWriteSegments6MpFpnPnsIOutputStream_pvpcIIpI_I3I5_I_+0x4f
c4654d3c
libxpcom.so`__1cUnsThreadPoolRunnableDRun6M_I_+0xb0
libxpcom.so`__1cInsThreadEMain6Fpv_v_+0x32
c4ec1d6a
libc.so.1`_thr_setup+0x50
libc.so.1`_lwp_start
1


(any guesses as to what program this might be?)

Not just a new view for DTrace, but a new view for user-land.

the plockstat(1m) command

DTrace is an incredibly powerful tool, but some tasks are so common that we want to make it as easy as possible to use DTrace's facilities without knowing anything about DTrace. The plockstat(1m) command wraps up a bunch of knowledge about lock contention in a neat and easy to use package:

# plockstat -s 10 -A -p `pgrep locker`
\^C
Mutex block
-------------------------------------------------------------------------------
Count nsec Lock Caller
13 22040260 locker`lock1 locker`go_lock+0x47
nsec ---- Time Distribution --- count Stack
65536 |@@@@@@@@@@@@@@ | 8 libc.so.1`mutex_lock+0x38
131072 | | 0 locker`go_lock+0x47
262144 |@@@@@ | 3 libc.so.1`_thr_setup+0x50
524288 | | 0 libc.so.1`_lwp_start
1048576 | | 0
2097152 | | 0
4194304 | | 0
8388608 | | 0
16777216 |@ | 1
33554432 | | 0
67108864 | | 0
134217728 | | 0
268435456 |@ | 1
...

This has been a bit of a teaser. I only integrated plockstat into Solaris 10 yesterday and it will be a few weeks before you can access plockstat as part of the Solaris Express program, but keep an eye on the DTrace Solaris Express Schedule.

Join the discussion

Comments ( 2 )
  • Robert Tuesday, August 31, 2004
    I can't wait to use it!
    Great work!
  • c0t0d0s0.org Tuesday, September 7, 2004
    [Trackback] Ich kenne das Problem, in Programmen nach Fehler zu suchen, ja auch schon zur Gen�ge. Ich sitze beim Kunden, eine vom Kunden pr�ferierte Version eines Opensource-Tools funktioniert nicht richtig, und dann steht man erstmal da. Das Fehlersuchen ist dann...
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
Oracle

Integrated Cloud Applications & Platform Services