Deadman Walking


Within Sun we consider the concept of observability to be of the utmost importance, as can be seen by the plethora of tools bundled within OpenSolaris and noted in the Observability Community on opensolaris.org. This belief is best summed up on the observability page with the phrase "If you cannot see the problem, you cannot fix it".

This belief in observability extends to postmortem analysis as well (see dumpadm(1M)). Unfortunately at times development builds can "just hang" [tm], living you in somewhat of quandry as to what you should do. Thankfully Solaris provides a way to still get a crashdump from this kind of situation via Deadman.

Enabling Deadman

To enable deadman first ensure you are capturing crash images with dumpadm(1M) and then just add
set snooping=1
to /etc/system. Note that any zones on your system will inherit the deadman setting as well.

Slightly More Gory Details

Deadman is a high level cyclic (see the cyclics subsystem), which monitors the kernel lbolt variable, which in turn is incremented everytime the clock interrupt fires. The deadman code is executed once a second, and once it notices that lbolt hasn't been incremented in snoop_interval / MICROSEC times (set up in deadman_init) it will cause a panic.
Technorati Tag(s) : ,
Comments:

Post a Comment:
Comments are closed for this entry.
About

fintanr

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today