The DTrace deadman mechanism

In the last few weeks I've been asked about the DTrace deadman mechanism a few times so I thought I'd put up a bit of an explanation about what's going off here. Actually, I'm not usually asked directly about it but really about the end state that the consumer ends up in. If you've seen the following from a consumer then you've encountered the deadman mechanism:

# dtrace -n 'syscall:::entry{@[probefunc] = count();}'
dtrace: description 'syscall:::entry' matched 232 probes
dtrace: processing aborted: Abort due to systemic unresponsiveness
Firstly, why do we have such a mechanism? Well, whilst not a common occurrence it is possible for a consumer to construct enablings which would cause the system to become unresponsive. In such a situation, rather than leave the system unusable we would rather abort the consumer that looks to have induced the condition we are in. DTrace has two seperate criteria that it uses as indicators of system instability:
  • A per-consumer in-kernel timer that fires once a second must have fired once in a period of 10 seconds.
  • A DTrace consumer, which usually does a status check every second, hasn't checked in for 30 seconds.
If we hit either of these conditions then no further processing is done on the consumer and it is aborted. So, what can be done if you are, for whatever reason, activating the deadman mechanism? Well, the first thing to point out is that it may not be DTrace's fault that this is happening. If the system is very heavily loaded or if there is a bug out there that is inducing one of the above conditions then you may get aborted when your enablings are fine. Still, if you are hitting the deadman and you are reasonably confident that you haven't done anything too crazy in your D code, you can disable it simply by enabling destructive actions. This can be done in two ways:
  • Use the #pragma D option destructive directive in your D script.
  • Specify the -w flag to dtrace(1M)
You can also try tuning out the deadman by altering the kernel variables which control the mechanism. These are:
  • dtrace_deadman_timeout - Specifies the time period in which the per-consumer in-kernel timer must fire. The default value is 10000000000 nanoseconds (10 secs). Tuning up this variable with mdb(1) is your best starter for ten.
  • dtrace_deadman_user - specifies the maximum period we allow to pass without the user-land consumer checking-in. The default value is 30000000000 nanoseconds (30 secs).
  • dtrace_deadman_interval - specifies the frequency of the timer where we evaluate and update state used to determine whether our two conditions specified above have been satisfied. The default value is 1000000000 (1 sec).
Changes to dtrace_deadman_interval must be done before a consumer is started but changes to dtrace_deadman_timeout and dtrace_deadman_user can be done whilst a consumer is active.

How do you know which condition you are hitting when you invoke the deadman mechanism? Well, that's a tough one as there isn't a particularly nice mechanism in place to do that. To do this we need to use mdb(1) and have a knowledge of the source in this area... When a consumer first starts we set up a low level cyclic that fires every dtrace_deadman_interval nanoseconds (the code for this is here ). So, by default, we end up calling the dtrace_state_deadman function once a second. A sightly reduced version of this function is:
static void
dtrace_state_deadman(dtrace_state_t \*state)
        hrtime_t now;


        now = dtrace_gethrtime();

        if (state != dtrace_anon.dta_state &&
            now - state->dts_laststatus >= dtrace_deadman_user)

        state->dts_alive = INT64_MAX;
        state->dts_alive = now;
In the above we first check to see if the consumer is not an anonymous consumer and that state->dts_laststatus has been updated in the last dtrace_deadman_user nanoseconds. The consumer's dts_laststatus state member is updated once a second (by default with dtrace(1M)) when the consumer calls dtrace_work() which in turns asks for the status of the consumer by calling dtrace_status(). If the consumer has checked in during the last 30 seconds we also update the dts_alive state member with the current nanosecond timestamp. This is important...

When the consumer fires a probe we get punted into dtrace_probe() which is where the action really happens as we execute the following sequence:
                if (now - state->dts_alive > dtrace_deadman_timeout) {
                         \* We seem to be dead.  Unless we (a) have kernel
                         \* destructive permissions (b) have expicitly enabled
                         \* destructive actions and (c) destructive actions have
                         \* not been disabled, we're going to transition into
                         \* the KILLED state, from which no further processing
                         \* on this state will be performed.
                        if (!dtrace_priv_kernel_destructive(state) ||
                            !state->dts_cred.dcr_destructive ||
                            dtrace_destructive_disallow) {
                                void \*activity = &state->dts_activity;
                                dtrace_activity_t current;

                                do {
                                        current = state->dts_activity;
                                } while (dtrace_cas32(activity, current,
                                    DTRACE_ACTIVITY_KILLED) != current);

This is the code that makes the actual decision to abort the consumer. Notice that the decision is based solely upon the dts_alive state member having not been updated in the previous dtrace_deadman_timeout nonseconds (10 secs by defult). This can happen based upon the once a second cyclic not firing at all for 10 secs or the consumer hasn't checked in for the last 30 seconds. The astute reader may note that for the case where the consumer hasn't checked its status we actually end up waiting a time period of dtrace_deadman_user+dtrace_deadman_timeout before we abort the client (which would be 40 secs'ish by default from when the consumer last checked in). We can use the above knowledge to possibly figure out what is going off. Using mdb(1) we can inspect the dts_laststatus, dts_alive and dts_activity members of the consumers dtrace_state structure. A simple script to do this which assumes you only have a single consumer running is:

while [ 1 ]
        sleep 1
        echo "::walk dtrace_state | ::print dtrace_state_t dts_laststatus dts_alive dts_activity" | mdb -k
If dts_laststatus is not increasing then the consumer isn't running. If dts_alive is not increasing then either the cyclic isn't running or the consumer hasn't been serviced in the last dtrace_deadman_user nanoseconds - in the latter case then both dts_laststatus and dts_alive would be constant. When dts_alive hasn't been updated in dtrace_deadman_timeout nanoseconds the dts_activity member is transitioned to the DTRACE_ACTIVITY_KILLED state. An example of these fields when the consumer is transitioned into the KILLED state owing to a consumer that's not checking in as it should do:
Mon May 21 11:39:29 BST 2007
dts_laststatus = 0x15521d81431
dts_alive = 0x15558b8b842
dts_activity = 2 (DTRACE_ACTIVITY_ACTIVE)
Mon May 21 11:39:32 BST 2007
dts_laststatus = 0x15521d81431
dts_alive = 0x15558b8b842
dts_activity = 6 (DTRACE_ACTIVITY_KILLED)
Finally, you will note in the above block of code the checks that are done to see whether or not we should actually carry on and abort the consumer. The 3 checks carried out are:
  • Does the consumer have the privilege to bypass the deadman. Only a consumer ran by the root user can avoid the deadman timer.
  • Have destructive actions been explicitly enabled?
  • Have destructive actions been globally disabled by setting the dtrace_destructive_disallow variable to 1.

There may well be some bits that I've missed out of the above so let me know if I have. Also, if you are seeing behaviour that doesn't fit the above then feel free to contact me directly or just mail the mailing list.


Post a Comment:
Comments are closed for this entry.



« July 2016