Wednesday Jul 09, 2008

Getting dtrace to exit with the same status as the traced program

Today's dtrace one-liner is how do you get dtrace to exit with the same exit code as the program you are tracing so you can use it in a script?

There are going to be many ways to solve this. Mine is simply to include a probe for the exit system call and then pass arg0 of the exit system call to the dtrace exit() function:

while dtrace -n 'syscall::\*exit:entry /pid == $target / { trace(arg0); exit(arg0) } fbt:autofs::return / pid == $target / { stack(5); trace(arg1) }' -c 'ls -d /home/cjg'
do
umountall -F nfs
svcadm restart autofs
done

Simple and allows the test to run and run until the problem is seen.

Thursday Jul 03, 2008

Dtracing gethostbyname_r

Today's dtrace “one liner” is to dig into why a process that is failing when a call to gethostbyname_r(). Now if the application reported all that was going on this would not be needed, meanwhile in the real world we are lucky enough to have dtrace. To test it out I'm tracing the getent(1) command. In real life it would use the -p flag to dtrace and the process ID of the daemon.

/usr/sbin/dtrace -n 'pid$target::gethostbyname_r:entry {
        self->name = arg0;
}
pid$target::gethostbyname_r:return / arg1 == 0 / {
        ustack(5);
}
pid$target::gethostbyname_r:return / self->name / {
        printf("%s", copyinstr(self->name));
        self->name = 0
}' -c "getent hosts xxxxxdredd"
dtrace: description 'pid$target::gethostbyname_r:entry ' matched 3 probes
dtrace: pid 4748 has exited
CPU     ID                    FUNCTION:NAME
  2  48759           gethostbyname_r:return 
              libnsl.so.1`gethostbyname_r+0xc4
              getent`dogethost+0x54
              getent`main+0x7c
              getent`_start+0x108

  2  48759           gethostbyname_r:return xxxxxdredd

It would be nice to be able to get the h_errno value as well but so far I've not managed that.

Update:

Jon Haslem kindly explained to me the subtleties of copyin() so that I can get the h_errno value.

/usr/sbin/dtrace -Zn 'pid$target::gethostbyname_r:entry {
        self->name = arg0; 
        self->errno = arg4; 
}
pid$target::gethostbyname_r:return / arg1 == 0 / {
        ustack(5);
        printf("%d %s h_errno %x", pid,
                copyinstr(self->name), 
        \*(int \*)copyin(self->errno,sizeof(int)));
}' -c "getent hosts xxxxxdredd"
dtrace: description 'pid$target::gethostbyname_r:entry ' matched 2 probes
dtrace: pid 5087 has exited
CPU     ID                    FUNCTION:NAME
  2  48764            gethostbyname_r:entry errno: d4220008
  2  48765           gethostbyname_r:return 
              libnsl.so.1`gethostbyname_r+0xc4
              getent`dogethost+0x54
              getent`main+0x7c
              getent`_start+0x108
5087 xxxxxdredd h_errno 1

# 

Wednesday Jul 02, 2008

What is the maximum number of commands queued to a LUN

This is not quite a one liner as I'm reusing the code from a previous post to print out the devices in a human readable form other wise it is just a one liner and was when I typed it in.

The question posed here was what is the maximum number of commands sent to a LUN at any one time? Clearly this will max out at the throttle for the device however what was interesting, since the customer had already tuned the throttle down and the problem had gone away was what was their configuration capable of sending to the LUN:

#!/usr/sbin/dtrace -qCs

#define SD_TO_DEVINFO(un) ((struct dev_info \*)((un)->un_sd->sd_dev))

#define DEV_NAME(un) \\
        stringof(`devnamesp[SD_TO_DEVINFO(un)->devi_major].dn_name) /\* ` \*/

#define DEV_INST(un) (SD_TO_DEVINFO(un)->devi_instance)

fbt:\*sd:\*sd_start_cmds:entry  { @[DEV_NAME(args[0]),DEV_INST(args[0])] = max(arg
s[0]->un_ncmds_in_driver) }
END {
        printa("%s%d %@d\\n", @);
}


This produces a nice list of disk devices and the maximum number of commands that have been sent to them at anyone time:

# dtrace -qCs  /var/tmp/max_sd.d -n 'tick-5sec { exit(0) }'
sd2 1
sd0 70

# 

Combine that with the dscript from the latency bubble posting earlier and you can drill down on where your IO is waiting.

Tuesday Jul 01, 2008

Today's Dtrace one liner

Todays dtrace one liner is part of a case investigating why messages are not making it into the messages file. Using the divide and concur priciple the first question you need to answer is: “Is the process that is supposed to generate the messages calling into syslog?”

$ dtrace -n 'pid$target::syslog:entry { printf("%d %s", arg0, copyinstr(arg1)) }' -p $(pgrep xxxx)
dtrace: description 'pid$target::syslog:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  1  43227                     syslog:entry 5 %s
\^C

That is enough to answer the first question. You can get all flash and pull out the string passed in as the second argument but then it is not a one liner and is answering a different question. However it is neat so here it is:

$ dtrace -qn 'pid$target::syslog:entry { printf("%d %s", arg0, (this->arg1 = copy
instr(arg1))) }
pid$target::syslog:entry / this->arg1 == "%s"/ { printf(" %s\\n", copyinstr(arg2))
}' -p $(pgrep xxxx)

How about varargs and vsprintf for dtrace....

About

This is the old blog of Chris Gerhard. It has mostly moved to http://chrisgerhard.wordpress.com

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today