Tuesday Nov 18, 2008

Even More DTrace Lab Answers (9 & 10)

OK, let's get the rest out.

Exercise 9

List the processes that are connecting to a specific port.

OK, we'll take that to mean we want to know the name and the process ID of connections to a particular port. This requires just a little network programming knowledge. We can see that the second argument (arg1) to connect(3SOCKET) is a sockaddr. The extra knowledge that we need is that to get a port, we need to cast it to a sockaddr_in and look at the port structure element. This bit of code expects the command line argument 1 to be a port number.

#!/usr/sbin/dtrace -s

#pragma D option quiet

syscall::connect:entry {
	this->sock = (struct sockaddr_in \*)copyin(arg1, arg2);
	self->port = this->sock->sin_port;
syscall::connect:entry /self->port == $1/ {
	 printf("%5d %s\\n", pid, execname); }

syscall::connect:entry { self->port = 0; }

Exercise 10

Write a script to show where a specific system call, rename for example, is blocking and how long it is blocked for.

This one was a bit of fun to write, especially since the output format specifier for a stack is not documented and I had to find it in dt_printf.c.

We'll use the sched provider in this. While it would be nice to use the wakeup probe, it's not fired from the thread context so we won't have the thread local variable that we need to check and use. So, we just use on-cpu, which will tell us how long we were off cpu since we were told to block. Note that this also means that we are not tracking normal scheduling where it may have been pre-empted by another thread for whatever reason. This is what we want.

#!/usr/sbin/dtrace -s

#pragma D option quiet

BEGIN { printf("Collecting...  \^C to continue\\n"); }

syscall::rename:entry {self->interest = 1; }
sched:::sleep /self->interest/ { self->blocktime = timestamp; }
sched:::on-cpu /self->blocktime/ {
	this->taken = timestamp - self->blocktime; 
	@[curthread,stack()] = quantize(this->taken);
	self->blocktime = 0;
syscall::rename:return { self->interest = 0; }

END { printa("Thread 0x%p %k %@d\\n", @); }

This leaves us with only Exercise 11 to go. I'll try to get that up tomorrow.

Saturday Nov 15, 2008

More DTrace Lab Answers

OK, I promised more answers so here we go, ... (sorry, where files have copyright notices, I have to leave them there).

Exercise 2

Restrict the iosnoop.d to trace a specific process.

This solution will allow you to either specify a pid by appending '-p {pid}' or to run a command by appending '-c "command args ..." to the command line. eg dtrace -s iosnoop.d -p 1234

 \* The contents of this file are subject to the terms of the
 \* Common Development and Distribution License, Version 1.0 only
 \* (the "License").  You may not use this file except in compliance
 \* with the License.
 \* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
 \* or http://www.opensolaris.org/os/licensing.
 \* See the License for the specific language governing permissions
 \* and limitations under the License.
 \* When distributing Covered Code, include this CDDL HEADER in each
 \* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
 \* If applicable, add the following below this CDDL HEADER, with the
 \* fields enclosed by brackets "[]" replaced with your own identifying
 \* information: Portions Copyright [yyyy] [name of copyright owner]
 \* Copyright 2005 Sun Microsystems, Inc.  All rights reserved.
 \* Use is subject to license terms.
 \* This D script is used as an example in the Solaris Dynamic Tracing Guide
 \* wiki in the "io Provider" Chapter.
 \* The full text of the this chapter may be found here:
 \*   http://wikis.sun.com/display/DTrace/io+Provider
 \* On machines that have DTrace installed, this script is available as
 \* iosnoop.d in /usr/demo/dtrace, a directory that contains all D scripts
 \* used in the Solaris Dynamic Tracing Guide.  A table of the scripts and their
 \* corresponding chapters may be found here:
 \*   file:///usr/demo/dtrace/index.html

#pragma D option quiet

	printf("%10s %58s %2s\\n", "DEVICE", "FILE", "RW");

/pid == $target/
	printf("%10s %58s %2s\\n", args[1]->dev_statname,
	    args[2]->fi_pathname, args[0]->b_flags & B_READ ? "R" : "W");

Exercise 3

Display the arguments for the rename(2) system call along with its return code.

This is longer than it needs to be as I like the idea of a single line of output for each time the we have a rename() system call. I save the two arguments in thread local variables (self->source and self->destination). Note that the two arguments are strings and they are in user space when this probe fires, so we use copyinstr() to both get them into kernel context and make them printable strings. Note also that we clean up after ourselves in the return probe.

#!/usr/sbin/dtrace -s

#pragma D option quiet

syscall::rename:entry {
	self->source = copyinstr(arg0);
	self->destination = copyinstr(arg1);
syscall::rename:return {
	printf("%s: %s -> %s returned %d\\n", probefunc, self->source,
	    self->destination, arg1);
	self->source = self->destination = 0;

Exercise 4

Display the real and sys times (see timex(1)) for the syscall

I'm not completely happy with this as I saw some anomalous real time values running this on my notebook, but it should be about right. Note that we record a walltimestamp (seconds since epoch) and the vtimestamp (time on cpu in nanoseconds) and simply compare them in the return probe.

#!/usr/sbin/dtrace -s

#pragma D option quiet

syscall:::entry {
	self->seconds = walltimestamp;
	self->sys = vtimestamp;
syscall:::return /self->sys/ {
	printf("%s: %d real seconds, %dns system\\n",
            walltimestamp - self->seconds,
            vtimestamp - self->sys);
	self->seconds = self->sys = 0;

Exercise 5

Show the kernel function name that triggers the io:::start probe.

Many folks did this using stack(). There is a better way. This makes use of the fact that in the io provider, the probe\* values are defined.

$ pfexec dtrace -qn 'io:::start {printf("iostart:::probefunc called from %s\\n",probefunc);}'
iostart:::probefunc called from bdev_strategy

You could also simply use the default action without -q as that will print the probe specification each time it fires, which will include the probefunc.

Exercise 6

Show the flow of kernel functions for a write system call.

Most folks put a -F in the arguments on line 1. The clearer (and more correct) way to do this in a script is to use the pragma (highlighted).

#!/usr/sbin/dtrace -s

#pragma D option flowindent

syscall::write:entry { self->interest = 1; }
fbt::: /self->interest/ {}
syscall::write:return { self->interest = 0; }

Exercise 7

Show all lock events for mutexes that occur during two second.

The question hints that we may want to use an aggregation and summary rather than printing them as they happen.

#!/usr/sbin/dtrace -s3

#pragma D option quiet

lockstat:::adaptive\* {@[probename] = count();}
tick-2s {exit(0);}

Exercise 8

Find file to which most IO is being done.

I'm going to assume that the question means real IO, not IOs that hit the cache. In which case, we can use the answer to exercise 2 as a hint. Unfortunately, pretty much all of the IO that this tracks is being done by sched to flush buffers and we don't have any idea of the filename. I'll try to come up with a better solution.

#!/usr/sbin/dtrace -s

#pragma D option quiet

BEGIN { printf("Collecting data ... \^C to finish.\\n"); }
io:::start { @[args[2]->fi_pathname] = count(); }
END { trunc(@,1); printa(@); }

That's it for the moment, I'll finish them off later.

Friday Nov 14, 2008

DTrace Lab in the performance track at CEC

We have just been helping out in the DTrace lab for the performance track at CEC. On having a look at exercise 1, we suggested to the attendees that they attempt the other exercises and come back to this one. The exercise was:

Enhance /usr/demo/dtrace/iosnoop to be able to detect reads that are satisfied by the filesystem cache (UFS: pagecache).

We then went off to write our own versions of an answer to the problem.I started from scratch, and (referring back to a previous blog where I did some digging into this code), came up with the following:

#!/usr/sbin/dtrace -s

#pragma D option quiet

BEGIN { printf("Collecting ...\\n"); }

syscall::read\*:entry {
        self->interest = 1;
        self->file = fds[arg0].fi_name;
        self->phys = 0;

/\* we only increment this kstat if we have to go to disk, from a filesystem \*/

sysinfo:::bread /self->interest/ { self->phys = 1;}

syscall::read\*:return /self->interest/ {
        this->str = self->phys
            ? "Physical reads"
            : "Cache Reads";
        @[this->str, self->file] = count();
        self->interest = 0;

END { printa("%@6d %s %s\\n", @); }

On running this you get output like:

   209 Cache Reads jaxrpc-impl.jar
   211 Cache Reads appserv-admin.jar
   311 Cache Reads xalan.jar
   354 Cache Reads ttysrch
   372 Cache Reads jaxr-impl.jar
   373 Cache Reads auxv
   500 Cache Reads appserv-assemblytool.jar
  1216 Cache Reads 
  1324 Cache Reads appserv-rt.jar
  1392 Cache Reads clone@0:ptm
 83181 Cache Reads psinfo

Of course by the time I finished writing this, pretty much everything in the only ufs filesystem on the box had been cached :)

Thursday Nov 13, 2008

CEC '08 Saturday/Sunday

OK, I've been slack. Not only have I not blogged since September, I've been at CEC2008 since Saturday and also haven't written anything.

The trip out

Caught a shuttle to the airport, grabbed some lunch after clearing customs and caught a United flight to Los Angeles. Unfortunately I was stuck in a window seat at the back of the aircraft for the 13 hours in the air which was less than comfortable, sigh.

As I was near the back of the aircraft I was one of the last off and at the rear of the line to clear US Immigration in Los Angeles. It was great to see the officers actually smiling and laughing with the passengers. Picked up my luggage, cleared customs and rechecked it. Walked out to terminal 7 to get the flight to Las Vegas. Cleared security ok, but unfortunately one of my colleagues was behind someone who they decided to thoroughly check the bags of, and as a result he missed the flight and had to catch another.

I arrived in Las Vegas and went looking for my checked luggage, ...

This is the first time that this has happened to me. Apparently my luggage was still in LA, and might arrive on the next flight. As it turns out it wasn't the next flight, they got it to the Casino after I was asleep about 11:30pm, so I got it the next morning.

Met up with Chris Gerhard on the bus and headed out to the Paris Casino. Quite a few of us joined up for a quiet drink in Napoleon's Piano Bar.


Didn't wake up until around 10:30am and then spent the morning putting finishing touches on my presentation.

I noticed that the certification room was open, so I decided to give the Solaris 10 Security Administrator exam a go. This exam has a pass mark of 52% and the course notes and trial exams did not give me a lot of hope for getting through first go.

Imagine my surprise on scoring 71%! Wonderful I now posses the Admin, Nework and Security Certifications!

Attended the Welcome reception and hit the sack again. More later.


* - Solaris and Network Domain, Technical Support Centre

Alan is a kernel and performance engineer based in Australia who tends to have the nasty calls gravitate towards him


« July 2016

No bookmarks in folder

Sun Folk

No bookmarks in folder

Non-Sun Folk
Non-Sun Folks

No bookmarks in folder