Sunday Jul 29, 2007

DTracing off-CPU time

In this entry I'll demonstrate DTracing off-CPU time.

On my way back from OSCON 2007, I found myself in Portland airport with a laptop, no Internet and half an hour to kill. In fact, the same laptop that I used during the SVOSUG talk in April. It was a chance to finish the demo that I started back then.

For those that were there or listened in, the talk had an unfortunate start with over an hour of audiovisual issues - including problems with the laptop video driver, laptop boot process, the data projector, lights and microphone. Alan DuBoff did a good job of fixing things while I began plan B (presenting from a SunRay), and he fixed the video driver such that it works better with data projectors than any other Acer Ferrari I've tried.

One issue we didn't fix that night was a 20 second wait when starting up certain applications, such as gnome-terminal. It was to make a great live DTrace demo to finish with, but I ran out of time to do it properly (we finished at around 11:30pm). Here it is as a blog entry instead.

The problem

When running gnome-terminal, there is a 20 second wait before the terminal appears on the screen.

Initial check

During the 20 second wait,
  • vmstat 1 showed idle CPUs
  • prstat -m showed that gnome-terminal was not on-CPU

DTrace investigation

The gnome-terminal application is off-CPU for some reason, and the CPUs are otherwise idle. There are numerous different ways to begin the investigation with DTrace, including,
  • using the sched provider to trace off-CPU to on-CPU event time
  • using the syscall provider to trace elapsed time for syscalls
  • using the pid provider to trace elapsed time for application binary and library calls
I'll start with the sched provider (the syscall provider would also make a good starting point). I'll keep restarting gnome-terminal, so that I can DTrace all the events (this is harder if you are trying to DTrace something that is already off-CPU).

The following one-liner runs gnome-terminal and measures the time from that process leaving the CPU to when it returns, and only prints user stack traces if that time was over 1 second,

# dtrace -n 'sched:::off-cpu /pid == $target/ { self->start = timestamp; }
  sched:::on-cpu /self->start && ((timestamp - self->start) > 1000000000)/
  { printf("waited: %d ms\\n", (timestamp - self->start) / 1000000); ustack(); }
' -c gnome-terminal
dtrace: description 'sched:::off-cpu ' matched 6 probes
CPU     ID                    FUNCTION:NAME
  0  48718                    resume:on-cpu waited: 20034 ms

              libc.so.1`door_call+0x1a
              libc.so.1`_nsc_trydoorcall+0x213
              libnsl.so.1`_door_getipnodebyname_r+0x8f
              libnsl.so.1`_get_hostserv_inetnetdir_byname+0xb62
              libnsl.so.1`getipnodebyname+0xdf
              libsocket.so.1`get_addr+0x126
              libsocket.so.1`_getaddrinfo+0x414
              libsocket.so.1`getaddrinfo+0x19
              libORBit-2.so.0.1.0`get_netid+0x91
              libORBit-2.so.0.1.0`link_get_local_hostname+0x35
              libORBit-2.so.0.1.0`link_server_setup+0x56
              libORBit-2.so.0.1.0`giop_server_new+0x5e
              libORBit-2.so.0.1.0`ORBit_ORB_start_servers+0x1d8
              libORBit-2.so.0.1.0`IOP_generate_profiles+0x67
              libORBit-2.so.0.1.0`ORBit_marshal_object+0x8e
              libORBit-2.so.0.1.0`ORBit_marshal_value+0x391
              libORBit-2.so.0.1.0`orbit_small_marshal+0xf7
              libORBit-2.so.0.1.0`ORBit_small_invoke_stub+0x11c
              libORBit-2.so.0.1.0`ORBit_small_invoke_stub_n+0x43
              libORBit-2.so.0.1.0`ORBit_c_stub_invoke+0x132

\^C
Hmm - looks like that user stack trace was truncated. Time to boost the ustackframes value,
# dtrace -x ustackframes=64 -n '
  sched:::off-cpu /pid == $target/ { self->start = timestamp; }
  sched:::on-cpu /self->start && ((timestamp - self->start) > 1000000000)/
  { printf("waited: %d ms\\n", (timestamp - self->start) / 1000000); ustack(); }
' -c gnome-terminal
dtrace: description 'sched:::off-cpu ' matched 6 probes
dtrace: pid 101415 has exited
CPU     ID                    FUNCTION:NAME
  0  48720                    resume:on-cpu waited: 20038 ms

              libc.so.1`door_call+0x1a
              libc.so.1`_nsc_trydoorcall+0x213
              libnsl.so.1`_door_getipnodebyname_r+0x8f
              libnsl.so.1`_get_hostserv_inetnetdir_byname+0xb62
              libnsl.so.1`getipnodebyname+0xdf
              libsocket.so.1`get_addr+0x126
              libsocket.so.1`_getaddrinfo+0x414
              libsocket.so.1`getaddrinfo+0x19
              libORBit-2.so.0.1.0`get_netid+0x91
              libORBit-2.so.0.1.0`link_get_local_hostname+0x35
              libORBit-2.so.0.1.0`link_server_setup+0x56
              libORBit-2.so.0.1.0`giop_server_new+0x5e
              libORBit-2.so.0.1.0`ORBit_ORB_start_servers+0x1d8
              libORBit-2.so.0.1.0`IOP_generate_profiles+0x67
              libORBit-2.so.0.1.0`ORBit_marshal_object+0x8e
              libORBit-2.so.0.1.0`ORBit_marshal_value+0x391
              libORBit-2.so.0.1.0`orbit_small_marshal+0xf7
              libORBit-2.so.0.1.0`ORBit_small_invoke_stub+0x11c
              libORBit-2.so.0.1.0`ORBit_small_invoke_stub_n+0x43
              libORBit-2.so.0.1.0`ORBit_c_stub_invoke+0x132
              libgconf-2.so.4.1.0`ConfigServer_add_client+0x4f
              libgconf-2.so.4.1.0`gconf_get_config_server+0xb9
              libgconf-2.so.4.1.0`gconf_engine_connect+0x24f
              libgconf-2.so.4.1.0`gconf_engine_get_default+0x4c
              libgconf-2.so.4.1.0`gconf_client_get_default+0x2a
              libgnomeui-2.so.0.1401.0`libgnomeui_post_args_parse+0x187
              libgnome-2.so.0.1401.0`gnome_program_postinit+0x61
              libgnome-2.so.0.1401.0`gnome_program_init_common+0x37b
              libgnome-2.so.0.1401.0`gnome_program_initv+0x4d
              libgnome-2.so.0.1401.0`gnome_program_init+0x56
              gnome-terminal`main+0x2d5
              gnome-terminal`_start+0x7a
Good. This stack trace is likely to have led to the process leaving the CPU for the measured 20038 ms (the system was otherwise idle, so it is unlikely to have been kicked off due to scheduling). Reading through the lines, it looks like it is resolving a hostname - a common source of latencies when DNS is misconfigured.

There are a number of lines we could begin studying to confirm what is happening (and arrive at the same answer); I'll start with libnsl.so.1`getipnodebyname+0xdf, which should have a man page entry,

# man getipnodebyname
[...]
     struct hostent \*getipnodebyname(const char  \*name,  int  af,
     int flags, int \*error_num);
[...]
Now to check what hostname is being resolved,
# dtrace -n 'pid$target:libnsl:getipnodebyname:entry { trace(copyinstr(arg0)); }'
  -c gnome-terminal
dtrace: description 'pid$target:libnsl:getipnodebyname:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0  50488            getipnodebyname:entry   marlin
\^C
dtrace: pid 101758 terminated by SIGINT
Oh - "marlin" is the hostname of this laptop, which should resolve just fine. Checking related configuration files,
# grep hosts /etc/nsswitch.conf
hosts:      files dns
#
# cat -n /etc/hosts
     1  #
     2  # Copyright 2006 Sun Microsystems, Inc.  All rights reserved.
     3  # Use is subject to license terms.
     4  #
     5  #ident  "@(#)hosts      1.1     06/08/04 SMI"
     6  #
     7  127.0.0.1       localhost
     8  192.168.1.166   marlin marlin.sf.fw.jpn.com
Hmm - the name service switch file should cause /etc/hosts (/etc/inet/hosts) to be read first, which contains a valid entry for "marlin".

Lets take a closer look at the getipnodebyname() call, and trace the address family and flags arguments,

# dtrace -n 'pid$target:libnsl:getipnodebyname:entry { printf("%s af=%d flags=%d",
  copyinstr(arg0), arg1, arg2); }' -c gnome-terminal
dtrace: description 'pid$target:libnsl:getipnodebyname:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0  50488            getipnodebyname:entry marlin af=26 flags=19
\^C
dtrace: pid 101821 terminated by SIGINT
Ok, address family 26 is,
# grep 26 /usr/include/sys/socket.h
#define AF_INET6        26              /\* Internet Protocol, Version 6 \*/
IPv6! Should have checked earlier,
# grep ipnodes /etc/nsswitch.conf
ipnodes:    files dns
#
# cat -n /etc/inet/ipnodes
     1  #
     2  # Copyright 2006 Sun Microsystems, Inc.  All rights reserved.
     3  # Use is subject to license terms.
     4  #
     5  #ident  "@(#)ipnodes    1.1     06/08/04 SMI"
     6  #
     7  ::1             localhost
     8  127.0.0.1       localhost
Adding an entry to the ipnodes file for "marlin" fixed the issue.

Take 2

Was using the sched provider and then pid to trace getipnodebyname() the best path to take? It's hard to say - there are many different ways DTrace can help you solve problems. Lets try some other paths.

Here I'll trace elapsed times from the syscall provider - since if we have an off-CPU issue on an idle system, it will almost certainly be visible at the syscall layer.

The procsystime tool from the DTraceToolkit performs different types of syscall time analysis. I'm using it as it saves a minute or so of typing. Here it runs the gnome-terminal command and produces a report of elapsed times by syscall,

# /opt/DTT/procsystime -e gnome-terminal

Elapsed Times for command gnome-terminal,

         SYSCALL          TIME (ns)
           gtime               1882
      sigpending               1921
     priocntlsys               4200
          sysi86               4354
          getgid               5152
       sysconfig               5606
     getsockname               5904
      systeminfo               7040
      setcontext               7342
          fxstat               7943
     getpeername               9393
       getrlimit              11717
          getuid              13927
           uname              14661
       sigaction              14698
          getpid              19241
      setsockopt              23743
          getcwd              33784
            fsat              41700
          stat64              43024
           readv              45937
          listen              49874
          llseek              52341
           mkdir              52633
            pipe              64612
           chmod              65684
           fcntl              70864
           utime              76677
         fstat64             109420
           ioctl             133301
          access             157803
          unlink             165703
          open64             221120
          accept             276863
            bind             281026
          writev             343976
             brk             344410
         memcntl             347475
           write             358764
      getdents64             422206
          munmap             486137
         connect             588166
     resolvepath             686768
       so_socket             748039
           close             846782
            open            1127761
            read            1191102
            mmap            1592314
           xstat            8521257
         pollsys            9817661
          doorfs        20035299808
The doorfs() call takes the door descriptor as the first argument,
# dtrace -n 'syscall::doorfs:entry /pid == $target/ { trace(arg0); }' -c gnome-terminal
dtrace: description 'syscall::doorfs:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0    572                     doorfs:entry                 3
  0    572                     doorfs:entry                 3
  0    572                     doorfs:entry                 3
  0    572                     doorfs:entry                 3
\^C
dtrace: pid 101833 terminated by SIGINT
#
# dtrace -n 'syscall::doorfs:entry /pid == $target/ { trace(fds[arg0].fi_pathname); }'
  -c gnome-terminal
dtrace: description 'syscall::doorfs:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0    572                     doorfs:entry   /var/run/name_service_door
  0    572                     doorfs:entry   /var/run/name_service_door
  0    572                     doorfs:entry   /var/run/name_service_door
  0    572                     doorfs:entry   /var/run/name_service_door
\^C
dtrace: pid 101835 terminated by SIGINT
The door calls were to /var/run/name_service_door, pointing to name resolution and the nscd process (which can be confirmed with more DTrace).

Running procsystime on nscd produces,

# /opt/DTT/procsystime -e -n nscd
Hit Ctrl-C to stop sampling...
\^C

Elapsed Times for processes nscd,

         SYSCALL          TIME (ns)
           ioctl               2452
          llseek               2615
         fstat64               3996
       sysconfig              13933
            open              15416
            read              26336
           gtime              52980
       so_socket             323066
           close             349885
           xstat             430465
            send             454944
         connect             795246
          doorfs          360127347
        lwp_park        10009952745
         pollsys        20032365315
       nanosleep       153088010532
nscd is multi-threaded, with many threads sleeping as they wait for work - meaning that large off-CPU elapsed times may have nothing to do with gnome-terminal. The time for pollsys() is interesting, as at 20 seconds it matches the measured time in gnome-terminal.

Analysing pollsys() further (the poll() manpage is similar),

# man poll
[...]
     int poll(struct pollfd fds[], nfds_t nfds, int timeout);
[...]
The first argument is an array with length specified by the second argument. Looping over an array is difficult from DTrace (due to lack of loops), but this doesn't really present a difficulty when troubleshooting. Here I'll dump the struct in hex, and the other values as ints,
# dtrace -n 'syscall::pollsys:entry /execname == "nscd"/ { printf("nfds=%d", arg1);
  tracemem(copyin(arg0, 16), 16); }' -c gnome-terminal
dtrace: description 'syscall::pollsys:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0    542                    pollsys:entry nfds=1
             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 05 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00  ....@...........

  0    542                    pollsys:entry nfds=1
             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 05 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00  ....@...........

  0    542                    pollsys:entry nfds=1
             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 05 00 00 00 40 00 42 fd 0b 7b ab 46 f0 d7 c0 0d  ....@.B..{.F....

  0    542                    pollsys:entry nfds=1
             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 05 00 00 00 40 00 00 00 0b 7b ab 46 f0 d7 c0 0d  ....@....{.F....

dtrace: pid 102041 has exited
Don't worry - DTrace lets you cast variables as structs if you like; I'm just dumping the data in hex as this is a short one-liner.

The nfds value shows that there is only ever one file descriptor in the array; and the hex dump (on this little endian architecture) shows that the file descriptor number is "5" (the first member of struct pollfd).

Now DTrace is used to print the pathname for that file descriptor,

# dtrace -n 'syscall::pollsys:entry /execname == "nscd"/ 
  { trace(fds[\*(uint32_t \*)copyin(arg0, 4)].fi_pathname); }' -c gnome-terminal
dtrace: description 'syscall::pollsys:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0    542                    pollsys:entry   <unknown>
  0    542                    pollsys:entry   <unknown>
\^C
dtrace: pid 102063 terminated by SIGINT
Well, that didn't work. Not all file descriptors have pathnames, of course. It will save some time to borrow pfiles for the next step,
# pfiles `pgrep -x nscd`
101917: /usr/sbin/nscd
  Current rlimit: 256 file descriptors
   0: S_IFCHR mode:0666 dev:270,0 ino:6815752 uid:0 gid:3 rdev:13,2
      O_RDWR
      /devices/pseudo/mm@0:null
   1: S_IFCHR mode:0666 dev:270,0 ino:6815752 uid:0 gid:3 rdev:13,2
      O_RDWR
      /devices/pseudo/mm@0:null
   2: S_IFCHR mode:0666 dev:270,0 ino:6815752 uid:0 gid:3 rdev:13,2
      O_RDWR
      /devices/pseudo/mm@0:null
   3: S_IFDOOR mode:0777 dev:279,0 ino:0 uid:0 gid:0 size:0
      O_RDWR FD_CLOEXEC  door to nscd[101917]
   4: S_IFSOCK mode:0666 dev:277,0 ino:17874 uid:0 gid:0 size:0
      O_RDWR
        SOCK_RAW
        SO_SNDBUF(8192),SO_RCVBUF(8192)
        sockname: AF_ROUTE
        peername: AF_ROUTE
   5: S_IFSOCK mode:0666 dev:277,0 ino:53988 uid:0 gid:0 size:0
      O_RDWR
        SOCK_DGRAM
        SO_DGRAM_ERRIND,SO_SNDBUF(57344),SO_RCVBUF(57344)
        sockname: AF_INET 192.168.1.78  port: 51697
        peername: AF_INET 192.168.1.5  port: 53
File descriptor 5 was a socket to the remote host 192.168.1.5 port 53 (DNS). This shows that nscd is waiting for 20 seconds on a DNS socket, and we know that gnome-terminal has a 20 second wait on the /var/run/name_service_door file. They are probably related.

Proving that they are related, if needed, is simply more DTrace. This time I'll borrow a script from /usr/demo/dtrace, which shows who is waiting for nscd and how long for,

# dtrace -s /usr/demo/dtrace/nscd.d -c gnome-terminal
dtrace: script '/usr/demo/dtrace/nscd.d' matched 27 probes
dtrace: pid 101594 has exited

  nscd                                                gnome-session
           value  ------------- Distribution ------------- count
           16384 |                                         0
           32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
           65536 |                                         0

  nscd                                                gnome-terminal
           value  ------------- Distribution ------------- count
           16384 |                                         0
           32768 |@@@@@@@@@@                               1
           65536 |@@@@@@@@@@@@@@@@@@@@                     2
          131072 |                                         0
          262144 |                                         0
          524288 |                                         0
         1048576 |                                         0
         2097152 |                                         0
         4194304 |                                         0
         8388608 |                                         0
        16777216 |                                         0
        33554432 |                                         0
        67108864 |                                         0
       134217728 |                                         0
       268435456 |                                         0
       536870912 |                                         0
      1073741824 |                                         0
      2147483648 |                                         0
      4294967296 |                                         0
      8589934592 |                                         0
     17179869184 |@@@@@@@@@@                               1
     34359738368 |                                         0
In the 17.1 to 34.3 second bucket is gnome-terminal waiting for nscd - our 20 second wait.

The /usr/demo/dtrace/nscd.d script works by tracing who is nscd is waking up via the sched:::wakeup probe, after having traced the time when threads sleep (see the "sched" chapter in the Dynamic Tracing Guide on docs.sun.com).

The wrap

DTrace provides many ways to solve performance issues or for troubleshooting. Above were only a few techniques for analysing off-CPU time, but there are more available.

Some techniques can get very complex, and require much systems knowledge. Don't worry: if you only figure out one way to solve your problem - you've still solved your problem. (Systems knowledge should help you solve system problems faster).

Tuesday Feb 27, 2007

DTrace on SDNtv

I was recently interviewed about DTrace for SDNtv, Sun's Developer Network Channel. You can watch the video in a browser or download it and watch it later.

For the interview we discussed what DTrace is, what DTrace can do, and I ran the colortrace demo. It was a short interview aimed at introducing people to DTrace for the first time.

I found SDNtv itself quite interesting: the host I had, Mark Herring, was technically savvy and both understood DTrace and knew what best to ask. It was also filmed in one take - that was a big suprise - I've always assumed such interviews are cut together creatively, but this was a real interview filmed in one shot.

But what is especially interesting is the idea of videoing Sun engineers as they discuss their own products, and making those videos publically available. Not that long ago (6 years?), I remember being a Sun customer in Australia who was trying to learn more about Solaris and the kernel, and in particular (being a code geek and a Unix history geek) I wanted to know who wrote what and why. There was very little public information about Sun's engineers - it seemed that the very names of the engineers were confidential. And, growing up in Australia, there were no conferences to visit where Sun engineers gave presentations, like there are in the US.

When Bryan first emailed me in early 2004 suggesting I check out something new called DTrace, I had no idea who Bryan was. I remember staring at his sig and thinking "I wish I knew who this guy was". A quick Internet search didn't help much (try that now!).

Times have changed. Engineers now have blogs, and discuss source code on OpenSolaris mailing lists. And SDNtv lets you see and hear them speak!

Sunday Jan 28, 2007

colortrace

I was recently asked to give a very short but important DTrace demonstration to a wide audience, where I would only have time to run and discuss one script. It is a tough choice - there are so many to pick from, including the 100+ scripts and oneliners in the DTraceToolkit. Also, a short text-based demo may not convey the incredible power of DTrace - if you aren't paying attention it may look like just another tool; DTrace is far more than that.

What a great excuse to use some color escape sequences (yes, I've wanted to do this since I started writing the DTraceToolkit). With a colorized output it won't look like just another tool, and I can draw appropriate attention to a key DTrace feature.

I wrote colortrace.d (or colourtrace.d, for readers from the Commonwealth), which prints a flow indented trace of function entrys and returns for the entire softwark stack. Different layers of the softwark stack have been colorized:

  • blue - application functions
  • green - library functions
  • violet - system calls
  • red - kernel functions

I added a column to print delta times in nanoseconds, to illustrate that yes - we can measure whatever times we want. When Adam previously performed a similar demonstration, he added a color for a Java function layer.

Here I run colortrace.d on a bash shell, and while colortrace.d is running I press the letter "t" on the bash shell:

# ./colortrace.d -p 101695
DELTA(ns)                TYPE    FUNCTION
120805042             syscall   <- read
34461               libc.so.1   <- _read
13474               libc.so.1   <- read
8387                     bash   <- rl_getc
12899                    bash   <- rl_read_key
10044                    bash   -> _rl_dispatch
11760                    bash    -> _rl_dispatch_subseq
7189                     bash     -> rl_insert
7020                     bash      -> _rl_insert_char
6777                     bash       -> _rl_any_typein
8922                     bash       <- _rl_any_typein
7489                     bash       -> rl_insert_text
9230                libc.so.1        -> strlen
12040               libc.so.1        <- strlen
7362                libc.so.1        -> strncpy
6956                libc.so.1        <- strncpy
6996                     bash       <- rl_insert_text
7095                     bash      <- _rl_insert_char
6069                     bash     <- rl_insert
8117                     bash    <- _rl_dispatch_subseq
6484                     bash   <- _rl_dispatch
6992                     bash   -> rl_redisplay
5764                libc.so.1    -> strlen
6577                libc.so.1    <- strlen
5356                libc.so.1    -> strncpy
6249                libc.so.1    <- strncpy
6992                libc.so.1    -> memset
7572                libc.so.1    <- memset
7749                libc.so.1    -> strlen
6118                libc.so.1    <- strlen
6627                     bash    -> _rl_move_cursor_relative
6573                     bash    <- _rl_move_cursor_relative
7533                     bash    -> _rl_output_some_chars
7458                libc.so.1     -> fwrite
6335                libc.so.1      -> _fwrite_unlocked
7085                libc.so.1       -> _realbufend
6167                libc.so.1        -> getxfdat
6250                libc.so.1        <- getxfdat
6490                libc.so.1       <- _realbufend
6306                libc.so.1      <- _fwrite_unlocked
6115                libc.so.1     <- fwrite
6488                     bash    <- _rl_output_some_chars
7628                libc.so.1    -> fflush
6317                libc.so.1     -> _fflush_u
6407                libc.so.1      -> _xflsbuf
5046                libc.so.1       -> _realbufend
52132               libc.so.1        -> getxfdat
5178                libc.so.1        <- getxfdat
5480                libc.so.1       <- _realbufend
7046                libc.so.1       -> _write
29263                 syscall        -> write
12044                  kernel         -> write32
9820                   kernel          -> write
5170                   kernel           -> getf
5514                   kernel            -> set_active_fd
5643                   kernel            <- set_active_fd
10803                  kernel           <- getf
7063                   kernel           -> nbl_need_check
5365                   kernel           <- nbl_need_check
6075                   kernel           -> fop_rwlock
5619                   kernel            -> fs_rwlock
4939                   kernel            <- fs_rwlock
5117                   kernel           <- fop_rwlock
5819                   kernel           -> fop_write
5389                   kernel            -> spec_write
6382                   kernel             -> smark
5471                   kernel              -> gethrestime_sec
5174                   kernel               -> gethrestime
5159                   kernel                -> pc_gethrestime
4724                   kernel                 -> gethrtime
4923                   kernel                  -> tsc_gethrtime
4831                   kernel                  <- tsc_gethrtime
4875                   kernel                 <- gethrtime
5240                   kernel                <- pc_gethrestime
5215                   kernel               <- gethrestime
5368                   kernel              <- gethrestime_sec
5710                   kernel             <- smark
5656                   kernel             -> strwrite
5529                   kernel              -> strwrite_common
5543                   kernel               -> straccess
4976                   kernel                -> mutex_owned
4761                   kernel                <- mutex_owned
5611                   kernel               <- straccess
6066                   kernel               -> strput
4713                   kernel                -> mutex_owned
4705                   kernel                <- mutex_owned
5836                   kernel                -> canputnext
6037                   kernel                <- canputnext
6160                   kernel                -> strmakedata
5542                   kernel                 -> allocb_cred
5513                   kernel                  -> allocb
4982                   kernel                   -> kmem_cache_alloc
5522                   kernel                    -> kmem_cache_alloc_debug
5185                   kernel                     -> verify_and_copy_pattern
5054                   kernel                     <- verify_and_copy_pattern
6464                   kernel                     -> dblk_constructor
4853                   kernel                      -> kmem_cache_alloc
4696                   kernel                       -> kmem_cache_alloc_debug
5188                   kernel                        -> verify_and_copy_pattern
4707                   kernel                        <- verify_and_copy_pattern
5153                   kernel                        -> gethrtime
4843                   kernel                         -> tsc_gethrtime
4587                   kernel                         <- tsc_gethrtime
4836                   kernel                        <- gethrtime
5133                   kernel                        -> getpcstack
4742                   kernel                        <- getpcstack
4849                   kernel                        -> kmem_log_enter
5100                   kernel                        <- kmem_log_enter
5047                   kernel                       <- kmem_cache_alloc_debug
5262                   kernel                      <- kmem_cache_alloc
5773                   kernel                     <- dblk_constructor
4970                   kernel                     -> gethrtime
4610                   kernel                      -> tsc_gethrtime
4504                   kernel                      <- tsc_gethrtime
4975                   kernel                     <- gethrtime
4906                   kernel                     -> getpcstack
4534                   kernel                     <- getpcstack
4991                   kernel                     -> kmem_log_enter
4760                   kernel                     <- kmem_log_enter
4740                   kernel                    <- kmem_cache_alloc_debug
5060                   kernel                   <- kmem_cache_alloc
5642                   kernel                  <- allocb
5394                   kernel                  -> crhold
5467                   kernel                  <- crhold
5435                   kernel                 <- allocb_cred
5531                   kernel                 -> uiomove
5663                   kernel                  -> xcopyin_nta
4871                   kernel                  <- kcopy
5012                   kernel                 <- uiomove
5939                   kernel                <- strmakedata
5853                   kernel                -> stream_willservice
5280                   kernel                <- stream_willservice
6012                   kernel                -> putnext
4937                   kernel                 -> mutex_owned
4680                   kernel                 <- mutex_owned
5381                   kernel                 -> mutex_owned
4694                   kernel                 <- mutex_owned
6160                   kernel                 -> ttcompatwput
5016                   kernel                  -> putnext
4776                   kernel                   -> mutex_owned
4523                   kernel                   <- mutex_owned
4981                   kernel                   -> mutex_owned
4661                   kernel                   <- mutex_owned
5683                   kernel                   -> ldtermwput
5704                   kernel                    -> bcanputnext
5815                   kernel                     -> claimstr
5410                   kernel                     <- claimstr
5598                   kernel                     -> bcanput
6099                   kernel                     <- bcanput
5602                   kernel                     -> releasestr
5013                   kernel                      -> cv_broadcast
5708                   kernel                       -> getpcstack
4690                   kernel                       <- getpcstack
5167                   kernel                       -> apic_intr_enter
5135                   kernel                        -> psm_get_cpu_id
4733                   kernel                        <- psm_get_cpu_id
5540                   kernel                        -> psm_get_cpu_id
4531                   kernel                        <- psm_get_cpu_id
5295                   kernel                        -> psm_get_cpu_id
4503                   kernel                        <- psm_get_cpu_id
4917                   kernel                       <- apic_intr_enter
5296                   kernel                       -> hilevel_intr_prolog
10448                  kernel                        -> apic_intr_exit
5041                   kernel                         -> psm_get_cpu_id
4481                   kernel                         <- psm_get_cpu_id
5119                   kernel                        <- apic_intr_exit
5333                   kernel                       <- hilevel_intr_epilog
5143                   kernel                      <- cv_broadcast
5124                   kernel                     <- releasestr
5350                   kernel                    <- bcanputnext
5851                   kernel                    -> ldtermwmsg
6826                   kernel                     -> msgdsize
5427                   kernel                     <- msgdsize
5652                   kernel                     -> ldterm_output_msg
4981                   kernel                      -> allocb
4817                   kernel                       -> kmem_cache_alloc
5224                   kernel                        -> kmem_cache_alloc_debug
5270                   kernel                         -> verify_and_copy_pattern
5033                   kernel                         <- verify_and_copy_pattern
5303                   kernel                         -> dblk_constructor
4692                   kernel                          -> kmem_cache_alloc
4928                   kernel                           -> kmem_cache_alloc_debug
5243                   kernel                            -> getpcstack
4569                   kernel                            <- getpcstack
5069                   kernel                            -> apic_intr_enter
4830                   kernel                            <- apic_intr_enter
4972                   kernel                            -> hilevel_intr_prolog
40463                  kernel                             -> apic_intr_exit
5107                   kernel                              -> psm_get_cpu_id
4549                   kernel                              <- psm_get_cpu_id
4995                   kernel                             <- apic_intr_exit
5123                   kernel                            <- hilevel_intr_epilog
5633                   kernel                            -> verify_and_copy_pattern
4867                   kernel                            <- verify_and_copy_pattern
5360                   kernel                            -> gethrtime
4633                   kernel                             -> tsc_gethrtime
4566                   kernel                             <- tsc_gethrtime
4999                   kernel                            <- gethrtime
4856                   kernel                            -> getpcstack
4739                   kernel                            <- getpcstack
5058                   kernel                            -> kmem_log_enter
4818                   kernel                            <- kmem_log_enter
4786                   kernel                           <- kmem_cache_alloc_debug
5189                   kernel                          <- kmem_cache_alloc
5063                   kernel                         <- dblk_constructor
4939                   kernel                         -> gethrtime
4694                   kernel                          -> tsc_gethrtime
4706                   kernel                          <- tsc_gethrtime
4749                   kernel                         <- gethrtime
4852                   kernel                         -> getpcstack
4780                   kernel                         <- getpcstack
4838                   kernel                         -> kmem_log_enter
4669                   kernel                         <- kmem_log_enter
5068                   kernel                        <- kmem_cache_alloc_debug
4970                   kernel                       <- kmem_cache_alloc
4925                   kernel                      <- allocb
6118                   kernel                      -> movtuc
6115                   kernel                      <- movtuc
5246                   kernel                      -> freeb
5098                   kernel                       -> dblk_lastfree
5180                   kernel                        -> crfree
5155                   kernel                        <- crfree
5292                   kernel                        -> kmem_cache_free
4946                   kernel                         -> kmem_cache_free_debug
4814                   kernel                          -> kmem_log_enter
4880                   kernel                          <- kmem_log_enter
4998                   kernel                          -> gethrtime
4598                   kernel                           -> tsc_gethrtime
4653                   kernel                           <- tsc_gethrtime
4949                   kernel                          <- gethrtime
4841                   kernel                          -> getpcstack
4515                   kernel                          <- getpcstack
5073                   kernel                          -> kmem_log_enter
4595                   kernel                          <- kmem_log_enter
5352                   kernel                          -> dblk_destructor
5061                   kernel                           -> kmem_cache_free
4670                   kernel                            -> kmem_cache_free_debug
4721                   kernel                             -> kmem_log_enter
4863                   kernel                             <- kmem_log_enter
4914                   kernel                             -> gethrtime
4555                   kernel                              -> tsc_gethrtime
4853                   kernel                              <- tsc_gethrtime
4761                   kernel                             <- gethrtime
4825                   kernel                             -> getpcstack
4714                   kernel                             <- getpcstack
4888                   kernel                             -> kmem_log_enter
4580                   kernel                             <- kmem_log_enter
5219                   kernel                             -> copy_pattern
4980                   kernel                             <- copy_pattern
4978                   kernel                            <- kmem_cache_free_debug
5806                   kernel                           <- kmem_cache_free
4964                   kernel                          <- dblk_destructor
4893                   kernel                          -> copy_pattern
4530                   kernel                          <- copy_pattern
5010                   kernel                         <- kmem_cache_free_debug
4931                   kernel                        <- kmem_cache_free
5002                   kernel                       <- dblk_lastfree
5293                   kernel                      <- freeb
5397                   kernel                     <- ldterm_output_msg
5285                   kernel                     -> msgdsize
4840                   kernel                     <- msgdsize
6236                   kernel                     -> drv_setparm
5415                   kernel                     <- drv_setparm
5231                   kernel                     -> putnext
4820                   kernel                      -> mutex_owned
4617                   kernel                      <- mutex_owned
5286                   kernel                      -> mutex_owned
4526                   kernel                      <- mutex_owned
5851                   kernel                      -> ptemwput
5325                   kernel                       -> bcanputnext
4759                   kernel                        -> claimstr
4458                   kernel                        <- claimstr
5060                   kernel                        -> bcanput
5074                   kernel                        <- bcanput
4841                   kernel                        -> releasestr
4663                   kernel                         -> cv_broadcast
4771                   kernel                         <- cv_broadcast
4773                   kernel                        <- releasestr
4758                   kernel                       <- bcanputnext
5838                   kernel                       -> ptemwmsg
5242                   kernel                        -> putnext
4584                   kernel                         -> mutex_owned
4811                   kernel                         <- mutex_owned
5169                   kernel                         -> mutex_owned
4473                   kernel                         <- mutex_owned
6277                   kernel                         -> ptswput
5691                   kernel                          -> putq
6413                   kernel                           -> qenable_locked
5001                   kernel                            -> mutex_owned
4522                   kernel                            <- mutex_owned
5765                   kernel                           <- qenable_locked
5307                   kernel                           -> mutex_owned
4553                   kernel                           <- mutex_owned
5277                   kernel                          <- putq
5303                   kernel                          -> cv_broadcast
4546                   kernel                          <- cv_broadcast
5789                   kernel                         <- ptswput
4863                   kernel                         -> mutex_owned
4681                   kernel                         <- mutex_owned
4865                   kernel                         -> mutex_owned
4482                   kernel                         <- mutex_owned
5909                   kernel                        <- putnext
5438                   kernel                       <- ptemwmsg
5338                   kernel                      <- ptemwput
5258                   kernel                      -> mutex_owned
4592                   kernel                      <- mutex_owned
4857                   kernel                      -> mutex_owned
4719                   kernel                      <- mutex_owned
4820                   kernel                     <- putnext
4941                   kernel                    <- ldtermwmsg
5600                   kernel                   <- ldtermwput
4921                   kernel                   -> mutex_owned
4483                   kernel                   <- mutex_owned
5049                   kernel                   -> mutex_owned
4495                   kernel                   <- mutex_owned
4763                   kernel                  <- putnext
5565                   kernel                 <- ttcompatwput
4934                   kernel                 -> mutex_owned
4543                   kernel                 <- mutex_owned
4859                   kernel                 -> mutex_owned
4682                   kernel                 <- mutex_owned
4805                   kernel                <- putnext
5706                   kernel                -> stream_runservice
5130                   kernel                 -> mutex_owned
4505                   kernel                 <- mutex_owned
6436                   kernel                 -> queue_service
5501                   kernel                  -> runservice
5329                   kernel                   -> entersq
5686                   kernel                   <- entersq
6490                   kernel                   -> ptswsrv
4974                   kernel                    -> getq
4688                   kernel                     -> getq_noenab
4922                   kernel                     <- getq_noenab
5330                   kernel                    <- getq
4888                   kernel                    -> bcanputnext
5246                   kernel                     -> getpcstack
4641                   kernel                     <- getpcstack
5019                   kernel                     -> apic_intr_enter
5023                   kernel                     <- apic_intr_enter
4991                   kernel                     -> hilevel_intr_prolog
40252                  kernel                      -> apic_intr_exit
4984                   kernel                       -> psm_get_cpu_id
4831                   kernel                       <- psm_get_cpu_id
5005                   kernel                      <- apic_intr_exit
4903                   kernel                     <- hilevel_intr_epilog
5265                   kernel                     -> claimstr
5086                   kernel                     <- claimstr
5119                   kernel                     -> bcanput
5071                   kernel                     <- bcanput
4889                   kernel                     -> releasestr
4679                   kernel                      -> cv_broadcast
4965                   kernel                      <- cv_broadcast
4878                   kernel                     <- releasestr
4769                   kernel                    <- bcanputnext
5142                   kernel                    -> putnext
4871                   kernel                     -> mutex_owned
4567                   kernel                     <- mutex_owned
5209                   kernel                     -> mutex_owned
4526                   kernel                     <- mutex_owned
6066                   kernel                     -> strrput
6220                   kernel                      -> qclaimed
5323                   kernel                      <- qclaimed
5297                   kernel                      -> mutex_owned
4483                   kernel                      <- mutex_owned
5265                   kernel                      -> putq
4667                   kernel                       -> qenable_locked
4592                   kernel                        -> mutex_owned
4739                   kernel                        <- mutex_owned
4880                   kernel                       <- qenable_locked
4880                   kernel                       -> mutex_owned
4659                   kernel                       <- mutex_owned
4864                   kernel                      <- putq
4883                   kernel                      -> mutex_owned
4690                   kernel                      <- mutex_owned
5938                   kernel                      -> pollwakeup
6412                   kernel                       -> pollnotify
4819                   kernel                        -> mutex_owned
4517                   kernel                        <- mutex_owned
5862                   kernel                        -> cv_signal
5412                   kernel                         -> disp_lock_enter
5260                   kernel                          -> lock_set_spl
4898                   kernel                          <- splx
5298                   kernel                         <- lock_set_spl
5513                   kernel                        <- disp_lock_enter
6097                   kernel                        -> sleepq_wakeone_chan
5878                   kernel                         -> sleepq_unlink
5882                   kernel                         <- sleepq_unlink
6150                   kernel                         -> ts_wakeup
6835                   kernel                          -> setbackdq
4878                   kernel                           -> gethrtime_unscaled
4767                   kernel                            -> tsc_gethrtimeunscaled
4768                   kernel                            <- tsc_gethrtimeunscaled
5305                   kernel                           <- gethrtime_unscaled
5345                   kernel                           -> cpu_update_pct
4993                   kernel                            -> scalehrtime
5250                   kernel                             -> tsc_scalehrtime
4980                   kernel                             <- tsc_scalehrtime
5317                   kernel                            <- scalehrtime
5715                   kernel                            -> cpu_decay
5451                   kernel                             -> exp_x
4760                   kernel                             <- exp_x
5225                   kernel                            <- cpu_decay
5253                   kernel                           <- cpu_update_pct
6435                   kernel                           -> cpu_choose
5793                   kernel                            -> disp_lowpri_cpu
6383                   kernel                            <- disp_lowpri_cpu
5624                   kernel                           <- cpu_choose
6285                   kernel                           -> disp_lock_enter_high
5746                   kernel                           <- disp_lock_enter_high
6425                   kernel                           -> cpu_resched
6065                   kernel                            -> poke_cpu
5701                   kernel                             -> apic_send_ipi
5626                   kernel                              -> get_apic_cmd1
5355                   kernel                              <- get_apic_cmd1
5742                   kernel                             <- apic_send_ipi
5167                   kernel                            <- poke_cpu
5254                   kernel                           <- cpu_resched
6656                   kernel                           -> cpu_wakeup
5605                   kernel                           <- cpu_wakeup
5194                   kernel                          <- setbackdq
5575                   kernel                         <- ts_wakeup
5068                   kernel                         -> disp_lock_exit_high
4899                   kernel                         <- disp_lock_exit_high
5750                   kernel                        <- sleepq_wakeone_chan
5206                   kernel                        -> disp_lock_exit
4893                   kernel                        <- splx
5125                   kernel                       <- disp_lock_exit
5716                   kernel                      <- cv_signal
5366                   kernel                     <- pollnotify
5885                   kernel                    <- pollwakeup
5684                   kernel                   <- strrput
5051                   kernel                   -> mutex_owned
4614                   kernel                   <- mutex_owned
5096                   kernel                   -> mutex_owned
4530                   kernel                   <- mutex_owned
4840                   kernel                  <- putnext
5045                   kernel                  -> getq
4681                   kernel                   -> getq_noenab
4647                   kernel                   <- getq_noenab
5013                   kernel                  <- getq
5021                   kernel                  -> cv_broadcast
4747                   kernel                  <- cv_broadcast
5865                   kernel                 <- ptswsrv
6120                   kernel                 -> leavesq
5334                   kernel                 <- leavesq
5146                   kernel                 -> cv_broadcast
4552                   kernel                 <- cv_broadcast
6149                   kernel                <- runservice
5560                   kernel               <- queue_service
5922                   kernel              <- stream_runservice
5748                   kernel             <- strput
6336                   kernel            <- strwrite_common
5398                   kernel           <- strwrite
5481                   kernel          <- spec_write
5895                   kernel         <- fop_write
6266                   kernel         -> fop_rwunlock
5432                   kernel          -> fs_rwunlock
5318                   kernel          <- fs_rwunlock
5455                   kernel         <- fop_rwunlock
5430                   kernel         -> releasef
5303                   kernel          -> clear_active_fd
5447                   kernel          <- clear_active_fd
5098                   kernel          -> cv_broadcast
4594                   kernel          <- cv_broadcast
5455                   kernel         <- releasef
5460                   kernel        <- write
6000                   kernel       <- write32
6729                  syscall      <- write
28571               libc.so.1     <- _write
10001               libc.so.1    <- _xflsbuf
6926                libc.so.1   <- _fflush_u
6176                libc.so.1   <- fflush
7718                     bash   <- rl_redisplay
54421                    bash   <- readline_internal_char
8667                     bash   -> readline_internal_char
11003               libc.so.1    -> sigsetjmp
11960               libc.so.1     -> __csigsetjmp
6691                libc.so.1     <- __csigsetjmp
6367                libc.so.1    <- sigsetjmp
6954                     bash    -> _rl_init_argument
7834                     bash    <- _rl_init_argument
7598                     bash    -> rl_read_key
7065                     bash     -> _rl_next_macro_key
6371                     bash     <- _rl_next_macro_key
6299                     bash     -> rl_getc
6818                libc.so.1      -> fileno
6908                libc.so.1      <- fileno
6850                libc.so.1      -> read
7184                libc.so.1       -> _read
28550                 syscall        -> read
9637                   kernel         -> read32
6043                   kernel          -> read
5360                   kernel           -> getf
5147                   kernel            -> set_active_fd
4994                   kernel            <- set_active_fd
6298                   kernel           <- getf
5840                   kernel           -> nbl_need_check
4779                   kernel           <- nbl_need_check
5503                   kernel           -> fop_rwlock
5009                   kernel            -> fs_rwlock
4558                   kernel            <- fs_rwlock
4867                   kernel           <- fop_rwlock
5894                   kernel           -> fop_read
5403                   kernel            -> spec_read
5469                   kernel             -> smark
5361                   kernel              -> gethrestime_sec
4852                   kernel               -> gethrestime
5152                   kernel                -> pc_gethrestime
4890                   kernel                 -> gethrtime
4720                   kernel                  -> tsc_gethrtime
4807                   kernel                  <- tsc_gethrtime
5195                   kernel                 <- gethrtime
5355                   kernel                <- pc_gethrestime
5085                   kernel               <- gethrestime
4978                   kernel              <- gethrestime_sec
5017                   kernel             <- smark
5537                   kernel             -> strread
5804                   kernel              -> straccess
4817                   kernel               -> mutex_owned
4734                   kernel               <- mutex_owned
5419                   kernel              <- straccess
5344                   kernel              -> mutex_owned
4534                   kernel              <- mutex_owned
5591                   kernel              -> strget
4831                   kernel               -> mutex_owned
4535                   kernel               <- mutex_owned
4874                   kernel               -> mutex_owned
4675                   kernel               <- mutex_owned
5018                   kernel               -> getq_noenab
4821                   kernel               <- getq_noenab
5847                   kernel              <- strget
4908                   kernel              -> mutex_owned
4584                   kernel              <- mutex_owned
5557                   kernel              -> qbackenable
4978                   kernel              <- qbackenable
6433                   kernel              -> strwaitq
4917                   kernel               -> mutex_owned
4469                   kernel               <- mutex_owned
6029                   kernel               -> allocb_wait
5397                   kernel                -> allocb
4903                   kernel                 -> kmem_cache_alloc
5236                   kernel                  -> kmem_cache_alloc_debug
5242                   kernel                   -> verify_and_copy_pattern
5294                   kernel                   <- verify_and_copy_pattern
5744                   kernel                   -> dblk_constructor
... 200 lines truncated ...
\^C

There is a lot of output as flow passes from the user program to the kernel and back again, all of which is traceable.

If you were familiar with previous state of the art tracing and debugging tools, then DTrace has just raised the bar. Previously, truss (or strace) by default prints some of the lines in violet, apptrace and truss -u the lines in green, truss -ua.out the lines in blue, software debuggers the lines in blue and green, and TNF could print a small nuber of the red lines. DTrace sees all.

DTrace can also print user level instructions, which I could have printed above in orange. If I had, the output would have been be many times longer. Perhaps more useful would be to print the higher level kernel events which DTrace can observe, which can instrument events within functions. Other possible enhancements would be to print function entry and return arguments, and to walk user and kernel data structures to print members of interest.

The colortrace.d script which generated the above output is as follows:

#!/usr/sbin/dtrace -s
/\*
 \* colortrace.d - Trace function, syscall and kernel events, in color.
 \*                Written for DTrace (Solaris 10 3/05).
 \*
 \* 26-Jan-2005, ver 1.00
 \*
 \* USAGE:       colortrace.d -p PID
 \*
 \* This script demonstrates some key DTrace features: the capability
 \* to observe function calls across the entire software stack, and
 \* to measure timestamps for any event. colortrace.d shows the function
 \* flow of a process, along with delta times between lines of output.
 \*
 \* NOTES: Run this on something small, such as a shell. If you are on
 \* a multi-CPU server, pbind the process to a single CPU else the output
 \* may by shuffled.
 \*/

#pragma D option quiet
#pragma D option switchrate=10

/\* see "man -s5 dtterm" for a color list \*/
inline string RED       = "\\033[31;1m";
inline string BLUE      = "\\033[34;1m";
inline string GREEN     = "\\033[32;1m";
inline string VIOLET    = "\\033[35;1m";
inline string OFF       = "\\033[0m";

dtrace:::BEGIN
{
        last = timestamp;
        printf("%-12s %16s    %s\\n", "DELTA(ns)", "TYPE", "FUNCTION");
}

syscall:::entry
/pid == $target/
{
        depth++;
        printf("%-12d %16s %s%\*s -> %s%s\\n", timestamp - last, probeprov,
            VIOLET, depth, " ", probefunc, OFF);
        self->insyscall = 1;
        last = timestamp;
}

syscall:::return
/pid == $target/
{
        printf("%-12d %16s %s%\*s <- %s%s\\n", timestamp - last, probeprov,
            VIOLET, depth, " ", probefunc, OFF);
        depth = depth > 0 ? --depth : 0;
        self->insyscall = 0;
        last = timestamp;
}

fbt:::entry
/self->insyscall/
{
        depth++;
        printf("%-12d %16s %s%\*s -> %s%s\\n", timestamp - last, "kernel",
            RED, depth, " ", probefunc, OFF);
        last = timestamp;
}

fbt:::return
/self->insyscall/
{
        printf("%-12d %16s %s%\*s <- %s%s\\n", timestamp - last, "kernel",
            RED, depth, " ", probefunc, OFF);
        depth = depth > 0 ? --depth : 0;
        last = timestamp;
}

pid$target:a.out::entry
{
        depth++;
        printf("%-12d %16s %s%\*s -> %s%s\\n", timestamp - last, probemod,
            BLUE, depth, " ", probefunc, OFF);
        last = timestamp;
}

pid$target:a.out::return
{
        printf("%-12d %16s %s%\*s <- %s%s\\n", timestamp - last, probemod,
            BLUE, depth, " ", probefunc, OFF);
        depth = depth > 0 ? --depth : 0;
        last = timestamp;
}

pid$target:lib\*::entry
{
        depth++;
        printf("%-12d %16s %s%\*s -> %s%s\\n", timestamp - last, probemod,
            GREEN, depth, " ", probefunc, OFF);
        last = timestamp;
}

pid$target:lib\*::return
{
        printf("%-12d %16s %s%\*s <- %s%s\\n", timestamp - last, probemod,
            GREEN, depth, " ", probefunc, OFF);
        depth = depth > 0 ? --depth : 0;
        last = timestamp;
}

Once you pick up the syntax of DTrace programming (which is straightforward and based on C), you'll see that this is a trivial script. The version I demonstrated took minutes to write; I've spent a little more time on this version to make the output neater. A great place to start learning the syntax is the DTrace Guide.

To capture output for this blog entry, I changed the color escape sequences to print HTML font tags. Since the entire output of DTrace can be programmed, we could generate XML if needed.

The times printed in the left column are delta times from the previous event - which will help locate the source of latencys. It can take a little thought to comprehend what they really mean - a large value does not always mean that the named function to the right was slow. If needed, how these time measurements are taken can be customized by modifying the script.

While colortrace.d serves its purpose as a demonstration of DTrace observability, it doesn't actually represent a common use of DTrace. In practice, we use the filtering features of DTrace to only trace events of interest, and the aggregation features of DTrace to summarize data in useful ways.

Friday Nov 03, 2006

DTracing vim latency

I've just decreased my vim start up time by 340x, which makes for an interesting example of using DTrace:

I noticed that the vim text editor was taking longer and longer to fire up, and it was now taking up to 14 seconds for vim to initialise. Something is definitely wrong - and there should be some big game for DTrace to hunt down.

Syscall analysis is a good starting point, and so I run procsystime from the DTraceToolkit to show syscall timing information and counts:

# /opt/DTT/procsystime -a -n vim
Hit Ctrl-C to stop sampling...
\^C

Elapsed Times for processes vim,

         SYSCALL          TIME (ns)
     sigaltstack               1866
      sigpending               2046
[...]
           close           96958757
         pollsys         1694151254
       nanosleep        12098745228
          TOTAL:        14219906941

CPU Times for processes vim,

         SYSCALL          TIME (ns)
     sigaltstack                382
      sigpending                420
[...]
         connect            2644571
           xstat            3809476
            open            9690091
          TOTAL:           26863724

Syscall Counts for processes vim,

         SYSCALL              COUNT
          access                  1
           chmod                  1
[...]
            mmap                 67
             brk                188
           xstat                213
          TOTAL:               1041

This output has been cut to fit. Reading from bottom up:

  • Syscall Counts show nothing obvious - 1041 syscalls isn't a lot; what is more interesting is their elapsed and on-CPU times -- for example, those 213 xstat()s may have blocked on a slow NFS share.
  • CPU Times in total are 26 ms - so they don't explain our long start up times.
  • Elapsed Times show a total of 14 seconds, 12 seconds of which are during nanosleep. This corresponds to the long delay experienced as vim loaded.

Next is to examine why vim is calling nanosleep. We can start by pulling out user stack traces when nanosleep is called, either from a DTrace one-liner or using dtruss from the DTraceToolkit:

# dtrace -n 'syscall::nanosleep:entry /execname == "vim"/ { @[ustack()] = count(); }'
dtrace: description 'syscall::nanosleep:entry ' matched 1 probe
\^C


              libc.so.1`__nanosleep+0x7
              libX11.so.4`_X11TransConnectDisplay+0x6b8
              libX11.so.4`XOpenDisplay+0xbd
              libXt.so.4`XtOpenDisplay+0xa0
              vim`setup_term_clip+0x233
              vim`main+0xb5f
              vim`_start+0x80
               12

# /opt/DTT/dtruss -st nanosleep -n vim
PID/LWP   SYSCALL(args)                  = return
849553/1:  nanosleep(0x8046C50, 0x8046C58, 0xD9213FA8)           = 0 0

              libc.so.1`__nanosleep+0x7
              libX11.so.4`_X11TransConnectDisplay+0x6b8
              libX11.so.4`XOpenDisplay+0xbd
              libXt.so.4`XtOpenDisplay+0xa0
              vim`setup_term_clip+0x233
              vim`main+0xb5f
              vim`_start+0x80
[...]

This shows how nanosleep was being called by vim. It was being called from libX11 in functions such as _X11TransConnectDisplay() -- huh? I'm using vim in text mode, why would it be talking to X11?

Well, if I have an X11 issue, let me start by checking the DISPLAY variable:

$ echo $DISPLAY
:11.0

Ahh - that doesn't seem right. Maybe I'm connected via ssh, and we are doing X11 port forwarding?:

$ ptree $$
369710 screen
  369715 /bin/bash
    851597 ptree 369715

Rather than showing sshd in our ancestory, this has shown screen. The screen software provides text based multiple session recovery. When I initialised screen, I did so via an ssh session; now I'm connecting via the console, and that ssh session is long gone - as is the DISPLAY :11.0.

Setting DISPLAY to :0.0 fixes the issue, and now procsystime shows the following syscall elapsed times:

# /opt/DTT/procsystime -Te -n vim
Hit Ctrl-C to stop sampling...
\^C

Elapsed Times for processes vim,

         SYSCALL          TIME (ns)
     sigaltstack               1980
      systeminfo               2221
[...]
          access            4190720
           xstat            7960249
            open           16273686
          TOTAL:           42198388

A total elapsed time of 42 ms is 340 times faster than it was previously!

... I didn't answer why vim was talking to X11 in the first place: in the previous stack trace, the vim function that calls X is setup_term_clip(). There are numerous things we can do with DTrace to learn more about this, but first I'll check the vim source to see if a comment explains what is going on. This is what I found:

   /\* Start using the X clipboard, unless the GUI was started. \*/
[...]
      setup_term_clip()

Ahh - vim was trying to use the X clipboard, which was failing as our DISPLAY variable was corrupt.

Thursday Oct 05, 2006

DTrace TCP Provider at CEC 2006

Yesterday at the CEC 2006 conference, Bryan Cantrill and I gave a demonstration of a prototype DTrace tcp provider, which is one of a suggested collection of DTrace network providers. I've just put together a website to list the demonstrations performed - CEC 2006 DTrace TCP.

Below are a couple of examples from the website. tcpio2.d traces TCP traffic and provides IP address, TCP port, TCP payload size and TCP flag information:

# ./tcpio2.d
           LADDR:PORT                 RADDR:PORT   BYTES FLAGS
   192.168.1.185:79     <-    192.168.1.109:56145      0 (SYN)
   192.168.1.185:79     ->    192.168.1.109:56145      0 (SYN|ACK)
   192.168.1.185:79     <-    192.168.1.109:56145      0 (ACK)
   192.168.1.185:79     <-    192.168.1.109:56145      2 (PUSH|ACK)
   192.168.1.185:79     ->    192.168.1.109:56145      0 (ACK)
   192.168.1.185:79     ->    192.168.1.109:56145    126 (PUSH|ACK)
   192.168.1.185:79     <-    192.168.1.109:56145      0 (ACK)
   192.168.1.185:79     ->    192.168.1.109:56145      0 (FIN|ACK)
   192.168.1.185:79     <-    192.168.1.109:56145      0 (ACK)
   192.168.1.185:79     <-    192.168.1.109:56145      0 (FIN|ACK)
   192.168.1.185:79     ->    192.168.1.109:56145      0 (ACK)
\^C

And tcpaccept2.d aggregates inbound connections by hostname and service name,

# ./tcpaccept2.d
dtrace: script './tcpaccept2.d' matched 1 probe
Tracing... Hit Ctrl-C to end.
\^C
   HOSTNAME                   PORT        COUNT
   marlin6                    finger          1
   bass                       login           2
   marlin                     ssh             2
   marlin                     telnet          2
   marlin                     finger          3

The source to these scripts and numerous other demonstrations are on the website. Check DTrace discuss for discussions about this prototype provider.

Monday Sep 18, 2006

DTrace meets JavaScript

I've recently been writing a DTrace provider for JavaScript, using the user statically defined tracing interface (USDT). The codename for this project is Helper Monkey.

This was achieved by adding USDT probes to the Spider Monkey JavaScript Engine, which is used by the Mozilla family of browsers including Firefox. Since Solaris Nevada ships with Firefox, the DTrace JavaScript provider can be used by running a script that preloads a DTrace enhanced version of libmozjs.so, and runs the shipped version of Firefox.

So far I've added several probes to observe JavaScript function calls, object creation and destruction (garbage collection), and script execution. I've also written several DTrace scripts to report probe activity. While I have more probes planned, this provider has already reached a point where it may be useful to people, and so we are making the first (very alpha) version available.

DTrace meeting JavaScript may sound like an odd combination, especially if you grew impressions of JavaScript when it was first used to make buttons change colour, or to make temperature converters. But now JavaScript, as part of AJAX, is driving new and serious Web 2.0 applications. Google Maps is one of what may be many new (and very cool) applications to embrace these technologies.

Ways already exist to fetch debug information from JavaScript, such as running a debug build of the browser. However with DTrace people can run a normal browser build, with USDT probes, and dynamically enable debug information as needed. Without stopping or starting the browser. The following demonstrations should convey some of the advantages of DTracing JavaScript.


JavaScript: clock.html

The following is the source to clock.html, a simple website with some JavaScript code to print a clock with the format HH:MM:SS.

<HTML>
<HEAD>
<TITLE>Clock, JavaScript</TITLE>
<SCRIPT type="text/javascript">
function padZero(i)
{
        if (i < 10) return "0" + i
        return i
}

function startTime()
{
        var now = new Date
        var time = padZero(now.getHours()) + ":" +
            padZero(now.getMinutes()) + ":" +
            padZero(now.getSeconds());
        document.getElementById('clock').innerHTML = "time: " + time + "<br>"
        var timeout = setTimeout('startTime()', 1000)
}
</SCRIPT>
</HEAD>
<BODY onload="startTime()">
<DIV id="clock"></DIV>
</BODY>
</HTML>

While this clock is running in my browser, the following DTrace scripts are used for analysis.


DTrace/JavaScript: Function Calls

The js_funccalls.d DTrace script counts JavaScript function calls as it is running. Here I run it for 9 seconds then hit Ctrl-C,

# ./js_funccalls.d
Tracing... Hit Ctrl-C to end.
\^C
 FILE                             FUNC                                    CALLS
 clock.html                       getElementById                              9
 clock.html                       getHours                                    9
 clock.html                       getMinutes                                  9
 clock.html                       getSeconds                                  9
 clock.html                       setTimeout                                  9
 clock.html                       startTime                                   9
 clock.html                       padZero                                    27

The FUNC column lists JavaScript functions, and the FILE column is where they were called from (their context). DTrace captured 9 updates to the clock, with most of the functions called 9 times. padZero() was called 27 times, as it is called three times during each clock update.

The source to js_funccalls.d is straightforward,

#!/usr/sbin/dtrace -Zs
/\*
 \* js_funccalls.d - measure JavaScript function calls using DTrace.
 \*                  Written for the Solaris Nevada DTrace JavaScript provider.
 \*
 \* 12-Sep-2006, ver 0.90
 \*
 \* USAGE: js_calls.d            # hit Ctrl-C to end
 \*
 \* FIELDS:
 \*              FILE            Filename that contained the function
 \*              FUNC            JavaScript function name
 \*              CALLS           Function calls during this sample
 \*/

#pragma D option quiet

dtrace:::BEGIN
{
        printf("Tracing... Hit Ctrl-C to end.\\n");
}

javascript\*:::function-entry
{
        @funcs[basename(copyinstr(arg0)), copyinstr(arg2)] = count();
}

dtrace:::END
{
        printf(" %-32s %-36s %8s\\n", "FILE", "FUNC", "CALLS");
        printa(" %-32s %-36s %@8d\\n", @funcs);
}

the probe javascript\*:::function-entry fires when a JavaScript function begins to execute. There is another probe, javascript\*:::function-return for when the function completes. The probe arguments are still being designed, for now arg0 is a the file pathname, arg1 is a classname (if available), and arg2 is the function name.


DTrace/JavaScript: Function Flow

Now we will use both the entry and return probes to trace the flow of the JavaScript, with indentation as we enter and exit functions. The js_funcflow.d script does this.

# ./js_funcflow.d
C TIME                  FILE                     -- FUNC
1 2006 Sep 18 12:03:28  clock.html               -> startTime
1 2006 Sep 18 12:03:28  clock.html                 -> getHours
1 2006 Sep 18 12:03:28  clock.html                 <- getHours
1 2006 Sep 18 12:03:28  clock.html                 -> padZero
1 2006 Sep 18 12:03:28  clock.html                 <- padZero
1 2006 Sep 18 12:03:28  clock.html                 -> getMinutes
1 2006 Sep 18 12:03:28  clock.html                 <- getMinutes
1 2006 Sep 18 12:03:28  clock.html                 -> padZero
1 2006 Sep 18 12:03:28  clock.html                 <- padZero
1 2006 Sep 18 12:03:28  clock.html                 -> getSeconds
1 2006 Sep 18 12:03:28  clock.html                 <- getSeconds
1 2006 Sep 18 12:03:28  clock.html                 -> padZero
1 2006 Sep 18 12:03:28  clock.html                 <- padZero
1 2006 Sep 18 12:03:28  clock.html                 -> getElementById
1 2006 Sep 18 12:03:28  clock.html                 <- getElementById
1 2006 Sep 18 12:03:28  clock.html                 -> setTimeout
1 2006 Sep 18 12:03:28  clock.html                 <- setTimeout
1 2006 Sep 18 12:03:28  clock.html               <- startTime
...

The above output is repeated every second, as the clock is updated. We can see that each of the functions is called within startTime, which was called by the timeout. The arrows denote as we enter (->) or leave (<-) functions.

While this output makes for a great demo, it may not be that practical for larger JavaScript applications, where the output can run into thosands of lines. The output can also be shuffled on multi-CPU clients if the JavaScript engine skips between CPUs; for this reason the first column has been printed, C for CPU-id, if this changes then the output may be out of order.

DTrace/JavaScript: Object Creation

The following DTrace script tallys newly created objects,

# ./js_objnew.d
Tracing... Hit Ctrl-C to end.
\^C
 FILE                     LINENO  CLASS                                  COUNT
 clock.html               13      Date                                       5

I ran the script for five seconds then hit Ctrl-C. The output shows that 5 Date objects were created, caused by line 13 of clock.html. If you count to line 13 on clock.html listed earlier, you will see that it was "var now = new Date". Great!

DTrace/JavaScript: Object Leakage

Apart from trace object creation, we can also trace object destruction from the garbage collector. The js_objgcx.d script does this, providing tallys of the objects in memory and their origin, every five seconds.

# ./js_objgcx.d
Tracing... Hit Ctrl-C to end.

 FILE                     LINENO  CLASS                                   TOTAL
 clock.html               13      Date                                        5

 FILE                     LINENO  CLASS                                   TOTAL
 clock.html               13      Date                                       10

 FILE                     LINENO  CLASS                                   TOTAL
 clock.html               13      Date                                       15

 FILE                     LINENO  CLASS                                   TOTAL
 clock.html               13      Date                                       20
...

As objects are created, their tally is increased; as they are garbage collected, their tally is decreased.

Every five seconds we find that there is an additional 5 Date objects in memory, and another summary is printed out. At the end of this output there were 20 Date objects in memory. If I were to leave this running for a long time, eventually the garbage collector would kick in and trim the number down (to, say, 1000) - which we could see happen in our output. For more complex JavaScript which creates objects faster, this will naturally happen sooner.

If our script created objects that were not garbage collected, their count would continue increasing, and the browser would consume more and more RAM.

DTrace/JavaScript: Function Time

Since we can trace function entry and return, we can also measure the elapsed time for functions to execute. This can be useful to identify if there is a slow function whose code could be improved. The js_functime.d script provides several experimental metrics for function calls,

# ./js_functime.d 
Tracing... Hit Ctrl-C to end.
\^C
ELAPSED TIME DISTRIBUTION,

  getSeconds                                        
           value  ------------- Distribution ------------- count    
            4096 |                                         0        
            8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 12       
           16384 |                                         0        

  padZero                                           
           value  ------------- Distribution ------------- count    
            2048 |                                         0        
            4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@               23       
            8192 |@@@@@@@@@@@@@@                           13       
           16384 |                                         0        

  getElementById                                    
           value  ------------- Distribution ------------- count    
           16384 |                                         0        
           32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 12       
           65536 |                                         0        

  getHours                                          
           value  ------------- Distribution ------------- count    
           16384 |                                         0        
           32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 12       
           65536 |                                         0        

  getMinutes                                        
           value  ------------- Distribution ------------- count    
           16384 |                                         0        
           32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 12       
           65536 |                                         0        

  setTimeout                                        
           value  ------------- Distribution ------------- count    
           16384 |                                         0        
           32768 |@@@@@@@@@@                               3        
           65536 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@           9        
          131072 |                                         0        

  startTime                                         
           value  ------------- Distribution ------------- count    
          524288 |                                         0        
         1048576 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 12       
         2097152 |                                         0        

___ OVERLAP TIMES: ___                                     ______ ELAPSED _____
FILE                     FUNCTION                 COUNT    AVG(us)      SUM(us)
clock.html               getSeconds                  12          8          104
clock.html               padZero                     36          7          275
clock.html               getMinutes                  12         45          548
clock.html               getElementById              12         53          640
clock.html               getHours                    12         59          714
clock.html               setTimeout                  12         67          807
clock.html               startTime                   12       1219        14637

These function times are overlapping, such that the time fore startTime() contains both the startTime() instructions, and any sub functions called - such as padZero(), etc.

What I found interesting from the output was that getHours() (average 59 us) was much slower than getSeconds() (average 8 us); wouldn't they have equivalent times? Some digging with DTrace shows what other userland library functions occured during getHours() and getSeconds():

getHours()

DTracing PID: 278562, "firefox-bin"
CPU FUNCTION                                 
  0  -> js_Interpret                            getHours                         
  0    -> js_Invoke                             libmozjs.so.1                    
  0      -> JS_GetPrivate                       libmozjs.so.1                    
  0      <- JS_GetPrivate                       libmozjs.so.1                    
  0      -> js_ComputeThis                      libmozjs.so.1                    
  0      <- js_ComputeThis                      libmozjs.so.1                    
  0      -> date_getHours                       libmozjs.so.1                    
  0        -> date_getProlog                    libmozjs.so.1                    
  0          -> JS_InstanceOf                   libmozjs.so.1                    
  0          <- JS_InstanceOf                   libmozjs.so.1                    
  0        <- date_getProlog                    libmozjs.so.1                    
  0        -> DaylightSavingTA                  libmozjs.so.1                    
  0  -> PRMJ_DSTOffset                          libmozjs.so.1                    
  0    -> jsll_udivmod                          libmozjs.so.1                    
  0      -> CountLeadingZeros                   libmozjs.so.1                    
  0      <- CountLeadingZeros                   libmozjs.so.1                    
  0      -> norm_udivmod32                      libmozjs.so.1                    
  0      <- norm_udivmod32                      libmozjs.so.1                    
  0    <- jsll_udivmod                          libmozjs.so.1                    
  0    -> PRMJ_basetime                         libmozjs.so.1                    
  0      -> PRMJ_ToExtendedTime                 libmozjs.so.1                    
  0        -> PRMJ_LocalGMTDifference           libmozjs.so.1                    
  0          -> memset                          libc.so.1                        
  0          <- memset                          libc.so.1                        
  0          -> mktime                          libc.so.1                        
  0            -> ___errno                      libc.so.1                        
  0            <- ___errno                      libc.so.1                        
  0            -> getsystemTZ                   libc.so.1                        
  0              -> assert_no_libc_locks_held   libc.so.1                        
  0              <- assert_no_libc_locks_held   libc.so.1                        
  0              -> getenv                      libc.so.1                        
  0                -> findenv                   libc.so.1                  
...
[175 lines in total]

getSeconds()

DTracing PID: 278562, "firefox-bin"
CPU FUNCTION                                 
  0  -> js_Interpret                            getSeconds                       
  0    -> js_Invoke                             libmozjs.so.1                    
  0      -> JS_GetPrivate                       libmozjs.so.1                    
  0  <- JS_GetPrivate                           libmozjs.so.1                    
  0  -> js_ComputeThis                          libmozjs.so.1                    
  0  <- js_ComputeThis                          libmozjs.so.1                    
  0  -> date_getUTCSeconds                      libmozjs.so.1                    
  0    -> date_getProlog                        libmozjs.so.1                    
  0      -> JS_InstanceOf                       libmozjs.so.1                    
  0      <- JS_InstanceOf                       libmozjs.so.1                    
  0    <- date_getProlog                        libmozjs.so.1                    
  0    -> SecFromTime                           libmozjs.so.1                    
  0      -> floor                               libm.so.2                        
  0      <- floor                               libm.so.2                        
  0      -> fmod                                libm.so.2                        
  0      <- fmod                                libm.so.2                        
  0    <- SecFromTime                           libmozjs.so.1                    
  0    -> js_NewNumberValue                     libmozjs.so.1                    
  0    <- js_NewNumberValue                     libmozjs.so.1                    
  0  <- date_getUTCSeconds                      libmozjs.so.1                    
  0  -> PR_AtomicIncrement                      libnspr4.so                      
  0  <- PR_AtomicIncrement                      libnspr4.so                      
  0  -> memset                                  libc.so.1                        
  0  <- memset                                  libc.so.1                        
  0  <- js_Invoke                               libmozjs.so.1                    
  0  -> JS_GetPrivate                           libmozjs.so.1                    
  0  <- JS_GetPrivate                           libmozjs.so.1                    
  0  -> JS_GetStringBytes                       libmozjs.so.1                    
  0    -> js_GetStringBytes                     libmozjs.so.1                    
  0      -> PR_Lock                             libnspr4.so                      
  0        -> mutex_lock                        libc.so.1             
...
[59 lines in total]

Tracing getSeconds() produced 59 lines of output (from about 28 function calls), however getHours() produced 175 lines of output (from about 86 function calls). The output has been trimmed, but enough has been included to see the difference - getHours() looks up the timezone, getSeconds() does not!

Probes and Arguments so far

  • javascript\*:::function-entry (filename, classname, funcname)
  • javascript\*:::function-return (filename, classname, funcname)
  • javascript\*:::object-create-start (filename, classname)
  • javascript\*:::object-create (filename, classname, \*object, lineno)
  • javascript\*:::object-create-done (filename, classname)
  • javascript\*:::object-finalize (NULL, classname, \*object)
  • javascript\*:::execute-start (filename, lineno)
  • javascript\*:::execute-done (filename, lineno)

More probes will be added, and the arguments are likely to change (improve).

What's next

The DTrace scripts, the modified Spider Monkey source (for Firefox 1.5.0.4), and a precompiled Firefox 1.5.0.4 libmozjs.so for Solaris Nevada x86, are available in this tar file. This is very much an alpha release for the super-keen (you'll need the latest Solaris Nevada at least, and compilation skills); it even contains code comment tags for debugging that will be dropped ("BDG"). I'll be spending time finding out how best deliver this, such as making it easier for people to get running, and asking the Mozilla folks for help on how best to distrubite it.

About

Brendan Gregg, Fishworks engineer

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today