DTrace and PHP, demonstrated

If you were at my presentation at OSCON yesterday, I apologize for the brisk pace -- there was a ton of material to cover, and forty-five minutes isn't much time. Now that I've got a little more time, I'd like to get into the details of the PHP demonstration that I did during the presentation. Here is the (very) simple PHP program that I was using:
<?php
function blastoff()
{
        echo "Blastoff!\\n\\n";
}

function one()
{
        echo "One...\\n";
        blastoff();
}

function two()
{
        echo "Two...\\n";
        one();
}

function three()
{
        echo "Three...\\n";
        two();
}

function launch()
{
        three();
}

while (1)
{
        launch();
        sleep(1);
}
?>
Running this in a window just repeats this output:
% php ./blastoff.php
php ./blastoff.php
Content-type: text/html
X-Powered-By: PHP/5.1.0-dev

Three...
Two...
One...
Blastoff!

Three...
Two...
One...
Blastoff!
...
Now, because I have specified Wez Furlong's new dtrace.so as an extension in my php.ini file, when I run the above, two probes show up:
# dtrace -l | grep php
  4   php806       dtrace.so          php_dtrace_execute function-return
  5   php806       dtrace.so          php_dtrace_execute function-entry
The function-entry and function-return probes have three arguments:
  • arg0 is the name of the function (a pointer to a string within PHP)

  • arg1 is the name of the file containing the call site (also a pointer to a string within PHP)

  • arg2 is the line number of the call site a pointer to a string within PHP)

So for starters, let's just get an idea of which functions are being called in my PHP program:

# dtrace -n function-entry'{printf("called %s() in %s at line %d\\n", \\
    copyinstr(arg0), copyinstr(arg1), arg2)}' -q
called launch() in /export/php/blastoff.php at line 32
called three() in /export/php/blastoff.php at line 27
called two() in /export/php/blastoff.php at line 22
called one() in /export/php/blastoff.php at line 16
called blastoff() in /export/php/blastoff.php at line 10
called launch() in /export/php/blastoff.php at line 32
called three() in /export/php/blastoff.php at line 27
called two() in /export/php/blastoff.php at line 22
called one() in /export/php/blastoff.php at line 16
called blastoff() in /export/php/blastoff.php at line 10
\^C
If you're new to DTrace, note that you have the power to trace an arbitrary D expression in your action. For example, instead of printing out the file and line number of the call site, we could trace the walltimestamp:

# dtrace -n function-entry'{printf("called %s() at %Y\\n", \\
    copyinstr(arg0), walltimestamp)}' -q
called launch() at 2005 Aug  5 08:08:24
called three() at 2005 Aug  5 08:08:24
called two() at 2005 Aug  5 08:08:24
called one() at 2005 Aug  5 08:08:24
called blastoff() at 2005 Aug  5 08:08:24
called launch() at 2005 Aug  5 08:08:25
called three() at 2005 Aug  5 08:08:25
called two() at 2005 Aug  5 08:08:25
called one() at 2005 Aug  5 08:08:25
called blastoff() at 2005 Aug  5 08:08:25
\^C
Note, too, that (unless I direct it not to) this will aggregate across PHP instances. So, for example:

#!/usr/sbin/dtrace -s

#pragma D option quiet

php\*:::function-entry
{
        @bypid[pid] = count();
        @byfunc[copyinstr(arg0)] = count();
        @bypidandfunc[pid, copyinstr(arg0)] = count();
}

END
{
        printf("By pid:\\n\\n");
        printa("  %-40d %@d\\n", @bypid);

        printf("\\nBy function:\\n\\n");
        printa("  %-40s %@d\\n", @byfunc);

        printf("\\nBy pid and function:\\n\\n");
        printa("  %-9d %-30s %@d\\n", @bypidandfunc);
}
If I run three instances of blastoff.php and then the above script, I see the following after I \^C:

By pid:

  806                                      30
  875                                      30
  889                                      30

By function:

  launch                                   18
  three                                    18
  two                                      18
  one                                      18
  blastoff                                 18

By pid and function:

  875       two                            6
  875       three                          6
  875       launch                         6
  875       blastoff                       6
  889       blastoff                       6
  806       launch                         6
  889       one                            6
  806       three                          6
  889       two                            6
  806       two                            6
  889       three                          6
  806       one                            6
  889       launch                         6
  806       blastoff                       6
  875       one                            6
The point is that DTrace allows you to aggregate across PHP instances, allowing you to understand not just what a particular PHP is doing, but what PHP is doing more generally.

If we're interested in better understanding the code flow in a particular PHP instance, we can write a script that uses a thread-local variable to follow the code flow:

#pragma D option quiet

self int indent;

php$target:::function-entry
/copyinstr(arg0) == "launch"/
{
        self->follow = 1;
}

php$target:::function-entry
/self->follow/
{
        printf("%\*s", self->indent, "");
        printf("-> %s\\n", copyinstr(arg0));
        self->indent += 2;
}

php$target:::function-return
/self->follow/
{
        self->indent -= 2;
        printf("%\*s", self->indent, "");
        printf("<- %s\\n", copyinstr(arg0));
}

php$target:::function-return
/copyinstr(arg0) == "launch"/
{
        self->follow = 0;
        exit(0);
}
Running the above requires giving the script a particular PHP process; assuming the above script is named blast.d, it might look like this:

# dtrace -s ./blast.d -p `pgrep -n php`
-> launch
  -> three
    -> two
      -> one
        -> blastoff
        <- blastoff
      <- one
    <- two
  <- three
<- launch

This shows us all of the PHP functions that were called from launch(), but it doesn't tell the full story -- we know that our PHP functions are calling into native code to do some of their work. To add this to the mix, we'll write a slightly longer script:

#pragma D option quiet

self int indent;

php$target:::function-entry
/copyinstr(arg0) == "launch"/
{
        self->follow = 1;
}

php$target:::function-entry
/self->follow/
{
        printf("%\*s", self->indent, "");
        printf("-> %-20s %\*sphp\\n", copyinstr(arg0),
            46 - self->indent, "");
        self->indent += 2;
}

php$target:::function-return
/self->follow/
{
        self->indent -= 2;
        printf("%\*s", self->indent, "");
        printf("<- %-20s %\*sphp\\n", copyinstr(arg0),
            46 - self->indent, "");
}

pid$target:libc.so.1::entry
/self->follow/
{
        printf("%\*s", self->indent, "");
        printf("-> %-20s %\*s%s\\n", probefunc, 46 - self->indent, "",
            probemod);
        self->indent += 2;
}

pid$target:libc.so.1::return
/self->follow/
{
        self->indent -= 2;
        printf("%\*s", self->indent, "");
        printf("<- %-20s %\*s%s\\n", probefunc, 46 - self->indent, "",
            probemod);
}

php$target:::function-return
/copyinstr(arg0) == "launch"/
{
        self->follow = 0;
        exit(0);
}

Running the above yields the following output:

-> launch                                          php
  -> three                                         php
    -> write                                       libc.so.1
      -> _save_nv_regs                             libc.so.1
      <- _save_nv_regs                             libc.so.1
      -> _write                                    libc.so.1
      <- _write                                    libc.so.1
    <- write                                       libc.so.1
    -> two                                         php
      -> write                                     libc.so.1
        -> _save_nv_regs                           libc.so.1
        <- _save_nv_regs                           libc.so.1
        -> _write                                  libc.so.1
        <- _write                                  libc.so.1
      <- write                                     libc.so.1
      -> one                                       php
        -> write                                   libc.so.1
          -> _save_nv_regs                         libc.so.1
          <- _save_nv_regs                         libc.so.1
          -> _write                                libc.so.1
          <- _write                                libc.so.1
        <- write                                   libc.so.1
        -> blastoff                                php
          -> write                                 libc.so.1
            -> _save_nv_regs                       libc.so.1
            <- _save_nv_regs                       libc.so.1
            -> _write                              libc.so.1
            <- _write                              libc.so.1
          <- write                                 libc.so.1
        <- blastoff                                php
      <- one                                       php
    <- two                                         php
  <- three                                         php
<- launch                                          php

This shows us what's really going on -- or at least more of what's really going on: our PHP functions are inducing work from libc. This is much more information, but of course, we're still only seeing what's going on at user-level. To add the kernel into the mix, add the following to the script:

fbt:genunix::entry
/self->follow/
{
        printf("%\*s", self->indent, "");
        printf("-> %-20s %\*skernel\\n", probefunc, 46 - self->indent, "");
        self->indent += 2;
}

fbt:genunix::return
/self->follow/
{
        self->indent -= 2;
        printf("%\*s", self->indent, "");
        printf("<- %-20s %\*skernel\\n", probefunc, 46 - self->indent, "");
}

Running this new script generates much more output:

-> launch                                          php
  -> three                                         php
    -> write                                       libc.so.1
      -> _save_nv_regs                             libc.so.1
      <- _save_nv_regs                             libc.so.1
      -> _write                                    libc.so.1
        -> syscall_mstate                          kernel
          -> gethrtime_unscaled                    kernel
          <- gethrtime_unscaled                    kernel
        <- syscall_mstate                          kernel
        -> syscall_entry                           kernel
          -> pre_syscall                           kernel
          <- pre_syscall                           kernel
          -> copyin_args32                         kernel
            -> copyin_nowatch                      kernel
              -> watch_disable_addr                kernel
              <- watch_disable_addr                kernel
            <- copyin_nowatch                      kernel
          <- copyin_args32                         kernel
        <- syscall_entry                           kernel
        -> write32                                 kernel
          -> write                                 kernel
            -> getf                                kernel
              -> set_active_fd                     kernel
              <- set_active_fd                     kernel
            <- getf                                kernel
            ...
            -> releasef                            kernel
              -> cv_broadcast                      kernel
              <- cv_broadcast                      kernel
            <- releasef                            kernel
          <- write                                 kernel
        <- write32                                 kernel
        ...
        -> syscall_mstate                          kernel
          -> gethrtime_unscaled                    kernel
          <- gethrtime_unscaled                    kernel
        <- syscall_mstate                          kernel
      <- _write                                    libc.so.1
    <- write                                       libc.so.1
    -> two                                         php
      -> write                                     libc.so.1
        -> _save_nv_regs                           libc.so.1
        <- _save_nv_regs                           libc.so.1
        -> _write                                  libc.so.1
          -> syscall_mstate                        kernel
            -> gethrtime_unscaled                  kernel
            <- gethrtime_unscaled                  kernel
          <- syscall_mstate                        kernel
          ...

Now we're seeing everything that our PHP program is doing, from the PHP through the native code, into the kernel and back. So using DTrace on PHP has a number of unique advantages: you can look at the entire system, and you can look at the entire stack -- and you can do it all in production. Thanks again to Wez for putting together the PHP provider -- and if you're a PHP developer, bon appetit!

Comments:

Great presentation Bryan. I had no opinion on OpenSolaris prior to this presentation, but had heard lots about dtrace. After your presentation, my coworkers and I were super stoked about opensolaris/dtrace. I can't wait to install it, get our product (Sophos PureMessage) building on it and run dtrace on it! As our project is largely perl based, we'd probably be interested in any effort to add probes to perl. Great job! Your passion and excitement about dtrace were well evident from your presentation! It was one of my highlights from OSCON. Luke

Posted by Luke Closs on August 05, 2005 at 04:45 AM PDT #

Anyway Sun could add PHP to the Solaris 10 Apache2 package? Redhat ships with PHP, but for some reason Sun doesn't. :( - Ryan

Posted by Matty on August 05, 2005 at 05:50 AM PDT #

How big a deal is this?

The reason I ask is that after reading your post I started thinking about doing this sort of thing for Perl. Naturally, I started poking around the web to see if anyone else was already doing this.

That lead me to this interview at opensolaris.org. In that interview you say:

DTrace can't currently instrument languages that have dynamic program text: Java, Python, Perl, PHP, etc. This is not an easy problem to solve: these languages have not been designed or implemented with dynamic instrumentation in mind, and the techniques required to instrument them are often very specific to a particular language and its run-time environment. Solving this problem still lies in the indefinite future for us--the next year or so will be spent evolving our existing functionality--but we very much intend to solve it.

and

Both. Instrumenting JITted code is difficult because the virtual machine needs to know what you're up to at some level--even if only enough to know to leave your instrumentation alone. Once the code is instrumented, the granularity of operations makes it difficult to tie that instrumentation back to something meaningful. To give you a simple, concrete example: Perl 5 doesn't associate line numbers with parse tree nodes unless it is explicitly started in an explicit debugging mode. By discarding that information, Perl makes it very difficult to [connect] Perl-induced system activity (I/O, CPU utilization, network activity, etc.) to the specific body of Perl that is inducing it. Programs written in C and C++ generally discard their debugging information too, but these languages are so much closer to the operating system and the underlying microprocessor that we can generally still draw meaningful inferences. Not so for Perl, Java, Python, PHP, etc.

That was less than two months ago.

I confess that I haven't, yet, been able to use DTrace in the field. My exposure to it has been through information that I've read on the web, and sitting through a couple of Sun technical demonstrations about it in the UK. Those demos played up the ease with which it's possible to write a DTrace provider, and walked through creating one. Without wishing to denigrate Wez's work, I got the impression that simply instrumenting function entry points like this with a dynamic language like PHP or Perl is no great shakes.

Have a got the wrong end of the stick?

Posted by Nik Clayton on August 05, 2005 at 10:14 AM PDT #

Perl is possible, at least to the same level as the PHP stuff Wez has done, i.e. subroutine name, source line and file, see http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2005-08/msg00140.html. Tracing individual opcode dispatch is also possible, if required.

Posted by Alan Burlison on August 05, 2005 at 11:22 AM PDT #

Nik, My comments a couple of months ago reflect the difficulty of truly dynamic instrumentation. That is, doing to VM-based languages what we can do with user-level C/C++. What we have discovered since is that imperfect solutions can be such a huge lurch forward for these environments that they are very much worth the time and trouble. The PHP solution, for example, doesn't instrument every opcode. (And from talking to Marcus Boerger and Wez after my presentation, this is as hard as I claimed it to be.) But it's such a huge leap, that it's an effective solution -- and it may even be "good enough" to obviate the need to solve the (much) harder problem of true dynamic instrumentation.

Posted by Bryan Cantrill on August 05, 2005 at 11:43 AM PDT #

[Trackback] Following a link in a comment from Alan Burlinson from Bryan's blog about creating a php provider , it appears that Alan is very well along the path to having a DTrace provider for perl. To read a bit more about this, have a look at the mail...

Posted by Alan Hargreaves' Weblog on August 06, 2005 at 04:35 PM PDT #

Hi Bryan, The PHP Dtrace extension would really help me with a problem I got... I have a segfaulting PHP web application and can't seem to narrow down the problem. So this extension would probably help me alot... However I have some probs getting this thing to work: # pear install dtrace ... the only warning I can see is when linking: \*\*\* Warning: Linking the shared library dtrace.la against the non-libtool \*\*\* objects php.o is not portable! ... other than that the build seems to have gone through fine, even though it shouldn't be more than 10k. Build system: Solaris Express build 20, on x86 Sun JDS CBE 1.2 tools, GCC 3.4.3 Apache 2.0.54 Latest PHP dev sources. The output file gets huge: 204274 16416 -rwxr-xr-x 1 root root 16797924 Oct 11 21:53 /var/tmp/pear-build-defaultuser/install-dtrace-1.0.3/opt/php-STABLE-200510100832-gcc/lib/php/extensions/no-debug-non-zts-20041030/dtrace.so when loading it from php.ini: PHP Warning: PHP Startup: Unable to load dynamic library '/opt/php-STABLE-200510100832-gcc/lib/php/extensions/no-debug-non-zts-20041030/dtrace.so' - ld.so.1: httpd: fatal: relocation error: file /opt/php-STABLE-200510100832-gcc/lib/php/extensions/no-debug-non-zts-20041030/dtrace.so: symbol zend_execute: referenced symbol not found in Unknown on line 0 nm output: [59] | 0| 0|NOTY |GLOB |0 |UNDEF |zend_execute [57] | 0| 0|NOTY |GLOB |0 |UNDEF |zend_execute_internal [60] | 0| 0|NOTY |GLOB |0 |UNDEF |zend_get_executed_filename [76] | 0| 0|NOTY |GLOB |0 |UNDEF |zend_get_executed_lineno Any help would be greatly appreciated!

Posted by Magnus Forsberg on October 11, 2005 at 06:03 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

bmc

Search

Top Tags
Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today