X

DTrace and PHP, demonstrated

Guest Author
If you were at
my presentation at OSCON yesterday,
I apologize for the brisk
pace -- there was a ton of material to cover, and forty-five minutes
isn't much time. Now that I've got a little more time,
I'd like to get into the
details of the PHP demonstration that I did during the presentation.
Here is the (very) simple PHP program that I was using:
<?php
function blastoff()
{
echo "Blastoff!\\n\\n";
}
function one()
{
echo "One...\\n";
blastoff();
}
function two()
{
echo "Two...\\n";
one();
}
function three()
{
echo "Three...\\n";
two();
}
function launch()
{
three();
}
while (1)
{
launch();
sleep(1);
}
?>

Running this in a window just repeats this output:
% php ./blastoff.php
php ./blastoff.php
Content-type: text/html
X-Powered-By: PHP/5.1.0-dev
Three...
Two...
One...
Blastoff!
Three...
Two...
One...
Blastoff!
...

Now, because I have specified
Wez Furlong's
new
dtrace.so
as an extension in my
php.ini
file, when I run the above, two probes show up:
# dtrace -l | grep php
4 php806 dtrace.so php_dtrace_execute function-return
5 php806 dtrace.so php_dtrace_execute function-entry

The function-entry and function-return probes
have three arguments:
  • arg0 is the name of the function (a pointer to a string
    within PHP)

  • arg1 is the name of the file containing the call site (also
    a pointer to a string within PHP)

  • arg2 is the line number of the call site
    a pointer to a string within PHP)


So for starters, let's just get an idea of which functions are being called
in my PHP program:

# dtrace -n function-entry'{printf("called %s() in %s at line %d\\n", \\
copyinstr(arg0), copyinstr(arg1), arg2)}' -q
called launch() in /export/php/blastoff.php at line 32
called three() in /export/php/blastoff.php at line 27
called two() in /export/php/blastoff.php at line 22
called one() in /export/php/blastoff.php at line 16
called blastoff() in /export/php/blastoff.php at line 10
called launch() in /export/php/blastoff.php at line 32
called three() in /export/php/blastoff.php at line 27
called two() in /export/php/blastoff.php at line 22
called one() in /export/php/blastoff.php at line 16
called blastoff() in /export/php/blastoff.php at line 10
\^C

If you're new to DTrace, note that you have the power to trace an
arbitrary D expression in your action. For example, instead of printing
out the file and line number of the call site,
we could trace the walltimestamp:

# dtrace -n function-entry'{printf("called %s() at %Y\\n", \\
copyinstr(arg0), walltimestamp)}' -q
called launch() at 2005 Aug 5 08:08:24
called three() at 2005 Aug 5 08:08:24
called two() at 2005 Aug 5 08:08:24
called one() at 2005 Aug 5 08:08:24
called blastoff() at 2005 Aug 5 08:08:24
called launch() at 2005 Aug 5 08:08:25
called three() at 2005 Aug 5 08:08:25
called two() at 2005 Aug 5 08:08:25
called one() at 2005 Aug 5 08:08:25
called blastoff() at 2005 Aug 5 08:08:25
\^C

Note, too, that (unless I direct it not to) this will aggregate across
PHP instances. So, for example:

#!/usr/sbin/dtrace -s
#pragma D option quiet
php\*:::function-entry
{
@bypid[pid] = count();
@byfunc[copyinstr(arg0)] = count();
@bypidandfunc[pid, copyinstr(arg0)] = count();
}
END
{
printf("By pid:\\n\\n");
printa(" %-40d %@d\\n", @bypid);
printf("\\nBy function:\\n\\n");
printa(" %-40s %@d\\n", @byfunc);
printf("\\nBy pid and function:\\n\\n");
printa(" %-9d %-30s %@d\\n", @bypidandfunc);
}

If I run three instances of blastoff.php and then the above script,
I see the following after I \^C:

By pid:
806 30
875 30
889 30
By function:
launch 18
three 18
two 18
one 18
blastoff 18
By pid and function:
875 two 6
875 three 6
875 launch 6
875 blastoff 6
889 blastoff 6
806 launch 6
889 one 6
806 three 6
889 two 6
806 two 6
889 three 6
806 one 6
889 launch 6
806 blastoff 6
875 one 6

The point is that DTrace allows you to aggregate across PHP
instances
, allowing you to understand not just what a particular
PHP is doing, but what PHP is doing more generally.

If we're interested in better understanding the code flow in a particular
PHP instance, we can write a script that uses a thread-local variable
to follow the code flow:

#pragma D option quiet
self int indent;
php$target:::function-entry
/copyinstr(arg0) == "launch"/
{
self->follow = 1;
}
php$target:::function-entry
/self->follow/
{
printf("%\*s", self->indent, "");
printf("-> %s\\n", copyinstr(arg0));
self->indent += 2;
}
php$target:::function-return
/self->follow/
{
self->indent -= 2;
printf("%\*s", self->indent, "");
printf("<- %s\\n", copyinstr(arg0));
}
php$target:::function-return
/copyinstr(arg0) == "launch"/
{
self->follow = 0;
exit(0);
}

Running the above requires giving the script a particular PHP process;
assuming the above script is named blast.d, it might look like
this:

# dtrace -s ./blast.d -p `pgrep -n php`
-> launch
-> three
-> two
-> one
-> blastoff<- blastoff<- one<- two<- three
<- launch

This shows us all of the PHP functions that were called from
launch(), but it doesn't tell the full story -- we know that
our PHP functions are calling into native code to do some of their work.
To add this to the mix, we'll write a slightly longer script:

#pragma D option quiet
self int indent;
php$target:::function-entry
/copyinstr(arg0) == "launch"/
{
self->follow = 1;
}
php$target:::function-entry
/self->follow/
{
printf("%\*s", self->indent, "");
printf("-> %-20s %\*sphp\\n", copyinstr(arg0),
46 - self->indent, "");
self->indent += 2;
}
php$target:::function-return
/self->follow/
{
self->indent -= 2;
printf("%\*s", self->indent, "");
printf("<- %-20s %\*sphp\\n", copyinstr(arg0),
46 - self->indent, "");
}
pid$target:libc.so.1::entry
/self->follow/
{
printf("%\*s", self->indent, "");
printf("-> %-20s %\*s%s\\n", probefunc, 46 - self->indent, "",
probemod);
self->indent += 2;
}
pid$target:libc.so.1::return
/self->follow/
{
self->indent -= 2;
printf("%\*s", self->indent, "");
printf("<- %-20s %\*s%s\\n", probefunc, 46 - self->indent, "",
probemod);
}
php$target:::function-return
/copyinstr(arg0) == "launch"/
{
self->follow = 0;
exit(0);
}

Running the above yields the following output:

-> launch                                          php
-> three php
-> write libc.so.1
-> _save_nv_regs libc.so.1<- _save_nv_regs libc.so.1
-> _write libc.so.1<- _write libc.so.1<- write libc.so.1
-> two php
-> write libc.so.1
-> _save_nv_regs libc.so.1<- _save_nv_regs libc.so.1
-> _write libc.so.1<- _write libc.so.1<- write libc.so.1
-> one php
-> write libc.so.1
-> _save_nv_regs libc.so.1<- _save_nv_regs libc.so.1
-> _write libc.so.1<- _write libc.so.1<- write libc.so.1
-> blastoff php
-> write libc.so.1
-> _save_nv_regs libc.so.1<- _save_nv_regs libc.so.1
-> _write libc.so.1<- _write libc.so.1<- write libc.so.1<- blastoff php<- one php<- two php<- three php
<- launch php

This shows us what's really going on -- or at least more of what's
really
going on: our PHP functions are inducing work from libc. This is much
more information, but of course, we're still only seeing what's going
on at user-level. To add the kernel into the mix, add the following
to the script:

fbt:genunix::entry
/self->follow/
{
printf("%\*s", self->indent, "");
printf("-> %-20s %\*skernel\\n", probefunc, 46 - self->indent, "");
self->indent += 2;
}
fbt:genunix::return
/self->follow/
{
self->indent -= 2;
printf("%\*s", self->indent, "");
printf("<- %-20s %\*skernel\\n", probefunc, 46 - self->indent, "");
}

Running this new script generates much more output:

-> launch                                          php
-> three php
-> write libc.so.1
-> _save_nv_regs libc.so.1<- _save_nv_regs libc.so.1
-> _write libc.so.1
-> syscall_mstate kernel
-> gethrtime_unscaled kernel<- gethrtime_unscaled kernel<- syscall_mstate kernel
-> syscall_entry kernel
-> pre_syscall kernel<- pre_syscall kernel
-> copyin_args32 kernel
-> copyin_nowatch kernel
-> watch_disable_addr kernel<- watch_disable_addr kernel<- copyin_nowatch kernel<- copyin_args32 kernel<- syscall_entry kernel
-> write32 kernel
-> write kernel
-> getf kernel
-> set_active_fd kernel<- set_active_fd kernel<- getf kernel
...
-> releasef kernel
-> cv_broadcast kernel<- cv_broadcast kernel<- releasef kernel<- write kernel<- write32 kernel
...
-> syscall_mstate kernel
-> gethrtime_unscaled kernel<- gethrtime_unscaled kernel<- syscall_mstate kernel<- _write libc.so.1<- write libc.so.1
-> two php
-> write libc.so.1
-> _save_nv_regs libc.so.1<- _save_nv_regs libc.so.1
-> _write libc.so.1
-> syscall_mstate kernel
-> gethrtime_unscaled kernel<- gethrtime_unscaled kernel<- syscall_mstate kernel
...

Now we're seeing everything that our PHP program is doing, from
the PHP through the native code, into the kernel and back. So using DTrace
on PHP has a number of unique advantages: you can look at the entire
system, and you can look at the entire stack -- and you can do it all
in production. Thanks again to Wez for putting together the PHP provider --
and if you're a PHP developer, bon appetit!

Join the discussion

Comments ( 7 )
  • Luke Closs Friday, August 5, 2005
    Great presentation Bryan. I had no opinion on OpenSolaris prior to this presentation, but had heard lots about dtrace. After your presentation, my coworkers and I were super stoked about opensolaris/dtrace. I can't wait to install it, get our product (Sophos PureMessage) building on it and run dtrace on it!
    As our project is largely perl based, we'd probably be interested in any effort to add probes to perl.
    Great job! Your passion and excitement about dtrace were well evident from your presentation! It was one of my highlights from OSCON.
    Luke
  • Matty Friday, August 5, 2005
    Anyway Sun could add PHP to the Solaris 10 Apache2 package? Redhat ships with PHP, but for some reason Sun doesn't. :(
    - Ryan
  • Nik Clayton Friday, August 5, 2005

    How big a deal is this?

    The reason I ask is that after reading your post I started thinking about doing this sort of thing for Perl. Naturally, I started poking around the web to see if anyone else was already doing this.

    That lead me to this interview at opensolaris.org. In that interview you say:

    DTrace can't currently instrument languages that have dynamic program text: Java, Python, Perl, PHP, etc. This is not an easy problem to solve: these languages have not been designed or implemented with dynamic instrumentation in mind, and the techniques required to instrument them are often very specific to a particular language and its run-time environment. Solving this problem still lies in the indefinite future for us--the next year or so will be spent evolving our existing functionality--but we very much intend to solve it.

    and

    Both. Instrumenting JITted code is difficult because the virtual machine needs to know what you're up to at some level--even if only enough to know to leave your instrumentation alone. Once the code is instrumented, the granularity of operations makes it difficult to tie that instrumentation back to something meaningful. To give you a simple, concrete example: Perl 5 doesn't associate line numbers with parse tree nodes unless it is explicitly started in an explicit debugging mode. By discarding that information, Perl makes it very difficult to [connect] Perl-induced system activity (I/O, CPU utilization, network activity, etc.) to the specific body of Perl that is inducing it. Programs written in C and C++ generally discard their debugging information too, but these languages are so much closer to the operating system and the underlying microprocessor that we can generally still draw meaningful inferences. Not so for Perl, Java, Python, PHP, etc.

    That was less than two months ago.

    I confess that I haven't, yet, been able to use DTrace in the field. My exposure to it has been through information that I've read on the web, and sitting through a couple of Sun technical demonstrations about it in the UK. Those demos played up the ease with which it's possible to write a DTrace provider, and walked through creating one. Without wishing to denigrate Wez's work, I got the impression that simply instrumenting function entry points like this with a dynamic language like PHP or Perl is no great shakes.

    Have a got the wrong end of the stick?

  • Alan Burlison Friday, August 5, 2005
    Perl is possible, at least to the same level as the PHP stuff Wez has done, i.e. subroutine name, source line and file, see http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2005-08/msg00140.html.
    Tracing individual opcode dispatch is also possible, if required.
  • Bryan Cantrill Friday, August 5, 2005
    Nik, My comments a couple of months ago reflect the difficulty of truly dynamic instrumentation. That is, doing to VM-based languages what we can do with user-level C/C++. What we have discovered since is that imperfect solutions can be such a huge lurch forward for these environments that they are very much worth the time and trouble. The PHP solution, for example, doesn't instrument every opcode. (And from talking to Marcus Boerger and Wez after my presentation, this is as hard as I claimed it to be.) But it's such a huge leap, that it's an effective solution -- and it may even be "good enough" to obviate the need to solve the (much) harder problem of true dynamic instrumentation.
  • Alan Hargreaves' Weblog Saturday, August 6, 2005
    [Trackback] Following a link in a comment from Alan Burlinson from Bryan's blog about creating a php provider , it appears that Alan is very well along the path to having a DTrace provider for perl.
    To read a bit more about this, have a look at the mail...
  • Magnus Forsberg Tuesday, October 11, 2005
    Hi Bryan,
    The PHP Dtrace extension would really help me with a problem I got... I have a segfaulting PHP web application and can't seem to narrow down the problem. So this extension would probably help me alot...
    However I have some probs getting this thing to work:
    # pear install dtrace
    ...
    the only warning I can see is when linking:
    \*\*\* Warning: Linking the shared library dtrace.la against the non-libtool
    \*\*\* objects php.o is not portable!
    ... other than that the build seems to have gone through fine, even though it shouldn't be more than 10k.
    Build system:
    Solaris Express build 20, on x86
    Sun JDS CBE 1.2 tools, GCC 3.4.3
    Apache 2.0.54
    Latest PHP dev sources.
    The output file gets huge:
    204274 16416 -rwxr-xr-x 1 root root 16797924 Oct 11 21:53 /var/tmp/pear-build-defaultuser/install-dtrace-1.0.3/opt/php-STABLE-200510100832-gcc/lib/php/extensions/no-debug-non-zts-20041030/dtrace.so
    when loading it from php.ini:
    PHP Warning: PHP Startup: Unable to load dynamic library '/opt/php-STABLE-200510100832-gcc/lib/php/extensions/no-debug-non-zts-20041030/dtrace.so' - ld.so.1: httpd: fatal: relocation error: file /opt/php-STABLE-200510100832-gcc/lib/php/extensions/no-debug-non-zts-20041030/dtrace.so: symbol zend_execute: referenced symbol not found in Unknown on line 0
    nm output:
    [59] | 0| 0|NOTY |GLOB |0 |UNDEF |zend_execute
    [57] | 0| 0|NOTY |GLOB |0 |UNDEF |zend_execute_internal
    [60] | 0| 0|NOTY |GLOB |0 |UNDEF |zend_get_executed_filename
    [76] | 0| 0|NOTY |GLOB |0 |UNDEF |zend_get_executed_lineno
    Any help would be greatly appreciated!
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha