Making yourself indispensible with DTrace

DTrace technology and tools appeared in 2005, and despite this, it is still not very well known among majority of developers and system administrators. It is especially surprising given that for five and a half years since the release of Solaris 10, where DTrace was first implemented, there was no better technology made for analyzing of operating systems and applications.

DTrace was developed by Sun Microsystems, and all of its components were open sourced in 2005. Later DTrace has been ported to Mac OS X, QNX, and FreeBSD, and remained available in all the distributions that have inherited from Solaris: Belenix, Korona, Nexenta, OpenIndiana and Milax. DTrace is a core part of Solaris, and new DTrace probes were introduced in Solaris 11 Express, announced in November 2010.

DTrace is an instrument for system administrators as well as for developers, and DTrace providers enable probes for many languages, including scripting ones such as PHP.

DTrace has several unique features, which can not be found being together in other tools, and in particular DTrace:
1. allows to collect information in the system operating under maximum load in production - with low overhead;
2. collects any information from any part of the system, allowing you to observe applications and the kernel as well;
3. can show which arguments are passed from one function to another, even if one does not have source code for the functions;
4. harvests function calls execution time info, calculates a percentage of time spent to execute each of them, shows how many times each of the functions was called, etc.;
5. filters information in a specified way - for example, lets you restrict the observation scope by an application, a thread, a particular system call, or another specified execution unit;
6. may react to certain events (I/O, call of the given function, the completion of programs, starting a new thread, etc.).
7. has high-level and low-level observation scopes - from observing an internal functioning of a device driver to monitoring certain events in PHP scripts execution or method calls in Java applications.
8. allows call trace, with tracking any of options - a run-time arguments passed, etc.

The main component of DTrace is a kernel module, servicing the whole DTrace sybsystem, and the main tool is dtrace, which accepts scripts written in D, DTrace-specific language with C-like syntax. Actually, the syntax is something between AWK and C syntaxes, and those who like both languages will enjoy most.

In this short article we are not going to dig into details of DTrace implementation; those who are interested in them, can google more details, or read DTrace User Guide at If you read Russian, you may wish to download Chapter 27 and 28 of the book "Operating System Solaris" (second edition), written by Yevgeny Ilyin and me in 2009. The chapters are available in .pdf in the Internet.

Let's look at some DTrace usage practices. We'll see how the desire to simplify one's life and indulge the laziness influences the system administrator or developer to learn and apply new tools, and how it improves their productivity in times, and applications preformance exponentially.

Once upon a time at OSDevCon conference in Dresden, Chad Mynhier led a master class on DTrace. He has worked for Forsythe, and the company has earned a lot doing banking applications optimization. He told us a story of a customer he triumphantly tuned applications for, speeding them up in one thousand times, and breaking the record of fastest earning a honorarium.

The bank was satisfied with an important application before, but it significantly slowed down with load growth. Bank was planning bigger increase of the load, and situation started to be dangerous, as losses were unacceptable. An analysis of the application with DTrace did not reveal anything special initially, but function calls frequency distribution looked suspicious: gettimeofday() call, returning the current system time, was the most demanded call. Databases often call this function, because they often record a time at which the transaction occurred, but this function is somehow dominated the calls ... Chad located a particular portion of the code in the application using DTrace, where the function was called exteremely often, and it turned out that it was called by developer's mistake in a cycle, which, in turn, was involved in almost every application's transaction.

Probably, gettimeofday() call remained from some sort of debug output, or has been put in a loop by mistake, but anyway, the error was not revealed in the testing phase and it affected the performance when the application's load exceeded a certain level only. Before that, a powerful server was able to handle tens of thousands of gettimeofday() calls in a second.

DTrace is necessary not only to find problems or monitor performance, and I often use DTrace when I'm too lazy to search and read the documentation, or when the documentation is not complete, or documentation and source code are not available, or I need to understand how the application works.

In D scripting language threre is always following syntax:

/ predicate /

Provider is a kernel module or an application module, which registers its DTrace probes in the system. For example, the syscall provider registers the probes located in system calls, mysql<mysql-server-PID> provider - do it for probes embedded into MySQL server.
Module is a name of the module or library, for example, libc.
Function is function name, a where probe is located, for example, fopen.

Probe is a probe name (in many cases, the probe is called "entry" or "return").

If the predicate evaluates to true, DTrace performs actions. For example, if it is necessary to do it when the probe was reached while running an application "top" only, the predicate would be

/execname == top /

Actions are the list of statements used to harvest information, display/print it, or do aggregations.

A simple example of how I use DTrace in everyday life: after moving user settings from one computer to another one, a printer stopped working. It was clear that a configuration file has wrong settings for some reason. But which file exactly? There are many of them. I started to look for a suspicious file in  /etc/printers and /etc/lp/*, but it did not give any clue. Then I moved to DTrace. Simple script allows you to look inside of lpstat (the most innocuous of programs working with a printer) and see which files it tries to open:

scheduler is running
system default printer: eaqvap21
aqvap21: unknown printer
aqvap21: unknown printer

To run the program and give its PID to a DTrace script one should use -c:

# dtrace-n 'pid $ target:: fopen: entry {printf ("% s", copyinstr (arg0 ));}'-c "lpstat-s"
dtrace: description 'pid$target::fopen:entry ' matched 1 probe
scheduler is running
system default printer: eaqvap21
aqvap21: unknown printer
aqvap21: unknown printer
dtrace: pid 11156 has exited
0 59882 fopen:entry /etc/default/init
0 59882 fopen:entry /etc/lp/ppd/eaqvap21.ppd
0 59882 fopen:entry /export/home/filip/.printers
0 59882 fopen:entry /export/home/filip/.printers
0 59882 fopen:entry /export/home/filip/.printers
0 59882 fopen:entry /etc/printers.conf
0 59882 fopen:entry /export/home/filip/.printers
0 59882 fopen:entry /etc/printers.conf
1 59882 fopen:entry /etc/nsswitch.conf
1 59882 fopen:entry /export/home/filip/.printers
1 59882 fopen:entry /export/home/filip/.printers
1 59882 fopen:entry /etc/printers.conf

That's all: I forgot that I had to remove .printers file in my home directory:

rm /export/home/filip/.printers

done! Everything works now!

Note: you have to give lpstat's PID to the script because pid provider requires PID of the process we examine, and fopen function can be checked with pid provider only.

Sometimes I use another modification of the script:

pfexec dtrace -n 'syscall:: open *: entry {printf ("% s \ n", copyinstr (arg0 ));}'

This script brings a bunch of data from all system calls that begin with "open", which is inconvenient, but it is easier to remember it and simply to use with already running applications, and information received may be sufficient for analysis.

Finally, for those who debugs Web applications, a script catching SQL-operators before the server executes them can be useful. It allows us to find out exactly which operators are to be executed; it helps if it appears that the script presumably works with the database correctly, but the database returns an unexpected result: it may be that either incorrect SQL expression, or connection is made with wrong database, or a parameter is passed to a DB server with improper syntax (for example, without quotes or with extra quotation marks).

Here is the script:

#!/usr/sbin/dtrace -s
#pragma D option quiet
printf("%-20s %-20s %-40s %-9s\n", "Who", "Database", "Query", "Time(ms)");
self->query = copyinstr(arg0);
self->connid = arg1;
self->db = copyinstr(arg2);
self->who = strjoin(copyinstr(arg3),strjoin("@",copyinstr(arg4)));
self->querystart = timestamp;

printf("%-20s %-20s %-40s %-9d\n",self->who,self->db,self->query,
(timestamp - self->querystart) / 1000000);

Running it, we can observe what the server is doing (be careful! There can be a lot of information if the server is heavily loaded with requests!):

$ pfexec ./mysql.d
Who Database Query Time(ms)
root@localhost data SELECT DATABASE() 0
root@localhost test show tables 0
root@localhost test SELECT DATABASE() 0
root@localhost mysql select * from users 0
root@localhost mysql select * from host 0

A time taken for a SQL statement execution can be measured as a difference between triggering query-start and query-done, and DTrace allows to plot the distribution of time among statements. One can find the database calls that take a lot of time, affecting performance of an application. The great advantage of DTrace is that all these measurements can be performed in production, because the overhead of DTrace is minimal and will not interfere with the server.

These examples show that DTrace probes can be put in Solaris applications everywhere. Please, refer to manual for operating system (man) and the documentation for specific applications (such as MySQL server on site to learn which arguments associated with probes can be used as shown above.

In case you wish to familarize with more examples of D-scripts please take a look at:
/opt/DTT/ (scripts are there by default in Solaris 11 Express)

Before you start playing with dtrace please note that you must have one or more DTrace privileges in the operating system in order to run dtrace, regular users can not use dtrace for security reasons. A system administrator can give a right to use dtrace to a particular user, but it is not done by default. Root has administrative privileges, which is enough to run dtrace.


DTrace is not less known to the masses as Solaris at large. If DTrace would be widely available for example on Linux, it'll get the deserved appreciation. Alas, DTrace port on Linux is very far from mature.

Posted by shmerl on July 04, 2011 at 04:49 PM GMT #

Post a Comment:
  • HTML Syntax: NOT allowed

The Observatory is a blog for users of Oracle Solaris. Tune in here for tips, tricks and more as we explore the Solaris operating system from Oracle.


« June 2016