This blog entry was provided by Ruud van der Pas.

Introduction

DTrace is a dynamic tracing tool that can be used to analyze what a system is doing in real time. It is easy to use and has been designed to have low overhead. It is also safe to use on production systems. This blog presents an overview of DTrace.

What is DTrace?

DTrace is used to examine the behavior of user programs, libraries, the operating system kernel, or any combination of these simultaneously. DTrace provides a holistic view and may be used to help understand how the system works, to track down performance problems, or to find the cause of an outlier in the behavior.

The typical use of DTrace is to follow, or better said, trace, activities of interest. The tracing may be at the operating system level, but could also start from an application, go into one or more system libraries, and possibly all the way down into the operating system kernel.

A probe is the technical term to specify such an activity. On every system where DTrace is installed, a set of probes is available. This is the set that you will be working with and there are easily thousands, if not more, probes to choose from.

DTrace is fully programmable using the D language. This provides full flexibility as to what to trace and what the output should look like. More on the D language can be found below in the section titled The D language.

A brief history

DTrace has had a long history and as a result is battle hardened. This has made it robust and feature rich.

The initial version was released as part of the Solaris 10 Operating System that came out in 2004. In 2010, the port to Oracle Linux started and the first version was available in 2011. Since then, over the years several updates have been released.

There was one thing that increasingly became an issue though. Kernel patches were required to install and use DTrace. This led to the decision to re-architect and re-implement the underlying engine to use eBPF (extended Berkeley Packet Filter), plus some other Linux tracing tools.

This combines the best of two worlds. DTrace has quite a high abstraction level, making it easy to use and relieving the user of the low level details that come with the use of eBPF. Additionally, availability of eBPF is not an issue. It is a core part of the Linux kernel.

The eBPF based version of DTrace has been available for several years now. It is available for Intel, AMD, and ARM based systems and continues to be developed and extended. The installation is much easier than before, because DTrace is a userland application and installs like any other userland package. See section Installation on Oracle Linux for how to install the most recent version of DTrace.

The D language

The D language is specific to DTrace, but has been inspired by several commonly used programming languages like awk, Perl, and C/C++. For example, the C/C++ printf() function to print variables is also supported in DTrace. This is why anybody with some basic programming experience will find it easy to learn the D language.

A D script, or program, consists of a sequence of blocks, where each block starts with the description of a probe, an optional filter expression (called a predicate), followed by a clause. The latter is a block of code enclosed in curly braces ({ and }).

Below we show the general structure of such a block:

1 provider:module:function:name
2 / predicate /
3 {
4   D statement(s)
5 }

Line 1 has the mandatory reference to a probe. There are four fields, where a missing field acts the same as a * wildcard.

The next line is optional and contains the predicate that acts as a filter. The expression between the two forward slashes (/) is of type integer and evaluates to true or false.

The actions are specified in the clause (lines 3-5). A clause is executed unconditionally, unless a predicate is present. If it evaluates to true, the clause is executed. Otherwise, no action is taken.

DTrace recognizes a form of shorthand when referencing probes. By convention, if not all the fields of a probe description are specified, DTrace can match a request to all the probes with matching values in the parts of the name that are specified. In addition to this, wildcard characters from the set {*,?,!,[]} may be used within a field. These symbols are also commonly used in Linux shell commands.

Below is an example of a probe with a predicate and clause:

1 sched:vmlinux::on-cpu
2 / execname != "dtrace" /
3 {
4   @count_on_cpu = count();
5 }

The probe description is at line 1. The sched provider is used. It provides probes from module vmlinux. This probe description does not include a function, but there is a name field and it is called on-cpu. This line causes the probe to fire each time that a process is scheduled to run on a CPU.

Since we do not want to include the dtrace command in the output, at line 2, a predicate to filter this name out is used. As a result, the clause at lines 3-5 is not executed in case the dtrace command is scheduled to run.

The clause spans lines 3-5 and contains one statement at line 4. At this line, the count() aggregation function is used. It is incremented by 1 each time the probe fires and the updated value is stored in a structure called an aggregation. Here we have given it the name @count_on_cpu. An aggregation is very similar to an associative array. It is used to store the result of an aggregation function, like count(), but there are more aggregation functions than this. As an aside, the D language also supports associative arrays.

You may wonder why we do not print any results. While you have full control over what is printed and when, in this example we rely on a convenient feature: any aggregation that is not explicitly printed is printed upon termination of the script. This is in particular very useful when developing a D script and all you want to see is the contents of aggregations. In a later phase, formatted print statements can be used to control the layout.

What this all means is that this single statement at line 4 counts how often any process, other than dtrace itself, is scheduled to run on the CPU. The result, the count, is automatically printed when the script finishes.

The code shown above is stored in a file called on-cpu.d and executed under control of the dtrace command:

$ sudo dtrace -s on-cpu.d
dtrace: script 'on-cpu.d' matched 1 probe
^C
              582
$

Under the hood, eBPF code is generated and executed. This is why DTrace scripts have to be run as root, or using the sudo command as is done here. The -s option is needed, because we read the script from a file.

The output starts with a line that echoes the name of the script and confirms that the probe matches. This is part of the default output from the dtrace command. The script does not finish until we stop it with Ctrl-C. When we do this, we see that the number 582 is printed. This is the total number of processes that were scheduled to run on the CPU while the script was running.

Regarding this example:

  • The default output from the dtrace command can be suppressed either through a command line option, or a pragma in the script.
  • It is not always needed to stop a script with Ctrl-C. You can for example also trace a process that is started at the command line. Upon completion of the process, the tracing stops automatically. It is also possible to add a probe that fires after a certain time has passed. In the clause for this probe, the exit() function to terminate the tracing is then called. In general, the exit() function can be called from any probe and stops the tracing.
  • There are various print functions that can be used to explicitly control what needs to be printed, including aggregations, and define the layout.

These topics are however outside of the scope of this blog. We refer to the user guide for the details.

There are several other things that can be recorded as part of this probe. For example, the name(s) of the processes that were scheduled on the CPU, and also the time when that happened. Also, the sched provider includes other probes to trace scheduler activities. For example, there is a probe that fires when a process is scheduled to be taken off the CPU.

Three commonly asked questions

Those new to DTrace often have similar questions. They are common enough to warrant coverage in this blog.

Why do we need DTrace?

There are several tracing tools on Linux already. What makes DTrace unique is the programmability. In combination with the vast amount of probes that are supported, it provides for a powerful tool that can be adapted to the tracing requirements.

The probes let you trace many activities in the system and even more importantly, they can be monitored simultaneously. This means that correlations between different events are much easier to detect. Other tools that provide tracing are often restricted to a specific area, or part of the system (e.g. kernel only), and do not support dependencies between events. Combining different tools requires separate experiments to be conducted, making it much harder to find correlations.

On top of this, thanks to the programmability, any combination of events can be traced. In addition, the output can be fully customized to the task at hand. This eliminates the need to write a post-processing tool to parse the results.

DTrace or eBPF?

Some use eBPF natively in their application, but this requires quite some details to be considered. Although DTrace generates eBPF code under the hood, this is transparent to the user. The generated code is compiled and executed when invoking the dtrace command.

Thanks to the D language, the user is freed from the low level details of eBPF and can instead focus on a high level description of the probes and actions to be taken. This not only makes DTrace much easier to use, it also provides more flexibility when developing a script.

Another advantage is that DTrace continues to evolve and expand. Over time, new providers and probes have been added. The D language also continues to be extended with new features.

How does DTrace compare to application profiling tools?

An often asked question is whether DTrace replaces tools like perf, or gprofng. These are application profiling tools and while DTrace has some profiling capabilities, there is very little overlap. On the contrary, they complement each other.

DTrace is a troubleshooting tool that is used to find the root cause of outliers in behavior in the system. For example, it may expose that an internal buffer is not depleted quickly enough, causing delays higher up in the call chain.

Profiling tools excel at finding where the time is spent in an application. Nothing more, nothing less. The timing information is obtained at the function, source line and instruction level. While DTrace has support for timing events, this is at the granularity of an event. This could be a function call, but also an operating system activity for example. As far as the timings go, it doesn’t go any deeper than that.

As an example how these tools complement each other, consider the following scenario. The profiling tool shows that not all threads of a multithreaded application arrive at a barrier at more or less the same time. A common explanation is that the algorithm has a load imbalance, but that can be ruled out here.

With DTrace, one can then check what the scheduler is doing. Do all threads get their fair share of the available cores, or not? Another thing to look at is the NUMA behaviour. Could it be that some threads execute on a remote node and need more time to get the data they need? These kinds of questions are easy to answer with DTrace and if possible at all, much harder to do through a profiling tool.

Installation on Oracle Linux

On Oracle Linux, yum, or dnf, can be used to install DTrace on Oracle Linux 8, 9, or 10. On aarch64 platforms, the packages are in the baseos repository and no additional commands are needed prior to installing DTrace. On an x86 platform, an additional repository needs to be enabled through the following command:

sudo dnf config-manager --enable <repository>

The name of the repository depends on the version of Oracle Linux:

  • Oracle Linux 8: ol8_UEKR7
  • Oracle Linux 9: ol9_UEKR7
  • Oracle Linux 10: ol10_UEKR8

Once the repository has been enabled, this command installs DTrace:

sudo dnf install -y dtrace

The dtrace command is installed in /usr/sbin. It is good practice to verify this and run the following command to see the version number:

sudo dtrace -V

Summary

DTrace is a tool to trace activities in a system. The events to be monitored can be in an application, a library, the Linux kernel, or any combination of these. Thanks to this, correlations between events are easier to detect, compared to using tools that have a specific focus area.

The D language makes it easy to leverage the power of DTrace. The programmability offers the flexibility to define what needs to be traced and what the output should look like.

Pointers to more information

DTrace is very well documented and there are many examples on the internet. Below we list a small selection of all the material that is available. These could be a starting point to learn more, but by no means is this list exhaustive.

  • The DTrace training videos – The Oracle Linux Training Station provides easy access to a variety of free Linux trainings, including one on DTrace. It is straightforward to navigate through the choices, but you can also go directly to the playlist with all the DTrace trainings through this link: DTrace trainings playlist.
  • The DTrace user guide – This extensive guide covers all of the features and has many examples to learn from. It can be viewed, or downloaded here.
  • The DTrace repository on GitHub – The DTrace source code is on github.com to make it easier to access the source. This repository is also used to work with developers in the Linux community.