Introduction
Let’s take a look at what has happened recently in the DTrace world. Many improvements and additions have made the tool even more powerful, and also easier to learn, thanks to the addition of many examples and new training videos.
DTrace is a system observability tool that is:
- powerful:
- it has knowledge of kernel structures
- you can study the system, user applications and kernel, specific processes or libraries
- it allows full stack tracing: both kernel and userspace in one session
- you can use simple programming syntax to handle run-time events and the data you collect
- probes are provided for profiling, kernel function calls and tracepoints, and user programs
- easy to use:
- you can study programs that are already running
- D syntax is similar to other common programming languages like C and awk
- safe:
- DTrace is designed to be safe to run on a production system, by default
- a -w option allows certain, limited actions, such as writing to a file
DTrace for Linux is implemented on top of eBPF, thereby achieving kernel-level observability through an exclusively user-space tool. The eBPF code verifier offers an additional level of safety.
Here, we review what is new in DTrace 2.0.4 and 2.0.5.
Increased availability in Linux distributions
DTrace is available in Oracle Linux, Gentoo Linux, Void Linux and recently it was added to Debian as well, and can be built from source on most distributions.
Error injection with DTrace
While safety is important in DTrace, error injection can be useful for studying kernel behavior. Therefore, DTrace has introduced the
void return(int);
action, which forces a kernel function to return the specified int. The action must appear in a rawfbt probe for a function that the kernel lists in /sys/kernel/debug/error_injection/list.
Consider this example program, which simply reports getpid() several times:
#include
#include
int main(int c, char **v) {
int i;
for (i = 0; i < 5; i++)
printf("C prog gets %d\n", getpid());
return 0;
}
Further, consider the D script:
BEGIN { printf("pid should be %d\n", $target); n = 0; }
rawfbt:vmlinux:__*_sys_getpid:entry
/pid == $target/
{ printf("entry "); n++; } // report every entry, increment n
rawfbt:vmlinux:__*_sys_getpid:entry
/pid == $target && n == 3/
{ printf("inject "); return(77777777); } // on one entry, inject an error
rawfbt:vmlinux:__*_sys_getpid:entry // otherwise, just pad the output
/pid == $target && n != 3/
{ printf(" "); }
rawfbt:vmlinux:__*_sys_getpid:return
/pid == $target/
{ printf("return %d\n", arg1); } // report each return value
Each probe has the predicate /pid == $target/ so that we do not modify the getpid() return value for every process on the system! Note that return appears both as a probe name and as an action. Clearly, return is a destructive option and so we must run DTrace either with -w or #pragma D option destructive.
We compile with gcc and run the D script, here called D.d:
$ sudo /usr/sbin/dtrace -c ./a.out -qws D.d C prog gets 26115854 C prog gets 26115854 C prog gets 77777777 C prog gets 26115854 C prog gets 26115854 pid should be 26115854 entry return 26115854 entry return 26115854 entry inject return 77777777 entry return 26115854 entry return 26115854
Normally, we see we enter and return from getpid(), returning the same pid value repeatedly. In one iteration, however, the D entry probe injects a bogus return value, which we see when the corresponding return probe fires (reporting its arg1 value) and the C program reports the value.
Note that the kernel must have been configured with CONFIG_BPF_KPROBE_OVERRIDE on an architecture having the CONFIG_FUNCTION_ERROR_INJECTION option.
New providers
With the DTrace on Linux 2.0.4 release, there are now also tcp and udp providers. These are examples of providers that provide probes at semantically important points in the Linux kernel; other examples include the io, ip, lockstat, proc, and sched providers.
The tcp provider provides probes that fire at different stage of tcp processing:
- accept-established
- accept-refused
- connect-request
- connect-established
- connect-refused
- send
- receive
- state-change
Information about the underlying packet is available via probe arguments that are structs:
- pktinfo_t *: packet
- csinfo_t *: connection state
- ipinfo_t *: common IP info
- tcpinfo_t *: TCP header fields
- tcpsinfo_t *: stable TCP details from tcp_t
- tcplsinfo_t *: old tcp state for state-change
For udp, there is a simpler set of probes:
- send
- receive
Information about the underlying packet in the udp case is available via probe arguments that are structs:
- pktinfo_t *: packet
- csinfo_t *: connection state
- ipinfo_t *: common IP info
- udpinfo_t *: UDP header fields
- udpsinfo_t *: stable UDP details from udp_t
USDT Provider
The USDT provider is not new, but it is worth highlighting nonetheless. It is for user-level, statically defined tracing. That is, user-space code — whether a program or a shared library — can contain probes points at semantically important points. Then, a user, who may be unfamiliar with the implementation but has a feel for important steps along the way, can use those probes.
The User Guide has a nice USDT example. A file defines the USDT provider and probes:
$ cat myproviders.d
provider myprov
{
probe my__put(int, int);
probe my__get();
};
A simple user code using these probes could look like this:
$ cat func.c
#include
#include
#include "myproviders.h"
int bar(int in)
{
printf("bar evaluates %d\n", in);
return 3 * in;
}
void foo(void)
{
if (MYPROV_MY_PUT_ENABLED()) {
int arg0, arg1;
arg0 = bar(1111);
arg1 = bar(2222);
MYPROV_MY_PUT(arg0, arg1);
}
MYPROV_MY_GET();
}
int main(int c, char **v)
{
while (1) {
usleep(1000 * 1000);
foo();
}
return 0;
}
The whole program might be built like this:
$ /usr/sbin/dtrace -h -s myproviders.d $ gcc -I/usr/lib64/dtrace/include -c func.c $ /usr/sbin/dtrace -G -s myproviders.d func.o $ gcc -Wl,--export-dynamic,--strip-all myproviders.o func.o
Finally, you could run the example as follows:
$ sudo /usr/sbin/dtrace -c ./a.out -q -n '
myprov$target:::my-put { printf("put %d %d\n", arg0, arg1); }
myprov$target:::my-get { printf("get\n"); }
tick-5sec {exit(0)}'
bar evaluates 1111
bar evaluates 2222
put 3333 6666
get
bar evaluates 1111
bar evaluates 2222
put 3333 6666
get
bar evaluates 1111
bar evaluates 2222
put 3333 6666
get
bar evaluates 1111
bar evaluates 2222
put 3333 6666
get
The documentation discusses a number of details that are skipped over here.
(Note: With the DTrace 2.0.5 release, you could get the error DRTI: Ioctl failed due to a seccomp jail issue. A fix is queued for the next release on the github devel branch.)
Compatibility with Systemtap style tracepoints
There is now also a stapsdt provider. While you could add stapsdt probes to your programs or libraries, stapsdt probes do not dynamically register themselves with DTrace, making them less powerful than DTrace-based USDT probes. The provider is useful, however, because it allows DTrace tracing of programs and libraries that contain static probes that were added via stapsdt ELF notes, whether for use with SystemTap or DTrace. For example, stapsdt probes were added to Python as far back as Python 3.6 provided Python is built with --with-dtrace; see the blog Tracing Python with DTrace. See also the documentation on the stapsdt provider.
Variables Can Have stack() and ustack() Values
With DTrace 2.0.4, the D language has been expanded to allow use of stack() and ustack() as functions. Traditionally, stack() and ustack() have been actions — in effect, statements in D clauses that have resulted in call stacks being written to the output buffer. However, it can be useful to store a call stack to a variable for future retrieval. For example, when we free memory, what was the call stack that had malloced it? Or, when there is lock contention, what was the call stack that had acquired the lock? Now, stack() and ustack() can be used in expressions. An exception is added to allow them to retain their original behaviour as data recording actions as well. A new internal type dt_stack_t holds call stack data.
For example, a D clause might have directed a probe to write the kernel call stack to the output buffer:
{
stack();
}
This still works, but perhaps you want to capture the call stack in one probe and (conditionally) report that call stack later in a different probe. One might write:
{
myvar = stack();
}
for the first probe and:
{
printf("%k", myvar);
}
for the second. One can store call stacks to global, thread-local, or clause-local variables or use them as keys to associative arrays.
The documentation discusses both stack and ustack.
Bug fixes
Sadly, there are sometimes bugs.
Happily, there are sometimes fixes.
There were circumstances in which dynamic variables could overwrite one another. Dynamic variables include associative array elements and thread-local variables (self->). A technical discussion can be found in commit 4e8c23df7a on github.
A regression in DTrace 2.0.3 was that fbt probes were not listed by default. While dtrace -lP fbt would list fbt probes, dtrace -l would not. This is fixed in 2.0.4.
Another DTrace 2.0.3 regression was that pid*:::return probes were unreliable if there were more than one function. This is fixed in 2.0.4.
On large systems with nonconsecutive CPU numbering, some output could be lost. This is fixed in 2.0.5.
The proc:::exec args[0] argument was incorrect when the probe was firing based on syscall:vmlinux:execveat:entry. Again, this is now fixed.
A significant memory leak was fixed in the provider and probe management code. Details are in commit
23cdaf54ce on github.
Compilation information associated with a probe is now freed once the associated BPF program has been loaded into the kernel, freeing up considerable memory when many probes are used. Details are in commit 1ceca59160 on github.
Performance and scalability improvements
In DTrace 2.0.5, user-space probes (probes provided by the pid, usdt, and stapsdt providers) are now implemented by means of PID-specific uprobes.
User-space probes were previously implemented using system-wide uprobes. Those are kernel-level uprobes that are inserted in memory pages with executable code that is shared by all processes that use it. This resulted in a performance impact for all processes using that shared executable code because the probe would be triggered, it was was left to the BPF tracing program to determine whether the user was interested in this probe event. Probes placed in common shared libraries (e.g. the C library) had a significant impact on the entire system (up to ~10x slowdown compared to PID-specific uprobes).
The new mechanism uses kernel-level uprobes that are inserted in a process-specific copy of the executable code (just the page that contains the probe). Therefore, such probes do not affect any other processes on the system, even if they use the same executable code. This greatly reduces the impact of tracing common code.
Technical details are in commit 3fbcca750a on github.
Other recent improvements include:
- Internally, DTrace stores temporary strings. Improvements have been made in managing and reusing such tstrings to accommodate deeply nested ternary expressions involving strings. Details are in commit 5b20226bc9 on github.
- Aggregation map IDs have been cached in order to speed up aggregation truncations on many-CPU systems. Details are in commit c4b6043d36 on github.
Easier access to documentation and help
Documentation is now available bundled with DTrace itself, it is installed alongside the DTrace program.
The User Guide has been converted to Markdown format. You can access this documentation both on github at its index file and on your system where DTrace is installed at /usr/share/doc/dtrace-2.0.5/userguide/index.md.
The Tutorial is now similarly available, for example on github at its index file and on your system where DTrace is installed at /usr/share/doc/dtrace-2.0.5/tutorial/index.md.
There are now also examples distributed with DTrace. On github, you can navigate through the examples via their README file. On your system where DTrace is installed, check:
- /usr/share/doc/dtrace-2.0.5/examples/README.md: Overview.
- /usr/share/doc/dtrace-2.0.5/examples/*.d: Examples of varying difficulty.
- /usr/share/doc/dtrace-2.0.5/examples/language_features/*: Tiny examples to illustrate specific language features.
AI integration
Increasingly, AI is used to explore unfamiliar tools, write scripts, and perform other tasks. The DTrace documentation, therefore, is available as LLM context files, structured context packs (llms-txt format) for use with large language models such as GPT-4, GPT-5, or Claude. An LLM performs better with more information but at the same time faces context window limitations. Therefore, DTrace offers different sizes of LLM context files. You can navigate to these files on github via the LLM README file and on your system where DTrace is installed at /usr/share/doc/dtrace-2.0.5/llm/*. Also, this blog discusses using LLMs to explore DTrace. There is on-going work to improve these files for better AI results.