X

News, tips, partners, and perspectives for the Oracle Linux operating system and upstream Linux kernel work

DTrace for the Application Developer - Counting Function Calls

This blog entry was provided by Ruud van der Pas

 

Introduction

DTrace is often positioned as an operating system analysis tool for the system administrators, but it has a wider use than this. In particular the application developer may find some features useful when trying to understand a performance problem.

In this article we show how DTrace can be used to print a list of the user-defined functions that are called by the target executable. We also show how often these functions are called. Our solution presented below works for a multithreaded application and the function call counts for each thread are given.

Motivation

There are several reasons why it may be helpful to know how often functions are called:

  • Identify candidates for compiler-based inlining. With inlining, the function call is replaced by the source code of that function. This eliminates the overhead associated with calling a function and also provides additional opportunities for the compiler to better optimize the code. The downsides are an increase in the usage of registers and potentially a reduced benefit from an instruction cache. This is why inlining works best on small functions called very often.
  • Test coverage. Although much more sophisticated tools exist for this, for example gcov, function call counts can be useful to quickly verify if a function is called at all. Note that gcov requires the executable to be instrumented and the source has to be compiled with the appropriate options.
  • In case the function call counts vary across the threads of a multithreaded program, there may be a load imbalance. The counts can also be used to verify which functions are executed by a single thread only.

 

Target Audience

No background in DTrace is assumed. All DTrace features and constructs used are explained. It is expected the reader has some familiarity with developing applications, knows how to execute an executable, and has some basic understanding of shell scripts.

The DTrace Basics

DTrace provides dynamic tracing of both the operating system kernel and user processes. Kernel and process activities can be observed across all processes running, or be restricted to a specific process, command, or executable. There is no need to recompile or have access to the source code of the process(es) that are monitored.

A probe is a key concept in DTrace. Probes define the events that are available to the user to trace. For example, a probe can be used to trace the entry to a specific system call. The user needs to specify the probe(s) to monitor. The simple D language is available to program the action(s) to be taken in case an event occurs.

DTrace is safe, unintrusive, and supports kernel as well as application observability.

DTrace probes are organized in sets called providers. The name of a provider is used in the definition of a probe. The user can bind one or more tracing actions to any of the probes that have been provided. A list of all of the available probes on the system is obtained using the -l option on the dtrace command that is used to invoke DTrace.

Below an example is shown, but only snippets of the output are listed, because on this system there are over 110,000 probes.

# dtrace -l

     ID   PROVIDER            MODULE                          FUNCTION NAME
      1     dtrace                                                     BEGIN
      2     dtrace                                                     END
      3     dtrace                                                     ERROR

                     <lines deleted>

     16    profile                                                     tick-1000
     17    profile                                                     tick-5000
     18    syscall           vmlinux                              read entry
     19    syscall           vmlinux                              read return
     20    syscall           vmlinux                             write entry
     21    syscall           vmlinux                             write return

                     <lines deleted>

    656       perf           vmlinux               syscall_trace_enter sys_enter
    657       perf           vmlinux            syscall_slow_exit_work sys_exit
    658       perf           vmlinux                  emulate_vsyscall emulate_vsyscall
    659   lockstat           vmlinux       intel_put_event_constraints spin-release
    660   lockstat           vmlinux             intel_stop_scheduling spin-release
    661   lockstat           vmlinux           uncore_pcibus_to_physid spin-release

                     <lines deleted>

  1023      sched           vmlinux              __sched_setscheduler dequeue
  1024   lockstat           vmlinux              tg_set_cfs_bandwidth spin-release
  1025      sched           vmlinux                     activate_task enqueue
  1026      sched           vmlinux                   deactivate_task dequeue
  1027       perf           vmlinux                    ttwu_do_wakeup sched_wakeup
  1028      sched           vmlinux               do_set_cpus_allowed enqueue

                 <many more lines deleted>

155184        fbt        xt_comment                        comment_mt return
155185        fbt        xt_comment                   comment_mt_exit entry
155186        fbt        xt_comment                   comment_mt_exit return
163711    profile                                                     profile-99
163712    profile                                                     profile-1003
#

Each probe in this output is identified by a system-dependent numeric identifier and four fields with unique values:

 

  • provider - The name of the DTrace provider that is publishing this probe.
  • module - If this probe corresponds to a specific program location, the name of the kernel module, library, or user-space program in which the probe is located.
  • function - If this probe corresponds to a specific program location, the name of the kernel, library, or executable function in which the probe is located.
  • name - A name that provides some idea of the probe's semantic meaning, such as BEGIN, END, entry, return, enqueue, or dequeue.

 

All probes have a provider name and a probe name, but some probes, such as the BEGIN, END, ERROR, and profile probes, do not specify a module and function field. This type of probe does not instrument any specific program function or location. Instead, these probes refer to a more abstract concept. For example, the BEGIN probe always triggers at the start of the tracing process.

Wild cards in probe descriptions are supported. An empty field in the probe description is equivalent to * and therefore matches any possible value for that field.

For example, to trace the entry to the malloc() function in libc.so.6 in a process with PID 365, the pid365:libc.so.6:malloc:entry probe can be used. To probe the malloc() function in this process regardless of the specific library it is part of, either the pid365::malloc:entry or pid365:*:malloc:entry probe can be used.

 

Upon invocation of DTrace, probe descriptions are matched to determine which probes should have an action associated with them and need to be enabled. A probe is said to fire when the event it represents is triggered.

 

The user defines the actions to be taken in case a probe fires. These need to be written in the D language, which is specific to DTrace, but readers with some programming experience will find it easy to learn. Different actions may be specified for different probe descriptions. While these actions can be specified at the command line, in this article we put all the probes and associated actions in a file. This D program, or script, by convention has the extension ".d".

Aggregations are important in DTrace. Since they play a key role in this article we add a brief explanation here.

The syntax for an aggregation is @user_defined_name[keys] = aggregation_function(). An example of an aggregation function is sum(arg). It takes a scalar expression as an argument and returns the total value of the specified expressions.

For those readers who like to learn more about aggregations in particular we recommend to read this section on aggregations from the Oracle Linux DTrace Guide. This section also includes a list of the available aggregation functions.

Testing Environment and Installation Instructions

The experiments reported upon here have been conducted in an Oracle Cloud Infrastructure ("OCI") instance running Oracle Linux. The following kernel has been used:

$ uname -srvo
Linux 4.14.35-1902.3.1.el7uek.x86_64 #2 SMP Mon Jun 24 21:25:29 PDT 2019 GNU/Linux
$

The 1.6.4 version of the D language and the 1.2.1 version of DTrace have been used:

$ sudo dtrace -Vv
dtrace: Sun D 1.6.4
This is DTrace 1.2.1
dtrace(1) version-control ID: e543f3507d366df6ffe3d4cff4beba2d75fdb79c
libdtrace version-control ID: e543f3507d366df6ffe3d4cff4beba2d75fdb79c
$

DTrace is available on Oracle Linux and can be installed through the following yum command:

$ sudo yum install dtrace-utils

After the installation has completed, please check your search path! DTrace is invoked through the dtrace command in /usr/sbin. Unfortunately there is a different tool with the same name in /usr/bin. You can check the path is correct through the following command:

$ which dtrace
/usr/sbin/dtrace
$

 

Oracle Linux is not the only operating system that supports DTrace. It actually has its roots in the Oracle Solaris operating system, but it is also available on macOS and Windows. DTrace is also supported on other Linux based operating systems. For example, this blog article outlines how DTrace could be used on Fedora.

Counting Function Calls

In this section we show how DTrace can be used to count function calls. Various D programs are shown, successively refining the functionality.

The Test Program

In the experiments below, a multithreaded version of the multiplication of a matrix with a vector is used. The program is written in C and the algorithm has been parallelized using the Pthreads API. This is a relatively simple test program and makes it easy to verify the call counts are correct.

Below is an example of a job that multiplies a 1000x500 matrix with a vector of length 500 using 4 threads. The output echoes the matrix sizes, the number of threads used, and the time it took to perform the multiplication:

$ ./mxv.par.exe -m 1000 -n 500 -t 4
Rows = 1000 columns = 500 threads = 4 time mxv = 510 (us)
$

 

A First DTrace Program

The D program below lists all functions that are called when executing the target executable. It also shows how often these functions have been executed. Line numbers have been added for ease of reference:

 1 #!/usr/sbin/dtrace -s
 2
 3 #pragma D option quiet
 4
 5 BEGIN {
 6  printf("\n======================================================================\n");
 7  printf("                    Function Call Count Statistics\n");
 8  printf("======================================================================\n");
 9 }
10 pid$target:::entry
11 {
12  @all_calls[probefunc,probemod] = count();
13 }
14 END {
15  printa(@all_calls);
16 }

The first line invokes DTrace and uses the -s option to indicate the D program is to follow. At line 3, a pragma is used to supress some information DTrace prints by default.

The BEGIN probe spans lines 5-9. This probe is executed once at the start of the tracing and is ideally suited to initialize variables and, as in this case, print a banner.

At line 10 we use the pid provider to enable tracing of a user process. The target process is either specified using a particular process id (e.g. pid365), or through the $target macro variable that expands to the process id of the command specified at the command line. The latter form is used here. The pid provider offers the flexibility to trace any command, which in this case is the execution of the matrix-vector multiplication executable.

We use wild cards for the module name and function. The probe name is entry and this means that this probe fires upon entering any function of the target process.

Lines 11 and 13 contain the mandatory curly braces that enclose the actions taken. In this case there is only one action and it is at line 12. Here, the count() aggregation function is used. It returns how often it has been called. Note that this is on a per-probe basis, so this line counts how often each probe fires. The result is stored in an aggregation with the name @all_calls. Since this is an aggregation, the name has to start with the "@" symbol.

The aggregation is indexed through the probefunc and probemod built-in DTrace variables. They expand to the function name that caused the probe to trigger and the module this function is part of. This means that line 12 counts how many times each function of the parent process is executed and the library or exectuable this function is part of.

The END probe spans lines 14-16. Recall this probe is executed upon termination of the tracing. Although aggregations are automatically printed upon termination, we explicitly print the aggregation using the printa function. The function and module name(s), plus the respective counts, are printed.

Below is the output of a run using the matrix-vector program. It is assumed that the D program shown above is stored in a file with the name fcalls.d. Note that root privileges are needed to use DTrace. This is why we use the sudo tool to execute the D program. By default the DTrace output is mixed with the program output. The -o option is used to store the DTrace output in a separate file.

The -c option is used to specifiy the command or executable that needs to be traced. Since we use options on the executable, quotes are needed to delimit the full command.

Since the full output contains 149 lines, only some snippets are shown here:

 

$ sudo ./fcalls.d -c "./mxv.par.exe -m 1000 -n 500 -t 4" -o fcalls.out
$ cat fcalls.out

======================================================================
                    Function Call Count Statistics
======================================================================

_Exit                                           libc.so.6         1
_IO_cleanup                                     libc.so.6         1
_IO_default_finish                              libc.so.6         1
_IO_default_setbuf                              libc.so.6         1
_IO_file_close                                  libc.so.6         1

                     <many more lines deleted>

init_data                                     mxv.par.exe         1
main                                          mxv.par.exe         1

                     <many more lines deleted>

driver_mxv                                    mxv.par.exe         4
getopt                                          libc.so.6         4
madvise                                         libc.so.6         4
mempcpy                              ld-linux-x86-64.so.2         4
mprotect                                        libc.so.6         4
mxv_core                                      mxv.par.exe         4
pthread_create@@GLIBC_2.2.5               libpthread.so.0         4

                     <many more lines deleted>

_int_free                                       libc.so.6      1007
malloc                                          libc.so.6      1009
_int_malloc                                     libc.so.6      1012
cfree                                           libc.so.6      1015
strcmp                               ld-linux-x86-64.so.2      1205
__drand48_iterate                               libc.so.6    500000
drand48                                         libc.so.6    500000
erand48_r                                       libc.so.6    500000
$

 

The output lists every function that is part of the dynamic call tree of this program, the module it is part of, and how many times the function is called. The list is sorted by default with respect to the function call count.

The functions from module mxv.par.exe are part of the user source code. The other functions are from shared libraries. We know that some of these, e.g. drand48(), are called directly by the application, but the majority of these library functions are called indirectly. To make things a little more complicated, a function like malloc() is called directly by the application, but may also be executed by library functions deeper in the call tree. From the above output we cannot make such a distinction.

Note that the DTrace functions stack() and/or ustack() could be used to get callstacks to see the execution path(s) where the calls originate from. In many cases this feature is used to zoom in on a specific part of the execution flow and therefore restricted to a limited set of probes.

A Refined DTrace Program

While the D program shown above is correct, the list with all functions that are called is quite long, even for this simple application. Another drawback is that there are many probes that trigger, slowing down program execution.

In the second version of our D program, we'd like to restrict the list to user functions called from the executable mxv.par.exe. We also want to format the output, print a header and display the function list in alphabetical order. The modified version of the D program is shown below:

 1 #!/usr/sbin/dtrace -s
 2
 3 #pragma D option quiet
 4 #pragma D option aggsortkey=1
 5 #pragma D option aggsortkeypos=0
 6
 7 BEGIN {
 8  printf("\n======================================================================\n");
 9  printf("                    Function Call Count Statistics\n");
10  printf("======================================================================\n");
11 }
12 pid$target:a.out::entry
13 {
14  @call_counts_per_function[probefunc] = count();
15 }
16 END {
17  printf("%-40s %12s\n\n", "Function name", "Count");
18  printa("%-40s %@12lu\n", @call_counts_per_function);
19 }

Two additional pragmas appear at lines 4-5. The pragma at line 4 enables sorting the aggregations by a key and the next one sets the key to the first field, the name of the function that triggered the probe.

The BEGIN probe is unchanged, but the probe spanning lines 12-15 has two important differences compared to the similar probe used in the first version of our D program. At line 12, we use a.out for the name of the module. This is an alias for the module name in the pid probe. It is replaced with the name of the target executable, or command, to be traced. In this way, the D program does not rely on a specific name for the target.

The second change is at line 14, where the use of the probemod built-in variable has been removed because it is no longer needed. By design, only functions from the target executable trigger this probe now.

The END probe has also been modified. At line 17, a statement has been added to print the header. The printa statement at line 18 has been extended with a format string to control the layout. This string is optional, but ideally suitable to print (a selection of) the fields of an aggregation. We know the first field is a string and the result is a 64 bit unsigned integer number, hence the use of the %s and %lu formats. The thing that is different compared to a regular printf format string in C/C++ is the use of the "@" symbol. This is required when printing the result of an aggregation function.

Below is the output using the modified D program. The command to invoke this script is exactly the same as before.

======================================================================
                    Function Call Count Statistics
======================================================================
Function name                                   Count

allocate_data                                       1
check_results                                       1
determine_work_per_thread                           4
driver_mxv                                          4
get_user_options                                    1
get_workload_stats                                  1
init_data                                           1
main                                                1
mxv_core                                            4
my_timer                                            2
print_all_results                                   1

The first thing to note is that with 11 entries, the list is much shorter. By design, the list is alphabetically sorted with respect to the function name. Since we no longer trace every function called, the tracing overhead has also been reduced substantially.

A DTrace Program with Support for Multithreading

With the above D program one can easily see how often our functions are executed. Although our goal of counting user function calls has been achieved, we'd like to go a little further. In particular, to provide statistics on the multithreading characteristics of the target application:

 

  • Print the name of the executable that has been traced, as well as the total number of calls to user defined functions.
  • Print how many function calls each thread executed. This shows whether all threads approximately execute the same number of function calls.
  • Print a function list with the call counts for each thread. This allows us to identify those functions executed sequentially and also provides a detailed comparison to verify load balancing at the level of the individual functions.

 

The D program that implements this additional functionality is shown below.

 1 #!/usr/sbin/dtrace -s
 2
 3 #pragma D option quiet
 4 #pragma D option aggsortkey=1
 5 #pragma D option aggsortkeypos=0
 6
 7 BEGIN {
 8  printf("\n======================================================================\n");
 9  printf("                    Function Call Count Statistics\n");
10  printf("======================================================================\n");
11 }
12 pid$target:a.out:main:return
13 {
14  executable_name = execname;
15 }
16 pid$target:a.out::entry
17 {
18  @total_call_counts                               = count();
19  @call_counts_per_function[probefunc]             = count();
20  @call_counts_per_thr[tid]                        = count();
21  @call_counts_per_function_and_thr[probefunc,tid] = count();
22 }
23 END {
24  printf("\n============================================================\n");
25  printf("Name of the executable     : %s\n" , executable_name);
26  printa("Total function call counts : %@lu\n", @total_call_counts);
27
28  printf("\n============================================================\n");
29  printf("               Aggregated Function Call Counts\n");
30  printf("============================================================\n");
31  printf("%-40s %12s\n\n", "Function name", "Count");
32  printa("%-40s %@12lu\n", @call_counts_per_function);
33
34  printf("\n============================================================\n");
35  printf("               Function Call Counts Per Thread\n");
36  printf("============================================================\n");
37  printf("%6s %12s\n\n", "TID", "Count");
38  printa("%6d %@12lu\n", @call_counts_per_thr);
39
40  printf("\n============================================================\n");
41  printf("               Thread Level Function Call Counts\n");
42  printf("============================================================\n");
43  printf("%-40s %6s %10s\n\n", "Function name", "TID", "Count");
44  printa("%-40s %6d %@10lu\n", @call_counts_per_function_and_thr);
45 }

The first 11 lines are unchanged. Lines 12-15 define an additional probe that looks remarkably similar to the probe we have used so far, but there is an important difference. The wild card for the function name is gone and instead we specify main explicitly. That means this probe only fires upon entry of the main program.

This is exactly what we want here, because this probe is only used to capture the name of the executable. It is available through the built-in variable execname. Another minor difference is that this probe triggers upon the return from this function. This is purely for demonstration purposes, because the same result would be returned if the trigger was on the entry to this function.

One may wonder why we do not capture the name of the executable in the BEGIN probe. After all, it fires at the start of the tracing process and only once. The issue is that at this point in the tracing, execname does not return the name of the executable, but the file name of the D program.

The probe used in the previous version of the D program has been extended to gather more statistics. There are now four aggregations at lines 18-21:

 

  • At line 18 we simply increment the counter each time this probe triggers. In other words, aggregation @total_call_counts contains the total number of function calls.
  • The statement at line 19 is identical to what was used in the previous version of this probe.
  • At line 20, the tid built-in variable is used as the key into an aggregation called @call_counts_per_thr. This variable contains the integer id of the thread triggering the probe. The count() aggregation function is used as the value. Therefore this statement counts how many function calls a specific thread has executed.
  • Another aggregation called @call_counts_per_function_and_thr is used at line 21. Here we use both the probefunc and tid built-in variables as a key. Again the count() aggregation function is used as the value. In this way we break down the number of calls from the function(s) triggering this probe by the thread id.

 

The END probe is more extensive than before and spans lines 23-45. There are no new features or constructs though. The aggregations are printed in a similar way and the "@" symbol is used in the format string to print the results of the aggregations.

The results of this D program are shown below.
======================================================================
                    Function Call Count Statistics
======================================================================

============================================================
Name of the executable     : mxv.par.exe
Total function call counts : 21

============================================================
               Aggregated Function Call Counts
============================================================
Function name                                   Count

allocate_data                                       1
check_results                                       1
determine_work_per_thread                           4
driver_mxv                                          4
get_user_options                                    1
get_workload_stats                                  1
init_data                                           1
main                                                1
mxv_core                                            4
my_timer                                            2
print_all_results                                   1

============================================================
               Function Call Counts Per Thread
============================================================
   TID        Count

 20679           13
 20680            2
 20681            2
 20682            2
 20683            2

============================================================
               Thread Level Function Call Counts
============================================================
Function name                               TID      Count

allocate_data                             20679          1
check_results                             20679          1
determine_work_per_thread                 20679          4
driver_mxv                                20680          1
driver_mxv                                20681          1
driver_mxv                                20682          1
driver_mxv                                20683          1
get_user_options                          20679          1
get_workload_stats                        20679          1
init_data                                 20679          1
main                                      20679          1
mxv_core                                  20680          1
mxv_core                                  20681          1
mxv_core                                  20682          1
mxv_core                                  20683          1
my_timer                                  20679          2
print_all_results                         20679          1

Right below the header, the name of the executable (mxv.par.exe) and the total number of function calls (21) are printed. This is followed by the same table we saw before.

The second table is titled "Function Call Counts Per Thread". The data confirms that 5 threads have been active. There is one master thread and it creates the other four threads. The thread ids are in the range 20679-20683. Note that these numbers are not fixed. A subsequent run most likely shows different numbers. What is presumably the main thread executes 13 function calls. The other four threads execute two function calls each.

These numbers don't tell us much about what is really going on under the hood and this is why we generate a third table titled "Thread Level Function Call Counts". The data is sorted with respect to the function names.

What we see in this table is that the main thread executes all functions, other than driver_mxv and mxv_core. These two functions are executed by the four threads that have been created. We also see that function determine_work_per_thread is called four times by the main thread. This function is used to compute the amount of work to be executed by each thread. In a more scalable design, this should be handled by the individual threads. Function my_timer is executed twice by the main thread. That is because this function is called at the start and end of the matrix-vector multiplication.

While this table shows the respective thread ids, it is not immediately clear which function(s) each thread executes. It is not difficult to create a table that shows the sorted thread ids in the first column and the function names, as well as the respective counts, next to the ids. This is left as an exercise to the reader.

There is one more thing we would like to mention. While the focus has been on the user written functions, there is no reason why other functions cannot be included. For example, we know this program uses the Pthreads library libpthreads.so. In case functions from this library should be counted as well, a one line addition to the main probe is sufficient:

 1 pid$target:a.out::entry,
 2 pid$target:libpthread.so:pthread_*:entry
 3 {
 4  @total_call_counts                               = count();
 5  @call_counts_per_function[probefunc]             = count();
 6  @call_counts_per_thr[tid]                        = count();
 7  @call_counts_per_function_and_thr[probefunc,tid] = count();
 8 }

The differences are in lines 1-2. Since we want to use the same actions for both probes, we simply place them back to back, separated by a comma. The second probe specifies the module (libpthread.so), but instead of tracing all functions from this library, for demonstration purposes we use a wild card to only select function names starting with pthread_.

Additional Reading Material

The above examples, plus the high level coverage of the DTrace concepts and terminology, are hopefully sufficient to get started. More details are beyond the scope of this article, but luckily, DTrace is very well documented.

For example, the Oracle Linux DTrace Guide, covers DTrace in detail and includes many short code fragments. In case more information is needed, there are many other references and examples. Regarding the latter, the Oracle DTrace Tutorial contains a variety of example programs.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.