In this post from Alan Maguire, he shows how to use DTrace to inspect common network-related system calls. DTrace is a powerful tracing tool for understanding user and kernel processes.

We’ve been steadily adding DTrace support for networking in Oracle Linux. Let’s see how we can use these probes to better understand network behaviour. Let’s do a quick tour to see what sort of support is available for dynamic tracing the network stack.

Before diving into the details though, it’s worth asking why use DTrace at all? Don’t we have enough tools in the networking space? The reasons I use it are:

  • DTrace is open-ended while being production-safe. So in other words if I want to find something out, there’s usually a way of doing it, and most importantly I can do that without panic’ing a system. The hard part of course is forming those questions into queries that DTrace understands. I hope the next few blog posts will provide some useful hints on how to do so.
  • DTrace allows me to reach across multiple subsystems, e.g. how effectively are interrupts being coalesced when receiving packets? Which locks are hot on the network receive path? In cloud environments especially, it’s critical to be able to connect the dots from network behaviour to other subsystems and abstractions. People love to blame “the network”. DTrace allows us to see where the costs are really incurred.

Socket layer: syscall provider – available since UEK2

As on other operating systems, the syscall provider is avaiable and we can trace network-related system calls – sendto, sendmsg, sendmmsg, recvfrom, recvmsg, recvmmsg.

Here’s a simple example – trace bytes sent on a per-executable basis. Hit Ctrl+C when you’re done collecting info.

 # dtrace -n 'syscall::sendto:entry { @c[execname] = sum(arg2); }'
dtrace: description 'syscall::sendto:entry ' matched 1 probe
^C
 
  DNS Resolver #6                                                  65
  ping                                                            192
  Socket Thread                                                  2006
  vpnagentd                                                      3872


This is what we call a DTrace one-liner. Let’s examine it piece-by-piece.

  • The -n option says we are directly specifying our probes rather than using a DTrace script (if we wanted to do that, we would use “dtrace -s <scriptname>”).
  • The probe is a 4-element description, with each element separated by a “:”. The first element is the provider – the DTrace subsystem which we are using. In this case it is “syscall”. The second is the module, which is relevant for other DTrace providers but not here (for Linux it is always “vmlinux”, so we can leave it blank). The third is the function – in the case of the syscall provider, that’s the system call name. And finally we have the probe name. For each system call there are two probes, entry and return, corresponding to system call request/completion.
  • Next is the action,contained in the curly brackets. DTrace code takes the form “probe /predicate/ { action }”. In this case we have no predicate, but if we did it would determine if the body was executed or not. Next let’s describe the action.
  • Here we are using an aggregation variable “c” – the aggregation is signified by the “@” symbol. Aggregations can specify multiple keys in between the square brackets, and the associated values for the various sets of keys are computed using one of the aggregating functions sum(), count(), avg() etc.  Aggregations are collected efficientl in-kernel. Here we are sum()ing arg2 – the third argument to sendto(), which if you look at the manual page you will see is the size.

 

So to summarize, our one-liner says “show me the sum total of bytes sent via sendto() system calls, and collect the sums for each program name” (“execname” is a built in variable that references the current process executable name).

Now of course these are requests, and some may fail.  Again note that the probe we used was syscall::sendto:entry.  There is also a corresponding syscall::sendto:return. A common pattern in DTrace is to measure how long it takes from :entry to :return.

 

Layer 4 (Transport layer): TCP, UDP providers – available since UEK4QU5

In Linux – as in Solaris – we have transport-level providers for TCP and UDP. These trace transmit/receive events, and for TCP we also trace TCP state machine transitions and connection establishment milestones.

 # dtrace -l -P udp
   ID   PROVIDER            MODULE                          FUNCTION NAME
 1836        udp           vmlinux                      udp_send_skb send
 1838        udp           vmlinux                       udp_recvmsg receive
 1840        udp           vmlinux                 udp_queue_rcv_skb receive
 1868        udp           vmlinux                   udp_v6_send_skb send
 1869        udp           vmlinux                     udpv6_recvmsg receive
 1871        udp           vmlinux               udpv6_queue_rcv_skb receive
 
# dtrace -l -P tcp
   ID   PROVIDER            MODULE                          FUNCTION NAME
 1818        tcp           vmlinux                     tcp_set_state state-change
 1819        tcp           vmlinux                  tcp_conn_request state-change
 1820        tcp           vmlinux                tcp_finish_connect connect-established
 1821        tcp           vmlinux             tcp_rcv_state_process accept-established
 1822        tcp           vmlinux             tcp_rcv_state_process connect-refused
 1823        tcp           vmlinux                  tcp_transmit_skb send
 1824        tcp           vmlinux                  tcp_transmit_skb connect-request
 1825        tcp           vmlinux                       tcp_connect state-change
 1826        tcp           vmlinux                   tcp_v4_send_ack send
 1827        tcp           vmlinux                 tcp_v4_send_reset send
 1828        tcp           vmlinux                 tcp_v4_send_reset accept-refused
 1829        tcp           vmlinux                tcp_v4_send_synack send
 1830        tcp           vmlinux                        tcp_v4_rcv receive
 1831        tcp           vmlinux                     tcp_time_wait state-change
 1882        tcp           vmlinux              tcp_v6_send_response send
 1883        tcp           vmlinux              tcp_v6_send_response accept-refused
 1884        tcp           vmlinux                tcp_v6_send_synack send
 1885        tcp           vmlinux                        tcp_v6_rcv receive

 

Here’s a quick example – what systems (remote IP) are we refusing connections to, and which ports are they trying to connect to?

 # dtrace -n 'tcp:::accept-refused { @c[args[2]->ip_daddr, args[4]->tcp_sport] = count(); }'
dtrace: description 'tcp:::accept-refused ' matched 2 probes
^C
 
  127.0.0.1                                              7                1


I telnet’ed port 7 locally in another window to generate this. Port 7 was closed so we see 1 connection being refused. The TCP, UDP and IP providers are all static-defined tracing (SDT) providers. The idea of these is that rather than trying to write probes that sit on ever-changing function entry or return to gather information from data structures that are also changing, we define a set of events of interest, and each place in the code that these events occur we place a static probe. So for example we can see that the tcp:::send probe is in tcp_v4_send_reset, tcp_v4_send_ack, etc. Importantly, these probes also define stable translators which take the OS-specific structures and translate them into stable arguments that DTrace consumers can rely on.  To see the relevant deinifitions for Oracle Linux, you can look in /usr/lib64/dtrace/4.1/tcp.d, udp.d, ip.d.

 

For the most part we have preserved compatibility with Solaris. If there’s something missing you really need, let us know! Here’s how the various arguments look. We use args[0] instead of arg0 to let DTrace know we’re trying to access the stable, translated argument. The raw arguments are still available via arg0, arg1 etc.

args[0] - pktinfo_t * [packet information]
args[1] - csinfo_t * [ connection information]
args[2] - ipinfo_t * [IP protocol information]
args[3] - tcpsinfo_t * for TCP, udpsinfo_t * for UDP [TCP/UDP state information]
args[4] - tcpinfo_t * for TCP, udpinfo_t * for UDP [TCP/UDP header information]


Layer 3 (IP provider) – available since UEK4QU4

The IP provider traces send, receive, and inbound/outbound drops.

 # dtrace -l -P ip
   ID   PROVIDER            MODULE                          FUNCTION NAME
 1839         ip           vmlinux                     ip_rcv_finish drop-in
 1840         ip           vmlinux                  ip_local_deliver receive
 1841         ip           vmlinux                            ip_rcv drop-in
 1842         ip           vmlinux                  __ip_append_data drop-out
 1843         ip           vmlinux                 __ip_local_out_sk send
 1844         ip           vmlinux                    ip_append_page drop-out
 1845         ip           vmlinux                       ip_send_skb drop-out
 1861         ip           vmlinux                       raw_sendmsg send
 1878         ip           vmlinux        __ip6_flush_pending_frames drop-out
 1879         ip           vmlinux                          ip6_xmit send
 1880         ip           vmlinux                          ip6_xmit drop-out
 1881         ip           vmlinux                ip6_finish_output2 drop-out
 1882         ip           vmlinux                 __ip6_append_data drop-out
 1883         ip           vmlinux                       ip6_forward drop-out
 1884         ip           vmlinux                        ip6_output drop-out
 1885         ip           vmlinux                      ip6_send_skb drop-out
 1886         ip           vmlinux                  ip6_input_finish drop-in
 1887         ip           vmlinux                          ipv6_rcv drop-in
 1888         ip           vmlinux                         ip6_input receive
 1895         ip           vmlinux                    ndisc_send_skb send
 1900         ip           vmlinux                     rawv6_sendmsg send
 1903         ip           vmlinux                      mld_sendpack send
 1904         ip           vmlinux                      mld_sendpack drop-out
 1905         ip           vmlinux                        igmp6_send send
 1906         ip           vmlinux                        igmp6_send drop-out
 1923         ip           vmlinux                __ip6_local_out_sk send


For IP the arguments are

args[0] - pktinfo_t * [packet information]
args[1] - csinfo_t * [connection information]
args[2] - ipinfo_t * [IP protocol information]
args[3] - ifinfo_t * [IP interface info]
args[4] - ipv4info_t * [IPv6 header info if IPv6 packet]
args[5] - ipv6info_t * [IPv6 header info if IPv6 packet]

 

Other

DTrace on Oracle Linux includes a perf provider, which traces perf events. And of course function boundary tracing (fbt) is available too. Here’s an example of using function boundary tracing to get counts of socket buffer (struct sk_buff) allocation sizes. sk_buff is the key structure for network data, and here we’re examining the counts in each power-of-two bucket 0-1, 1-2, 2-4, etc – this is the quantize() aggregating action. We see for example that the majority were in the 2048-4096 range.

# dtrace -n 'fbt::__alloc_skb:entry { @size = quantize(arg0); }'
dtrace: description 'fbt::__alloc_skb:entry ' matched 1 probe
^C
           value  ------------- Distribution ------------- count    
              -1 |                                         0        
               0 |                                         80       
               1 |                                         3        
               2 |                                         0        
               4 |                                         137      
               8 |                                         988      
              16 |@@                                       5719     
              32 |@@@@@                                    10977    
              64 |                                         779      
             128 |@@@                                      6805     
             256 |@@@@@@@@@@@                              25462    
             512 |                                         508      
            1024 |@@                                       5108     
            2048 |@@@@@@@@@@@@@@@@                         39011    
            4096 |                                         8        
            8192 |                                         0        
           16384 |                                         16       
           32768 |                                         0        



We’ll describe some more examples using fbt and perf providers in a future blog post!