Sunday Jun 20, 2010

DTrace TCP, UDP providers

We recently integrated DTrace TCP and UDP providers into Solaris/OpenSolaris - they will appear in build 142. I should mention at the outset that Brendan did the lion's share of this work - I just had to merge his changes with the latest Solaris Nevada builds and do the paperwork mostly.

The first thing to note is that there is a great advantage in tracing network events above IP in the networking stack - at that point both the inbound and outbound data has been mapped to a connection, which in turn is tied to a processs, a zone etc. This allows us to represent that information (along with an identifier that uniquely represents that connection for its lifetime) at the various probe points via a csinfo_t (Connection State INFO Type) probe argument:

typedef struct csinfo {
        uintptr_t cs_addr;
	uint64_t cs_cid;     /\* connection ID \*/
	pid_t cs_pid;        /\* associated PID \*/
	zoneid_t cs_zoneid;  /\* associated zoneid \*/
} csinfo_t;

There is great utility in being able to tie TCP and UDP events to a particular process, as illustrated by the tcptop.d script in /usr/demo/dtrace, which is essentially Brendan's DTraceToolkit tcptop using the stable tcp provider. Here's some sample output - we see the top processes as measured by TCP payload bytes (when 0s appear, they signify connections sending control segments like ACKs which have no payload).

# dtrace -s /usr/demo/dtrace/tcptop.d
Sampling... Please wait.
2010 Jun  4 07:56:56,  load: 0.36,  TCPin:      0 Kb,  TCPout:      0 Kb

  ZONE    PID LADDR           LPORT RADDR           RPORT      SIZE
     0 102238 127.0.0.1       49668 127.0.0.1        9010         0
     0 102238 ::1             50795 ::1              9010         0

2010 Jun  4 07:57:01,  load: 0.35,  TCPin:      0 Kb,  TCPout:      0 Kb

  ZONE    PID LADDR           LPORT RADDR           RPORT      SIZE
     0 102238 127.0.0.1       41433 127.0.0.1        9010         0
     0 102238 ::1             36782 ::1              9010         0
     0 132388 127.0.0.1       53480 127.0.0.1        9010         0
     0 132388 127.0.0.1       62234 127.0.0.1        9010         0
     0 132388 ::1             50266 ::1              9010         0
     0 132388 ::1             63145 ::1              9010         0

2010 Jun  4 07:57:06,  load: 0.33,  TCPin:      0 Kb,  TCPout:      0 Kb

  ZONE    PID LADDR           LPORT RADDR           RPORT      SIZE
     0 132388 127.0.0.1       47516 127.0.0.1        9010         0
     0 132388 127.0.0.1       48726 127.0.0.1        9010         0
     0 132388 ::1             57467 ::1              9010         0
     0 132388 ::1             62543 ::1              9010         0

2010 Jun  4 07:57:11,  load: 0.32,  TCPin:      0 Kb,  TCPout:      0 Kb

  ZONE    PID LADDR           LPORT RADDR           RPORT      SIZE
     0 132388 129.150.120.230 56142 141.146.118.10    993       247

\^C

These events are collected via the tcp:::send and tcp:::receive probe firing, and since we are interested in valid pids, we exclude TCP segments which have no associated process (such as RST|ACK segments generated in response to inbound connection requests for ports where no TCP service is listening). Here we collect the details in the @out aggregation:

tcp:::send
/ args[1]->cs_pid != -1 /
{
 	@out[args[1]->cs_zoneid, args[1]->cs_pid, args[2]->ip_saddr,
            args[4]->tcp_sport, args[2]->ip_daddr, args[4]->tcp_dport] =
            sum(args[2]->ip_plength - args[4]->tcp_offset);
}

As a quick way to see which processes/zones are busy, this is pretty useful. To drill down to individual segments, tcpsnoop can be used:

# dtrace -s /usr/demo/dtrace/tcpsnoop.d
  TIME          PID           LADDR:PORT                 RADDR:PORT   BYTES FLAGS
735575877102 102238       127.0.0.1:45518  ->        127.0.0.1:9010       0 (SYN)
735575877139     -1       127.0.0.1:9010   <-        127.0.0.1:45518      0 (SYN)
735575877168     -1       127.0.0.1:9010   ->        127.0.0.1:45518      0 (RST|ACK)
735575877230 102238       127.0.0.1:45518  <-        127.0.0.1:9010       0 (RST|ACK)
735581534280 132408 129.150.120.230:38737  ->   216.129.112.94:80       712 (PUSH|ACK)
735581725757 132408 129.150.120.230:38737  <-   216.129.112.94:80         0 (ACK)
735581735388 132408 129.150.120.230:38737  <-   216.129.112.94:80       356 (PUSH|ACK)
735581850134 132408 129.150.120.230:38737  ->   216.129.112.94:80         0 (ACK)
\^C

As mentioned above, note how RST|ACKs sent specify a PID of -1 (NOPID) - this indicates that they are generated by the kernel protocol stack and have no associated process. Notice also that they are sent and received locally, so they appear twice, as does all localhost traffic.

Similar udptop.d and udpsnoop.d demo scripts are also available in /usr/demo/dtrace.

From the above, I learned a general lesson about DTrace providers - while it is clearly of great benefit to add a stable provider to allow probing into a subsystem, it is of even greater benefit if that provider can tie the events it probes to other system abstractions (in this case, we link network protocol processing to user processes and system zones).

The other thing to emphasize is that the above DTrace probes are not intended as replacements for snoop or wireshark - those tools do packet sniffing on the wire (often in promiscuous mode), whereas the above scripts monitor traffic that reaches IP and is fanned out to TCP or UDP connections on the inbound path, or has not yet reached IP (on the outbound path). So, in other words, a snoop trace could look quite different. The primary goal of this work (at least to me) is not to replace those packet-sniffing tools, but rather to augment the kinds of analysis we can do - in particular we can view TCP and UDP processing events in a wider system context (Which process did this traffic originate from? Which zone is receiving the most TCP traffic? Which ports are busy?).

In the next entry, I'm going to talk about how we can carry out TCP latency measurements using the tcp provider.

About

user12820842

Search

Archives
« June 2010 »
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
21
24
26
27
28
29
30
   
       
Today