Diagnose networking problems on Solaris

Diagnose networking problems on Solaris

One of the basic but most useful tools for diagnosing networking problems on the host side is netstat(1M). This tool may be well known among the networking types. But I often found from talking to even savvy Solaris/UNIX developers how little they know about netstat to use it effectively. Therefore I will attempt to shed some light here.

netstat -s output is often the first thing one should examine when encountering a networking problem, especially if the problem is performance related. E.g., have there been any checksum errors, (believe it or not, checksum problem has been among the most common causes for performance problems in a development environment with new hardware/software), or a large amount of TCP retransmissions, duplicate segments, out-of-order packets, TCP resets, connection abort due to timeout..., etc?

Understanding the meaning of these counters can be a challenge. Some are described in the various MIB standards such as RFC4022, which defines TCP MIBs, and are also documented in the mibiisa(1M) man page. E.g., tcpInErrs records how many TCP packets fail the TCP checksum test and are discarded.

Others are Sun's extensions, and can be more obscure except for those of us working on the Solaris networking stack. Fortunately this need not be the case anymore thanks to OpenSolaris. E.g., to find out what tcpTimRetransDrop means, one need not go any further than the on-line source code browser. Type in the symbol, and it will take you directly to here where tcpTimRetransDrop is incremented. From the code piece below, it is pretty clear now this connection has exceeded its abort timer threshold, and is therefore getting dropped.

        if ((ms = tcp->tcp_ms_we_have_waited) > second_threshold) {
                /\*
                 \* For zero window probe, we need to send indefinitely,
                 \* unless we have not heard from the other side for some
                 \* time...
                 \*/
                if ((tcp->tcp_zero_win_probe == 0) ||
                    (TICK_TO_MSEC(lbolt - tcp->tcp_last_recv_time) >
                    second_threshold)) {
                        BUMP_MIB(&tcp_mib, tcpTimRetransDrop);
                        /\*
                         \* If TCP is in SYN_RCVD state, send back a
                         \* RST|ACK as BSD does.  Note that tcp_zero_win_probe
                         \* should be zero in TCPS_SYN_RCVD state.
                         \*/
                        if (tcp->tcp_state == TCPS_SYN_RCVD) {
                                tcp_xmit_ctl("tcp_timer: RST sent on timeout
"
                                    "in SYN_RCVD",
                                    tcp, tcp->tcp_snxt,
                                    tcp->tcp_rnxt, TH_RST | TH_ACK);
                        }
                        (void) tcp_clean_death(tcp,
                            tcp->tcp_client_errno ?
                            tcp->tcp_client_errno : ETIMEDOUT, 25);
                        return;
                } else {

Solaris 10 also added extensive kstat(1M) support in the protocol stack to improve observability. Do a "kstat tcp" or "kstat ip", and you'll find many useful statistics. These kstats provide much finer-grain details about the operation of the protocol stack, and are often used by us Solaris kernel engineers for debugging purpose. For example, we relied heavily on the various cksum related kstats to debug TCP/UDP/IPv4-header checksum offload bugs, including both the software and the hardware bugs.

Again you'll need to go to the source code to best understand the precise meaning of many kstats. Kstat also covers udp, sctp, icmp, and ipsecah.

Both netstat(1M) and kstat(1M) show the number of interesting events that have occurred so far. How does one capture an event right at the time when it happens? DTrace in Solaris 10 contains a MIB provider that can be used exactly for this purpose. E.g., to debug a mysterious connection reset problem that has plagued your client-server application, fire off the following DTrace command:

dtrace -n mib:::tcpEstabResets

The probe will fire right at the time when the MIB event tcpEstabResets occurs.

You may want to glean more information for the event in question. E.g., what are the TCP port numbers, connection/socket states of the connection that got reset?

Unfortunately the MIB provider as currently is in Solaris 10 does not provide useful arguments to answer more detailed questions like the above. Before we, or perhaps some of you can help to fix this shortcoming, you can use the DTrace FBT provider once you locate the right function to trace.

If you run

dtrace -n mib:::tcpEstabResets{stack(10);}

you'll see tcp_clean_death on the top of the stack. Go to the source code, you'll see that arg0 is of type tcp_t, which is the equivalent of the TCP Control Block in Solaris, from which you have access to all the connection information you'll ever need. You can even traverse some data structure up to the socket layer if you understand the code. You'll also notice you need to check err == ECONNRESET to focus only on RST errors.

Further down the stack from the above you'll see a function tcp_rput_data. That is a more general kernel function that most of the inbound TCP packets will have to pass through. Go to the source code you'll see arg0 is of type conn_s, from where you can get an access to its TCP control block through conn_tcp. Arg1 is the mblk containing the packet, with b_rptr points to the beginning of the IP header.

Now with DTrace on fbt:tcp:tcp_rput_data:entry you've got a very powerful diagnosis tool that has access to both packets from the wire, and the internal TCP connection states. This is much more powerful than traditional network tracing tools like snoop(1M) or tcpdump where it has no access to internal TCP connection states maintained by the host stack. I have used this probe point to diagnose some very difficult performance problems in the past.

A word of caution - functions like tcp_clean_death or tcp_rput_data are really implementation code, hence can change across Solaris releases without any warning. DScripts relying on private kernel functions like these may have to evovle with the kernel from release to release.

The following is a simple DScript that monitors acks from the remote end in a TCP throughput test to detect the transmit side stalls due to a completely filled TCP pipeline.

#!/usr/sbin/dtrace -qs

/\* Note: the script assumes big endian and no window scaler \*/

fbt:tcp:tcp_rput_data:entry
/((conn_t \*)arg0)->conn_tcp->tcp_unsent != 0 &&
\*(int \*)&(((tcph_t \*)(args[1]->b_rptr + 20))->th_ack[0]) +
\*(ushort \*)&(((tcph_t \*)(args[1]->b_rptr + 20))->th_win[0])
- ((conn_t \*)arg0)->conn_tcp->tcp_snxt <= 0/
{

printf("pipeline full or window closed %d\\n",
	((conn_t \*)arg0)->conn_tcp->tcp_unsent);
}
Technorati Tag:
Technorati Tag:
Technorati Tag:
Comments:

Very cool! Thanks for the awesome writeup!

Posted by Matty on June 16, 2005 at 02:50 AM PDT #

Very good case! I wonder if you have some experience for solving performance problem about devices? I encounter a difficult problem, that we use some device which implementing lower layer of ATM and transmiting msgs. It can not scale when the load is increased, I didn't know how to inspect the problem.

Posted by Quentin on September 22, 2006 at 06:07 AM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

hkchu

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today