Wednesday Jun 02, 2010

netcat and network stack debugging

One of the options of the netcat program (/usr/bin/nc) available in OpenSolaris (if you don't have it installed simply run pkg install SUNWnetcat) is the -D (aka debugging) option. Only recently I realized that not everyone out there knows how to capture the debugging data once this option is set since this is pretty Solaris specific. What this option does is basically trigger a flag inside the ip kernel module specific for given connection structure (conn_t). netcat does this by calling setsockopt() system call with SO_DEBUG option set for given socket. As a result conn_debug flag is set inside conn_t structure associated with the socket. This flag is then consulted in various functions manipulating the structure.

When there is an interesting event and the conn_debug is set the function calls strlog() kernel function to record the data. For example, here's a snippet of usr/src/uts/common/inet/tcp/tcp_input.c:tcp_input_listener():

   1366 	if (listener->tcp_conn_req_cnt_q >= listener->tcp_conn_req_max) {
   1367 		mutex_exit(&listener->tcp_eager_lock);
   1368 		TCP_STAT(tcps, tcp_listendrop);
   1369 		TCPS_BUMP_MIB(tcps, tcpListenDrop);
   1370 		if (lconnp->conn_debug) {
   1371 			(void) strlog(TCP_MOD_ID, 0, 1, SL_TRACE|SL_ERROR,
   1372 			    "tcp_input_listener: listen backlog (max=%d) "
   1373 			    "overflow (%d pending) on %s",
   1374 			    listener->tcp_conn_req_max,
   1375 			    listener->tcp_conn_req_cnt_q,
   1376 			    tcp_display(listener, NULL, DISP_PORT_ONLY));
   1377 		}
   1378 		goto error2;
   1379 	}

To capture the data logged via strlog it's necessary to know the STREAMS module ID, which in our case is TCP_MOD_ID which is defined in usr/src/uts/common/inet/tcp_impl.h as 5105.

To read the data one can use either strace(1M) command line tool or strerr(1M) daemon which both produce text logs. To read everything, one can use this command (needs read access to /dev/log so has to run under root):

# strace 5105 all all

Here are two examples using netcat: at first we try to bind to a port which we don't have privileges for:

$ nc -D -l -4 -p 23
nc: Permission denied

which produces the following entry from strace:

000004 12:11:04 19c7ff6a  1 ..E 5105 0 ip: [ID 234745 kern.debug] tcp_bind: no priv for port 23

next we try to bind to already occupied port:

$ nc -D -l -4 -p 4444
nc: Address already in use

which produces the following entry from strace:

000005 12:15:33 19c86878  1 ..E 5105 0 ip: [ID 326978 kern.debug] tcp_bind: requested addr busy

This is of course traceable via normal tools such as ppriv(1) or truss(1) but the point is that much fine grained details can be captured from the network modules. The format of the log entries is explained in the strace(1M) man page.

Thursday Jan 17, 2008

Adding dtrace SDT probes

It seems that many developers and dtrace users found themselves in a position where they wanted to add some SDT probes to a module to get more insight into what's going on but the had to pause and were thinking "okay, more probes. But where to put them ? Do I really need the additional probes when I already have the fbt ones ?". To do this, systematic approach is needed in order not to over-do or under-do. I will use KSSL (Solaris kernel SSL proxy [1]) for illustration.

With CR 6556447, tens of SDT probes were introduced into KSSL module and other modules which interface with it. Also, in addition to the new SDT probes, in KSSL we got rid of the KSSL_DEBUG macros compiled only in DEBUG kernels and substituted them with SDT probes. As a result, much better observability and error detection was achieved with both debug and non-debug kernels. The other option would be to create KSSL dtrace provider but that would be too big gun for what is needed to achieve.

Generically, the following interesting data points for data gathering/observation can be identified in code:

  • data paths
    When there is a more than one path how data could flow into a subsystem. E.g. for TCP we have couple of cases how SSL data could reach KSSL input queue. To identify where from exactly was tcp_kssl_input() called we use SDT probes:
    	if (tcp->tcp_listener || tcp->tcp_hard_binding) {
    		if (tcp->tcp_kssl_pending) {
    			    mblk_t \*, mp);
    			tcp_kssl_input(tcp, mp);
    		} else {
    			tcp_rcv_enqueue(tcp, mp, seg_len);
    	} else {
    		/\* Does this need SSL processing first? \*/
    			if ((tcp->tcp_kssl_ctx != NULL) &&
    			    (DB_TYPE(mp) == M_DATA)) {
    				    mblk_t \*, mp);
    				tcp_kssl_input(tcp, mp);
    			} else {
    				putnext(tcp->tcp_rq, mp);
    				if (!canputnext(tcp->tcp_rq))
    					tcp->tcp_rwnd -= seg_len;
  • data processed in while/for cycles
    To observe what happens in each iteration of the cycle. Can be used in code like this:
    while (mp != NULL) {
      DTRACE_PROBE1(kssl_mblk__handle_record_cycle, mblk_t \*, mp);
      /\* process the data \*/
      mp = mp->b_cont;
  • switch statements
    If significant/non-trivial processing happens inside switch it may be useful to add SDT probes there too. E.g.:
      content_type = (SSL3ContentType)mp->b_rptr[0];
      switch(content_type) {
        /\* select processing according to type \*/
        case content_alert:
           DTRACE_PROBE1(kssl_mblk__content_alert, mblk_t \*, mp);
        case content_change_cipher_spec:
           DTRACE_PROBE1(kssl_mblk__change_cipher_spec, mblk_t \*, mp);
           DTRACE_PROBE1(kssl_mblk__unexpected_msg, mblk_t \*, mp);
  • labels which cannot be (easily) identified in other way
    Useful if code which follows the label is generic (assignments, no function calls), e.g.:
                                     \* Give this session a chance to fall back to
                                     \* userland SSL
                                    if (ctxmp == NULL)
                                            goto no_can_do;
                                    DTRACE_PROBE1(kssl_no_can_do, tcp_t \*, tcp);
                                    listener = tcp->tcp_listener;
                                    ind_mp = tcp->tcp_conn.tcp_eager_conn_ind;
                                    ASSERT(ind_mp != NULL);

You've surely noticed that same of the probe definitions above have common prefix (kssl_mblk-). This is one of the things which make SDT probes so attractive. With prefixes it is possible to do the e.g. following:

  printf("hit error in %s\\n", probefunc);
  stack(); ustack();

The important part is that we do not specify module of function name. The implicit wildcard (funcname/probename left out) combined with explicit wildcard (prefix + asterisk) will lead to all KSSL error probes to be activated regardless of in which module or function there are defined. This is obviously very useful for technologies which span multiple Solaris subsystems or modules (such as KSSL).

The nice thing about the error probes is that they could be leveraged in test suites. For each test case we can first run dtrace script with the above probeset covering all KSSL errors in the background and after the test completes just check if it produced some data. If it did, then the test case can be considered as failed. No need to check kstat(1M) (and other counters), log files, etc.

Also, thanks to the way how dtrace probes are activated we can have both generic probeset (using this for lack of better term) as above with addition of probe specific action, e.g.:

/\* probeset of all KSSL error probes \*/
  printf("hit error in %s\\n", probefunc);

  the probe definition is:
         uint16_t, rec_sz, int, spec->cipher_bsize);
  tracemem(arg0, 32);
  printf("rec_sz = %d , cipher_bsize = %d\\n", arg1, arg2);

If probe kssl_err-bad_record_size gets activated the generic probe will be activated (and fires) too because the probeset contains the probe.

Similarly to the error prefix, we can have data specific prefix. For KSSL it is kssl_mblk- prefix which could be used for tracing all mblks (msgb(9S)) as they flow through TCP/IP, STREAMS and KSSL modules. With such probes it is possible to do e.g. the following:

/\* how many bytes from a mblk to dump \*/
#define DUMP_SIZE       48

/\* use macros from  \*/
#define MBLKL(mp)       ((mp)->b_wptr - (mp)->b_rptr)
#define DB_FLAGS(mp)    ((mp)->b_datap->db_flags)

#define PRINT_MBLK_INFO(mp)                                             \\
        printf("mblk = 0x%p\\n", mp);                                    \\
        printf("mblk size = %d\\n", MBLKL((mblk_t \*)mp));                \\

#define PRINT_MBLK(mp)                                                  \\
                trace(timestamp);                                       \\
                printf("\\n");                                           \\
                PRINT_MBLK_INFO(mp);                                    \\
                printf("DB_FLAGS = 0x%x", DB_FLAGS((mblk_t \*)mp));      \\
                tracemem(((mblk_t \*)mp)->b_rptr, DUMP_SIZE);            \\
                tracemem(((mblk_t \*)mp)->b_wptr - DUMP_SIZE,            \\


This is actually an excerpt from my (currently internal) KSSL debugging suite.
An example of output from such probe can be seen in my Coloring dtrace output post.

For more complex projects it would be waste to stop here. Prefixes could be further structured. However, this has some drawbacks. In particular, I was thinking about having kssl_mblk- and kssl_err- prefixes. Now what to do for places where an error condition occurred _and_ we would like to see the associated mblk ? Using something like kssl_mblk_err-\* comes to ones mind. However, there is a problem with that - what about the singleton cases (only mblk, only err). Sure, using multiple wildcards in dtrace is possible (e.g. syscall::\*read\*:) but this will make it ugly and complicated given the number of mblk+err cases (it's probably safe to assume that the number of such cases will be low). Simply, it's not worth the hassle. Rather, I went with 2 probes.
To conclude, using structured prefixes is highly beneficial only for set of probes where categories/sub-prefixes create non-intersecting sets (e.g. data type and debug level).

Of course, all of the above is not valid only for kernel but also for custom userland probes !

[1] High-level description of KSSL can be found in blueprint 819-5782.


blog about security and various tools in Solaris


« July 2016