Friday Apr 24, 2015

Deferred dump in Solaris

Kernels of even most robust operating systems do crash from time to time and getting crash dump image of kernel and process memory is usually indispensable for root causing the problem. With growing memory sizes (the M6-32 SPARC server can have up to 32TB of RAM) the amount of data which needs to be written to disk in order to store a crash dump can be pretty big - for larger systems uncompressed crash dumps with tens of gigabytes in size are not exceptional. Even with crash dump restructured into pieces (see my last entry about crash dump restructuring for more info) the kernel portion can be still significant. In Solaris, a ZFS zvol is dedicated as a dump device and during panic processing the crash dump is stored in compressed form to the dump device and extracted from there by savecore(1M) into crash directory once the system comes up again. This is all managed by dumpadm(1M). The problems with dump device in the context of large systems are mutlifold - the dump device can be too small to hold the dump (usually it resides on the root pool which is designed to store just the operating system), it can reside on iSCSI volume (where it is not possible to write the dump to during panic as the kernel is very limited in what it can do during this time) or it can be simply too slow to write the big dump (kernel I/O is restricted to polling operation during panic processing) and dedicating multi-gigabyte SSD to dump device would be waste of resources.

To overcome this limitation I've been working with my colleagues Sriman, Brian, Nick and Chris to store dump in memory, reboot the system and extract the dump to disk. This technology is called Deferred dump. It appears in Solaris starting with Oracle Solaris (Oracle Solaris 11.2 SRU 8.4.0, you can read more about this SRU in My Oracle Support document Doc ID 1672221.1 or generically about SRUs on Gerry's blog about Solaris 11 lifecycle management and SRUs).

In reality there are few hints which give away the fact that dump device was not used for storing the dump, here's a capture of a panic triggered in LDOM:

root@s11:~# reboot -d
Apr 24 06:40:02 s11 reboot: initiated by root on /dev/console

panic[cpu7]/thread=c4003b9d7320: forced crash dump initiated at user request

000002a102519930 genunix:kadmin+650 (fc, 0, c400308f5e38, 4, 5, 1)
  %l0-3: 0000000020895800 00000000102be000 0000000000000004 0000000000000004
  %l4-7: 0000000000000600 0000000000000010 0000000000000004 0000000000000004
000002a102519a00 genunix:uadmin+1d0 (1, c4003bd72460, 0, 6d7000, ff00, 5)
  %l0-3: 000000000000852e 000003000000c000 0000000000000004 0000000000000000
  %l4-7: 0000000000000000 0000000000000000 0000000000000000 0000000000000000

syncing file systems... done
Preserving kernel image in RAM, content: kernel sections: proc, zfs
 0:07  91% done (kernel)
 0:08  97% done (proc)
 0:08 100% done (zfs)
100% done: 331865 (kernel) + 21023 (proc) + 10523 (zfs) pages dumped, dump succeeded
NOTICE: Entering OpenBoot.
NOTICE: Fetching Guest MD from HV.
NOTICE: Starting additional cpus.
NOTICE: Initializing LDC services.
NOTICE: Probing PCI devices.
NOTICE: Finished PCI probing.

SPARC T5-8, No Keyboard
Copyright (c) 1998, 2015, Oracle and/or its affiliates. All rights reserved.
OpenBoot 4.37.0-nightly_01.13.2015, 23.2500 GB memory available, Serial #83489682.
Ethernet address 0:14:4f:f9:f3:92, Host ID: 84f9f392.

Boot device: /virtual-devices@100/channel-devices@200/disk@0:a  File and args: 
SunOS Release 5.11 Version 11.2 64-bit
Copyright (c) 1983, 2015, Oracle and/or its affiliates. All rights reserved.
NOTICE: Verified previous kernel image
Reconciling deferred dump: 99%..100% done.
Hostname: s11
Apr 24 06:40:58 s11 savecore: Until crashdump is saved to disk the multi-user-server milestone is delayed.
Apr 24 06:40:58 s11 savecore: Saving decompressed system crash dump files in directory /var/crash/data/61a392f8-3336-4d88-a0bb-d4ea84c11e60
Storing crash dump section kernel: 0:00   5% done (16956 pages)
Storing crash dump section kernel: 0:51 100% done (331865 pages)
Storing crash dump section proc: 0:08 100% done (21023 pages)
Storing crash dump section zfs: 0:00 100% done (10523 pages)

There are couple of things to notice. First the message about preserving system image in RAM. Then the reconcile phase where deferred dump pieces are reunited with VM and most importantly the message about delayed multi-user-server milestone which is necessary to prevent memory hungry apps from fighting with deferred dump while it is still residing in memory.

Now, you may ask: how can this work given that the data which is dumped reside in the same memory we are dumping into ? Well, this is a bit like feet switching when climbing walls - for a bit of time the hands take the weight of the body until the feet are switched on a single step. For deferred dump, the aid is preallocated memory (this is what you see in mdb(1) in the output of ::memstat d-command as Defdump prealloc), careful page allocation and compression. The technique to preserve the memory across reboot is different on SPARC and x86. On x86 fastreboot is the necessary pre-requisite while on SPARC this is a result of Hypervisor and OBP cooperation.

In overall, deferred dump should be faster than classic on-disk dump. This is because storing the dump to crash directory happens when the system can use system capabilities fully (namely DMA and full ZFS). Of course, the crash directory can reside on network volume too since the system is almost fully initialized at that point.
Sometimes, the system can perform 2 successive reboots - one to get from the panicked kernel to new kernel and another to clear the residue in case the new kernel deems that memory is fragmented enough to cause performance issues. This happens on x86 systems only.

There are a few limitations which can prevent the system from not performing deferred dump, namely:

  • fast reboot support on x86 - the system has to support switching kernels without having to go through BIOS
  • uptime on x86 - by default the system has to be up for at least 10 minutes
  • memory size - there is minimal system memory required, currently it is cca 9.5GB
  • firmware version on SPARC has to be sufficiently new to support deferred dump (9.4.x on T5/M6 and above, 8.7.x on T4)

Deferred dump is enabled by default so it will become de-facto standard way of dumping on most systems.

Thursday Jul 24, 2014

Crashdump restructuring in Solaris

In Solaris 11.2 the crashdump restructuring project changed the way how dump data are stored. Data which are not pure kernel pages now go into separate files. Together with my colleague Sriman we made it to happen.

The first noticeable change was a change in the layout of the crash directory. The files are stored under /var/crash/data/uuid/ directory. The long hexadecimal string (uuid) was added to better align with FMA - it's actually uuid (universally unique ID) of the crash event which can be found in fmadm faulty output. Actually, if you look at FMA panic events from earlier versions you can see that the resource string for the event was already designed this way, it's just materialized with this project.

For example, after 2 panic events the /var/crash directory will look like this:

0 -> data/404778fb-88da-4188-f222-8da174d44fa4
1 -> data/6e50417e-95fc-4ab4-e9a8-abbe80bc6b48

The 0, 1 symlinks maintain the sequential ordering of the old layout.

The example reflects a configuration when savecore is not automatically run after boot (i.e. dumpadm -n is in effect) and the administrator has extracted the first crash dump by hand (running savecore 0 in /var/crash/0/ directory). If you take a look at the console after the system rebooted after panic there are commands which you can copy-n-paste to the terminal to perform the extraction.

The other change in the above example is that there is new vmcore-zfs.N file. This is not the only new file which can appear. Depending on dumpadm(1M) configuration there can be files like:

  • vmcore.N - core kernel pages
  • vmcore-zfs.N - ZFS metadata (ZIO buffers)
  • vmcore-proc.N - process pages
  • vmcore-other.N - other pages (ZFS data, free pages)

By splitting the dump into multiple files it is possible to transfer just vmcore.N file for analysis to quickly assess what caused the panic and transfer the rest of the files later on.

If any of the "auxiliary" files is missing, mdb will report it:

root@va64-v20zl-prg06:/var/crash/0# mdb 0
mdb: failed to locate file ./vmcore-zfs.0. Contents of ZFS metadata (ZIO buffers) 
will not be available
Loading modules: [ unix genunix specfs dtrace mac cpu.generic 
cpu_ms.AuthenticAMD.15 uppc pcplusmp zvpsm scsi_vhci zfs mpt sd ip hook neti arp
 usba kssl sockfs lofs idm cpc nfs fcip fctl ufs logindmux ptm sppp ipc ]
> ::status
debugging crash dump vmcore.0 (64-bit) from va64-v20zl-prg06
operating system: 5.12 dump.on12 (i86pc)
usr/src version: 23877:f2e76e2d0329:on12_51+48
usr/closed version: 1961:cad017e4c7e4:on12_51+4
image uuid: 404778fb-88da-4188-f222-8da174d44fa4
panic message: forced crash dump initiated at user request
complete: yes, all pages present as configured
dump content: kernel [LOADED,UNVERIFIED] (core kernel pages)
              zfs [MISSING] (ZFS metadata (ZIO buffers))
panicking PID: 101575 (not dumped)

Another choice is not to dump some of the sections at all. E.g. to dump only pages kernel and currently running process at the time of panic but not ZFS metadata the system can be configured as:

dumpadm -c curproc-zfs

Also, the unix.N file is no longer extracted automatically (it can be done with -u option for savecore(1M) if you need the file for some reason) since it is embedded in vmcore.N file; mdb will find it automatically.

How to load the files into mdb with all these files around ? The easiest way how to access an extracted dump is to use just the suffix, i.e.

  mdb N

which will pick up all the files based on metadata in the main vmcore.N file. This worked even before this change, except there was just one file (2 if counting unix.N).

It is still possible to specify the files by hand, just remember to put the main file (vmcore.N) as the first argument:

  mdb vmcore.N vmcore-zfs.N ...

The other change (which is hard to notice unless you're dealing with lots of crash dump files) was that we laid out the infrastructure in kernel/mdb/libkvm to be properly backwards compatible w.r.t. on-disk crash dump format. As a result mdb will automatically load crash dump files produced in earlier Solaris versions. Currently it supports 3 latest versions.

Monday Nov 28, 2011

Netcat I/O enhancements

When Netcat integrated into OpenSolaris it was already clear that there will be couple of enhancements needed. The biggest set of the changes made after Solaris 11 Express was released brings various I/O enhancements to netcat shipped with Solaris 11. Also, since Solaris 11, the netcat package is installed by default in all distribution forms (live CD, text install, ...).

Now, let's take a look at the new functionality:

/usr/bin/netcat alternative program name (symlink)
-b bufsize I/O buffer size
-E use exclusive bind for the listening socket
-e program program to execute
-F no network close upon EOF on stdin
-i timeout extension of timeout specification
-L timeout linger on close timeout
-l -p port addr previously not allowed usage
-m byte_count Quit after receiving byte_count bytes
-N file pattern for UDP scanning
-I bufsize size of input socket buffer
-O bufsize size of output socket buffer
-R redir_spec port redirection
addr/port[/{tcp,udp}] syntax of redir_spec
-Z bypass zone boundaries
-q timeout timeout after EOF on stdin

Obviously, the Swiss army knife of networking tools just got a bit thicker.

While by themselves the options are pretty self explanatory, their combination together with other options, context of use or boundary values of option arguments make it possible to construct small but powerful tools. For example:

  • the port redirector allows to convert TCP stream to UDP datagrams.
  • the buffer size specification makes it possible to send one byte TCP segments or to produce IP fragments easily.
  • the socket linger option can be used to produce TCP RST segments by setting the timeout to 0
  • execute option makes it possible to simulate TCP/UDP servers or clients with shell/python/Perl/whatever script
  • etc.

If you find some other helpful ways use please share via comments.

Manual page nc(1) contains more details, along with examples on how to use some of these new options.

Friday Mar 18, 2011

Blog moving to

This blog will soon move to infrastructure.

Wednesday Jun 02, 2010

netcat and network stack debugging

One of the options of the netcat program (/usr/bin/nc) available in OpenSolaris (if you don't have it installed simply run pkg install SUNWnetcat) is the -D (aka debugging) option. Only recently I realized that not everyone out there knows how to capture the debugging data once this option is set since this is pretty Solaris specific. What this option does is basically trigger a flag inside the ip kernel module specific for given connection structure (conn_t). netcat does this by calling setsockopt() system call with SO_DEBUG option set for given socket. As a result conn_debug flag is set inside conn_t structure associated with the socket. This flag is then consulted in various functions manipulating the structure.

When there is an interesting event and the conn_debug is set the function calls strlog() kernel function to record the data. For example, here's a snippet of usr/src/uts/common/inet/tcp/tcp_input.c:tcp_input_listener():

   1366 	if (listener->tcp_conn_req_cnt_q >= listener->tcp_conn_req_max) {
   1367 		mutex_exit(&listener->tcp_eager_lock);
   1368 		TCP_STAT(tcps, tcp_listendrop);
   1369 		TCPS_BUMP_MIB(tcps, tcpListenDrop);
   1370 		if (lconnp->conn_debug) {
   1371 			(void) strlog(TCP_MOD_ID, 0, 1, SL_TRACE|SL_ERROR,
   1372 			    "tcp_input_listener: listen backlog (max=%d) "
   1373 			    "overflow (%d pending) on %s",
   1374 			    listener->tcp_conn_req_max,
   1375 			    listener->tcp_conn_req_cnt_q,
   1376 			    tcp_display(listener, NULL, DISP_PORT_ONLY));
   1377 		}
   1378 		goto error2;
   1379 	}

To capture the data logged via strlog it's necessary to know the STREAMS module ID, which in our case is TCP_MOD_ID which is defined in usr/src/uts/common/inet/tcp_impl.h as 5105.

To read the data one can use either strace(1M) command line tool or strerr(1M) daemon which both produce text logs. To read everything, one can use this command (needs read access to /dev/log so has to run under root):

# strace 5105 all all

Here are two examples using netcat: at first we try to bind to a port which we don't have privileges for:

$ nc -D -l -4 -p 23
nc: Permission denied

which produces the following entry from strace:

000004 12:11:04 19c7ff6a  1 ..E 5105 0 ip: [ID 234745 kern.debug] tcp_bind: no priv for port 23

next we try to bind to already occupied port:

$ nc -D -l -4 -p 4444
nc: Address already in use

which produces the following entry from strace:

000005 12:15:33 19c86878  1 ..E 5105 0 ip: [ID 326978 kern.debug] tcp_bind: requested addr busy

This is of course traceable via normal tools such as ppriv(1) or truss(1) but the point is that much fine grained details can be captured from the network modules. The format of the log entries is explained in the strace(1M) man page.

Thursday Dec 10, 2009

ZFS likes to have ECC RAM

I have been using custom built {ZFS,OpenSolaris}-based NAS at home for more than a year. The machine was built partly from second hand components (e.g. motherboard), from in-house unused iron and from minority of brand new stuff (more on that in a separate entry). The machine has been running constantly and serving data occasionally with very light load. One day I needed to perform some administrative task and realized it's not possible to SSH into the machine. Console login revealed the uptime is just couple of days, both pools (root pool and data pool) contain staggering number of checksum errors. In the /var/crash/ directory there was couple of crash dumps. Some of them were corrupted and mdb(1) refused to load them in or reported garbage. The times of the crashes corresponded to the Sunday night scrubbing for each of the pool. At least two of the dumps contained interesting and fairly obvious stack trace. I no longer have the file so here's just the entry from the log:

Nov  1 02:27:20 chiba \^Mpanic[cpu0]/thread=ffffff0190914040: 
Nov  1 02:27:20 chiba genunix: [ID 683410 kern.notice] BAD TRAP: type=d (#gp General protection) rp=ffffff0006822380 addr=488b
Nov  1 02:27:20 chiba unix: [ID 100000 kern.notice] 
Nov  1 02:27:20 chiba unix: [ID 839527 kern.notice] sh: 
Nov  1 02:27:20 chiba unix: [ID 753105 kern.notice] #gp General protection
Nov  1 02:27:20 chiba unix: [ID 358286 kern.notice] addr=0x488b
Nov  1 02:27:20 chiba unix: [ID 243837 kern.notice] pid=740, pc=0xfffffffffba0373a, sp=0xffffff0006822470, eflags=0x10206
Nov  1 02:27:20 chiba unix: [ID 211416 kern.notice] cr0: 8005003b cr4: 6f8
Nov  1 02:27:20 chiba unix: [ID 624947 kern.notice] cr2: fee86fa8
Nov  1 02:27:20 chiba unix: [ID 625075 kern.notice] cr3: b96a0000
Nov  1 02:27:20 chiba unix: [ID 625715 kern.notice] cr8: c
Nov  1 02:27:20 chiba unix: [ID 100000 kern.notice] 
Nov  1 02:27:20 chiba unix: [ID 592667 kern.notice]     rdi: ffffff018b1e1c98 rsi: ffffff01a032dfb8 rdx: ffffff0190914040
Nov  1 02:27:20 chiba unix: [ID 592667 kern.notice]     rcx: ffffff018ef054b0  r8:                c  r9:                b
Nov  1 02:27:20 chiba unix: [ID 592667 kern.notice]     rax: ffffff01a032dfb8 rbx:                0 rbp: ffffff00068224a0
Nov  1 02:27:20 chiba unix: [ID 592667 kern.notice]     r10:                0 r11:                0 r12: ffbbff01a032d740
Nov  1 02:27:20 chiba unix: [ID 592667 kern.notice]     r13: ffffff01a032dfb8 r14: ffffff018b1e1c98 r15:             488b
Nov  1 02:27:20 chiba unix: [ID 592667 kern.notice]     fsb:                0 gsb: fffffffffbc30400  ds:               4b
Nov  1 02:27:20 chiba unix: [ID 592667 kern.notice]      es:               4b  fs:                0  gs:              1c3
Nov  1 02:27:20 chiba unix: [ID 592667 kern.notice]     trp:                d err:                0 rip: fffffffffba0373a
Nov  1 02:27:20 chiba unix: [ID 592667 kern.notice]      cs:               30 rfl:            10206 rsp: ffffff0006822470
Nov  1 02:27:20 chiba unix: [ID 266532 kern.notice]      ss:               38
Nov  1 02:27:20 chiba unix: [ID 100000 kern.notice] 
Nov  1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822260 unix:die+10f ()
Nov  1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822370 unix:trap+43e ()
Nov  1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822380 unix:_cmntrap+e6 ()
Nov  1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff00068224a0 genunix:kmem_slab_alloc_impl+3a ()
Nov  1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff00068224f0 genunix:kmem_slab_alloc+a1 ()
Nov  1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822550 genunix:kmem_cache_alloc+130 ()
Nov  1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff00068225c0 zfs:dbuf_create+4e ()
Nov  1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff00068225e0 zfs:dbuf_create_bonus+2a ()
Nov  1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822630 zfs:dmu_bonus_hold+7e ()
Nov  1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff00068226c0 zfs:zfs_zget+5a ()
Nov  1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822780 zfs:zfs_dirent_lock+3fc ()
Nov  1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822820 zfs:zfs_dirlook+d9 ()
Nov  1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff00068228a0 zfs:zfs_lookup+25f ()
Nov  1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822940 genunix:fop_lookup+ed ()
Nov  1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822b80 genunix:lookuppnvp+3a3 ()
Nov  1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822c20 genunix:lookuppnatcred+11b ()
Nov  1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822c90 genunix:lookuppn+5c ()
Nov  1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822e90 genunix:exec_common+1ac ()
Nov  1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822ec0 genunix:exece+1f ()
Nov  1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822f10 unix:brand_sys_syscall32+19d ()
Nov  1 02:27:20 chiba unix: [ID 100000 kern.notice] 
Nov  1 02:27:20 chiba genunix: [ID 672855 kern.notice] syncing file systems...

Also, next to the messages on the console I found some entries in /var/adm/messages like this one:

Nov  2 12:15:01 chiba genunix: [ID 647144 kern.notice] ksh93: Cannot read /lib/amd64/

Later on, the condition of the machine worsened and it was not even possible to execute some commands due to I/O errors up to the point when the machine had to be halted.

The panic occurring in kmem routines, loads of checksum errors on both mirrored pools (same number of errors for each disk in the mirror) and the fact that the system was running with the same build for couple of months without a problem lead me to try memtest:

The errors started appearing on the screen in the first couple of seconds of the run. It turned out one of the 3 1GB DDR2 chips went bad. In case you're wondering, the DIMMS were bought as new 1 year ago, were branded (all of them from the same brand known for gaming/overclocking equipment, same type) and had aluminium heat sink on it, so no low quality stuff.

I was able to recover the data from past snapshots and replaced the RAM with ECC DIMMS (which required new motherboard+CPU combo). This is nice case of semi-silent data corruption detection. Without checksums the machine would be happily panicking and corrupting data without giving clear indication what is going on (e.g. which files were corrupted). So, even for home NAS solution ECC RAM is good (if not essential) to have.

FMA should do the right thing if one of the ECC modules goes bad which means it will not allow the bad pages to be used (the pages will be retired). The list of retired pages is persistent across reboots. More on FMA and ECC RAM can be found e.g. in this discussion on fm-discuss or in the FMA and DIMM serial numbers blog entry in Rob Johnston's blog or in the Eversholt rules for AMD in usr/src/cmd/fm/eversholt/files/i386/i86pc/amd64.esc.

Friday Oct 30, 2009

signal() versus sigaction() on Solaris

This entry is mostly for newcomers to Solaris/OpenSolaris from UNIX-like systems. When I had been taught about signal() and sigaction() my understanding was that sigaction() is just a superset of signal() and also POSIX conformant but otherwise they accomplish the same thing. This is indeed the case for some of UNIX-like operating systems. In Solaris, as I only recently discovered (to my dismay :)), it's different.

Consider the following code (please ignore the fact it's not strictly checking return values and that the signal handler is not safe):

#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <sys/types.h>

void sig_handler(int s) {
	printf("Got signal! Sleeping.\\n");


        printf("returning from signal handler\\n");

int main(void) {
        struct sigaction s_action;

	printf("Setting signal handler: ");
	(void) sigemptyset(&s_action.sa_mask);
	s_action.sa_handler = sig_handler;
        s_action.sa_flags = 0;
	(void) sigaction(SIGHUP, &s_action, (struct sigaction \*) NULL);
	signal(SIGHUP, sig_handler);

	printf("Waiting for signal\\n");

	return (0);

Now try to compile and run with and without the -DPOSIX_SIGNALS and send 2 SIGHUP signals to the process within the 10 seconds window (so the second signal is received while the signal handler is still running). With sigaction(), the signal will be caught by the handler in both of the cases. With signal() however, the second signal will cause the process to exit. This is because kernel will reset the signal handler to default upon receiving the signal for the first time. This is described in the signal(3C) man page in a somewhat hidden sentence inside the second paragraph (it really pays out to read man pages slowly and with attention to detail):

     If signal()  is
     used,  disp  is  the address of a signal handler, and sig is
     not  SIGILL, SIGTRAP, or  SIGPWR, the system first sets  the
     signal's disposition to  SIG_DFL before executing the signal

The sigaction(2) man page has this section:

     SA_RESETHAND    If set and the signal is caught, the  dispo-
                     sition of the signal is reset to SIG_DFL and
                     the signal will not be blocked on  entry  to
                     the  signal  handler  (SIGILL,  SIGTRAP, and
                     SIGPWR cannot be  automatically  reset  when
                     delivered; the system silently enforces this

sigaction() does not set the flag by default which results in the different behavior. I found out that this behavior has been present since Solaris 2.0 or so.

In fact, signal() routine from libc is implemented via sigaction(). From $SRC/lib/libc/port/sys/signal.c:

     58 /\*
     59  \* SVr3.x signal compatibility routines. They are now
     60  \* implemented as library routines instead of system
     61  \* calls.
     62  \*/
     64 void(\*
     65 signal(int sig, void(\*func)(int)))(int)
     66 {
     67 	struct sigaction nact;
     68 	struct sigaction oact;
     70 	CHECK_SIG(sig, SIG_ERR);
     72 	nact.sa_handler = func;
     73 	nact.sa_flags = SA_RESETHAND|SA_NODEFER;
     74 	(void) sigemptyset(&nact.sa_mask);
     76 	/\*
     77 	 \* Pay special attention if sig is SIGCHLD and
     78 	 \* the disposition is SIG_IGN, per sysV signal man page.
     79 	 \*/
     80 	if (sig == SIGCHLD) {
     81 		nact.sa_flags |= SA_NOCLDSTOP;
     82 		if (func == SIG_IGN)
     83 			nact.sa_flags |= SA_NOCLDWAIT;
     84 	}
     86 	if (STOPDEFAULT(sig))
     87 		nact.sa_flags |= SA_RESTART;
     89 	if (sigaction(sig, &nact, &oact) < 0)
     90 		return (SIG_ERR);
     92 	return (oact.sa_handler);
     93 }

I am pretty sure that the SA_RESETHAND flag is set in signal() in order to preserve backwards compatibility.

This means that to solve this problem with signal(), one should set the signal handler again in the signal handler itself. However, this is not a complete solution since there is still a window where the signal can be delivered and the handler is set to SIG_DFL - the default handler which is exit in case of SIGHUP as the signal.h(3HEAD) man page explains in really useful table:

          Name        Value   Default                    Event
     SIGHUP           1       Exit       Hangup (see termio(7I))

Now let's look at FreeBSD. Its SIGNAL(3) man page contains this separate paragraph:

     The handled signal is unblocked when the function returns and the process
     continues from where it left off when the signal occurred.  Unlike previ-
     ous signal facilities, the handler func() remains installed after a sig-
     nal has been delivered.

The second sentence is actually printed in bold letters. I also tried on Linux and NetBSD and the behavior is the same as in FreeBSD.

So, to conclude all of the above: using signal() is really not portable.

Tuesday Sep 01, 2009

Netcat as small packet factory

Recently I needed to test a bug fix in in.iked(1M) (should say with which in.iked is linked) after which the daemon should respond to IKEv2 requests with Notification message telling the peer to fall back to IKEv1 (previously it did not respond to IKEv2 packets at all). This can be tested by:

  • installing a OS instance which supports IKEv2 and initiating from there
  • writing a simple program (C/Perl/etc.) which will construct the UDP payload

Surely, there should be easier way how to send a UDP paket with arbitrary (in my case ISAKMP) payload. It turns out this is very easy to do just from command line with nc(1) which is available in OpenSolaris (install it via 'pkg install SUNWnetcat'). Let's try to send some garbage first to see if it works:

perl -e 'print "\\x41\\x41";' | nc -u rpe-foo.czech 500

Yep, tshark(1) (in OpenSolaris shipped by default with Wireshark) reports an IKE packet, malformed one (which is not surprising):

Capturing on eri0
  0.000000 -> ISAKMP [Malformed Packet]

0000  00 0a e4 2f 61 eb 00 03 ba 4e 3d 38 08 00 45 00   .../a....N=8..E.
0010  00 1e 26 98 40 00 ff 11 20 fb 0a 12 90 0c 0a 12   ..&.@... .......
0020  90 0b e2 66 01 f4 00 0a 34 57 41 41               ...f....4WAA

Our two A's are there just after the UDP header (Ethernet header 14 bytes, IP header 20 bytes, UDP 8 bytes, in sum 42 bytes and our 2 bytes are just after first 8 bytes on 3rd line).

With that we can go and construct IKEv1 packet first to see if the daemon will react upon it. We will need to construct the payload which is a IKEv1 header. IKEv1 is defined in RFC 2409 (The Internet Key Exchange (IKE)). IKEv1 uses ISAKMP header definition so we need to look into RFC 2408 (Internet Security Association and Key Management Protocol (ISAKMP)) for the actual header definition. It's there in section 3.1:

                         1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    !                          Initiator                            !
    !                            Cookie                             !
    !                          Responder                            !
    !                            Cookie                             !
    !  Next Payload ! MjVer ! MnVer ! Exchange Type !     Flags     !
    !                          Message ID                           !
    !                            Length                             !

I'd like to construct a packet which resembles first packet sent by IKEv1 Initiator. So, our packet code (similar to shell code) will look like this (without thinking too much of what should the values look like):

  • Initiator's cookie, must not be zero
  • Responder's cookie, must be zero in the initial packet from Initiator
  • next payload, let's try 0 first
  • Major and Minor Version (4 bits each)
  • Exchange Type
                                Exchange Type      Value
                             NONE                    0
                             Base                    1
                             Identity Protection     2
                             Authentication Only     3
                             Aggressive              4
                             Informational           5
                             ISAKMP Future Use     6 - 31
                             DOI Specific Use     32 - 239
                             Private Use         240 - 255
    So let's try Base first:
  • Flags (Initiator)
  • Message ID
  • Length

We need to massage our packet code into command line. The code:


We want source port to be 500 as well because of section 2.5.1 in RFC 2408 so use the -p option (this requires the net_privaddr privilege so either become root or use pfexec(1)). Also, we do not need to wait for the response so use -w option:

perl -e 'print "\\x11\\x22\\x33\\x44\\x55\\x66\\x77\\x88\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x10\\x01\\x00\\x66\\x66\\x66\\x66\\x28";' \\
    | nc -w 1 -p 500 -u rpe-foo.czech 500

The packet was received but there was no reply and tshark still considers this as Malformed Packet. Let's check the header again - oh yeah, the Length field has 4 bytes, not just one. Let's try again:

perl -e 'print "\\x11\\x22\\x33\\x44\\x55\\x66\\x77\\x88\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x10\\x01\\x00\\x66\\x66\\x66\\x66\\x28\\x00\\x00\\x00";' \\
    | nc -w 1 -p 500 -u rpe-foo.czech 500

Okay, this is our Base exchange but still not response:

294.029154 -> ISAKMP Base

0000  00 0a e4 2f 61 eb 00 03 ba 4e 3d 38 08 00 45 00   .../a....N=8..E.
0010  00 38 26 a7 40 00 ff 11 20 d2 0a 12 90 0c 0a 12   .8&.@... .......
0020  90 0b 01 f4 01 f4 00 24 34 71 11 22 33 44 55 66   .......$4q."3DUf
0030  77 88 00 00 00 00 00 00 00 00 00 10 01 00 66 66   w.............ff
0040  66 66 28 00 00 00                    

Let's try something more provocative and set the Exchange type to Identity protection:

perl -e 'print "\\x11\\x22\\x33\\x44\\x55\\x66\\x77\\x88\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x10\\x02\\x00\\x66\\x66\\x66\\x66\\x28\\x00\\x00\\x00";' \\
    | nc -w 1 -p 500 -u rpe-foo.czech 500

Oh yeah, this finally deserved a response:

383.050874 -> ISAKMP Identity Protection (Main Mode)

0000  00 0a e4 2f 61 eb 00 03 ba 4e 3d 38 08 00 45 00   .../a....N=8..E.
0010  00 38 26 a8 40 00 ff 11 20 d1 0a 12 90 0c 0a 12   .8&.@... .......
0020  90 0b 01 f4 01 f4 00 24 34 71 11 22 33 44 55 66   .......$4q."3DUf
0030  77 88 00 00 00 00 00 00 00 00 00 10 02 00 66 66   w.............ff
0040  66 66 28 00 00 00                                 ff(...

383.051672 -> ISAKMP Informational

0000  00 03 ba 4e 3d 38 00 0a e4 2f 61 eb 08 00 45 00   ...N=8.../a...E.
0010  00 99 d3 8b 40 00 ff 11 73 8c 0a 12 90 0b 0a 12   ....@...s.......
0020  90 0c 01 f4 01 f4 00 85 ed 05 11 22 33 44 55 66   ..........."3DUf
0030  77 88 85 75 8e 0f fa a5 5d de 0b 10 05 00 69 a5   w..u....].....i.
0040  63 e4 00 00 00 7d 00 00 00 61 00 00 00 01 01 10   c....}...a......
0050  00 1e 11 22 33 44 55 66 77 88 85 75 8e 0f fa a5   ..."3DUfw..u....
0060  5d de 80 0c 00 01 00 06 00 39 55 44 50 20 50 61   ]........9UDP Pa
0070  63 6b 65 74 20 64 6f 65 73 20 6e 6f 74 20 63 6f   cket does not co
0080  6e 74 61 69 6e 20 65 6e 6f 75 67 68 20 64 61 74   ntain enough dat
0090  61 20 66 6f 72 20 49 53 41 4b 4d 50 20 70 61 63   a for ISAKMP pac
00a0  6b 65 74 80 08 00 00                              ket....

Now that we proved to ourselves that we can construct semi-valid packet it's time to try IKEv2. IKEv2 header is defined in RFC 4306, section 3.1:

                           1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      !                       IKE_SA Initiator's SPI                  !
      !                                                               !
      !                       IKE_SA Responder's SPI                  !
      !                                                               !
      !  Next Payload ! MjVer ! MnVer ! Exchange Type !     Flags     !
      !                          Message ID                           !
      !                            Length                             !

On a first sight, it looks the same (for backward compatibility). However, some of the values are different. For IKE header, the main differences are in the Exchange Type and Flags:

                       Exchange Type            Value

                       RESERVED                 0-33
                       IKE_SA_INIT              34
                       IKE_AUTH                 35
                       CREATE_CHILD_SA          36
                       INFORMATIONAL            37
                       RESERVED TO IANA         38-239
                       Reserved for private use 240-255

IKE_SA_INIT is our guy ('echo 0t34=x | mdb' produces 0x22).

The flags are now used to indicate the exchange. Set 3rd bit to say we are the Initiator. We will retain the source port even though IKEv2 supports ports other than 500 and 4500 because we're dealing with IKEv1 implementation. Now slightly change our packet code (don't forget to change the Version field to 2.0):

perl -e 'print "\\x11\\x22\\x33\\x44\\x55\\x66\\x77\\x88\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x20\\x22\\x08\\x66\\x66\\x66\\x66\\x28\\x00\\x00\\x00";' \\
    | nc -w 1 -p 500 -u rpe-foo.czech 500

And we got a nice response (since the responder runs recent version

1013.190867 -> ISAKMP IKE_SA_INIT

0000  00 0a e4 2f 61 eb 00 03 ba 4e 3d 38 08 00 45 00   .../a....N=8..E.
0010  00 38 26 aa 40 00 ff 11 20 cf 0a 12 90 0c 0a 12   .8&.@... .......
0020  90 0b 01 f4 01 f4 00 24 34 71 11 22 33 44 55 66   .......$4q."3DUf
0030  77 88 00 00 00 00 00 00 00 00 00 20 22 08 66 66   w.......... ".ff
0040  66 66 28 00 00 00                                 ff(...

1013.192005 -> ISAKMP Informational

0000  00 03 ba 4e 3d 38 00 0a e4 2f 61 eb 08 00 45 00   ...N=8.../a...E.
0010  00 83 d3 8d 40 00 ff 11 73 a0 0a 12 90 0b 0a 12   ....@...s.......
0020  90 0c 01 f4 01 f4 00 6f 66 da 11 22 33 44 55 66   .......of.."3DUf
0030  77 88 5c 36 e3 75 a2 7b 8e fe 0b 10 05 00 87 03   w.\\6.u.{........
0040  0c f5 00 00 00 67 00 00 00 4b 00 00 00 01 01 10   .....g...K......
0050  00 05 11 22 33 44 55 66 77 88 5c 36 e3 75 a2 7b   ..."3DUfw.\\6.u.{
0060  8e fe 80 0c 00 01 00 06 00 23 49 6e 76 61 6c 69   .........#Invali
0070  64 20 49 53 41 4b 4d 50 20 6d 61 6a 6f 72 20 76   d ISAKMP major v
0080  65 72 73 69 6f 6e 20 6e 75 6d 62 65 72 80 08 00   ersion number...
0090  00   

The only thing which is not nice is our terminal since nc(1) dumped the binary packet to it. Let's try again with some post-processing:

# perl -e 'print "\\x11\\x22\\x33\\x44\\x55\\x66\\x77\\x88\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x20\\x22\\x08\\x66\\x66\\x66\\x66\\x28\\x00\\x00\\x00";'     
| nc -w 1 -p 500 -u rpe-foo.czech 500 | od -c
0000000 021   "   3   D   U   f   w 210 237   y 254 264 351 333 007 344
0000020 013 020 005  \\0   [ 251 244   j  \\0  \\0  \\0   g  \\0  \\0  \\0   K
0000040  \\0  \\0  \\0 001 001 020  \\0 005 021   "   3   D   U   f   w 210
0000060 237   y 254 264 351 333 007 344 200  \\f  \\0 001  \\0 006  \\0   #
0000100   I   n   v   a   l   i   d       I   S   A   K   M   P       m
0000120   a   j   o   r       v   e   r   s   i   o   n       n   u   m
0000140   b   e   r 200  \\b  \\0  \\0

The fix is obviously in place.

Monday Aug 17, 2009

OpenSSL PKCS#11 engine presentation

Some light intro first: OpenSSL has a concept of plugins/add-ons called 'engines' which can supply alternative implementation of crypto operations (digests, symmetric and asymmetric ciphers and random data generation). The main reason for the existence of the engines is the ability to offload crypto ops to hardware. (Open)Solaris ships with an engine called PKCS#11 engine which provides access to Solaris Cryptographic Framework which in turn can provide access to HW crypto.

I spent some time fixing bugs in OpenSSL PKCS#11 engine in Solaris so I got quite intimate with its internals. Recently while discussing an upcoming feature with Jan he asked me why one particular detail in the engine is done one way and not the other (it's the fork() detection not done via atfork handlers; for the curious). It took me some thinking to find the answer (I focused on the other changes at that time) which made us realize that it would be good to summarize the design choices behind the engine and also to document the internals so that others can quickly see what's going on inside and also be able to do changes in the engine without reverse engineer the thoughts behind it. The outcome is a set of slides which I hope succinctly describe both the overall picture and the gritty details.

The presentation can be downloaded here.

Wednesday Jun 03, 2009

KSSL project is now open

I have spent some time fixing bugs in KSSL (kernel SSL proxy) implementation in Solaris and got familar with it (and the KSSL development team) so with delight I can co-announce that the KSSL project has been opened on

To me, KSSL is one of the unique projects in the (Open)Solaris security land in a sense it is tightly integrated into the system and is a consumer of several major subsystems (networking, crypto framework) which makes it interesting for study and also for extending it in creative ways.

We will start adding more content to the pages, including design documentation and description of KSSL internals. Also, this marks major milestone in a way how KSSL team does its job. From now on all non-confidential discussions, reviews etc. will happen in the open. Feel free to join the project and participate if you're interested ! (become an observer and join the mailing list)

Monday May 04, 2009

External contributions to testing community

I have just integrated couple of changes which I believe are the first contributed externally to the Testing community as an open-source contribution. The changes add couple of new tests to the nc test suite to cover the enhancement described in PSARC/2008/680 (which is present in Nevada since build 106). This is the stuff which allows you to run nc(1) in client mode with complex portlist specifications. Previously it was possible only to use simple port ranges like 22-80, with this change one can connect to e.g. 22,24,50-80,66,1024-2048. Little example how it might be useful:

$ nc -v -z grok.czech 22,25,80-88,8080
Connection to grok.czech 22 port [tcp/ssh] succeeded!
nc: connect to port 25 [host grok.czech] (tcp) failed: Connection refused
Connection to grok.czech 80 port [tcp/\*] succeeded!
nc: connect to port 81 [host grok.czech] (tcp) failed: Connection refused
nc: connect to port 82 [host grok.czech] (tcp) failed: Connection refused
nc: connect to port 83 [host grok.czech] (tcp) failed: Connection refused
nc: connect to port 84 [host grok.czech] (tcp) failed: Connection refused
nc: connect to port 85 [host grok.czech] (tcp) failed: Connection refused
nc: connect to port 86 [host grok.czech] (tcp) failed: Connection refused
nc: connect to port 87 [host grok.czech] (tcp) failed: Connection refused
nc: connect to port 88 [host grok.czech] (tcp) failed: Connection refused
Connection to grok.czech 8080 port [tcp/\*] succeeded!

Back to the testing part. The putback (yes, stcnv-gate is still using Teamware) log for this change looks like this (I have modified Erik's e-mail a bit):

6786859 portranges_complex_spec is missing the listener
6754842 extended port list specification needs to be tested
Code contributed by Erik Trauschke <erik.trauschke AT>

I think this is really nice example of the ideal state - the contributor not only did the feature part but also the testing part. It shows a great degree of responsibility - not just throwing some code "over the fence" but fully participating in the process to ensure the quality even in the long term.

The tests are both positive and negative. Each purpose in portranges directory is numbered and the following numbers match the test purpose numbers:

  • 5-12 ensure nc treats ports 0 and 65536 as invalid
    Previously, it was possible to listen on ports 0 and 65536, with Erik's changes this is no longer true so we need to add regression tests for both cases (client/server) and both ports.
  • 13-19 see if various malformed port list specifications are considered as invalid
    Each purpose needs not only positive tests which make sure the functionality actually works but also negative tests which ensure it does not misbehave. In this case, invalid port list specifications are thrown at nc(1) to see it reacts accordingly (with error, that is).
  • 20-25 test the functionality of complex port lists
    This is the bunch of test which see if the functionality actually works.
  • 26 tests reallocation
    Since the internal representation of the port list is now dynamically allocated and there is a default pre-allocated value which is reallocated if needed we need to test the case of reallocation.

To be able to do such integration there is now a Test development process. It's similar to the process used in ON community but it's more lightweight. The main difference is that the request-sponsor part is done informally via the testing-discuss mailing list and there is no list of bugs to pick up from. But don't be shy, whether you're adding new functionality or completely new program, the Testing community is here to help you.

Tuesday Apr 07, 2009

Collateral salutariness

Each build of (Open)Solaris is tested with a variety of test suites on variety of platforms and I wanted nc test suite to participate in these runs.
Eoin Hughes from PIT team (which runs those tests) was kind enough to workaround couple of bugs (which are fixed now) in the test suite so it can be run in PIT environment. Later on, I got a report from Eoin that as a result of nc test suite run CR 6793191 (watchmalloc triggers system panic on sockfs copyin) was caught. This bug is manifested by a panic:

Panic message (this particular panic is on a DomU, although this happens across the board):
panic[cpu0]/thread=ffffff0150ce1540: copyin_noerr: argument not in kernel address space

ffffff000416dcf0 unix:bcopy_ck_size+102 ()
ffffff000416ddb0 genunix:watch_xcopyin+151 ()
ffffff000416dde0 genunix:watch_copyin+1d ()
ffffff000416de50 sockfs:copyin_name+91 ()
ffffff000416deb0 sockfs:bind+90 ()
ffffff000416df00 unix:brand_sys_syscall32+328 ()

The bug is actually a regression caused by CR 6292199 (bcopy and kcopy should'nt use rep, smov) and was fixed by an engineer from Intel in OpenSolaris/Nevada code base.

This is instance of an event which I like so much - unintended positive consequence elsewhere. In contrast with so called collateral damage this is something which is beneficial in other areas. I've written nc test suite to test primarily nc(1) command but here it proved to be useful for testing other areas of the system as well. In this case it was thanks to the fact that the test suite is run with memory leak checking by default (see NC_PRELOADS variable in src/suites/net/nc/include/vars file).

And yes, CR 6793191 is fixed by now.

Tuesday Mar 31, 2009

Command line history and editing for IPsec utilities

Since the days when John Beck added command line editing to zonecfg Mark Phalan did similar thing to Kerberos utilities and Huie-Ying Lee to sftp. IPsec utilities (ipseckey(1M) and ikeadm(1M)) offered the ability to enter commands in interactive mode for a long time but only since Nevada build 112, the commands support command line editing and history too. Again, thanks to libtecla (shipped with Solaris/OpenSolaris).

Lessons learned:

  • adding full-blown command line editing support is hard.
    Adding the initial support is quite easy. However, more advanced features could require substantial work. This is especially true for tab completion. For sftp Huie-Ying decided to add tab completion in the future phase because of the ambiguities when completing names of files (when to complete local files versus remote files).
    I did the same with tab completion for IPsec utilities - the integration only delivers basic command line editing support, without tab completion. The problem with ipseckey(1M) and ikeadm(1M) is that their grammar is quite bifurcated and has contexts. For example, you cannot use encr_alg with AH SAs in ipseckey. Or, it would be erroneous to tab complete a valid command in the middle of entering a key if the key hex sequence contained sub-string of a valid command. The hardest part is I think offering the right tables of valid commands in given context. E.g. in our case a command line in our case must start with top-level command. Each top-level command offers several valid sub-commands and we do not offer invalid sub-commands for given top-level command so there is a necessity to track the state of the finite state machine describing the grammar contexts.
    Also, after the user entered src we do not want to allow him to enter it again on the same command line. Also, if the user already entered say add esp spi we are expecting SPI number, not a command name.
    Ideally, to solve this problem in nice way there should be a meta library (or additional API in libtecla) which would offer the ability to link command tables and set the contexts.
  • interruptible cycles in command line mode
    ipseckey's monitor command reads from a PF_KEY socket in a loop. The loop is normally interruptible by SIGINT. To do so in libtecla environment (we do not want to exit the command line upon SIGINT and yet still need to interrupt the cycle), something like this is needed:
         static void
         monitor_catch(int signal)
         if (!interactive)
                 errx(signal, gettext("Bailing on signal %d."), signal);
         /\* Catch \^C. \*/
         newsig.sa_handler = monitor_catch;
         newsig.sa_flags = 0;
         (void) sigemptyset(&newsig.sa_mask);
         (void) sigaddset(&newsig.sa_mask, SIGINT);
         (void) sigaction(SIGINT, &newsig, &oldsig);
         for (; ; ) {
                 rc = read(keysock, samsg, sizeof (get_buffer));
                 /\* handle the data \*/
         /\* restore old behavior \*/
         if (interactive)
                 (void) sigaction(SIGINT, &oldsig, NULL);
  • interaction with SMF
    While it's fine to bail out in interactive mode with error, due to the nature of IPsec commands (they can read the config files using the same routines as for interactive mode and they are used as SMF services to bring up IPsec policy and keys after boot) we need to distinguish the interactive and non-interactive mode.
  • maximum command line history value
    It seems that the second parameter to new_GetLine() - histlen is commonly misunderstood. This variable does not express the number of maximum lines in the history but instead maximum size of the history buffer in bytes. If the buffer becomes full, libtecla does not trim the last line but shifts instead.
    Given the first parameter to new_GetLine() expresses maximum command line size (in bytes) one needs to do some calculations and estimates on what will be needed too avoid too big buffer - ipseckey is used to enter key material so the line could become quite long. Say we wanted to keep 1024 lines. If the maximum length of the line is 1024 this will give us 1 megabyte buffer which seems too much for a simple application. Thus I did some guessing and set the buffer size accordingly:
    For "common" ipseckey configuration commands (think moderately bifurcated 'add') it's cca 300 characters. Mostly however, the users enter query commands like 'flush esp', 'dump ah' and the like so this is somewhere around say 30 characters. Say 30% of the commands are configuration and the rest is queries. To hold 100 such commands only cca 10K memory is required. In the end I chose 64K to be able to hold 15 of the biggies (4K) commands.

Thursday Nov 20, 2008

Testing netcat

After multiple rounds of code review the netcat (or nc) test suite is now finally in the onnv-stc2gate. The test suite has its home in the OpenSolaris Networking community (see the networking tests page for the list of networking test suites).
The source code is present in the src/suites/net/nc/ directory and SUNWstc-netcat packages can be downloaded from OpenSolaris Download center.

Before I go further, this is how it looks like when the test suite is run (the output is trimmed a bit):

vk:honeymooners:/opt/SUNWstc-nc$ run_test nc
Validating Arguments...
New TET_ROOT for this run : /var/tmp/honeymooners_27828
The results will be available in /var/tmp/results.27828
tcc: journal file is /var/tmp/results.27828/testlog
12:45:57  Execute /tests/dflag/tc_dflag
12:46:04  Execute /tests/hflag/tc_hflag
12:46:05  Execute /tests/kflag/tc_kflag
12:46:11  Execute /tests/nflag/tc_nflag
12:46:15  Execute /tests/portranges/tc_portranges
12:46:23  Execute /tests/pflag/tc_pflag
12:46:26  Execute /tests/sflag/tc_sflag
12:46:35  Execute /tests/Uflag/tc_Uflag
12:46:36  Execute /tests/vflag/tc_vflag
12:46:43  Execute /tests/zflag/tc_zflag
12:46:46  Execute /tests/iflag/tc_iflag
12:46:59  Execute /tests/lflag/tc_lflag
12:47:29  Execute /tests/rflag/tc_rflag
12:48:16  Execute /tests/Tflag/tc_Tflag
12:48:33  Execute /tests/uflag/tc_uflag
12:48:50  Execute /tests/wflag/tc_wflag
TC /tests/dflag/tc_dflag

TP 1 tc_dflag PASS
TC /tests/hflag/tc_hflag

TP 1 tc_hflag PASS


Number of Tests : 50

PASS            : 50
FAIL            : 0
OTHER           : 0

Test Logs are at /var/tmp/results.27828, Journal File = /var/tmp/results.27828/testlog 


It's been almost a year since I started developing the test suite last Christmas (see the initial blog entry about nc-tet). Since then, I have lost part of the source code in hard drive crash, had to redo the source tree structure, fix ksh style, fix numerous bugs in test suite code and make the test suite more robust. One might ask whether having test suite for such a simple program like nc(1) was worth the hassle. I have only one answer to that: absolutely. First, it gives a confidence of not breaking (most of; see below) existing things when changing/adding functionality and second it helped me (and I hope the others participating/observing the code review on testing-discuss too) to explore what it takes to write a test suite from scratch (I will not go here into details whether I prefer CTI-TET over STF and vice versa).

The Beautiful code book (which I really recommend for anyone tinkering with any source code) contains a chapter called Beautiful tests by Alberto Savoia. I hope that at least some of the test purposes in nc test suite have some degree of beautifulness of at least one of the ways highlighted by Alberto (1. simplicity/efficiency, 2. help making the software being tested better in terms of quality and testability, 3. breadth/thoroughness).

One of the important questions for a test suite is code coverage level. Obviously, for software adhering to the OpenSolaris interface taxonomy model it is important that the test suite exercises all of the Committed interfaces and execution paths around those interfaces. For nc(1) this means a subset of the command line options and their arguments (see PSARC 2007/389 for the actual list). The key is certainly to test the features which are likely to break with an intrusive code change.

Very crude view of test coverage for nc(1) test suite (counting test purposes gives only very remote idea about real coverage but at least provides visual image) looks like this:

       rflag: +
       Tflag: +++++---
       pflag: +
       iflag: +-
       vflag: ++
       kflag: +
       Uflag: +-
       dflag: +
       uflag: ++-
       sflag: +-
       hflag: +
       nflag: +-
       wflag: +
  portranges: +---
       lflag: ++++++++----------

One plus character stands for one positive test purpose, minus is negative test purpose.

Side note: the above ASCII graph was produced using script (which presumes certain naming scheme for test purpose files). Just pipe a file listing into the script with test purpose filenames compliant to the scheme used in ontest-stc2 gate and it will spew out graph similar to the above.

In the above musing about code coverage I left out an important piece - why some of the features are not tested. For nc(1) the yet untested part is the SOCKS protocol support. Basically, this is because test suite environment does not contain SOCKS server to test against. There might not be many people using the -x/-X options but from my own experience nothing is more frustrating than discovering some old dusty corner which had to be fixed long time ago or removed completely. So for now, on my workstation which sits behind SOCKS proxy I have the following in ~/.ssh/config for a server outside corporate network which hosts my personal mailbox so it is accessed every day:

Host bar
  User foo
  # NOTE: for nc(1) testing
  ProxyCommand /usr/bin/nc -x %p
  ForwardAgent no
  ForwardX11 no

This ensures (along with upgrades of the workstation to recent Nevada builds periodically) that SOCKS support gets tested as well. And yes, ssh-socks5-proxy-connect(1) and ssh-http-proxy-connect(1) are not really needed.

Now with the test suite in place, anybody modifying nc(1) (there are some RFEs for nc in the oss-bit-size list and other bugfixes or features are also welcome) can have pretty high confidence that his change will not break things. Yes, this means that more nc(1) features are coming.

Friday Nov 07, 2008

Automatic webrev upload

I will start this one a little bit generically..

Part of standard OpenSolaris/Solaris development process is code review. To facilitate a review, a so-called webrev is needed. A webrev is set of HTML/text/PDF pages and documents which display all the changes between local repository containing the changes and its parent repository. To produce a webrev, simply switch to a repository and run the webrev script (it is part of SUNWonbld package, which can be downloaded from OpenSolaris download center.):

$ cd /local/Source/bugfix-314159.onnv
$ webrev

Assuming /opt/onbld/bin is present in your PATH a webrev will be generated under /local/Source/bugfix-314159.onnv/webrev/ directory.

For OpenSolaris changes, the webrev is usually uploaded to (every OpenSolaris member has an account automatically created for him) which serves it under (where OSol_username is your OpenSolaris username) and a request for review with a link to the webrev is sent to one of the mailing lists relevant to the change.
Dan Price has written a script which produces RSS feed out of recently uploaded webrevs which is pretty handy substitute for feeds from news/headlines/magazines :)

For a long time I was basically doing the following:

$ cd /local/Source/bugfix-foo.onnv && webrev
$ scp -r webrev

This had two flaws: first it was slow (because of rcp protocol over SSH channel) and second I had to delete it via separate command (use sftp(1) and rename the old webrev to .trash directory) before uploading new version of the webrev (otherwise couple of permissions errors would follow).

To solve the first problem, rsync (with SSH transport) can be used which makes the upload nearly instantaneous. Second problem can be worked around by using incremental webrevs. Still, this does not seem good enough for code reviews with many iterations.

So, the change made in CR 6752000 introduces the following command line options for automatic webrev upload:

  • -U uploads the webrev
  • -n suppresses webrev generation
  • -t allows to specify custom upload target

webrev.1 man page has been updated to explain the usage. For common OpenSolaris code reviews the usage will probably mostly look like this:

$ webrev -O -U

This will upload the webrev to under directory named according to local repository name. Further invocations will replace the remote webrev with fresh version.
But it is possible to get more advanced. After the initial webrev is posted, an incremental webrev can be both generated and posted. Assuming you're switched to the repository (via bldenv) and we're dealing with 4th round of code review the following command will perform the task:

webrev_name=`basename $CODEMGR_WS`
webrev -O -U -o $CODEMGR_WS/${webrev_name}.rd4 \\
    -p $CODEMGR_WS/${webrev_name}.rd3

The above commands hide maybe not-so-obvious behavior so I'll try to explain it in the table:

| command                   | local webrev directory | remote webrev directory                             |
| webrev -O -U              | $CODEMGR_WS/webrev/    |`basename $CODEMGR_WS` |
| webrev -O -o \\            | $CODEMGR_WS/my_webrev/ |              |
|   $CODEMGR_WS/my_webrev   |                        |                                                     |
| webrev -O \\               | $CODEMGR_WS/fix.rd2/   |                |
|  -p $CODEMGR_WS/fix.rd1 \\ |                        |                                                     |
|  -o $CODEMGR_WS/fix.rd2   |                        |                                                     |

Basically, without the -o flag webrev will generate the webrev to local directory named 'webrev' but it will upload it to the directory named after basename of local repository. With the -o flag webrev will use the name of root directory of the repository it is called from for both local and remote storage. This is done to keep the default behavior of generating local webrev to directory named 'webrev'. At the same time, uploading different webrevs to the same remote directory named 'webrev' does not make sense.

NOTE: This behavior is also valid in the case when not enclosed in a repository via ws or bldenv, I have just used $CODEMGR_WS to express root directory of a workspace/repository.

Also, now it is possible to call webrev from within Cadmium Mercurial plugin, so all webrev commands can be prefixed with hg.

All in all, it was fun dealing with webrevs of webrev. I am looking forward to more entries in the RSS feed :)

NOTE: It will take some time before the changes appear in SUNWonbld packages offered by the download center so it's better to update the sources from the ssh:// repository and build and upgrade the SUNWonbld package from there.


blog about security and various tools in Solaris


« October 2016