X

Solaris serviceability and nifty tools

Recent Posts

Solaris

Crashdump restructuring in Solaris

In Solaris 11.2 the crashdump restructuring project changed the way how dump data are stored. Data which are not pure kernel pages now go into separate files. Together with my colleague Sriman we made it to happen.The first noticeable change was a change in the layout of the crash directory. The files are stored under /var/crash/data/uuid/ directory.The long hexadecimal string (uuid) was added to better align with FMA - it's actually uuid (universally unique ID) of the crash event which can be found in fmadm faulty output. Actually, if you look at FMA panic events from earlier versions you can see that the resource string for the event was already designed this way, it's just materialized with this project.For example, after 2 panic events the /var/crash directory will look like this:0 -> data/404778fb-88da-4188-f222-8da174d44fa41 -> data/6e50417e-95fc-4ab4-e9a8-abbe80bc6b48boundsdata/ 404778fb-88da-4188-f222-8da174d44fa4/ vmcore-zfs.0 vmcore.0 vmdump-zfs.0 vmdump.0 6e50417e-95fc-4ab4-e9a8-abbe80bc6b48/ vmdump-zfs.1 vmdump.1The 0, 1 symlinks maintain the sequential ordering of the old layout.The example reflects a configuration when savecore is not automatically run after boot (i.e. dumpadm -n is in effect) and the administrator has extracted the first crash dump by hand (running savecore 0 in /var/crash/0/ directory). If you take a look at the console after the system rebooted after panic there are commands which you can copy-n-paste to the terminal to perform the extraction.The other change in the above example is that there is new vmcore-zfs.N file. This is not the only new file which can appear. Depending on dumpadm(1M) configuration there can be files like:vmcore.N - core kernel pagesvmcore-zfs.N - ZFS metadata (ZIO buffers)vmcore-proc.N - process pagesvmcore-other.N - other pages (ZFS data, free pages)By splitting the dump into multiple files it is possible to transfer just vmcore.N file for analysis to quickly assess what caused the panic and transfer the rest of the files later on.If any of the "auxiliary" files is missing, mdb will report it:root@va64-v20zl-prg06:/var/crash/0# mdb 0mdb: failed to locate file ./vmcore-zfs.0. Contents of ZFS metadata (ZIO buffers) will not be availableLoading modules: [ unix genunix specfs dtrace mac cpu.generic cpu_ms.AuthenticAMD.15 uppc pcplusmp zvpsm scsi_vhci zfs mpt sd ip hook neti arp usba kssl sockfs lofs idm cpc nfs fcip fctl ufs logindmux ptm sppp ipc ]> ::statusdebugging crash dump vmcore.0 (64-bit) from va64-v20zl-prg06operating system: 5.12 dump.on12 (i86pc)usr/src version: 23877:f2e76e2d0329:on12_51+48usr/closed version: 1961:cad017e4c7e4:on12_51+4image uuid: 404778fb-88da-4188-f222-8da174d44fa4panic message: forced crash dump initiated at user requestcomplete: yes, all pages present as configureddump content: kernel [LOADED,UNVERIFIED] (core kernel pages) zfs [MISSING] (ZFS metadata (ZIO buffers))panicking PID: 101575 (not dumped)Another choice is not to dump some of the sections at all. E.g. to dump only pages kernel and currently running process at the time of panic but not ZFS metadata the system can be configured as:dumpadm -c curproc-zfsAlso, the unix.N file is no longer extracted automatically (it can be done with -u option for savecore(1M) if you need the file for some reason) since it is embedded in vmcore.N file; mdb will find it automatically.How to load the files into mdb with all these files around ? The easiest way how to access an extracted dump is to use just the suffix, i.e. mdb Nwhich will pick up all the files based on metadata in the main vmcore.N file. This worked even before this change, except there was just one file (2 if counting unix.N).It is still possible to specify the files by hand, just remember to put the main file (vmcore.N) as the first argument: mdb vmcore.N vmcore-zfs.N ...The other change (which is hard to notice unless you're dealing with lots of crash dump files) was that we laid out the infrastructure in kernel/mdb/libkvm to be properly backwards compatible w.r.t. on-disk crash dump format. As a result mdb will automatically load crash dump files produced in earlier Solaris versions. Currently it supports 3 latest versions.

In Solaris 11.2 the crashdump restructuring project changed the way how dump data are stored. Data which are not pure kernel pages now go into separate files. Together with my colleague Sriman we made...

Personal

OpenGrok 0.11 setup on Solaris 11

OpenGrok 0.11 has been just released (see Lubos' post with release notes). This is nice version number coincidence to try it on Solaris 11.In case you are wondering what is OpenGrok, it is blindigly fast source code search and cross reference engine accessible over web.It is written in Java. It is also behind the source code browser on src.opensolaris.org, albeit running older version.For more information about the project take a look at its project page.Now, how to get OpenGrok running for your source code base on Solaris 11. I will illustrate this on source code coming from three different Source Code Management systems (for complete support see the full list).The complete setup on freshly installed Solaris 11 has 6 main steps:Install pre-requisities first:install couple of Source Code Management systems (depends on your needs):install Mercurial: pkg install developer/versioning/mercurialinstall CVS pkg install developer/versioning/cvsgit pkg install developer/versioning/gitdownload, compile and install exuberant ctags:pkg install developer/gcc-45pkg install system/headerwget http://prdownloads.sourceforge.net/ctags/ctags-5.8.tar.gztar xfz ctags-5.8.tar.gzcd ctags-5.8./configure && make && make installinstall Tomcat6: pkg install web/java-servlet/tomcatNow download and install OpenGrok package:location=http://hub.opensolaris.org/bin/download/Project+opengrok/files/pkgadd -d $location/OSOLopengrok-0.11.pkg OSOLopengrokMirror some source code as webservd user (note that OpenGrok by itself does not synchronize or mirror source code, this has to be done separately):cd /var/opengrok/src/cvs -d anonymous@cvs.openssl.org:/openssl-cvs co -rOpenSSL_1_0_0-stable opensslhg clone ssh://anon@hg.opensolaris.org/hg/opengrok/trunk opengrok-devgit clone http://git.samba.org/samba.gitRun the following first (as root) to make sure history indexing does not prompt to confirm the identify when consulting with remote repositories (CVS):# store the pubkeysssh-keyscan -t rsa,dsa cvs.openssl.org >> /etc/ssh/known_hostsssh-keyscan -t rsa,dsa hg.opensolaris.org >> /etc/ssh/known_hostsDeploy and start the web application (as root):EXUBERANT_CTAGS=/usr/local/bin/ctags \ /usr/opengrok/bin/OpenGrok deploy && \ svcadm enable tomcat6Index the source code and send the configuration to the running instance (as webservd user):EXUBERANT_CTAGS=/usr/local/bin/ctags \ /usr/opengrok/bin/OpenGrok indexenable the service (as root):svcadm enable opengrokOpenGrok is now accessible at http://SERVER_HOSTNAME:8080/source/ (where SERVER_HOSTNAME is the hostname of the server on which the above setup was done).Except for the part with ctags it is pretty streamlined and no brainer process. Hopefully the exuberant-ctags package will be available again from the standard Oracle pkg repositories.And here is the result:

OpenGrok 0.11 has been just released (see Lubos' post with release notes). This is nice version number coincidence to try it on Solaris 11.In case you are wondering what is OpenGrok, it is blindigly...

Solaris

Netcat I/O enhancements

When Netcat integrated into OpenSolaris it was already clear that there will be couple of enhancements needed. The biggest set of the changes made after Solaris 11 Express was released bringsvarious I/O enhancements to netcat shipped with Solaris 11. Also, since Solaris 11, the netcat package is installed by default in all distribution forms (live CD, text install, ...).Now, let's take a look at the new functionality:/usr/bin/netcat alternative program name (symlink) -b bufsize I/O buffer size -E use exclusive bind for the listening socket -e program program to execute -F no network close upon EOF on stdin -i timeout extension of timeout specification -L timeout linger on close timeout -l -p port addr previously not allowed usage -m byte_count Quit after receiving byte_count bytes -N file pattern for UDP scanning -I bufsize size of input socket buffer -O bufsize size of output socket buffer -R redir_spec port redirection addr/port[/{tcp,udp}] syntax of redir_spec -Z bypass zone boundaries -q timeout timeout after EOF on stdin Obviously, the Swiss army knife of networking tools just got a bit thicker.While by themselves the options are pretty self explanatory, their combination together with other options,context of use or boundary values of option arguments make it possible to construct small but powerful tools.For example:the port redirector allows to convert TCP stream to UDP datagrams.the buffer size specification makes it possible to sendone byte TCP segments or to produce IP fragments easily.the socket linger option can be used to produce TCP RST segments by setting the timeout to 0execute option makes it possible to simulate TCP/UDP servers or clients with shell/python/Perl/whatever scriptetc.If you find some other helpful ways use please share via comments.Manual page nc(1) contains more details, along with examples on how to use some of these new options.

When Netcat integrated into OpenSolaris it was already clear that there will be couple of enhancements needed. The biggest set of the changes made after Solaris 11 Express was released bringsvarious...

Solaris

netcat and network stack debugging

One of the options of the netcat program (/usr/bin/nc) available in OpenSolaris (if you don't have it installed simply run pkg install SUNWnetcat)is the -D (aka debugging) option. Only recently I realized that not everyone out there knows how to capture the debugging data once this optionis set since this is pretty Solaris specific. What this option does is basically trigger a flag inside the ip kernel module specific for givenconnection structure (conn_t). netcat does this by calling setsockopt() system call with SO_DEBUG option set for givensocket. As a result conn_debug flag is set inside conn_t structure associated with the socket. This flag is then consulted in various functions manipulating the structure.When there is an interesting event and the conn_debug is set the function calls strlog() kernel function to record the data.For example, here's a snippet of usr/src/uts/common/inet/tcp/tcp_input.c:tcp_input_listener(): 1366 if (listener->tcp_conn_req_cnt_q >= listener->tcp_conn_req_max) { 1367 mutex_exit(&listener->tcp_eager_lock); 1368 TCP_STAT(tcps, tcp_listendrop); 1369 TCPS_BUMP_MIB(tcps, tcpListenDrop); 1370 if (lconnp->conn_debug) { 1371 (void) strlog(TCP_MOD_ID, 0, 1, SL_TRACE|SL_ERROR, 1372 "tcp_input_listener: listen backlog (max=%d) " 1373 "overflow (%d pending) on %s", 1374 listener->tcp_conn_req_max, 1375 listener->tcp_conn_req_cnt_q, 1376 tcp_display(listener, NULL, DISP_PORT_ONLY)); 1377 } 1378 goto error2; 1379 }To capture the data logged via strlog it's necessary to know the STREAMS module ID, which in our case isTCP_MOD_ID which is defined in usr/src/uts/common/inet/tcp_impl.h as 5105.To read the data one can use either strace(1M) command line tool or strerr(1M) daemon which both produce text logs.To read everything, one can use this command (needs read access to /dev/log so has to run under root):# strace 5105 all allHere are two examples using netcat: at first we try to bind to a port which we don't have privileges for:$ nc -D -l -4 -p 23nc: Permission deniedwhich produces the following entry from strace:000004 12:11:04 19c7ff6a 1 ..E 5105 0 ip: [ID 234745 kern.debug] tcp_bind: no priv for port 23next we try to bind to already occupied port:$ nc -D -l -4 -p 4444nc: Address already in usewhich produces the following entry from strace:000005 12:15:33 19c86878 1 ..E 5105 0 ip: [ID 326978 kern.debug] tcp_bind: requested addr busyThis is of course traceable via normal tools such as ppriv(1) or truss(1) but the point is that much fine grained detailscan be captured from the network modules. The format of the log entries is explained in the strace(1M) man page.

One of the options of the netcat program (/usr/bin/nc) available in OpenSolaris (if you don't have it installed simply run pkg install SUNWnetcat)is the -D (aka debugging) option. Only recently I...

Solaris

ZFS likes to have ECC RAM

I have been using custom built {ZFS,OpenSolaris}-based NAS at home for more than a year. The machinewas built partly from second hand components (e.g. motherboard), from in-house unused ironand from minority of brand new stuff (more on that in a separate entry). The machine has been runningconstantly and serving data occasionally with very light load. One day I needed to performsome administrative task and realized it's not possible to SSH into the machine. Console loginrevealed the uptime is just couple of days, both pools (root pool and data pool) contain staggering number of checksum errors. In the /var/crash/ directory there was couple of crash dumps. Some of themwere corrupted and mdb(1) refused to load them in or reported garbage. The times of the crashes corresponded to the Sunday night scrubbing for each of the pool. At least two of the dumps contained interesting and fairlyobvious stack trace. I no longer have the file so here's just the entry from the log:Nov 1 02:27:20 chiba \^Mpanic[cpu0]/thread=ffffff0190914040: Nov 1 02:27:20 chiba genunix: [ID 683410 kern.notice] BAD TRAP: type=d (#gp General protection) rp=ffffff0006822380 addr=488bNov 1 02:27:20 chiba unix: [ID 100000 kern.notice] Nov 1 02:27:20 chiba unix: [ID 839527 kern.notice] sh: Nov 1 02:27:20 chiba unix: [ID 753105 kern.notice] #gp General protectionNov 1 02:27:20 chiba unix: [ID 358286 kern.notice] addr=0x488bNov 1 02:27:20 chiba unix: [ID 243837 kern.notice] pid=740, pc=0xfffffffffba0373a, sp=0xffffff0006822470, eflags=0x10206Nov 1 02:27:20 chiba unix: [ID 211416 kern.notice] cr0: 8005003b cr4: 6f8Nov 1 02:27:20 chiba unix: [ID 624947 kern.notice] cr2: fee86fa8Nov 1 02:27:20 chiba unix: [ID 625075 kern.notice] cr3: b96a0000Nov 1 02:27:20 chiba unix: [ID 625715 kern.notice] cr8: cNov 1 02:27:20 chiba unix: [ID 100000 kern.notice] Nov 1 02:27:20 chiba unix: [ID 592667 kern.notice] rdi: ffffff018b1e1c98 rsi: ffffff01a032dfb8 rdx: ffffff0190914040Nov 1 02:27:20 chiba unix: [ID 592667 kern.notice] rcx: ffffff018ef054b0 r8: c r9: bNov 1 02:27:20 chiba unix: [ID 592667 kern.notice] rax: ffffff01a032dfb8 rbx: 0 rbp: ffffff00068224a0Nov 1 02:27:20 chiba unix: [ID 592667 kern.notice] r10: 0 r11: 0 r12: ffbbff01a032d740Nov 1 02:27:20 chiba unix: [ID 592667 kern.notice] r13: ffffff01a032dfb8 r14: ffffff018b1e1c98 r15: 488bNov 1 02:27:20 chiba unix: [ID 592667 kern.notice] fsb: 0 gsb: fffffffffbc30400 ds: 4bNov 1 02:27:20 chiba unix: [ID 592667 kern.notice] es: 4b fs: 0 gs: 1c3Nov 1 02:27:20 chiba unix: [ID 592667 kern.notice] trp: d err: 0 rip: fffffffffba0373aNov 1 02:27:20 chiba unix: [ID 592667 kern.notice] cs: 30 rfl: 10206 rsp: ffffff0006822470Nov 1 02:27:20 chiba unix: [ID 266532 kern.notice] ss: 38Nov 1 02:27:20 chiba unix: [ID 100000 kern.notice] Nov 1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822260 unix:die+10f ()Nov 1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822370 unix:trap+43e ()Nov 1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822380 unix:_cmntrap+e6 ()Nov 1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff00068224a0 genunix:kmem_slab_alloc_impl+3a ()Nov 1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff00068224f0 genunix:kmem_slab_alloc+a1 ()Nov 1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822550 genunix:kmem_cache_alloc+130 ()Nov 1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff00068225c0 zfs:dbuf_create+4e ()Nov 1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff00068225e0 zfs:dbuf_create_bonus+2a ()Nov 1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822630 zfs:dmu_bonus_hold+7e ()Nov 1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff00068226c0 zfs:zfs_zget+5a ()Nov 1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822780 zfs:zfs_dirent_lock+3fc ()Nov 1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822820 zfs:zfs_dirlook+d9 ()Nov 1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff00068228a0 zfs:zfs_lookup+25f ()Nov 1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822940 genunix:fop_lookup+ed ()Nov 1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822b80 genunix:lookuppnvp+3a3 ()Nov 1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822c20 genunix:lookuppnatcred+11b ()Nov 1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822c90 genunix:lookuppn+5c ()Nov 1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822e90 genunix:exec_common+1ac ()Nov 1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822ec0 genunix:exece+1f ()Nov 1 02:27:20 chiba genunix: [ID 655072 kern.notice] ffffff0006822f10 unix:brand_sys_syscall32+19d ()Nov 1 02:27:20 chiba unix: [ID 100000 kern.notice] Nov 1 02:27:20 chiba genunix: [ID 672855 kern.notice] syncing file systems...Also, next to the messages on the console I found some entries in /var/adm/messages like this one:Nov 2 12:15:01 chiba genunix: [ID 647144 kern.notice] ksh93: Cannot read /lib/amd64/ld.so.1Later on, the condition of the machine worsened and it was not even possible to execute some commands due to I/O errors up to the point when the machine had to be halted.The panic occurring in kmem routines, loads of checksum errors on both mirrored pools (same number of errors for each disk in the mirror) and the fact that the system was running with the same build for couple ofmonths without a problem lead me to try memtest:The errors started appearing on the screen in the first couple of seconds of the run. It turned out one ofthe 3 1GB DDR2 chips went bad. In case you're wondering, the DIMMS were bought as new 1 year ago, were branded(all of them from the same brand known for gaming/overclocking equipment, same type) and had aluminium heat sinkon it, so no low quality stuff.I was able to recover the data from past snapshots and replaced the RAM with ECC DIMMS (which required new motherboard+CPU combo). This is nice case of semi-silent data corruption detection. Without checksums the machinewould be happily panicking and corrupting data without giving clear indication what is going on (e.g. which fileswere corrupted). So, even for home NAS solution ECC RAM is good (if not essential) to have. FMA should do the right thing if one of the ECC modules goes bad which means it will not allow the bad pages to be used(the pages will be retired). The list of retired pages is persistent across reboots. More on FMA and ECC RAMcan be found e.g. in this discussionon fm-discuss or in the FMA and DIMM serial numbers blog entry in Rob Johnston's blog or in the Eversholt rules for AMD inusr/src/cmd/fm/eversholt/files/i386/i86pc/amd64.esc.

I have been using custom built {ZFS,OpenSolaris}-based NAS at home for more than a year. The machine was built partly from second hand components (e.g. motherboard), from in-house unused ironand from...

Solaris

signal() versus sigaction() on Solaris

This entry is mostly for newcomers to Solaris/OpenSolaris from UNIX-like systems. When I had been taughtabout signal() and sigaction() my understanding was that sigaction() is just asuperset of signal() and also POSIX conformant but otherwise they accomplish the same thing.This is indeed the case for some of UNIX-like operating systems. In Solaris, as I only recently discovered (to my dismay :)), it's different.Consider the following code (please ignore the fact it's not strictly checking return values and that the signal handler is not safe):#include <stdio.h>#include <signal.h>#include <unistd.h>#include <sys/types.h>void sig_handler(int s) {printf("Got signal! Sleeping.\\n");sleep(10); printf("returning from signal handler\\n");}int main(void) { struct sigaction s_action;printf("Setting signal handler: ");#ifdef POSIX_SIGNALSprintf("sigaction\\n");(void) sigemptyset(&s_action.sa_mask);s_action.sa_handler = sig_handler; s_action.sa_flags = 0;(void) sigaction(SIGHUP, &s_action, (struct sigaction \*) NULL);#elseprintf("signal\\n");signal(SIGHUP, sig_handler);#endifprintf("Waiting for signal\\n");while(1)pause();return (0);}Now try to compile and run with and without the -DPOSIX_SIGNALS and send 2 SIGHUP signals to the processwithin the 10 seconds window (so the second signal is received while the signal handler is still running).With sigaction(), the signal will be caught by the handler in both of the cases.With signal() however, the second signal will cause the process to exit. This is because kernelwill reset the signal handler to default upon receiving the signal for the first time. This is described in the signal(3C)man page in a somewhat hidden sentence inside the second paragraph (it really pays out to read man pages slowly andwith attention to detail): If signal() is used, disp is the address of a signal handler, and sig is not SIGILL, SIGTRAP, or SIGPWR, the system first sets the signal's disposition to SIG_DFL before executing the signal handler.The sigaction(2) man page has this section: SA_RESETHAND If set and the signal is caught, the dispo- sition of the signal is reset to SIG_DFL and the signal will not be blocked on entry to the signal handler (SIGILL, SIGTRAP, and SIGPWR cannot be automatically reset when delivered; the system silently enforces this restriction).sigaction() does not set the flag by default which results in the different behavior.I found out that this behavior has been present since Solaris 2.0 or so.In fact, signal() routine from libc is implemented via sigaction(). From $SRC/lib/libc/port/sys/signal.c: 58 /\* 59 \* SVr3.x signal compatibility routines. They are now 60 \* implemented as library routines instead of system 61 \* calls. 62 \*/ 63 64 void(\* 65 signal(int sig, void(\*func)(int)))(int) 66 { 67 struct sigaction nact; 68 struct sigaction oact; 69 70 CHECK_SIG(sig, SIG_ERR); 71 72 nact.sa_handler = func; 73 nact.sa_flags = SA_RESETHAND|SA_NODEFER; 74 (void) sigemptyset(&nact.sa_mask); 75 76 /\* 77 \* Pay special attention if sig is SIGCHLD and 78 \* the disposition is SIG_IGN, per sysV signal man page. 79 \*/ 80 if (sig == SIGCHLD) { 81 nact.sa_flags |= SA_NOCLDSTOP; 82 if (func == SIG_IGN) 83 nact.sa_flags |= SA_NOCLDWAIT; 84 } 85 86 if (STOPDEFAULT(sig)) 87 nact.sa_flags |= SA_RESTART; 88 89 if (sigaction(sig, &nact, &oact) < 0) 90 return (SIG_ERR); 91 92 return (oact.sa_handler); 93 }I am pretty sure that the SA_RESETHAND flag is set in signal() in order to preserve backwards compatibility.This means that to solve this problem with signal(), one should set the signal handler again in the signal handler itself.However, this is not a complete solution since there is still a window where the signal can be delivered andthe handler is set to SIG_DFL - the default handler which is exit in case of SIGHUP asthe signal.h(3HEAD) man page explains in really useful table: Name Value Default Event SIGHUP 1 Exit Hangup (see termio(7I)) ...Now let's look at FreeBSD. Its SIGNAL(3) man page contains this separate paragraph: The handled signal is unblocked when the function returns and the process continues from where it left off when the signal occurred. Unlike previ- ous signal facilities, the handler func() remains installed after a sig- nal has been delivered.The second sentence is actually printed in bold letters. I also tried on Linux and NetBSD andthe behavior is the same as in FreeBSD.So, to conclude all of the above: using signal() is really not portable.

This entry is mostly for newcomers to Solaris/OpenSolaris from UNIX-like systems. When I had been taught about signal() and sigaction() my understanding was that sigaction() is just a superset of signa...

Solaris

Netcat as small packet factory

Recently I needed to test a bug fix in in.iked(1M) (should say libike.so with which in.iked is linked)after which the daemon should respondto IKEv2 requests with Notification message telling the peer to fall back to IKEv1 (previouslyit did not respond to IKEv2 packets at all). This can be tested by:installing a OS instance which supports IKEv2 and initiating from therewriting a simple program (C/Perl/etc.) which will construct the UDP payloadSurely, there should be easier way how to send a UDP paket with arbitrary (in my case ISAKMP) payload.It turns out this is very easy to do just from command line with nc(1) which is available in OpenSolaris(install it via 'pkg install SUNWnetcat'). Let's try to send some garbage first to see if it works:perl -e 'print "\\x41\\x41";' | nc -u rpe-foo.czech 500Yep, tshark(1) (in OpenSolaris shipped by default with Wireshark) reports an IKE packet, malformed one (which is not surprising):Capturing on eri0 0.000000 10.18.144.12 -> 10.18.144.11 ISAKMP [Malformed Packet]0000 00 0a e4 2f 61 eb 00 03 ba 4e 3d 38 08 00 45 00 .../a....N=8..E.0010 00 1e 26 98 40 00 ff 11 20 fb 0a 12 90 0c 0a 12 ..&.@... .......0020 90 0b e2 66 01 f4 00 0a 34 57 41 41 ...f....4WAAOur two A's are there just after the UDP header (Ethernet header 14 bytes, IP header 20 bytes, UDP 8 bytes, in sum 42 bytes and our 2 bytes are just after first 8 bytes on 3rd line).With that we can go and construct IKEv1 packet first to see if the daemon will react upon it.We will need to construct the payload which is a IKEv1 header. IKEv1 is defined in RFC 2409 (The Internet Key Exchange (IKE)). IKEv1 uses ISAKMP header definition so we need to look into RFC 2408 (Internet Security Association and Key Management Protocol (ISAKMP)) for the actual header definition. It's there in section 3.1: 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ! Initiator ! ! Cookie ! +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ! Responder ! ! Cookie ! +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ! Next Payload ! MjVer ! MnVer ! Exchange Type ! Flags ! +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ! Message ID ! +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ! Length ! +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+I'd like to construct a packet which resembles first packet sent by IKEv1 Initiator.So, our packet code (similar to shell code) will look like this (without thinking toomuch of what should the values look like):Initiator's cookie, must not be zero \\x11\\x22\\x33\\x44\\x55\\x66\\x77\\x88Responder's cookie, must be zero in the initial packet from Initiator \\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00next payload, let's try 0 first \\x00Major and Minor Version (4 bits each) \\x10Exchange Type Exchange Type Value NONE 0 Base 1 Identity Protection 2 Authentication Only 3 Aggressive 4 Informational 5 ISAKMP Future Use 6 - 31 DOI Specific Use 32 - 239 Private Use 240 - 255 So let's try Base first: \\x01Flags (Initiator) \\x00Message ID \\x66\\x66\\x66\\x66Length \\x28We need to massage our packet code into command line. The code: \\x11\\x22\\x33\\x44\\x55\\x66\\x77\\x88 \\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00 \\x00 \\x10 \\x01 \\x00 \\x66\\x66\\x66\\x66 \\x28We want source port to be 500 as well because of section 2.5.1 in RFC 2408 so use the -p option (this requires the net_privaddr privilege so either become root or use pfexec(1)).Also, we do not need to wait for the response so use -w option:perl -e 'print "\\x11\\x22\\x33\\x44\\x55\\x66\\x77\\x88\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x10\\x01\\x00\\x66\\x66\\x66\\x66\\x28";' \\ | nc -w 1 -p 500 -u rpe-foo.czech 500The packet was received but there was no reply and tshark still considers this as Malformed Packet.Let's check the header again - oh yeah, the Length field has 4 bytes, not just one. Let's try again:perl -e 'print "\\x11\\x22\\x33\\x44\\x55\\x66\\x77\\x88\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x10\\x01\\x00\\x66\\x66\\x66\\x66\\x28\\x00\\x00\\x00";' \\ | nc -w 1 -p 500 -u rpe-foo.czech 500Okay, this is our Base exchange but still not response:294.029154 10.18.144.12 -> 10.18.144.11 ISAKMP Base0000 00 0a e4 2f 61 eb 00 03 ba 4e 3d 38 08 00 45 00 .../a....N=8..E.0010 00 38 26 a7 40 00 ff 11 20 d2 0a 12 90 0c 0a 12 .8&.@... .......0020 90 0b 01 f4 01 f4 00 24 34 71 11 22 33 44 55 66 .......$4q."3DUf0030 77 88 00 00 00 00 00 00 00 00 00 10 01 00 66 66 w.............ff0040 66 66 28 00 00 00 Let's try something more provocative and set the Exchange type to Identity protection:perl -e 'print "\\x11\\x22\\x33\\x44\\x55\\x66\\x77\\x88\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x10\\x02\\x00\\x66\\x66\\x66\\x66\\x28\\x00\\x00\\x00";' \\ | nc -w 1 -p 500 -u rpe-foo.czech 500Oh yeah, this finally deserved a response:383.050874 10.18.144.12 -> 10.18.144.11 ISAKMP Identity Protection (Main Mode)0000 00 0a e4 2f 61 eb 00 03 ba 4e 3d 38 08 00 45 00 .../a....N=8..E.0010 00 38 26 a8 40 00 ff 11 20 d1 0a 12 90 0c 0a 12 .8&.@... .......0020 90 0b 01 f4 01 f4 00 24 34 71 11 22 33 44 55 66 .......$4q."3DUf0030 77 88 00 00 00 00 00 00 00 00 00 10 02 00 66 66 w.............ff0040 66 66 28 00 00 00 ff(...383.051672 10.18.144.11 -> 10.18.144.12 ISAKMP Informational0000 00 03 ba 4e 3d 38 00 0a e4 2f 61 eb 08 00 45 00 ...N=8.../a...E.0010 00 99 d3 8b 40 00 ff 11 73 8c 0a 12 90 0b 0a 12 ....@...s.......0020 90 0c 01 f4 01 f4 00 85 ed 05 11 22 33 44 55 66 ..........."3DUf0030 77 88 85 75 8e 0f fa a5 5d de 0b 10 05 00 69 a5 w..u....].....i.0040 63 e4 00 00 00 7d 00 00 00 61 00 00 00 01 01 10 c....}...a......0050 00 1e 11 22 33 44 55 66 77 88 85 75 8e 0f fa a5 ..."3DUfw..u....0060 5d de 80 0c 00 01 00 06 00 39 55 44 50 20 50 61 ]........9UDP Pa0070 63 6b 65 74 20 64 6f 65 73 20 6e 6f 74 20 63 6f cket does not co0080 6e 74 61 69 6e 20 65 6e 6f 75 67 68 20 64 61 74 ntain enough dat0090 61 20 66 6f 72 20 49 53 41 4b 4d 50 20 70 61 63 a for ISAKMP pac00a0 6b 65 74 80 08 00 00 ket....Now that we proved to ourselves that we can construct semi-valid packet it's time to try IKEv2.IKEv2 header is defined in RFC 4306, section 3.1: 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ! IKE_SA Initiator's SPI ! ! ! +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ! IKE_SA Responder's SPI ! ! ! +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ! Next Payload ! MjVer ! MnVer ! Exchange Type ! Flags ! +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ! Message ID ! +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ! Length ! +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+On a first sight, it looks the same (for backward compatibility). However, some of the values are different. For IKE header, the main differences are in the Exchange Type and Flags: Exchange Type Value RESERVED 0-33 IKE_SA_INIT 34 IKE_AUTH 35 CREATE_CHILD_SA 36 INFORMATIONAL 37 RESERVED TO IANA 38-239 Reserved for private use 240-255IKE_SA_INIT is our guy ('echo 0t34=x | mdb' produces 0x22).The flags are now used to indicate the exchange. Set 3rd bit to say we are the Initiator.We will retain the source port even though IKEv2 supports ports other than 500 and 4500 because we're dealing with IKEv1 implementation.Now slightly change our packet code (don't forget to change the Version field to 2.0):perl -e 'print "\\x11\\x22\\x33\\x44\\x55\\x66\\x77\\x88\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x20\\x22\\x08\\x66\\x66\\x66\\x66\\x28\\x00\\x00\\x00";' \\ | nc -w 1 -p 500 -u rpe-foo.czech 500And we got a nice response (since the responder runs recent version libike.so):1013.190867 10.18.144.12 -> 10.18.144.11 ISAKMP IKE_SA_INIT0000 00 0a e4 2f 61 eb 00 03 ba 4e 3d 38 08 00 45 00 .../a....N=8..E.0010 00 38 26 aa 40 00 ff 11 20 cf 0a 12 90 0c 0a 12 .8&.@... .......0020 90 0b 01 f4 01 f4 00 24 34 71 11 22 33 44 55 66 .......$4q."3DUf0030 77 88 00 00 00 00 00 00 00 00 00 20 22 08 66 66 w.......... ".ff0040 66 66 28 00 00 00 ff(...1013.192005 10.18.144.11 -> 10.18.144.12 ISAKMP Informational0000 00 03 ba 4e 3d 38 00 0a e4 2f 61 eb 08 00 45 00 ...N=8.../a...E.0010 00 83 d3 8d 40 00 ff 11 73 a0 0a 12 90 0b 0a 12 ....@...s.......0020 90 0c 01 f4 01 f4 00 6f 66 da 11 22 33 44 55 66 .......of.."3DUf0030 77 88 5c 36 e3 75 a2 7b 8e fe 0b 10 05 00 87 03 w.\\6.u.{........0040 0c f5 00 00 00 67 00 00 00 4b 00 00 00 01 01 10 .....g...K......0050 00 05 11 22 33 44 55 66 77 88 5c 36 e3 75 a2 7b ..."3DUfw.\\6.u.{0060 8e fe 80 0c 00 01 00 06 00 23 49 6e 76 61 6c 69 .........#Invali0070 64 20 49 53 41 4b 4d 50 20 6d 61 6a 6f 72 20 76 d ISAKMP major v0080 65 72 73 69 6f 6e 20 6e 75 6d 62 65 72 80 08 00 ersion number...0090 00 The only thing which is not nice is our terminal since nc(1) dumped the binary packetto it. Let's try again with some post-processing:# perl -e 'print "\\x11\\x22\\x33\\x44\\x55\\x66\\x77\\x88\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x20\\x22\\x08\\x66\\x66\\x66\\x66\\x28\\x00\\x00\\x00";' | nc -w 1 -p 500 -u rpe-foo.czech 500 | od -c0000000 021 " 3 D U f w 210 237 y 254 264 351 333 007 3440000020 013 020 005 \\0 [ 251 244 j \\0 \\0 \\0 g \\0 \\0 \\0 K0000040 \\0 \\0 \\0 001 001 020 \\0 005 021 " 3 D U f w 2100000060 237 y 254 264 351 333 007 344 200 \\f \\0 001 \\0 006 \\0 #0000100 I n v a l i d I S A K M P m0000120 a j o r v e r s i o n n u m0000140 b e r 200 \\b \\0 \\00000147The fix is obviously in place.

Recently I needed to test a bug fix in in.iked(1M) (should say libike.so with which in.iked is linked) after which the daemon should respondto IKEv2 requests with Notification message telling the peer...

Solaris

OpenSSL PKCS#11 engine presentation

Some light intro first: OpenSSL has a concept of plugins/add-ons called 'engines' which can supply alternativeimplementation of crypto operations (digests, symmetric and asymmetric ciphers and random data generation).The main reason for the existence of the engines is the ability to offload crypto ops to hardware.(Open)Solaris ships with an engine called PKCS#11 engine which provides access to Solaris CryptographicFramework which in turn can provide access to HW crypto.I spent some time fixing bugs in OpenSSL PKCS#11 engine in Solaris so I got quite intimate with its internals.Recently while discussing an upcoming feature with Jan he asked me why oneparticular detail in the engine is done one way and not the other (it's the fork() detection not done via atforkhandlers; for the curious). It took me some thinking to find the answer (I focused on the other changes at that time)which made us realize that it would be good to summarize the design choices behind the engine and also to document the internals so that others can quickly see what's going on inside and also be able to do changesin the engine without reverse engineer the thoughts behind it. The outcome is a set of slides which I hope succinctlydescribe both the overall picture and the gritty details.The presentation can be downloadedhere.

Some light intro first: OpenSSL has a concept of plugins/add-ons called 'engines' which can supply alternativeimplementation of crypto operations (digests, symmetric and asymmetric ciphers and random...

Solaris

External contributions to testing community

I have just integrated couple of changes which I believe are the first contributed externallyto the Testing community as an open-sourcecontribution. The changes add couple of new tests to the nc test suite to coverthe enhancement described in PSARC/2008/680(which is present in Nevada since build 106).This is the stuff which allows you to run nc(1) in client mode with complex portlist specifications. Previouslyit was possible only to use simple port ranges like 22-80, with this change one can connectto e.g. 22,24,50-80,66,1024-2048. Little example how it might be useful:$ nc -v -z grok.czech 22,25,80-88,8080Connection to grok.czech 22 port [tcp/ssh] succeeded!nc: connect to 129.157.71.49 port 25 [host grok.czech] (tcp) failed: Connection refusedConnection to grok.czech 80 port [tcp/\*] succeeded!nc: connect to 129.157.71.49 port 81 [host grok.czech] (tcp) failed: Connection refusednc: connect to 129.157.71.49 port 82 [host grok.czech] (tcp) failed: Connection refusednc: connect to 129.157.71.49 port 83 [host grok.czech] (tcp) failed: Connection refusednc: connect to 129.157.71.49 port 84 [host grok.czech] (tcp) failed: Connection refusednc: connect to 129.157.71.49 port 85 [host grok.czech] (tcp) failed: Connection refusednc: connect to 129.157.71.49 port 86 [host grok.czech] (tcp) failed: Connection refusednc: connect to 129.157.71.49 port 87 [host grok.czech] (tcp) failed: Connection refusednc: connect to 129.157.71.49 port 88 [host grok.czech] (tcp) failed: Connection refusedConnection to grok.czech 8080 port [tcp/\*] succeeded!Back to the testing part. The putback (yes, stcnv-gate is still using Teamware) log for this change looks like this (I have modified Erik's e-mail a bit):6786859 portranges_complex_spec is missing the listener6754842 extended port list specification needs to be testedCode contributed by Erik Trauschke <erik.trauschke AT freenet.de>I think this is really nice example of the ideal state - the contributor not only did the feature part butalso the testing part. It shows a great degree of responsibility - not just throwing some code "over the fence"but fully participating in the process to ensure the quality even in the long term.The tests are both positive and negative. Each purpose in portrangesdirectory is numbered and the following numbers match the test purpose numbers:5-12 ensure nc treats ports 0 and 65536 as invalid Previously, it was possible to listen on ports 0 and 65536, with Erik's changes this is no longer true so we need to add regression tests for both cases (client/server) and both ports.13-19 see if various malformed port list specifications are considered as invalid Each purpose needs not only positive tests which make sure the functionality actually works but also negative tests which ensure it does not misbehave. In this case, invalid port list specifications are thrown at nc(1) to see it reacts accordingly (with error, that is).20-25 test the functionality of complex port lists This is the bunch of test which see if the functionality actually works.26 tests reallocation Since the internal representation of the port list is now dynamically allocated and there is a default pre-allocated value which is reallocated if needed we need to test the case of reallocation.To be able to do such integration there is now a Test development process.It's similar to the process used in ON communitybut it's more lightweight. The main difference is that the request-sponsor part is done informally via the testing-discuss mailing list and there is no list of bugs to pick up from. But don'tbe shy, whether you're adding new functionality or completely new program, the Testing community is here to help you.

I have just integrated couple of changes which I believe are the first contributed externally to the Testing community as an open-source contribution. The changes add couple of new tests to thenc...

Solaris

Collateral salutariness

Each build of (Open)Solaris is tested with a variety of test suites on variety of platformsand I wanted nc test suite to participate in these runs.Eoin Hughes from PIT team (which runs those tests) was kind enough to workaround couple of bugs (which are fixed now)in the test suite so it can be run in PIT environment. Later on, I got a report from Eointhat as a result of nc test suite run CR 6793191 (watchmalloc triggers system panic on sockfs copyin) was caught. This bug is manifested by a panic:Panic message (this particular panic is on a DomU, although this happens across the board):panic[cpu0]/thread=ffffff0150ce1540: copyin_noerr: argument not in kernel address spaceffffff000416dcf0 unix:bcopy_ck_size+102 ()ffffff000416ddb0 genunix:watch_xcopyin+151 ()ffffff000416dde0 genunix:watch_copyin+1d ()ffffff000416de50 sockfs:copyin_name+91 ()ffffff000416deb0 sockfs:bind+90 ()ffffff000416df00 unix:brand_sys_syscall32+328 ()The bug is actually a regression caused by CR 6292199 (bcopy and kcopy should'nt use rep, smov) andwas fixed by an engineer from Intel in OpenSolaris/Nevada code base.This is instance of an event which I like so much - unintended positive consequence elsewhere. In contrast withso called collateral damage this is somethingwhich is beneficial in other areas. I've written nc test suite to test primarily nc(1) command but here it provedto be useful for testing other areas of the system as well. In this case it was thanks to the fact that thetest suite is run with memory leak checking by default (see NC_PRELOADS variable in src/suites/net/nc/include/vars file).And yes, CR 6793191 is fixed by now.

Each build of (Open)Solaris is tested with a variety of test suites on variety of platforms and I wanted nc test suite to participate in these runs.Eoin Hughes from PIT team (which runs those tests)...

Solaris

Command line history and editing for IPsec utilities

Since the days when John Beck added command line editing to zonecfgMark Phalan did similar thing to Kerberos utilities and Huie-Ying Lee to sftp. IPsec utilities (ipseckey(1M) and ikeadm(1M))offered the ability to enter commands in interactive mode for a long time but only since Nevada build 112, the commands support command line editing and history too. Again, thanks to libtecla (shipped with Solaris/OpenSolaris).Lessons learned:adding full-blown command line editing support is hard. Adding the initial support is quite easy. However, more advanced features could require substantial work. This is especially true for tab completion. For sftp Huie-Ying decided to add tab completion in the future phase because of the ambiguities when completing names of files (when to complete local files versus remote files). I did the same with tab completion for IPsec utilities - the integration only delivers basic command line editing support, without tab completion. The problem with ipseckey(1M) and ikeadm(1M) is that their grammar is quite bifurcated and has contexts. For example, you cannot use encr_alg with AH SAs in ipseckey. Or, it would be erroneous to tab complete a valid command in the middle of entering a key if the key hex sequence contained sub-string of a valid command. The hardest part is I think offering the right tables of valid commands in given context. E.g. in our case a command line in our case must start with top-level command. Each top-level command offers several valid sub-commands and we do not offer invalid sub-commands for given top-level command so there is a necessity to track the state of the finite state machine describing the grammar contexts. Also, after the user entered src we do not want to allow him to enter it again on the same command line. Also, if the user already entered say add esp spi we are expecting SPI number, not a command name. Ideally, to solve this problem in nice way there should be a meta library (or additional API in libtecla) which would offer the ability to link command tables and set the contexts.interruptible cycles in command line modeipseckey's monitor command reads from a PF_KEY socket in a loop. The loop is normally interruptible by SIGINT. To do so in libtecla environment (we do not want to exit the command line upon SIGINT and yet still need to interrupt the cycle), something like this is needed: static void monitor_catch(int signal) { if (!interactive) errx(signal, gettext("Bailing on signal %d."), signal); } void doreadcycle(void) { ... /\* Catch \^C. \*/ newsig.sa_handler = monitor_catch; newsig.sa_flags = 0; (void) sigemptyset(&newsig.sa_mask); (void) sigaddset(&newsig.sa_mask, SIGINT); (void) sigaction(SIGINT, &newsig, &oldsig); for (; ; ) { rc = read(keysock, samsg, sizeof (get_buffer)); /\* handle the data \*/ } /\* restore old behavior \*/ if (interactive) (void) sigaction(SIGINT, &oldsig, NULL); }interaction with SMF While it's fine to bail out in interactive mode with error, due to the nature of IPsec commands (they can read the config files using the same routines as for interactive mode and they are used as SMF services to bring up IPsec policy and keys after boot) we need to distinguish the interactive and non-interactive mode.maximum command line history value It seems that the second parameter to new_GetLine() - histlen is commonly misunderstood. This variable does not express the number of maximum lines in the history but instead maximum size of the history buffer in bytes. If the buffer becomes full, libtecla does not trim the last line but shifts instead. Given the first parameter to new_GetLine() expresses maximum command line size (in bytes) one needs to do some calculations and estimates on what will be needed too avoid too big buffer - ipseckey is used to enter key material so the line could become quite long. Say we wanted to keep 1024 lines. If the maximum length of the line is 1024 this will give us 1 megabyte buffer which seems too much for a simple application. Thus I did some guessing and set the buffer size accordingly: For "common" ipseckey configuration commands (think moderately bifurcated 'add') it's cca 300 characters. Mostly however, the users enter query commands like 'flush esp', 'dump ah' and the like so this is somewhere around say 30 characters. Say 30% of the commands are configuration and the rest is queries. To hold 100 such commands only cca 10K memory is required. In the end I chose 64K to be able to hold 15 of the biggies (4K) commands.

Since the days when John Beck added command line editing to zonecfg Mark Phalan did similar thing to Kerberos utilities and Huie-Ying Lee to sftp. IPsec utilities (ipseckey(1M) and ikeadm(1M))offered...

Solaris

Testing netcat

After multiple rounds of code review the netcat (or nc) test suite is now finally in the onnv-stc2gate.The test suite has its home in the OpenSolaris Networking community (seethe networking tests page for the list of networkingtest suites).The source code is present in thesrc/suites/net/nc/directory and SUNWstc-netcat packages can be downloaded from OpenSolaris Download center.Before I go further, this is how it looks like when the test suite is run (the output is trimmed a bit):vk:honeymooners:/opt/SUNWstc-nc$ run_test ncValidating Arguments...New TET_ROOT for this run : /var/tmp/honeymooners_27828The results will be available in /var/tmp/results.27828tcc: journal file is /var/tmp/results.27828/testlog12:45:57 Execute /tests/dflag/tc_dflag12:46:04 Execute /tests/hflag/tc_hflag12:46:05 Execute /tests/kflag/tc_kflag12:46:11 Execute /tests/nflag/tc_nflag12:46:15 Execute /tests/portranges/tc_portranges12:46:23 Execute /tests/pflag/tc_pflag12:46:26 Execute /tests/sflag/tc_sflag12:46:35 Execute /tests/Uflag/tc_Uflag12:46:36 Execute /tests/vflag/tc_vflag12:46:43 Execute /tests/zflag/tc_zflag12:46:46 Execute /tests/iflag/tc_iflag12:46:59 Execute /tests/lflag/tc_lflag12:47:29 Execute /tests/rflag/tc_rflag12:48:16 Execute /tests/Tflag/tc_Tflag12:48:33 Execute /tests/uflag/tc_uflag12:48:50 Execute /tests/wflag/tc_wflag##################################################TC /tests/dflag/tc_dflagTP 1 tc_dflag PASS##################################################TC /tests/hflag/tc_hflagTP 1 tc_hflag PASS...################################################## SUMMARY ======= Number of Tests : 50PASS : 50FAIL : 0UNRESOLVED : 0UNINITIATED : 0OTHER : 0##################################################Test Logs are at /var/tmp/results.27828, Journal File = /var/tmp/results.27828/testlog vk:honeymooners:/opt/SUNWstc-nc$It's been almost a year since I started developing the test suite last Christmas (see the initial blog entry about nc-tet).Since then, I have lost part of the source code in hard drive crash, had to redo thesource tree structure, fix ksh style, fix numerous bugs in test suite code and make the test suite more robust.One might ask whether having test suite for such a simple program like nc(1) was worth the hassle.I have only one answer to that: absolutely. First, it gives a confidence of not breaking (most of; see below)existing things when changing/adding functionality and second it helped me (and I hope the others participating/observingthe code review on testing-discuss too) to explore what it takes to write a test suite from scratch (I will not go here into details whether I prefer CTI-TET overSTF and vice versa).The Beautiful code book (which I really recommend for anyone tinkering with any source code) contains a chapter called Beautiful testsby Alberto Savoia. I hope that at least some of the test purposes in nc test suite have some degree of beautifulness of at least one of the ways highlighted by Alberto (1. simplicity/efficiency, 2. help making the software being tested better in terms ofquality and testability, 3. breadth/thoroughness).One of the important questions for a test suite is code coverage level.Obviously, for software adhering to the OpenSolaris interface taxonomy model it is importantthat the test suite exercises all of the Committed interfaces and execution paths around those interfaces.For nc(1) this means a subset of the command line options and their arguments (see PSARC 2007/389 for the actual list).The key is certainly to test the features which are likely to break with an intrusive code change.Very crude view of test coverage for nc(1) test suite (counting test purposes gives only very remote idea about real coverage but at least provides visual image) looks like this: rflag: + Tflag: +++++--- pflag: + iflag: +- vflag: ++ kflag: + Uflag: +- dflag: + uflag: ++- sflag: +- hflag: + nflag: +- wflag: + portranges: +--- lflag: ++++++++----------One plus character stands for one positive test purpose, minus is negative test purpose.Side note: the above ASCII graph was produced using test-coverage-graph.sh script (which presumes certain naming scheme for test purpose files).Just pipe a file listing into the script with test purpose filenames compliant to the scheme used in ontest-stc2 gate and it will spew out graphsimilar to the above.In the above musing about code coverage I left out an important piece - why some of the features are not tested.For nc(1) the yet untested part is the SOCKS protocol support. Basically, this is because test suite environmentdoes not contain SOCKS server to test against. There might not be many people using the -x/-Xoptions but from my own experience nothing is more frustrating than discovering some old dusty corner which had to be fixed long time ago or removed completely. So for now, on my workstation which sits behind SOCKS proxyI have the following in ~/.ssh/config for a server outside corporate network which hosts my personalmailbox so it is accessed every day:Host bar User foo Hostname outside.kewl.org # NOTE: for nc(1) testing ProxyCommand /usr/bin/nc -x socks-proxy.foothere.bar outside.kewl.org %p ForwardAgent no ForwardX11 noThis ensures (along with upgrades of the workstation to recent Nevada builds periodically) that SOCKSsupport gets tested as well. And yes, ssh-socks5-proxy-connect(1) and ssh-http-proxy-connect(1) are notreally needed.Now with the test suite in place, anybody modifying nc(1) (there are some RFEs for nc in the oss-bit-size list and other bugfixesor features are also welcome) can have pretty high confidence that his change will not break things.Yes, this means that more nc(1) features are coming.

After multiple rounds of code review the netcat (or nc) test suite is now finally in the onnv-stc2gate. The test suite has its home in the OpenSolarisNetworking community (seethe networking tests page ...

Solaris

Automatic webrev upload

I will start this one a little bit generically..Part of standard OpenSolaris/Solarisdevelopment process is code review. To facilitate a review, a so-called webrev is needed. A webrev is set of HTML/text/PDF pages and documents whichdisplay all the changes between local repository containing the changes and its parent repository. To produce a webrev,simply switch to a repository and run the webrev script (it is part of SUNWonbld package, which can be downloadedfrom OpenSolaris download center.):$ cd /local/Source/bugfix-314159.onnv$ webrevAssuming /opt/onbld/bin is present in your PATH a webrev will be generated under /local/Source/bugfix-314159.onnv/webrev/ directory.For OpenSolaris changes, the webrev is usually uploaded to cr.opensolaris.org(every OpenSolaris member has an account automatically created for him) which serves it under http://cr.opensolaris.org/~OSol_username/(where OSol_username is your OpenSolaris username) and a request for review with a link to the webrev is sent to one of the mailing lists relevant to the change.Dan Price has written a script which producesRSS feed out of recently uploaded webrevs which is pretty handy substitute for feeds from news/headlines/magazines :)For a long time I was basically doing the following:$ cd /local/Source/bugfix-foo.onnv && webrev$ scp -r webrev cr.opensolaris.org:bugfix-foo.onnvThis had two flaws: first it was slow (because of rcp protocol over SSH channel) and second I had to delete it via separate command (use sftp(1) and rename the old webrev to .trashdirectory) before uploading new version of the webrev (otherwise couple of permissions errors would follow).To solve the first problem, rsync (with SSH transport) can be used which makes the upload nearly instantaneous.Second problem can be worked around by using incremental webrevs. Still, this does not seem good enough for code reviewswith many iterations.So, the change made in CR 6752000 introduces the following command line options for automatic webrev upload:-U uploads the webrev-n suppresses webrev generation-t allows to specify custom upload targetwebrev.1 man page has beenupdated to explain the usage. For common OpenSolaris code reviews the usage will probably mostly look like this:$ webrev -O -UThis will upload the webrev to cr.opensolaris.org under directory named according to local repository name. Further invocationswill replace the remote webrev with fresh version.But it is possible to get more advanced. After the initial webrev is posted, an incremental webrev can be both generated and posted.Assuming you're switched to the repository (via bldenv) and we're dealing with 4th round of code review the followingcommand will perform the task:webrev_name=`basename $CODEMGR_WS`webrev -O -U -o $CODEMGR_WS/${webrev_name}.rd4 \\ -p $CODEMGR_WS/${webrev_name}.rd3The above commands hide maybe not-so-obvious behavior so I'll try to explain it in the table:+---------------------------+------------------------+-----------------------------------------------------+| command | local webrev directory | remote webrev directory |+---------------------------+------------------------+-----------------------------------------------------+| webrev -O -U | $CODEMGR_WS/webrev/ | cr.opensolaris.org/~OSOLuser/`basename $CODEMGR_WS` |+---------------------------+------------------------+-----------------------------------------------------+| webrev -O -o \\ | $CODEMGR_WS/my_webrev/ | cr.opensolaris.org/~OSOLuser/my_webrev || $CODEMGR_WS/my_webrev | | |+---------------------------+------------------------+-----------------------------------------------------+| webrev -O \\ | $CODEMGR_WS/fix.rd2/ | cr.opensolaris.org/~OSOLuser/fix.rd2 || -p $CODEMGR_WS/fix.rd1 \\ | | || -o $CODEMGR_WS/fix.rd2 | | |+---------------------------+------------------------+-----------------------------------------------------+Basically, without the -o flag webrev will generate the webrev to local directory named 'webrev'but it will upload it to the directory named after basename of local repository.With the -o flag webrev will use the name of root directory of the repository it is called from for both local and remotestorage. This is done to keep the default behavior of generating local webrev to directory named 'webrev'. At the same time, uploading different webrevs to the same remote directory named 'webrev' does not make sense.NOTE: This behavior is also valid in the case when not enclosed in a repository via ws or bldenv,I have just used $CODEMGR_WS to express root directory of a workspace/repository.Also, now it is possible to call webrev from within Cadmium Mercurial plugin, so all webrev commands can be prefixed with hg.All in all, it was fun dealing with webrevs of webrev. I am looking forward to more entries in the RSS feed :)NOTE: It will take some time before the changes appear in SUNWonbld packages offered by the download centerso it's better to update the sources from the ssh://anon@hg.opensolaris.org/hg/onnv/onnv-gate repositoryand build and upgrade the SUNWonbld package from there.

I will start this one a little bit generically.. Part of standard OpenSolaris/Solaris development process is code review. To facilitate a review, a so-called webrev is needed. A webrev is set of...

Solaris

strsep() in libc

As of today, strsep() function lives in Nevada's libc (tracked by CR 4383867 and PSARC 2008/305). This constitutes another step in the quest for more feature-full (in terms of compatibility) libc in OpenSolaris. In binary form, the changes will be availablein build 99. The documentation will be part of the string(3C) man page.Here's a small example of how to use it:#include <stdio.h>#include <string.h>#include <err.h>int parse(const char \*str) { char \*p = NULL; char \*inputstring, \*origstr; int ret = 1; if (str == NULL) errx(1, "NULL string"); /\* \* We have to remember original pointer because strsep() \* will change 'inputstr' pointer. \*/ if ((origstr = inputstring = strdup(str)) == NULL) errx(1, "strdup() failed"); printf("=== parsing '%s'\\n", inputstring); for ((p = strsep(&inputstring, ",")); p != NULL; (p = strsep(&inputstring, ","))) { if (p != NULL && \*p != '\\0') printf("%s\\n", p); else if (p != NULL) { warnx("syntax error"); ret = 0; goto bad; } }bad: printf("=== finished parsing\\n"); free(origstr); return (ret);}int main(int argc, char \*argv[]) { if (argc != 2) errx(1, "usage: prog "); if (!parse(argv[1])) exit(1); return (0);}This example was actually used as a unit test (use e.g. "1,22,33,44" and "1,22,,44,33" as input string) and it also nicely illustrates important properties of strsep() behavior: While searching for tokens, strsep() modifies the original string. This is shared property with strtok(). Unlike strtok(), strsep() is able to detect empty fields.There is a function in Solaris' libc which can do token splitting and does not modify the original string - strcspn().The other notable property of strsep() is that (unlike strtok()) it does not conform to ANSI-C. Time to draw a table: function(s) ISO C90 modifies detects input empty fields-------------+----------+----------+--------------+ strsep() No Yes Yes strtok() Yes Yes No strcspn() Yes No Sort ofNone of the above functions is bullet-proof. The bottom line is the user should decide which is the mostsuitable for given task and use it with its properties in mind.

As of today, strsep() function lives in Nevada's libc (tracked by CR 4383867 and PSARC 2008/305). This constitutes another step inthe quest for more feature-full (in terms of compatibility) libc...

Solaris

Customizing Mercurial outgoing output

Part of the transition of Mercurial in OpenSolaris are changes in the integration processes. Every RTI hasto contain output of hg outgoing -v so the CRTadvocates can better see the impact of the changes in terms of changed files. However, the default output is not very readable:$ hg outgoing -vcomparing with /local/ws-mirrors/onnv-clone.hgsearching for changeschangeset: 7248:225922d15fe6user: Vladimir Kotal date: 2008-08-06 23:39 +0200modified: usr/src/cmd/ldap/ns_ldap/ldapaddent.c usr/src/cmd/sendmail/db/config.h usr/src/cmd/ssh/include/config.h usr/src/cmd/ssh/include/openbsd-compat.h usr/src/cmd/ssh/include/strsep.h usr/src/cmd/ssh/libopenbsd-compat/Makefile.com usr/src/cmd/ssh/libopenbsd-compat/common/llib-lopenbsd-compat usr/src/cmd/ssh/libopenbsd-compat/common/strsep.c usr/src/cmd/ssh/libssh/common/llib-lssh usr/src/common/util/string.c usr/src/head/string.h usr/src/lib/libc/amd64/Makefile usr/src/lib/libc/i386/Makefile.com usr/src/lib/libc/port/gen/strsep.c usr/src/lib/libc/port/llib-lc usr/src/lib/libc/port/mapfile-vers usr/src/lib/libc/sparc/Makefile usr/src/lib/libc/sparcv9/Makefile usr/src/lib/passwdutil/Makefile.com usr/src/lib/passwdutil/bsd-strsep.c usr/src/lib/passwdutil/passwdutil.h usr/src/lib/smbsrv/libsmb/common/mapfile-vers usr/src/lib/smbsrv/libsmb/common/smb_util.cadded: usr/src/lib/libc/port/gen/strsep.cdeleted: usr/src/cmd/ssh/include/strsep.h usr/src/cmd/ssh/libopenbsd-compat/common/strsep.c usr/src/lib/passwdutil/bsd-strsep.clog:PSARC 2008/305 strsep() in libc4383867 need strsep() in libc--------------------------------------------------------------------In the above case, the list of modified files spans single line which makes the web form used for RTI go really wild in terms of width (I had to wrap the lines manually in the above example otherwise this page would suffer from the sameproblem). The following steps can be used to make the output a bit nicer:create ~/bin/Mercurial/outproc.py with the following contents:from mercurial import templatefiltersdef newlines(text): return text.replace(' ', '\\n')def outgoing_hook(ui, repo, \*\*kwargs): templatefilters.filters["newlines"] = newlineshook into outgoing command in ~/.hgrc by adding the following lines into [hooks], [extensions] sections so it looks like this:[extensions]outproc=~/bin/Mercurial/outproc.py[hooks]pre-outgoing=python:outproc.outgoing_hookcreate ~/bin/Mercurial/style.outgoing with the following contents:changeset = outgoing.templatecreate ~/bin/Mercurial/outgoing.template with the following contents (the file can be downloaded here):changeset:{rev}:{node|short}user:{author}date:{date|isodate}modified:{files|stringify|newlines}added:{file_adds|stringify|newlines}deleted:{file_dels|stringify|newlines}log:{desc}------------------------------------------------------------------------add the following into your ~/.bashrc (or to .rc file of the shell of your choice):alias outgoing='hg outgoing --style ~/bin/Mercurial/style.outgoing'After that it works like this:$ outgoingcomparing with /local/ws-mirrors/onnv-clone.hgsearching for changeschangeset:7248:225922d15fe6user:Vladimir Kotal date:2008-08-06 23:39 +0200modified:usr/src/cmd/ldap/ns_ldap/ldapaddent.cusr/src/cmd/sendmail/db/config.husr/src/cmd/ssh/include/config.husr/src/cmd/ssh/include/openbsd-compat.husr/src/cmd/ssh/include/strsep.husr/src/cmd/ssh/libopenbsd-compat/Makefile.comusr/src/cmd/ssh/libopenbsd-compat/common/llib-lopenbsd-compatusr/src/cmd/ssh/libopenbsd-compat/common/strsep.cusr/src/cmd/ssh/libssh/common/llib-lsshusr/src/common/util/string.cusr/src/head/string.husr/src/lib/libc/amd64/Makefileusr/src/lib/libc/i386/Makefile.comusr/src/lib/libc/port/gen/strsep.cusr/src/lib/libc/port/llib-lcusr/src/lib/libc/port/mapfile-versusr/src/lib/libc/sparc/Makefileusr/src/lib/libc/sparcv9/Makefileusr/src/lib/passwdutil/Makefile.comusr/src/lib/passwdutil/bsd-strsep.cusr/src/lib/passwdutil/passwdutil.husr/src/lib/smbsrv/libsmb/common/mapfile-versusr/src/lib/smbsrv/libsmb/common/smb_util.cadded:usr/src/lib/libc/port/gen/strsep.cdeleted:usr/src/cmd/ssh/include/strsep.husr/src/cmd/ssh/libopenbsd-compat/common/strsep.cusr/src/lib/passwdutil/bsd-strsep.clog:PSARC 2008/305 strsep() in libc4383867 need strsep() in libc------------------------------------------------------------------------I asked Richard Lowe (who has been very helpful with helping getting the transition process done) if nextMercurial version can have newlines function already included and if there could be outgoingtemplate which would be similar to logtemplate in hgrc(5).In the meantime I will be using the above for my RTIs.

Part of the transition of Mercurial in OpenSolaris are changes in the integration processes. Every RTI has to contain output of hg outgoing -v so the CRTadvocates can better see the impact of the...

Solaris

Test suite for netcat

In OpenSolaris world we very much care about correctness and hate regressions (of any kind).If I loosely paraphrase Bryan Cantrill the degreeof devotion should be obvious:"Have you tested your change in every way you know of ? If not, do not go anyfurther with the integration unless you do so."This implies that ordinary bug fix should have a unit test accompanying it.But, unit tests are cumbersome when performed by hand and do not mean much if theyare not accumulated over time.For integration of Netcat into OpenSolaris I have developed number of unit tests(basically at least one for each command line option) and couple more after spotting some bugs in nc(1).This means that nc(1) is ripe for having a test suite so the testscan be performed automatically. This is tracked by RFE 6646967. The test suitewill live in onnv-stc2 gate which is hosted and maintained by OpenSolarisTesting community.To create a test suite one can choose between two frameworks: STF andCTI-TET.I have chosen the latter because I wanted to try something new and also becauseCTI-TET seems to be the recommended framework these days.The work on nc test suite has started during Christmas break 2007 and after recovery from lost datait is now in pretty stable state and ready for code review.This is actually somewhat exciting because nc test suite is supposed to be the first OpenSolaris test suite developed in the open.Fresh webrev is always stored on cr.opensolaris.org innc-tet.onnv-stc2 directory.Everybody is invited to participate in the code review.Code review should be performed via testing-discuss at opensolaris.org mailing list (subscribe via Testing / Discussions).It has web interface in the form oftesting-discuss forum.So, if you're familiar with ksh scripting or CTI-TET framework (both not necessary) you have unique chance to bash (not bash) my code ! Watch for official code review announcement on the mailing list in the next couple of days.Lastly, another philosophical food for thought:Test suites are sets of programs and scripts which serve mainly one purpose - they should prevent bugsfrom happening in the software they test. But, test suites are software too. Presence of bugs in test suitesis an annoying phenomenon. How to get rid of that one ?

In OpenSolaris world we very much care about correctness and hate regressions (of any kind). If I loosely paraphrase Bryan Cantrill the degree of devotion should be obvious: "Have you tested your...

Solaris

poll(2) and POLLHUP with pipes in Solaris

During nc(1) preintegration testing, short time before it went backI had found that 'cat /etc/passwd | nc localhost 4444' produced endless loop with 100% CPU utilization, looping in calls doing poll(2) (I still remember my laptop suddenly getting much warmerthan it should be and CPU fan cranking up).'nc localhost 4444 < /etc/password' was not exhibiting that behavior.The cause was a difference between poll(2) implementation on BSD and Solaris. Since I am working on Netcatin Solaris again (adding more features, stay tuned), it's time to take a look back and maybe even help people porting similar software from BSD to Solaris.The issue appears because POLLHUP is set in read events bitfield for stdin after pipe isclosed (or to be more precise - after the producer/write end is done) on Solaris.poll.c (which resemblesreadwrite()function from nc) illustrates the issue:01 #include <stdio.h>02 #include <poll.h>03 04 #define LEN 102405 06 int main(void) {07 int timeout = -1;08 int n;09 char buf[LEN];10 int plen = LEN;11 12 struct pollfd pfd;13 14 pfd.fd = fileno(stdin);15 pfd.events = POLLIN;16 17 while (pfd.fd != -1) {18 if ((n = poll(&pfd, 1, timeout)) < 0) {19 err(1, "Polling Error");20 }21 fprintf(stderr, "revents = 0x%x [ %s %s ]\\n",22 pfd.revents,23 pfd.revents & POLLIN ? "POLLIN" : "",24 pfd.revents & POLLHUP ? "POLLHUP" : "");25 26 if (pfd.revents & (POLLIN|POLLHUP)) {27 if ((n = read(fileno(stdin), buf, plen)) < 0) {28 fprintf(stderr,29 "read() returned neg. val (%d)\\n", n);30 return;31 } else if (n == 0) {32 fprintf(stderr, "read() returned 0\\n", n);33 pfd.fd = -1;34 pfd.events = 0;35 } else {36 fprintf(stderr, "read: %d bytes\\n", n);37 }38 }39 }40 }Running it on NetBSD (chosen because my personal non-work mailbox is hosted on a machine running it)produces the following:otaku[~]% ( od -N 512 -X -v /dev/zero | sed 's/ [ \\t]\*/ /g'; sleep 3 ) | ./pollrevents = 0x1 [ POLLIN ]read: 1024 bytesrevents = 0x1 [ POLLIN ]read: 392 bytesrevents = 0x11 [ POLLIN POLLHUP ]read() returned 0I had to post-process the output of od(1) (because of difference between output of od(1) on NetBSD and Solaris) and slow the execution down a bit (via sleep) in order to make things more visible (try to run the command without the sleep and the pipe will be closed too quickly).On OpenSolaris the same program produces different pattern:moose:~$ ( od -N 512 -X -v /dev/zero | sed 's/ [ \\t]\*/ /g' ; sleep 3 ) | ./poll revents = 0x1 [ POLLIN ]read: 1024 bytesrevents = 0x1 [ POLLIN ]read: 392 bytesrevents = 0x10 [ POLLHUP ]read() returned 0So, the program is now obviously correct. Had the statement on line 26 checked only POLLIN,the command above (with or without the sleep) would go into endless loop on Solaris: revents = 0x11 [ POLLIN POLLHUP ]read: 1024 bytesrevents = 0x11 [ POLLIN POLLHUP ]read: 392 bytesrevents = 0x10 [ POLLHUP ]revents = 0x10 [ POLLHUP ]revents = 0x10 [ POLLHUP ]...Both OSes set POLLHUP after the pipe is closed. The difference is that while BSD always indicates POLLIN (even if there is nothing to read), Solaris strips it after data stream ended.So, which one is correct ?poll() function asdescribed by OpenGroup says that "POLLHUP and POLLIN are not mutually exclusive".This means both implementations seem to conform to the IEEE Std 1003.1, 2004 Edition standard(part of POSIX) in this respect.However, the POSIX standard also says: In each pollfd structure, poll ( ) shall clear the revents member, except that where the application requested a report on a condition by setting one of the bits of events listed above, poll ( ) shall set the corresponding bit in revents if the requested condition is true. In addition, poll ( ) shall set the POLLHUP, POLLERR, and POLLNVAL flag in revents if the condition is true, even if the application did not set the corresponding bit in events.This might be still ok even though POLLIN flagremains to be set in NetBSD's poll() even after no data are available for reading (try to comment out lines 33,34 and run as above) because the standard says about POLLIN flag: For STREAMS, this flag is set in revents even if the message is of zero length.Without further reading it is hard to tell how exactly should POSIX compliant poll() look like.On theAustin group mailing listthere was athread about poll() behavior w.r.t. POLLHUPsuggesting this is fuzzy area.Anyway, to see where exactly is POLLHUP set for pipes in OpenSolaris go to fifo_poll(). The function _sets_ the revents bit field to POLLHUP so the POLLIN flag is wiped off after that.fifo_poll() is part of fifofs kernel module which has been around in Solaris since late eighties (I was still in elementary school the year fifovnops.cappeared in SunOS code base :)).NetBSD has fifofs too but the POLLHUP flag gets set via bit logic operation in pipe_poll() which is part of syscall processing code.The difference between OpenSolaris and NetBSD (whoa, NetBSD project uses OpenGrok !) POLLHUP attitude (respectively) is now clear:

During nc(1) preintegration testing, short time before it went back I had found that 'cat /etc/passwd | nc localhost 4444' produced endless loop with100% CPU utilization, looping in calls doing...

Solaris

ZFS is going to save my laptop data next time

The flashback is still alive even weeks after: the day before my presentation at FIRST Technical Colloquium in Prague I brought my 2 years old laptop with the work-in-progress slides to the office.Since I wanted to finish the slides in the evening a live-upgradeprocess was fired off on the laptop to get fresh Nevadaversion. (of course, to show off during the presentation ;))LU is very I/O intensive process and the red Ferrari notebooks tend to get _very_ hot. In the afternoon Inoticed that the process failed. To my astonishment, the I/O operations started to fail.After couple of reboots (and zpool status / fmadm faulty commands) it was obvious that the disk cannot be trusted anymore. I was able to rescue some datafrom the ZFS pool which was spanning the biggest slice of the internal disk but not all data.(ZFS is not willing to get corrupted data out.) My slides were lost as well as other data.After some time I stumbled upon James Gosling's blog entry about ZFS mirroring on laptop.This get me started (or more precisely I was astonished and wondered how is it possiblethat this idea escaped me because at that time ZFS had been in Nevada for a long time)and I have discovered several similar and morein-depth blog entries about the topic.After some experiments with borrowed USB disk it was time to make it realityon a new laptop.The process was a multi-step one: First I had to extend the free slice #7 on the internal disk so it spans the remaining space on the disk because it was trimmed after the experiments. In the end the slices look like this in format(1) output:Part Tag Flag Cylinders Size Blocks 0 root wm 3 - 1277 9.77GB (1275/0/0) 20482875 1 unassigned wm 1278 - 2552 9.77GB (1275/0/0) 20482875 2 backup wm 0 - 19442 148.94GB (19443/0/0) 312351795 3 swap wu 2553 - 3124 4.38GB (572/0/0) 9189180 4 unassigned wu 0 0 (0/0/0) 0 5 unassigned wu 0 0 (0/0/0) 0 6 unassigned wu 0 0 (0/0/0) 0 7 home wm 3125 - 19442 125.00GB (16318/0/0) 262148670 8 boot wu 0 - 0 7.84MB (1/0/0) 16065 9 alternates wu 1 - 2 15.69MB (2/0/0) 32130 Then the USB drive was connected to the system and recognized via format(1):AVAILABLE DISK SELECTIONS: 0. c0d0 /pci@0,0/pci-ide@12/ide@0/cmdk@0,0 1. c5t0d0 /pci@0,0/pci1025,10a@13,2/storage@4/disk@0,0 Live upgrade boot environment which was not active was deleted via ludelete(1M) and the slice was commented out in /etc/vfstab. This was needed to make zpool(1M) happy. ZFS pool was created out of the slice on the internal disk (c0d0s7) and external USB disk (c5t0d0). I had to force it cause zpool(1M) complained about the overlap of c0d0s2 (slice spanning the whole disk) and c0d0s7:# zpool create -f data mirror c0d0s7 c5t0d0 For a while I have struggled with finding a name for the pool (everybody seems either to stick to the 'tank' name or come up with some double-cool-stylish name which I wanted to avoid because of the likely degradation of the excitement from that name) but then chosen the ordinary data (it's what it is, after all).I have verified that it is possible to disconnect the USB disk and safely connect it while an I/O operation is in progress:root:moose:/data# mkfile 10g /data/test &[1] 10933root:moose:/data# zpool status pool: data state: ONLINE scrub: none requestedconfig:NAME STATE READ WRITE CKSUMdata ONLINE 0 0 0 mirror ONLINE 0 0 0 c0d0s7 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0errors: No known data errors It survived it without a hitch (okay, I had to wait for the zpool command to complete a little bit longer due to the still ongoing I/O but that was it) and resynced the contents automatically after the USB disk was reconnected:root:moose:/data# zpool status pool: data state: DEGRADEDstatus: One or more devices are faulted in response to persistent errors.Sufficient replicas exist for the pool to continue functioning in adegraded state.action: Replace the faulted device, or use 'zpool clear' to mark the devicerepaired. scrub: resilver in progress for 0h0m, 3.22% done, 0h5m to goconfig:NAME STATE READ WRITE CKSUMdata DEGRADED 0 0 0 mirror DEGRADED 0 0 0 c0d0s7 ONLINE 0 0 0 c5t0d0 FAULTED 0 0 0 too many errorserrors: No known data errors Also, with heavy I/O it is needed to mark the zpool as clear after the resilver completes via zpool clear data because the USB drive is marked as faulty. Normally this will not happen (unless the drive really failed) because I will be connecting and disconnecting the drive only when powering on or shutting down the laptop, respectively.After that I have used Mark Shellenbaum's blog entryabout ZFS delegated administration (it was Mark who did the integration)and ZFS Delegated Administrationchapter from OpenSolarisZFS Administration Guide and created permissions set for my local user and assignedthose permissions to the ZFS pool 'data' and the user: # chmod A+user:vk:add_subdirectory:fd:allow /data # zfs allow -s @major_perms clone,create,destroy,mount,snapshot data # zfs allow -s @major_perms send,receive,share,rename,rollback,promote data # zfs allow -s @major_props copies,compression,quota,reservation data # zfs allow -s @major_props snapdir,sharenfs data # zfs allow vk @major_perms,@major_props data All of the commands had to be done under root.Now the user is able to create a home directory for himself: $ zfs create data/vkTime to setup the environment of data sets and prepare it for data. I have separated the data sets according to a 'service level'. Some data are very important (e.g. presentations ;)) to me so I want them multiplied via the ditto blocks mechanism so they are actually present 4 times in case of copies dataset property set to 2. Also, documents are not usually accompanied by executable code so the exec property was set to off which will prevent running scripts or programs from that dataset. Some data are volatile and in high quantity so they do not need any additional protection and it is good idea to compress them with better compression algorithm to save some space. The following table summarizes the setup: dataset properties comment +-------------------+--------------------------------+------------------------+ | data/vk/Documents | copies=2 | presentations | | data/vk/Saved | compression=on exec=off | stuff from web | | data/vk/DVDs | compression=gzip-8 exec=off | Nevada ISOs for LU | | data/vk/CRs | compression=on copies=2 | precious source code ! | +-------------------+--------------------------------+------------------------+ So the commands will be: $ zfs create -o copies=2 data/vk/Documents $ zfs create -o compression=gzip-3 -o exec=off data/vk/Saved ... Now it is possible to migrate all the data, change home directory of the user to /data/vk (e.g. via /usr/ucb/vipw) and relogin.However, this is not the end of it but just beginning. There are many things tomake the setup even better, to name a few: setup some sort of automatic snapshots for selected datasets The set of scripts and SMF service for doing ZFS snapshots and backup (see ZFS Automatic For The People and related blog entries) made by Tim Foster could be used for this task.make zpool scrub run periodicallydetect failures of the disks This would be ideal to see in Gnome panel or Gnome system monitor.setup off-site backup via SunSSH + zfs send This could be done using the hooks provided by Tim's scripts (see above). Set quotas and reservations for some of the datasets.Install ZFS scripts for Gnome nautilus so I will be able to browse, perform and destroy snapshots in nautilus. Now which set of scripts to use ? Chris Gerhard's or Tim Foster's ? Or should I just wait for the official ZFS support for nautilus to be integrated ?Find how exactly will the recovery scenario (in case of laptop drive failure) will look like. To import the ZFS pool from the USB disk should suffice but during my experiments I was not able to complete it successfully.With all the above the data should be safe from disk failure (after all disks areoften called "spinning rust" so they are going to fail sooner or later) and alsothe event of loss of both laptop and USB disk.Lastly, a philosophical thought: One of my colleagues considers hardware as a necessary (and very faulty) layer which is only needed to make it possible toexpress the ideas in software. This might seem extreme but come to think of it. ZFS is special in this sense - being a software which provides that bridge, it's core idea to isolate the hardware faults.

The flashback is still alive even weeks after: the day before my presentation at FIRST Technical Colloquium in Prague I brought my 2 years old laptop with the work-in-progress slides to the office.Sin...

Personal

FIRST Technical Colloquium in Prague

Two weeks ago (yeah, I am a slacker) FIRSTtechnical colloquiumwas held in Prague and we (me and Sasha) were given the opportunity to attend (the fact the Derrick serves as FIRST chair in the steering comitteehas of course something to do with it).I only attended one day of the technical colloquium (Tuesday 29th).The day was filled with various talks and presentations. Most of them were performed by various CERT teams members from around the world. This was because this event was a joint meeting of FIRST and TF-CSIRT. It was definitelyinteresting to see very different approaches to the shared problem set (dealing withincidents, setting up honey pots, building forensic analysis labs, etc.).Not only these differences stemmed from sizes of the networks and organizationsbut also (and that was kind of funny) from nationalities.In the morning I talked about the integration of Netcat into Solaris,describing the process, current features and planned enhancements and extensions.The most anticipated talk was by Adam Laurie who is entertaining guy involved in many hacker-like activities (see e.g. A hacker games the hotelarticle by Wired) directed at proving insecurities inmany publicly used systems.Adam (brother of Ben Laurie, author of Apache-SSL and OpenSSL contributor) first started with intro about satellite scanning, insecure hotel safes(with backdoors installed by the manufacturers which can be overcome by a screwdriver).Then he proceeded to talk about RFID chips, mainly about cloning.Also, at the "social event" in the evening I had the pleasure to share a table withKen van Wyk who is overall cool fellow and the author of Secure coding and Incident response books from O'Reilly.In overall, it was interesting to see so many security types in a room and get to know some ofthem.

Two weeks ago (yeah, I am a slacker) FIRSTtechnical colloquium was held in Prague and we (me and Sasha) were given the opportunity to attend (the fact the Derrick serves as FIRST chair in thesteering c...

Solaris

Grepping dtrace logs

I have been working on a tough bug for some non-trivial time. The bug is a combination of race condition and data consistency issues. To debug this I amusing multi-threaded apache process and dtrace heavily. The logs produced bydtrace are huge and contain mostly dumps of internal data structures.The excerpt from such log looks e.g. like this: 1 50244 pk11_return_session:return 128 8155.698 1 50223 pk11_RSA_verify:entry 128 8155.7069 1 50224 pk11_get_session:entry 128 8155.7199 1 50234 pk11_get_session:return 128 8155.7266PK11_SESSION: pid = 1802 session handle = 0x00273998 rsa_pub_key handle -> 0x00184e70 rsa_priv_key handle -> 0x00274428 rsa_pub = 0x00186248 rsa_priv = 0x001865f8 1 50224 pk11_get_session:entry 128 8155.7199 1 50244 pk11_return_session:return 128 8155.698Side note: This is post-processed log (probe together with their bodies are timestamp sorted, time stamps are convertedto miliseconds - see Coloring dtrace output entry for details and scripts).Increasingly, I was asking myself questions which resembled this one: "when function foo() was called with data which contained value bar ?"This quickly lead to a script which does the tedious job for me.dtrace-grep.pl accepts 2 or 3 parameters. First 2 are probe pattern anddata pattern, respectively. Third, which is optional, is input file (if not supplied, stdin will be used). Example of use on the above pasted file looks like this:~/bin$ ./dtrace-grep.pl pk11_get_session 0x00186248 /tmp/test 1 50234 pk11_get_session:return 128 8155.7266PK11_SESSION: pid = 1802 session handle = 0x00273998 rsa_pub_key handle -> 0x00184e70 rsa_priv_key handle -> 0x00274428 rsa_pub = 0x00186248 rsa_priv = 0x001865f8~/bin$ Now I have to get back to work, to do some more pattern matching.Oh, and the script is here.

I have been working on a tough bug for some non-trivial time. The bug is a combination of race condition and data consistency issues. To debug this I amusing multi-threaded apache process and dtrace...

Personal

Adding dtrace SDT probes

It seems that many developers and dtrace users found themselves in a position where theywanted to add some SDT probes to a module to get more insight into what's going on but the had to pauseand were thinking "okay, more probes. But where to put them ? Do I reallyneed the additional probes when I already have the fbt ones ?".To do this, systematic approach is needed in order not to over-do or under-do.I will use KSSL (Solaris kernel SSL proxy [1]) for illustration. With CR 6556447,tens of SDT probes were introduced into KSSL module and other modules which interface with it.Also, in addition to the new SDT probes, in KSSL we got rid of the KSSL_DEBUG macros compiled only in DEBUG kernels and substituted them with SDT probes. As a result, much better observability and error detection was achieved with both debug and non-debug kernels. The other option would be to create KSSLdtrace provider but that would be too big gun for what is needed to achieve.Generically, the following interesting data points for data gathering/observation can be identified in code:data paths When there is a more than one path how data could flow into a subsystem. E.g. for TCP we have couple of cases how SSL data could reach KSSL input queue. To identify where from exactly was tcp_kssl_input() called we use SDT probes:if (tcp->tcp_listener || tcp->tcp_hard_binding) {...if (tcp->tcp_kssl_pending) {DTRACE_PROBE1(kssl_mblk__ksslinput_pending, mblk_t \*, mp);tcp_kssl_input(tcp, mp);} else {tcp_rcv_enqueue(tcp, mp, seg_len);}} else {.../\* Does this need SSL processing first? \*/if ((tcp->tcp_kssl_ctx != NULL) && (DB_TYPE(mp) == M_DATA)) {DTRACE_PROBE1(kssl_mblk__ksslinput_data1, mblk_t \*, mp);tcp_kssl_input(tcp, mp);} else {putnext(tcp->tcp_rq, mp);if (!canputnext(tcp->tcp_rq))tcp->tcp_rwnd -= seg_len;}...data processed in while/for cycles To observe what happens in each iteration of the cycle. Can be used in code like this:while (mp != NULL) { DTRACE_PROBE1(kssl_mblk__handle_record_cycle, mblk_t \*, mp); /\* process the data \*/ ... mp = mp->b_cont;}switch statements If significant/non-trivial processing happens inside switch it may be useful to add SDT probes there too. E.g.: content_type = (SSL3ContentType)mp->b_rptr[0]; switch(content_type) { /\* select processing according to type \*/ case content_alert: DTRACE_PROBE1(kssl_mblk__content_alert, mblk_t \*, mp); ... break; case content_change_cipher_spec: DTRACE_PROBE1(kssl_mblk__change_cipher_spec, mblk_t \*, mp); ... break; default: DTRACE_PROBE1(kssl_mblk__unexpected_msg, mblk_t \*, mp); break; }labels which cannot be (easily) identified in other way Useful if code which follows the label is generic (assignments, no function calls), e.g.: /\* \* Give this session a chance to fall back to \* userland SSL \*/ if (ctxmp == NULL) goto no_can_do;...no_can_do: DTRACE_PROBE1(kssl_no_can_do, tcp_t \*, tcp); listener = tcp->tcp_listener; ind_mp = tcp->tcp_conn.tcp_eager_conn_ind; ASSERT(ind_mp != NULL); You've surely noticed that same of the probe definitions above have common prefix (kssl_mblk-). This is one of the things which make SDT probesso attractive.With prefixes it is possible to do the e.g. following:sdt:::kssl_err-\*{ trace(timestamp); printf("hit error in %s\\n", probefunc); stack(); ustack();}The important part is that we do not specify module of function name. The implicit wildcard (funcname/probename left out) combined with explicit wildcard(prefix + asterisk) will lead to all KSSL error probes to be activatedregardless of in which module or function there are defined.This is obviously very useful for technologies which span multiple Solarissubsystems or modules (such as KSSL).The nice thing about the error probes is that they could be leveraged in test suites.For each test case we can first run dtrace script with the above probesetcovering all KSSL errors in the background and after the test completesjust check if it produced some data. If it did, then the test case can be consideredas failed. No need to check kstat(1M) (and other counters), log files, etc.Also, thanks to the way how dtrace probes are activated we can have bothgeneric probeset (using this for lack of better term) as above withaddition of probe specific action, e.g.:/\* probeset of all KSSL error probes \*/sdt:::kssl_err-\*{ trace(timestamp); printf("hit error in %s\\n", probefunc);}/\* the probe definition is: DTRACE_PROBE2(kssl_err__bad_record_size, uint16_t, rec_sz, int, spec->cipher_bsize); \*/sdt:kssl:kssl_handle_record:kssl_err-bad_record_size{ trace(timestamp); tracemem(arg0, 32); printf("rec_sz = %d , cipher_bsize = %d\\n", arg1, arg2);}If probe kssl_err-bad_record_size gets activated the genericprobe will be activated (and fires) too because the probeset contains the probe.Similarly to the error prefix, we can have data specific prefix. For KSSL it is kssl_mblk- prefix which could be used for tracingall mblks (msgb(9S)) as they flow through TCP/IP, STREAMS and KSSL modules. With such probes it is possible to do e.g. the following:/\* how many bytes from a mblk to dump \*/#define DUMP_SIZE 48/\* use macros from \*/#define MBLKL(mp) ((mp)->b_wptr - (mp)->b_rptr)#define DB_FLAGS(mp) ((mp)->b_datap->db_flags)#define PRINT_MBLK_INFO(mp) \\ printf("mblk = 0x%p\\n", mp); \\ printf("mblk size = %d\\n", MBLKL((mblk_t \*)mp)); \\ PRINT_MBLK_PTRS(mp);#define PRINT_MBLK(mp) \\ trace(timestamp); \\ printf("\\n"); \\ PRINT_MBLK_INFO(mp); \\ printf("DB_FLAGS = 0x%x", DB_FLAGS((mblk_t \*)mp)); \\ tracemem(((mblk_t \*)mp)->b_rptr, DUMP_SIZE); \\ tracemem(((mblk_t \*)mp)->b_wptr - DUMP_SIZE, \\ DUMP_SIZE);sdt:::kssl_mblk-\*{ trace(timestamp); printf("\\n"); PRINT_MBLK(arg0)}This is actually an excerpt from my (currently internal) KSSL debugging suite.An example of output from such probe can be seen in myColoring dtrace outputpost.For more complex projects it would be waste to stop here. Prefixes could be furtherstructured. However, this has some drawbacks. In particular, I was thinking about having kssl_mblk- and kssl_err- prefixes. Now what to do for placeswhere an error condition occurred _and_ we would like to see the associated mblk ?Using something like kssl_mblk_err-\* comes to ones mind. However, there is a problem with that - what about the singleton cases (only mblk, only err). Sure, using multiplewildcards in dtrace is possible (e.g. syscall::\*read\*:) but this willmake it ugly and complicated given the number of mblk+err cases (it's probably safe to assume that the number of such cases will be low). Simply, it's not worth the hassle.Rather, I went with 2 probes.To conclude, using structured prefixes is highly beneficial only for set of probes wherecategories/sub-prefixes create non-intersecting sets (e.g. data type and debug level).Of course, all of the above is not valid only for kernel but also for custom userland probes ![1] High-level description of KSSL can be found in blueprint 819-5782.

It seems that many developers and dtrace users found themselves in a position where they wanted to add some SDT probes to a module to get more insight into what's going on but the had to pauseand were...

Solaris

Netcat in Solaris

CR 4664622 has been integrated into Nevada and will bepart of build 80 (which means it will not be part of next SXDE release but I can live with that :)).During the course of getting the process done I have stumbled upon several interesting obstacles. For example, during ingress Open Source Review I was asked by our open-source caretaker what will be the "support model" for Netcat once it is integrated. I was puzzled. Because, for Netcat, support is not really needed since it has been around for ages (ok, since 1996 according to wikipedia) and is pretty stable piece of code which is basically no longer developed. Nonetheless, this brings some interesting challenges with move to a community model where more and more projects are integrated by people outside Sun (e.g. ksh93 project).The nc(1) man page will be delivered in build 81. In the meantime you can readNetcat review blog entry which contains the link to updated man page.The older version of the man page is contained in the mail communication for PSARC 2007/389.Note: I have realized that the ARC case directory does not have to include most up-to-date man page at the time of integration.Only when something _architectural_ changes, then the man page has to be updated (which was not the case with Netcat since we only added new section describing how to setup nc(1) with RBAC). Thanks to Jim for the explanation.I have some ideas how to make Netcat in Solaris even better and will work to get them done over time.In particular, there are following RFEs: 6515928, 6573202. However, this does not mean that there is only single person who can work on nc(1). Since it is now part of ONNV, anyone is free to hack it.So, I'd like to invite everyone to participate - if you have an idea how to extend Netcat, what features to add, it is sitting in ONNV waiting for new RFEs (or bugs) and sponsor requests (be sure to read Jayakara Kini's explanation of how to contribute if you're not OpenSolaris contributor yet).Also, if you're Netcat user and use Netcat in a cool way, I want to hear that !

CR 4664622 has been integrated into Nevada and will be part of build 80 (which means it will not be part of nextSXDE release but I can live with that :)). During the course of getting the process done...

Personal

Chemical plants, Linux and dtrace

When my friend from Usti nad Labem (9th biggest city in Czech republic) asked me to present about OpenSolaris at local Linux conference, I got hooked. First, Usti is interestingcity (the city is surrounded by beautiful countryside, yet has chemical plants in the vicinity of city center) and I haven't been there for a long time and second, having the opportunity to present about OpenSolaris technologiesto Linux folks is unique.When we (Sasha agreed to go present with me) arrived to Usti we were greeted by slightly apocalyptic weather (see pictures below).The environment where all the presentations took place compensated that, fortunately.40 minutes to present about something which most people in the room are not very aware ofis challenging. The fact that OpenSolaris is open source and it is a home for severaldisruptive technologies makes that both easier and harder. We have greatly leveraged Bryan Cantrill's Dtrace review video taped at Google for doing second part of the presentation where we demo'ed dtrace. I have even borrowed some ofhis quotes. I am pretty sure he wouldn't object since his presentations wereperused in the past. To make the list of attributions complete, we have substantial materialin the first part from Lukas' past presentations about OpenSolaris project.It's much better to demo the technology than just talk about how great it is(I remember a funny moment in Bryan's presentation where a dtrace probe didn't "want" tofire where Bryan jokingly said "bad demo!" to the screen. I nearly fell off of my chair at that moment.). "So, I have finally seen dtrace in action !" was one of the great thingsto hear after the presentation.The "OpenSolaris technologies" presentation can be downloaded here.

When my friend from Usti nad Labem (9th biggest city in Czech republic) asked me to present aboutOpenSolaris at local Linux conference, I got hooked. First, Usti is interestingcity (the city is...

Personal

Netcat package and code review

As you might know, Netcat implementation is going to be part of OpenSolaris. The initial Netcat integration is based on a reimplementation fromOpenBSD (here's why).As Jeff Bonwick said, open sourced code is nothing compared to the fact that all design discussions and decisions suddenly happen in the public (loosely paraphrased). This is a great wave to ride and I have jumped on itwhen it was not really small so I have at least posted the webrev pointer for initial Netcat integration (CR 4664622) to the opensolaris-code mailing list (which is roughly the equivalent of freebsd-hackers, openbsd-tech or similar mailing lists)to get some code review comments.Since then couple of things changed. Thanks to Dan Price and othersit'snow possibleto upload webrevs to cr.opensolaris.org.I have started using the service so the new and official place for the Netcat webrev iscr.opensolaris.org/~vkotal/netcat-webrev/The webrev has moved location but what I said in the opensolaris-code post still holdstrue:Any constructive notes regarding the code are welcome. (I am mainly looking for functional/logic errors, packaging flaws or parts of code which could mismatch the PSARC case)The following things could help any potential code reviewer:Summary of the changes done to the original implementation in order to adapt it to Solaris environment.. PSARC 2007/389 case covering interfaces delivered by this project. For more informationabout ARCs see Architecture Process and Tools community pages.SUNWnetcat package (x86) which contains /usr/bin/nc binarywebrev of the differences between my version of Netcat and the one which is currently in OpenBSD. Only the \*.[ch] files matter, of course. (This is very easy thing to do with distributed SCM since it only requires one to reparent and regenerate webrev against new parent workspace)Updated manual page This is slightly different from the man page in the PSARC materials because it contains new section about using nc with privileges and associated set of examples in the EXAMPLES section. The man page in the PSARC materials will not be updated because after a case is approved, the man page is updated only in case some architectural changes were needed. In the case of privileges, it is only addition describing specific usage, no architectural changes.The conclusion for non code reviewers ? I hope it is clear the in (Open)Solaris landwe value quality and transparency. Peer reviews and architectural reviews are just(albeit crucial) pieces which help to achieve that goal.

As you might know, Netcat implementation is going to be part of OpenSolaris. The initial Netcat integration is based on areimplementation fromOpenBSD (here's why). As Jeff Bonwick said, open sourced...

Solaris

Getting code into libc

In my previous entry about BSD compatibility gap closure process I have promised to provide a guide on how to get new code into libc.I will use changes done via CR 6495220 to illustrate the process with examples.Process related and technical changes which are usually needed:get PSARC case done File a CR to create a manual page according to the man page draft supplied with the PSARC case. You will probably need to go through the functions being added and assign them MT-Level according to attributes(5) man page (if this was not done prior to filing the PSARC case).actually add the code into libc This includes moving/introducing files from the SCM point of view and doing necessary changes to the Makefiles. In terms of symbols, the functions need to be actually delivered twice. Once as underscored (strong) symbol and second as WEAK alias to the strong symbol. This allows libraries use their own private implementation of the functions. (This is because the weak symbol is silently overridden by the private symbol in runtime linker)add entries to c_synonyms.h and synonyms.hsynonyms.h is used in libc for symbol alias contruction (see above). c_synonyms.h provides access to underscored symbols for other (non-libc) libraries. This provides a way how to call the underscored symbols directly without risking namespace clashes/pollution. This step is actually needed to be used in conjunction with the previous step. nm(1) can be used to check this worked as expected:$ nm -x /usr/lib/libc.so.1 | grep '|_\\?err$'[5783] |0x00049c40|0x00000030|FUNC |GLOB |0 |13 |_err[6552] |0x00049c40|0x00000030|FUNC |WEAK |0 |13 |errDo the necessary packaging changes If you're adding new header file change SUNWhea's prototype_\* files (most probably just prototype_com) If the file was previously installed into proto area during build it needs to be removed from the exception files (for i386 and sparc).modify lib's mapfile This is needed for the symbols to become visible and versioned. Public symbols belong to the latest SUNW section. After you have compiled the changes you can check this via command similar to the following:pvs -dsv -N SUNW_1.23 /usr/lib/libc.so.1 \\ | sed -n '1,/SUNW.\*:/p' | egrep '((v?errx?)|(v?warnx?));' vwarnx; ... If you're adding private (underscored) symbols do not forget to add them to the SUNWprivate section. This is usually the case because the strong symbols are accompanied by weak symbols. Weak symbols go to the global part of the most recentSUNW section and strong symbols go to global part of SUNWprivate section.update libc's lint library If you are adding private symbols then add them as well. See the entries _vwarnfp et al. for example. After you're done it's time to run nightly with lint checks and fix the noise. (see below)Add per-symbol filters If you are moving stuff from a library to libc you will probably want to preserve the existing interfaces. To accomplish this per-symbol filters can be added to the library you're moving from. So, if symbol foo is moved from libbar to libc then change the line in the global section of libbar's mapfile to look like this:foo = FUNCTION FILTER libc.so.1; This was done with the \*fp functions in libipsecutils' mapfile. The specialty in that case was that the \*fp functions were renamed to underscored variants while moving them via redefines in errfp.h.Fix build/lint noise introduced by the changes There could be the following noises:build noise Can be caused by symbol type clash (there is symbol of the same name defined in libc as FUNC and in $SRC/cmd as OBJT) which is not harmful because ld(1) will do due diligence and prefer the non-libc symbol. This can be fixed by renaming the local symbol. There could also be #define clash caused by inclusion of c_synonyms.h. Fixed via renaming as well.lint noise In the 3rd pass of the lint checks an inconsistency in function declarations can be found such as this:/builds/.../usr/include/err.h", line 43: warning: function argument declaredinconsistently: warn(arg 1) in utils.c(62) char \* and llib-lc:err.h(43) const char \*(E_INCONS_ARG_DECL2) The problem with this output is that there are cca 23 files named utils.c in ONNV. CR 6585621 is waiting someone to provide remedy for that via adding -F flag to LINTFLAGS in $SRC/lib and $SRC/cmd. After the right file(s) are found the fix is usually renaming again. Where the renaming is not possible -erroff=E_FUNC_DECL_VAR_ARG2 can be passed to lint(1).Make sure there are not duplicate symbols in libc after the changes This is necessary because it might confuse debugging tools (mdb, dtrace). For err/warn stuff there was one such occurence:[6583] | 301316| 37|FUNC |GLOB |0 |13 |_warnx[1925] | 320000| 96|FUNC |LOCL |0 |13 |_warnx This can be usually solved by renaming the local variant.Test thoroughlytest with different compilers SunStudio does different things than gcc so it is good idea to test the code with both.Try to compile different consolidations (e.g. Companion CD, SFW) on top of the changes. For err/warn project a bug was filed to get RPM build fixed.Test if the WEAK symbols actually workTest the programs in ONNV affected by the changes e.g. the programs which needed to be modified because of the build/lint noise.Produce annotated putback list explaining the changes This is handy for a CRT advocate and saves time.If the change requires some sort of action from fellow gatelings, send a heads-up, e.g. like heads-up for err/warn.If you are actually adding code to libc (this includes moving code from other libraries to libc) send an e-mail similar to the heads-up e-mail to opensolaris-code mailing list, e.g. like this message about err/warn.

In my previous entry about BSD compatibility gap closure process I have promised to provide a guide on how to get new code into libc. I will use changes done via CR 6495220 to illustrate the process...

Solaris

Closing BSD compatibility gap

I do not get to meet customers very often but I clearly remember the last time where I participated in a pseudo-technical session with a new customer. The engineers were keen on learning details about all features which make Solaris outstanding but they were also bewildered by the lack of common functions such as daemon() (see CR 4471189). Yes, there is a number of private implementations in various places in ONNV, however this is not very useful. Until recently, this was also the case of err/warn function family.With the putback of CR 6495220 there are now the following functions living in libc: void err(int, const char \*, ...); void verr(int, const char \*, va_list); void errx(int, const char \*, ...); void verrx(int, const char \*, va_list); void warn(const char \*, ...); void vwarn(const char \*, va_list); void warnx(const char \*, ...); void vwarnx(const char \*, va_list);These functions were present in BSD systems for a long time (they've been in FreeBSD since 1994).The configure scripts of various pieces of software contain checks for presence and functionality of the err/warn functions in libc (and setting the HAVE_ERR_H define). For Solaris, those checks have now become enabled too.The err(3C) man page covering these functions will be delivered in the same build as the changes, that is build 72.The change is covered by PSARC 2006/662 architectural review and the stability level for all the functions is Committed(see Architecture Process and Tools for more details on how this works).Unfortunately, the case is still closed. Hopefully it will be opened soon.Update 09-28-2007: the PSARC/2006/662 case is now open, including onepager document and e-mail discussion. Thanks to John Plocher, Darren Reed and Bill Sommerfeld.As I prompted in the err/warn Heads-up there is now time to stop creating new private implementations and to look at purging duplicates (there are many of them, however not all can be gotten rid of in favour of err/warn from libc).I will write about how to get code into libc in general from a more detailed perspective next time.However, this does not mean there is nothing left to do in this area. Specifically, FreeBSD's err(3)contains functions err_set_exit(), err_set_file() and errc(), warnc(), verrc() and vwarnc(). These functions could be ported over too. Also, there is __progname or getprogname(3). Moreover, careful (code) reader has realized that err.c contains private implementations of functions with fp suffix. This function (sub)family could be made Committed too. So, there is still lot of work which could be done. Anyone is free to work on any of these. (see Jayakara Kini's blog entry on how to become OpenSolaris contributor if you're not already)

I do not get to meet customers very often but I clearly remember the last time where I participated in a pseudo-technical session with a new customer. The engineers were keen on learning details about...

Solaris

Coloring dtrace output

Currently I am working on a subtle bug in KSSL. (SSL kernel proxy) In order to diagnose the root cause of the bug I use set of dtracescripts to gather data in various probes. One of the dtrace scripts I am using looks like this:#!/usr/sbin/dtrace -Cs/\* trace mbuf which caused activation of kssl-i-kssl_handle_any_record_recszerr probe NOTE: this presumes that every mblk seen by TCP/KSSL is resonably sized (cca 100 bytes) \*//\* how many bytes from a mblk to dump \*/#define DUMP_SIZE 48/\* generic kssl mblk dump probe (covers both input path and special cases) \*/sdt:::kssl_mblk-\*{ trace(timestamp); printf("\\nmblk size = %d", ((mblk_t \*)arg0)->b_wptr - ((mblk_t \*)arg0)->b_rptr); tracemem(((mblk_t \*)arg0)->b_rptr, DUMP_SIZE); tracemem(((mblk_t \*)arg0)->b_wptr - DUMP_SIZE, DUMP_SIZE);}The scripts usually collect big chunks of data from the various SDT probes I have put into kssl kernel module. After the dataare collected I usually spend big chunks of time sifting though it.At one point of time I have got a suspicion that the problem is actuallya race condition of sorts. In order to shed some light on what's goingon I have used less(1) which provides highlighting of data when searching.While this is sufficient when searching for a single pattern, it does not scalewhen more patterns are used. This is when I got the idea to color theoutput from dtrace scripts to see the correlations (or lack of them) of the events with a simple Perl script. Example of the output colored by the script:This looks slightly more useful than plain black output in terminal but even with 19" display the big picture is missing.So, I have changed the dtrace-coloring script to be able to strip the data parts for probes and print just the headers:This is done via '-n' command line option. (the default is to print everything.)The output containing just the colored headers is especially nice for trackingdown race conditions and other time-sensitive misbehaviors.You can download the script for dtrace log coloring here:dtrace-coloring.plThe colors can be assigned to regular expressions in the hash 'coloring' inthe script itself. For the example above I have used the following assignments:my %coloring = ( '.\*kssl_mblk-i-handle_record_cycle.\*' => '4b6983', # dark blue '.\*kssl_mblk-enqueue_mp.\*' => '6b7f0d', # dark green '.\*kssl_mblk-getnextrecord_retmp.\*' => 'a11c10', # dark red);In the outputs above you might have noticed that a timestamp is printed when a probe fires. This is useful for pre-processing of the log file.dtrace(1) (or libdtrace to be more precise) does not sort events as they comefrom the kernel. (see CR 6496550 for more details) In cases when hunting down a race condition on multiprocessor machine having the output sortedis crucial. So in order to get consistentimage suitable for race condition investigation a sort script is needed. You mightuse a crude script of mine or you can write yours :)Technorati Profile

Currently I am working on a subtle bug in KSSL. (SSL kernel proxy) In order to diagnose the root cause of the bug I use set of dtracescripts to gather data in various probes. One of the dtrace...

Solaris

Sound Blaster Live! on Solaris

During stressful days of libmd backport to Solaris 10 update 4 I managed to break my iPod miniby sliding on my chair backwards and running to get next coffee cup. (The catch was that I still had my headphones connected to the iPod on so the iPod fell down on the floor. It now only reports the unhappy face of Macintosh.)Only after that I have realized how music is important for my day-to-day tasks. I cannot simply continue in the same pace as before without steady flow of rhythm into my ears.Before buying another iPod I wanted to get some interim relief. This is when I entered the not-so-happy world of Solaris audio drivers. My workstation is Acer Aspire E300 which came with integrated TV card and whatnot but also with integrated sound card. The sound card is supported in Solaris but the level of noise coming from it was unbearable. (and I am no HiFi snob, listening mostly to 32kbps radio Akropolis streams)After some googling I have realized that there is a driver for Sound Blaster Live! 5.1 PCI card which I had in my Solaris test workstation at home. The driver was backported by Jürgen Keil (frequent OpenSolaris contributor) from NetBSD among other drivers.The relevant part of prtconf -v output looks like this: pci1102,8027 (driver not attached) Hardware properties: name='assigned-addresses' type=int items=5 value=81024810.00000000.00009400.00000000.00000020 name='reg' type=int items=10 value=00024800.00000000.00000000.00000000.00000000.01024810.00000000.00000000.00000000.00000020 name='compatible' type=string items=7 value='pci1102,2.1102.8027.7' + 'pci1102,2.1102.8027' + 'pci1102,8027' + 'pci1102,2.7' + 'pci1102,2' + 'pciclass,040100' + 'pciclass,0401' name='model' type=string items=1 value='Audio device'It's quite easy to get it working:disable the on-board sound card in BIOSget the sources and extract them /usr/sfw/bin/wget http://www.tools.de/files/opensource/solaris/audio/beta/audio-1.9beta.tar.bz2 bzcat audio-1.9beta.tar.bz2 | tar -xf -compile (set the PATH if needed to get access to gcc/ar) export PATH=/usr/bin:/usr/sbin:/usr/sfw/bin:/usr/sfw/sbin:/usr/ccs/bin cd audio-1.9beta && makeinstall the driver (audioemu) and its dependency (audiohlp) on x86: cp drv/emu/audioemu /platform/i86pc/kernel/drv cp drv/emu/amd64/audioemu /platform/i86pc/kernel/drv/amd64 cp drv/emu/audioemu.conf /platform/i86pc/kernel/drv cp misc/audiohlp /kernel/misc cp misc/amd64/audiohlp /kernel/misc/amd64 cp misc/audiohlp.conf /kernel/miscattach the driver (see instructions in drv/emu/Makefile)add_drv -i '"pci1102,2" "pci1102,4"' audioemurebootYes, I could have used the infrastructure provided in Makefiles to create the package and installit but I wanted to have minimalistic install and have just the things which are really needed.After the reboot you should be able to play sound via e.g. xmms. (installed e.g. from Blastwave) Check for audioemu in modinfo(1M) output and dmesg(1M) output for error messages if something goes wrong.So far it has been working for me rather flawlessly. (no kernel panics ;))During the search for the driver I have discovered number of complaints from the users trying OpenSolaris for the first time that their Sound Blaster Live! was not recognized.Looking into Device drivers community not much is going on about soundcard drivers.I wonder how hard would it be to get the audioemu NetBSD-based driver into ON.. the CR number is 6539690.2007-08-08 update: After asking on opensolaris-discuss mailing list I have realized that there isa project underway which will deliver OSS into (Open)Solaris. Some info can be found in PSARC/2007/238, however there is no project page at opensolaris.org (yet). Hopefully, RFE 6539690will be closed after OSS integrates into ONNV.

During stressful days of libmd backport to Solaris 10 update 4 I managed to break my iPod mini by sliding on my chair backwards and running to get next coffee cup. (The catch was that I still had my h...

Solaris

Simple Solaris Installation

OpenSolaris was already out before I joined Sun so I had a chance to play with Solaris Express Community release for couple of months and actually look into the source code online thanks to Chandan's OpenGrok. Still, with all these goodies it required fair amount of exploration before I figured things out. Being a BSD person some of them were not surprising, whereas some of them slowed me down substantialy.I don't remember now the origin of the thought to summarize the installation steps and basic steps after installation to make the system more useable, this is not important. Anyway, the slides containing all of this information (and more) were made.The slides were originaly meant for CZOSUG Bootcamp where people brought their laptops and installed Solaris on them. I have created the slides together with Jan Pechanec . After watching both external people and new-hires struggle with basic steps after completing Solaris installation I think they could be used also to ease those post-install how-do-I-set-this steps.They are not perfect, could contain errors (please do report them) but here you go:Simple Solaris Installation slidesAlso, do not forget to look at recently founded Immigrants community at opensolaris.org which contains other goodies such as links to Ben Rockwood's Accelerated introduction to Solaris. Also do not forget to subscribe to the Immigrants mailing list.

OpenSolaris was already out before I joined Sun so I had a chance to play with Solaris Express Community release for couple of months and actually look into the source code online thanks to Chandan's O...