Wednesday Sep 03, 2008

strsep() in libc

As of today, strsep() function lives in Nevada's libc (tracked by CR 4383867 and PSARC 2008/305). This constitutes another step in the quest for more feature-full (in terms of compatibility) libc in OpenSolaris. In binary form, the changes will be available in build 99. The documentation will be part of the string(3C) man page.

Here's a small example of how to use it:

#include <stdio.h>
#include <string.h>
#include <err.h>

int parse(const char \*str) {
        char \*p = NULL;
        char \*inputstring, \*origstr;
        int ret = 1;
        if (str == NULL)
                errx(1, "NULL string");

         \* We have to remember original pointer because strsep()
         \* will change 'inputstr' pointer.
        if ((origstr = inputstring = strdup(str)) == NULL)
                errx(1, "strdup() failed");

        printf("=== parsing '%s'\\n", inputstring);
        for ((p = strsep(&inputstring, ",")); p != NULL;
           (p = strsep(&inputstring, ","))) {
                if (p != NULL && \*p != '\\0')
                        printf("%s\\n", p);
                else if (p != NULL) {
                        warnx("syntax error");
                        ret = 0;
                        goto bad;
        printf("=== finished parsing\\n");
        return (ret);

int main(int argc, char \*argv[]) {
        if (argc != 2)
                errx(1, "usage: prog ");

        if (!parse(argv[1]))

        return (0);

This example was actually used as a unit test (use e.g. "1,22,33,44" and "1,22,,44,33" as input string) and it also nicely illustrates important properties of strsep() behavior:

  • While searching for tokens, strsep() modifies the original string. This is shared property with strtok().
  • Unlike strtok(), strsep() is able to detect empty fields.

There is a function in Solaris' libc which can do token splitting and does not modify the original string - strcspn(). The other notable property of strsep() is that (unlike strtok()) it does not conform to ANSI-C. Time to draw a table:

 function(s)   ISO C90    modifies     detects
                           input     empty fields
 strsep()        No          Yes         Yes
 strtok()        Yes         Yes         No
 strcspn()       Yes         No        Sort of

None of the above functions is bullet-proof. The bottom line is the user should decide which is the most suitable for given task and use it with its properties in mind.

Thursday Aug 07, 2008

Customizing Mercurial outgoing output

Part of the transition of Mercurial in OpenSolaris are changes in the integration processes. Every RTI has to contain output of hg outgoing -v so the CRT advocates can better see the impact of the changes in terms of changed files. However, the default output is not very readable:

    $ hg outgoing -v
    comparing with /local/ws-mirrors/onnv-clone.hg
    searching for changes
    changeset:   7248:225922d15fe6
    user:        Vladimir Kotal 
    date:        2008-08-06 23:39 +0200
    modified:    usr/src/cmd/ldap/ns_ldap/ldapaddent.c usr/src/cmd/sendmail/db/config.h usr/src/cmd/ssh/include/config.h usr/src
    /cmd/ssh/include/openbsd-compat.h usr/src/cmd/ssh/include/strsep.h usr/src/cmd/ssh/libopenbsd-compat/ usr/src
    /cmd/ssh/libopenbsd-compat/common/llib-lopenbsd-compat usr/src/cmd/ssh/libopenbsd-compat/common/strsep.c usr/src/cmd/ssh/libssh
    /common/llib-lssh usr/src/common/util/string.c usr/src/head/string.h usr/src/lib/libc/amd64/Makefile usr/src/lib/libc
    /i386/ usr/src/lib/libc/port/gen/strsep.c usr/src/lib/libc/port/llib-lc usr/src/lib/libc/port/mapfile-vers usr/src
    /lib/libc/sparc/Makefile usr/src/lib/libc/sparcv9/Makefile usr/src/lib/passwdutil/ usr/src/lib/passwdutil
    /bsd-strsep.c usr/src/lib/passwdutil/passwdutil.h usr/src/lib/smbsrv/libsmb/common/mapfile-vers usr/src/lib/smbsrv/libsmb
    added:       usr/src/lib/libc/port/gen/strsep.c
    deleted:     usr/src/cmd/ssh/include/strsep.h usr/src/cmd/ssh/libopenbsd-compat/common/strsep.c usr/src/lib/passwdutil/bsd-strsep.c
    log:PSARC 2008/305 strsep() in libc
    4383867 need strsep() in libc

In the above case, the list of modified files spans single line which makes the web form used for RTI go really wild in terms of width (I had to wrap the lines manually in the above example otherwise this page would suffer from the same problem). The following steps can be used to make the output a bit nicer:

  1. create ~/bin/Mercurial/ with the following contents:
    from mercurial import templatefilters
    def newlines(text):
        return text.replace(' ', '\\n')
    def outgoing_hook(ui, repo, \*\*kwargs):
        templatefilters.filters["newlines"] = newlines
  2. hook into outgoing command in ~/.hgrc by adding the following lines into [hooks], [extensions] sections so it looks like this:
  3. create ~/bin/Mercurial/style.outgoing with the following contents:
    changeset = outgoing.template
  4. create ~/bin/Mercurial/outgoing.template with the following contents (the file can be downloaded here):
    changeset:	{rev}:{node|short}
    user:		{author}
    date:		{date|isodate}
  5. add the following into your ~/.bashrc (or to .rc file of the shell of your choice):
    alias outgoing='hg outgoing --style ~/bin/Mercurial/style.outgoing'

After that it works like this:

    $ outgoing
    comparing with /local/ws-mirrors/onnv-clone.hg
    searching for changes
    changeset:	7248:225922d15fe6
    user:		Vladimir Kotal 
    date:		2008-08-06 23:39 +0200
    PSARC 2008/305 strsep() in libc
    4383867 need strsep() in libc

I asked Richard Lowe (who has been very helpful with helping getting the transition process done) if next Mercurial version can have newlines function already included and if there could be outgoingtemplate which would be similar to logtemplate in hgrc(5).
In the meantime I will be using the above for my RTIs.

Sunday Apr 27, 2008

Test suite for netcat

In OpenSolaris world we very much care about correctness and hate regressions (of any kind). If I loosely paraphrase Bryan Cantrill the degree of devotion should be obvious:

"Have you tested your change in every way you know of ? If not, do not go any further with the integration unless you do so."

This implies that ordinary bug fix should have a unit test accompanying it. But, unit tests are cumbersome when performed by hand and do not mean much if they are not accumulated over time.

For integration of Netcat into OpenSolaris I have developed number of unit tests (basically at least one for each command line option) and couple more after spotting some bugs in nc(1). This means that nc(1) is ripe for having a test suite so the tests can be performed automatically. This is tracked by RFE 6646967. The test suite will live in onnv-stc2 gate which is hosted and maintained by OpenSolaris Testing community.

To create a test suite one can choose between two frameworks: STF and CTI-TET. I have chosen the latter because I wanted to try something new and also because CTI-TET seems to be the recommended framework these days.

The work on nc test suite has started during Christmas break 2007 and after recovery from lost data it is now in pretty stable state and ready for code review. This is actually somewhat exciting because nc test suite is supposed to be the first OpenSolaris test suite developed in the open.

Fresh webrev is always stored on in nc-tet.onnv-stc2 directory. Everybody is invited to participate in the code review.

Code review should be performed via testing-discuss at mailing list (subscribe via Testing / Discussions). It has web interface in the form of testing-discuss forum.

So, if you're familiar with ksh scripting or CTI-TET framework (both not necessary) you have unique chance to bash (not bash) my code ! Watch for official code review announcement on the mailing list in the next couple of days.

Lastly, another philosophical food for thought: Test suites are sets of programs and scripts which serve mainly one purpose - they should prevent bugs from happening in the software they test. But, test suites are software too. Presence of bugs in test suites is an annoying phenomenon. How to get rid of that one ?

Sunday Apr 13, 2008

poll(2) and POLLHUP with pipes in Solaris

During nc(1) preintegration testing, short time before it went back I had found that 'cat /etc/passwd | nc localhost 4444' produced endless loop with 100% CPU utilization, looping in calls doing poll(2) (I still remember my laptop suddenly getting much warmer than it should be and CPU fan cranking up). 'nc localhost 4444 < /etc/password' was not exhibiting that behavior.
The cause was a difference between poll(2) implementation on BSD and Solaris. Since I am working on Netcat in Solaris again (adding more features, stay tuned), it's time to take a look back and maybe even help people porting similar software from BSD to Solaris.

The issue appears because POLLHUP is set in read events bitfield for stdin after pipe is closed (or to be more precise - after the producer/write end is done) on Solaris. poll.c (which resembles readwrite() function from nc) illustrates the issue:

01 #include <stdio.h>
02 #include <poll.h>
04 #define LEN  1024
06 int main(void) {
07      int timeout = -1;
08      int n;
09      char buf[LEN];
10      int plen = LEN;
12      struct pollfd pfd;
14      pfd.fd = fileno(stdin);
15 = POLLIN;
17      while (pfd.fd != -1) {
18              if ((n = poll(&pfd, 1, timeout)) < 0) {
19                      err(1, "Polling Error");
20              }
21              fprintf(stderr, "revents = 0x%x [ %s %s ]\\n",
22                  pfd.revents,
23                  pfd.revents & POLLIN ? "POLLIN" : "",
24                  pfd.revents & POLLHUP ? "POLLHUP" : "");
26              if (pfd.revents & (POLLIN|POLLHUP)) {
27                      if ((n = read(fileno(stdin), buf, plen)) < 0) {
28                              fprintf(stderr,
29                                  "read() returned neg. val (%d)\\n", n);
30                              return;
31                      } else if (n == 0) {
32                              fprintf(stderr, "read() returned 0\\n", n);
33                              pfd.fd = -1;
34                     = 0;
35                      } else {
36                              fprintf(stderr, "read: %d bytes\\n", n);
37                      }
38              }
39      }
40 }

Running it on NetBSD (chosen because my personal non-work mailbox is hosted on a machine running it) produces the following:

otaku[~]% ( od -N 512 -X -v /dev/zero | sed 's/ [ \\t]\*/ /g'; sleep 3 ) | ./poll
revents = 0x1 [ POLLIN  ]
read: 1024 bytes
revents = 0x1 [ POLLIN  ]
read: 392 bytes
revents = 0x11 [ POLLIN POLLHUP ]
read() returned 0

I had to post-process the output of od(1) (because of difference between output of od(1) on NetBSD and Solaris) and slow the execution down a bit (via sleep) in order to make things more visible (try to run the command without the sleep and the pipe will be closed too quickly). On OpenSolaris the same program produces different pattern:

moose:~$ ( od -N 512 -X -v /dev/zero | sed 's/ [ \\t]\*/ /g' ; sleep 3 ) | ./poll 
revents = 0x1 [ POLLIN  ]
read: 1024 bytes
revents = 0x1 [ POLLIN  ]
read: 392 bytes
revents = 0x10 [  POLLHUP ]
read() returned 0

So, the program is now obviously correct. Had the statement on line 26 checked only POLLIN, the command above (with or without the sleep) would go into endless loop on Solaris:

revents = 0x11 [ POLLIN POLLHUP ]
read: 1024 bytes
revents = 0x11 [ POLLIN POLLHUP ]
read: 392 bytes
revents = 0x10 [  POLLHUP ]
revents = 0x10 [  POLLHUP ]
revents = 0x10 [  POLLHUP ]

Both OSes set POLLHUP after the pipe is closed. The difference is that while BSD always indicates POLLIN (even if there is nothing to read), Solaris strips it after data stream ended. So, which one is correct ? poll() function as described by OpenGroup says that "POLLHUP and POLLIN are not mutually exclusive". This means both implementations seem to conform to the IEEE Std 1003.1, 2004 Edition standard (part of POSIX) in this respect.

However, the POSIX standard also says:

    In each pollfd structure, poll ( ) shall clear the revents member, except that where the application requested a report on a condition by setting one of the bits of events listed above, poll ( ) shall set the corresponding bit in revents if the requested condition is true. In addition, poll ( ) shall set the POLLHUP, POLLERR, and POLLNVAL flag in revents if the condition is true, even if the application did not set the corresponding bit in events.

This might be still ok even though POLLIN flag remains to be set in NetBSD's poll() even after no data are available for reading (try to comment out lines 33,34 and run as above) because the standard says about POLLIN flag: For STREAMS, this flag is set in revents even if the message is of zero length.

Without further reading it is hard to tell how exactly should POSIX compliant poll() look like. On the Austin group mailing list there was a thread about poll() behavior w.r.t. POLLHUP suggesting this is fuzzy area.

Anyway, to see where exactly is POLLHUP set for pipes in OpenSolaris go to fifo_poll(). The function _sets_ the revents bit field to POLLHUP so the POLLIN flag is wiped off after that. fifo_poll() is part of fifofs kernel module which has been around in Solaris since late eighties (I was still in elementary school the year fifovnops.c appeared in SunOS code base :)). NetBSD has fifofs too but the POLLHUP flag gets set via bit logic operation in pipe_poll() which is part of syscall processing code. The difference between OpenSolaris and NetBSD (whoa, NetBSD project uses OpenGrok !) POLLHUP attitude (respectively) is now clear:

Thursday Apr 03, 2008

ZFS is going to save my laptop data next time

The flashback is still alive even weeks after: the day before my presentation at FIRST Technical Colloquium in Prague I brought my 2 years old laptop with the work-in-progress slides to the office. Since I wanted to finish the slides in the evening a live-upgrade process was fired off on the laptop to get fresh Nevada version. (of course, to show off during the presentation ;))
LU is very I/O intensive process and the red Ferrari notebooks tend to get _very_ hot. In the afternoon I noticed that the process failed. To my astonishment, the I/O operations started to fail. After couple of reboots (and zpool status / fmadm faulty commands) it was obvious that the disk cannot be trusted anymore. I was able to rescue some data from the ZFS pool which was spanning the biggest slice of the internal disk but not all data. (ZFS is not willing to get corrupted data out.) My slides were lost as well as other data.

After some time I stumbled upon James Gosling's blog entry about ZFS mirroring on laptop. This get me started (or more precisely I was astonished and wondered how is it possible that this idea escaped me because at that time ZFS had been in Nevada for a long time) and I have discovered several similar and more in-depth blog entries about the topic.
After some experiments with borrowed USB disk it was time to make it reality on a new laptop.

The process was a multi-step one:

  1. First I had to extend the free slice #7 on the internal disk so it spans the remaining space on the disk because it was trimmed after the experiments. In the end the slices look like this in format(1) output:
    Part      Tag    Flag     Cylinders         Size            Blocks
      0       root    wm       3 -  1277        9.77GB    (1275/0/0)   20482875
      1 unassigned    wm    1278 -  2552        9.77GB    (1275/0/0)   20482875
      2     backup    wm       0 - 19442      148.94GB    (19443/0/0) 312351795
      3       swap    wu    2553 -  3124        4.38GB    (572/0/0)     9189180
      4 unassigned    wu       0                0         (0/0/0)             0
      5 unassigned    wu       0                0         (0/0/0)             0
      6 unassigned    wu       0                0         (0/0/0)             0
      7       home    wm    3125 - 19442      125.00GB    (16318/0/0) 262148670
      8       boot    wu       0 -     0        7.84MB    (1/0/0)         16065
      9 alternates    wu       1 -     2       15.69MB    (2/0/0)         32130
  2. Then the USB drive was connected to the system and recognized via format(1):
           0. c0d0 
           1. c5t0d0 
  3. Live upgrade boot environment which was not active was deleted via ludelete(1M) and the slice was commented out in /etc/vfstab. This was needed to make zpool(1M) happy.
  4. ZFS pool was created out of the slice on the internal disk (c0d0s7) and external USB disk (c5t0d0). I had to force it cause zpool(1M) complained about the overlap of c0d0s2 (slice spanning the whole disk) and c0d0s7:
    # zpool create -f data mirror c0d0s7 c5t0d0
    For a while I have struggled with finding a name for the pool (everybody seems either to stick to the 'tank' name or come up with some double-cool-stylish name which I wanted to avoid because of the likely degradation of the excitement from that name) but then chosen the ordinary data (it's what it is, after all).
  5. I have verified that it is possible to disconnect the USB disk and safely connect it while an I/O operation is in progress:
    root:moose:/data# mkfile 10g /data/test &
    [1] 10933
    root:moose:/data# zpool status
      pool: data
     state: ONLINE
     scrub: none requested
    	data        ONLINE       0     0     0
    	  mirror    ONLINE       0     0     0
    	    c0d0s7  ONLINE       0     0     0
    	    c5t0d0  ONLINE       0     0     0
    errors: No known data errors
    It survived it without a hitch (okay, I had to wait for the zpool command to complete a little bit longer due to the still ongoing I/O but that was it) and resynced the contents automatically after the USB disk was reconnected:
    root:moose:/data# zpool status
      pool: data
     state: DEGRADED
    status: One or more devices are faulted in response to persistent errors.
    	Sufficient replicas exist for the pool to continue functioning in a
    	degraded state.
    action: Replace the faulted device, or use 'zpool clear' to mark the device
     scrub: resilver in progress for 0h0m, 3.22% done, 0h5m to go
    	data        DEGRADED     0     0     0
    	  mirror    DEGRADED     0     0     0
    	    c0d0s7  ONLINE       0     0     0
    	    c5t0d0  FAULTED      0     0     0  too many errors
    errors: No known data errors
    Also, with heavy I/O it is needed to mark the zpool as clear after the resilver completes via zpool clear data because the USB drive is marked as faulty. Normally this will not happen (unless the drive really failed) because I will be connecting and disconnecting the drive only when powering on or shutting down the laptop, respectively.
  6. After that I have used Mark Shellenbaum's blog entry about ZFS delegated administration (it was Mark who did the integration) and ZFS Delegated Administration chapter from OpenSolaris ZFS Administration Guide and created permissions set for my local user and assigned those permissions to the ZFS pool 'data' and the user:
      # chmod A+user:vk:add_subdirectory:fd:allow /data
      # zfs allow -s @major_perms clone,create,destroy,mount,snapshot data
      # zfs allow -s @major_perms send,receive,share,rename,rollback,promote data
      # zfs allow -s @major_props copies,compression,quota,reservation data
      # zfs allow -s @major_props snapdir,sharenfs data
      # zfs allow vk @major_perms,@major_props data
    All of the commands had to be done under root.
  7. Now the user is able to create a home directory for himself:
      $ zfs create data/vk
  8. Time to setup the environment of data sets and prepare it for data. I have separated the data sets according to a 'service level'. Some data are very important (e.g. presentations ;)) to me so I want them multiplied via the ditto blocks mechanism so they are actually present 4 times in case of copies dataset property set to 2. Also, documents are not usually accompanied by executable code so the exec property was set to off which will prevent running scripts or programs from that dataset.
    Some data are volatile and in high quantity so they do not need any additional protection and it is good idea to compress them with better compression algorithm to save some space. The following table summarizes the setup:
       dataset             properties                       comment
     | data/vk/Documents | copies=2                       | presentations          |
     | data/vk/Saved     | compression=on exec=off        | stuff from web         |
     | data/vk/DVDs      | compression=gzip-8 exec=off    | Nevada ISOs for LU     |
     | data/vk/CRs       | compression=on copies=2        | precious source code ! |
    So the commands will be:
      $ zfs create -o copies=2 data/vk/Documents
      $ zfs create -o compression=gzip-3 -o exec=off data/vk/Saved
  9. Now it is possible to migrate all the data, change home directory of the user to /data/vk (e.g. via /usr/ucb/vipw) and relogin.

However, this is not the end of it but just beginning. There are many things to make the setup even better, to name a few:

  • setup some sort of automatic snapshots for selected datasets
    The set of scripts and SMF service for doing ZFS snapshots and backup (see ZFS Automatic For The People and related blog entries) made by Tim Foster could be used for this task.
  • make zpool scrub run periodically
  • detect failures of the disks
    This would be ideal to see in Gnome panel or Gnome system monitor.
  • setup off-site backup via SunSSH + zfs send
    This could be done using the hooks provided by Tim's scripts (see above).
  • Set quotas and reservations for some of the datasets.
  • Install ZFS scripts for Gnome nautilus so I will be able to browse, perform and destroy snapshots in nautilus. Now which set of scripts to use ? Chris Gerhard's or Tim Foster's ? Or should I just wait for the official ZFS support for nautilus to be integrated ?
  • Find how exactly will the recovery scenario (in case of laptop drive failure) will look like.
    To import the ZFS pool from the USB disk should suffice but during my experiments I was not able to complete it successfully.

With all the above the data should be safe from disk failure (after all disks are often called "spinning rust" so they are going to fail sooner or later) and also the event of loss of both laptop and USB disk.

Lastly, a philosophical thought: One of my colleagues considers hardware as a necessary (and very faulty) layer which is only needed to make it possible to express the ideas in software. This might seem extreme but come to think of it. ZFS is special in this sense - being a software which provides that bridge, it's core idea to isolate the hardware faults.

Monday Feb 11, 2008

Grepping dtrace logs

I have been working on a tough bug for some non-trivial time. The bug is a combination of race condition and data consistency issues. To debug this I am using multi-threaded apache process and dtrace heavily. The logs produced by dtrace are huge and contain mostly dumps of internal data structures.

The excerpt from such log looks e.g. like this:

  1  50244       pk11_return_session:return       128   8155.698
  1  50223            pk11_RSA_verify:entry       128   8155.7069
  1  50224           pk11_get_session:entry       128   8155.7199
  1  50234          pk11_get_session:return       128   8155.7266
  pid = 1802
  session handle = 0x00273998
  rsa_pub_key handle -> 0x00184e70
  rsa_priv_key handle -> 0x00274428
  rsa_pub = 0x00186248
  rsa_priv = 0x001865f8

  1  50224           pk11_get_session:entry       128   8155.7199

  1  50244       pk11_return_session:return       128   8155.698

Side note: This is post-processed log (probe together with their bodies are timestamp sorted, time stamps are converted to miliseconds - see Coloring dtrace output entry for details and scripts).

Increasingly, I was asking myself questions which resembled this one: "when function foo() was called with data which contained value bar ?"

This quickly lead to a script which does the tedious job for me. accepts 2 or 3 parameters. First 2 are probe pattern and data pattern, respectively. Third, which is optional, is input file (if not supplied, stdin will be used). Example of use on the above pasted file looks like this:

~/bin$ ./ pk11_get_session 0x00186248 /tmp/test
  1  50234          pk11_get_session:return       128   8155.7266
  pid = 1802
  session handle = 0x00273998
  rsa_pub_key handle -> 0x00184e70
  rsa_priv_key handle -> 0x00274428
  rsa_pub = 0x00186248
  rsa_priv = 0x001865f8


Now I have to get back to work, to do some more pattern matching. Oh, and the script is here.

Saturday Dec 01, 2007

Netcat in Solaris

CR 4664622 has been integrated into Nevada and will be part of build 80 (which means it will not be part of next SXDE release but I can live with that :)).

During the course of getting the process done I have stumbled upon several interesting obstacles. For example, during ingress Open Source Review I was asked by our open-source caretaker what will be the "support model" for Netcat once it is integrated. I was puzzled. Because, for Netcat, support is not really needed since it has been around for ages (ok, since 1996 according to wikipedia) and is pretty stable piece of code which is basically no longer developed. Nonetheless, this brings some interesting challenges with move to a community model where more and more projects are integrated by people outside Sun (e.g. ksh93 project).

The nc(1) man page will be delivered in build 81. In the meantime you can read Netcat review blog entry which contains the link to updated man page. The older version of the man page is contained in the mail communication for PSARC 2007/389.
Note: I have realized that the ARC case directory does not have to include most up-to-date man page at the time of integration. Only when something _architectural_ changes, then the man page has to be updated (which was not the case with Netcat since we only added new section describing how to setup nc(1) with RBAC). Thanks to Jim for the explanation.

I have some ideas how to make Netcat in Solaris even better and will work to get them done over time. In particular, there are following RFEs: 6515928, 6573202. However, this does not mean that there is only single person who can work on nc(1). Since it is now part of ONNV, anyone is free to hack it.

So, I'd like to invite everyone to participate - if you have an idea how to extend Netcat, what features to add, it is sitting in ONNV waiting for new RFEs (or bugs) and sponsor requests (be sure to read Jayakara Kini's explanation of how to contribute if you're not OpenSolaris contributor yet).

Also, if you're Netcat user and use Netcat in a cool way, I want to hear that !

Tuesday Aug 28, 2007

Getting code into libc

In my previous entry about BSD compatibility gap closure process I have promised to provide a guide on how to get new code into libc. I will use changes done via CR 6495220 to illustrate the process with examples.

Process related and technical changes which are usually needed:

  • get PSARC case done
  • File a CR to create a manual page according to the man page draft supplied with the PSARC case. You will probably need to go through the functions being added and assign them MT-Level according to attributes(5) man page (if this was not done prior to filing the PSARC case).
  • actually add the code into libc
    This includes moving/introducing files from the SCM point of view and doing necessary changes to the Makefiles.
    In terms of symbols, the functions need to be actually delivered twice. Once as underscored (strong) symbol and second as WEAK alias to the strong symbol. This allows libraries use their own private implementation of the functions. (This is because the weak symbol is silently overridden by the private symbol in runtime linker)
  • add entries to c_synonyms.h and synonyms.h
    synonyms.h is used in libc for symbol alias contruction (see above). c_synonyms.h provides access to underscored symbols for other (non-libc) libraries. This provides a way how to call the underscored symbols directly without risking namespace clashes/pollution.
    This step is actually needed to be used in conjunction with the previous step. nm(1) can be used to check this worked as expected:
    $ nm -x /usr/lib/ | grep '|_\\?err$'
    [5783] |0x00049c40|0x00000030|FUNC |GLOB |0  |13 |_err
    [6552] |0x00049c40|0x00000030|FUNC |WEAK |0  |13 |err
  • Do the necessary packaging changes
    If you're adding new header file change SUNWhea's prototype_\* files (most probably just prototype_com)
    If the file was previously installed into proto area during build it needs to be removed from the exception files (for i386 and sparc).
  • modify lib's mapfile
    This is needed for the symbols to become visible and versioned. Public symbols belong to the latest SUNW section. After you have compiled the changes you can check this via command similar to the following:
    pvs -dsv -N SUNW_1.23 /usr/lib/ \\
      | sed -n '1,/SUNW.\*:/p' | egrep '((v?errx?)|(v?warnx?));'
    If you're adding private (underscored) symbols do not forget to add them to the SUNWprivate section. This is usually the case because the strong symbols are accompanied by weak symbols. Weak symbols go to the global part of the most recent SUNW section and strong symbols go to global part of SUNWprivate section.
  • update libc's lint library
    If you are adding private symbols then add them as well. See the entries _vwarnfp et al. for example.
    After you're done it's time to run nightly with lint checks and fix the noise. (see below)
  • Add per-symbol filters
    If you are moving stuff from a library to libc you will probably want to preserve the existing interfaces. To accomplish this per-symbol filters can be added to the library you're moving from. So, if symbol foo is moved from libbar to libc then change the line in the global section of libbar's mapfile to look like this:
    This was done with the \*fp functions in libipsecutils' mapfile. The specialty in that case was that the \*fp functions were renamed to underscored variants while moving them via redefines in errfp.h.
  • Fix build/lint noise introduced by the changes
    There could be the following noises:
    • build noise
      Can be caused by symbol type clash (there is symbol of the same name defined in libc as FUNC and in $SRC/cmd as OBJT) which is not harmful because ld(1) will do due diligence and prefer the non-libc symbol. This can be fixed by renaming the local symbol. There could also be #define clash caused by inclusion of c_synonyms.h. Fixed via renaming as well.
    • lint noise
      In the 3rd pass of the lint checks an inconsistency in function declarations can be found such as this:
      /builds/.../usr/include/err.h", line 43: warning: function argument declared
      inconsistently: warn(arg 1) in utils.c(62) char \* and llib-lc:err.h(43) const char \*
      The problem with this output is that there are cca 23 files named utils.c in ONNV. CR 6585621 is waiting someone to provide remedy for that via adding -F flag to LINTFLAGS in $SRC/lib and $SRC/cmd.
      After the right file(s) are found the fix is usually renaming again. Where the renaming is not possible -erroff=E_FUNC_DECL_VAR_ARG2 can be passed to lint(1).
  • Make sure there are not duplicate symbols in libc after the changes
    This is necessary because it might confuse debugging tools (mdb, dtrace). For err/warn stuff there was one such occurence:
    [6583] | 301316| 37|FUNC |GLOB |0 |13 |_warnx
    [1925] | 320000| 96|FUNC |LOCL |0 |13 |_warnx
    This can be usually solved by renaming the local variant.
  • Test thoroughly
    • test with different compilers
      SunStudio does different things than gcc so it is good idea to test the code with both.
    • Try to compile different consolidations (e.g. Companion CD, SFW) on top of the changes. For err/warn project a bug was filed to get RPM build fixed.
    • Test if the WEAK symbols actually work
    • Test the programs in ONNV affected by the changes
      e.g. the programs which needed to be modified because of the build/lint noise.
  • Produce annotated putback list explaining the changes
    This is handy for a CRT advocate and saves time.
  • If the change requires some sort of action from fellow gatelings, send a heads-up, e.g. like heads-up for err/warn.
  • If you are actually adding code to libc (this includes moving code from other libraries to libc) send an e-mail similar to the heads-up e-mail to opensolaris-code mailing list, e.g. like this message about err/warn.

Thursday Aug 23, 2007

Closing BSD compatibility gap

I do not get to meet customers very often but I clearly remember the last time where I participated in a pseudo-technical session with a new customer. The engineers were keen on learning details about all features which make Solaris outstanding but they were also bewildered by the lack of common functions such as daemon() (see CR 4471189). Yes, there is a number of private implementations in various places in ONNV, however this is not very useful. Until recently, this was also the case of err/warn function family.

With the putback of CR 6495220 there are now the following functions living in libc:

  void err(int, const char \*, ...);
  void verr(int, const char \*, va_list);
  void errx(int, const char \*, ...);
  void verrx(int, const char \*, va_list);
  void warn(const char \*, ...);
  void vwarn(const char \*, va_list);
  void warnx(const char \*, ...);
  void vwarnx(const char \*, va_list);

These functions were present in BSD systems for a long time (they've been in FreeBSD since 1994). The configure scripts of various pieces of software contain checks for presence and functionality of the err/warn functions in libc (and setting the HAVE_ERR_H define). For Solaris, those checks have now become enabled too.

The err(3C) man page covering these functions will be delivered in the same build as the changes, that is build 72.

The change is covered by PSARC 2006/662 architectural review and the stability level for all the functions is Committed (see Architecture Process and Tools for more details on how this works). Unfortunately, the case is still closed. Hopefully it will be opened soon.
Update 09-28-2007: the PSARC/2006/662 case is now open, including onepager document and e-mail discussion. Thanks to John Plocher, Darren Reed and Bill Sommerfeld.

As I prompted in the err/warn Heads-up there is now time to stop creating new private implementations and to look at purging duplicates (there are many of them, however not all can be gotten rid of in favour of err/warn from libc).

I will write about how to get code into libc in general from a more detailed perspective next time.

However, this does not mean there is nothing left to do in this area. Specifically, FreeBSD's err(3) contains functions err_set_exit(), err_set_file() and errc(), warnc(), verrc() and vwarnc(). These functions could be ported over too. Also, there is __progname or getprogname(3). Moreover, careful (code) reader has realized that err.c contains private implementations of functions with fp suffix. This function (sub)family could be made Committed too. So, there is still lot of work which could be done. Anyone is free to work on any of these. (see Jayakara Kini's blog entry on how to become OpenSolaris contributor if you're not already)

Thursday Jun 14, 2007

Coloring dtrace output

Currently I am working on a subtle bug in KSSL. (SSL kernel proxy) In order to diagnose the root cause of the bug I use set of dtrace scripts to gather data in various probes. One of the dtrace scripts I am using looks like this:

#!/usr/sbin/dtrace -Cs
  trace mbuf which caused activation of 
  kssl-i-kssl_handle_any_record_recszerr probe

  NOTE: this presumes that every mblk seen by TCP/KSSL
        is resonably sized (cca 100 bytes)

/\* how many bytes from a mblk to dump \*/
#define DUMP_SIZE       48

  generic kssl mblk dump probe 
  (covers both input path and special cases)
  printf("\\nmblk size = %d",
      ((mblk_t \*)arg0)->b_wptr - ((mblk_t \*)arg0)->b_rptr);
  tracemem(((mblk_t \*)arg0)->b_rptr, DUMP_SIZE);
  tracemem(((mblk_t \*)arg0)->b_wptr - DUMP_SIZE, DUMP_SIZE);

The scripts usually collect big chunks of data from the various SDT probes I have put into kssl kernel module. After the data are collected I usually spend big chunks of time sifting though it. At one point of time I have got a suspicion that the problem is actually a race condition of sorts. In order to shed some light on what's going on I have used less(1) which provides highlighting of data when searching. While this is sufficient when searching for a single pattern, it does not scale when more patterns are used. This is when I got the idea to color the output from dtrace scripts to see the correlations (or lack of them) of the events with a simple Perl script. Example of the output colored by the script:

colored dtrace output (data)

This looks slightly more useful than plain black output in terminal but even with 19" display the big picture is missing. So, I have changed the dtrace-coloring script to be able to strip the data parts for probes and print just the headers:

colored dtrace output (headers)

This is done via '-n' command line option. (the default is to print everything.) The output containing just the colored headers is especially nice for tracking down race conditions and other time-sensitive misbehaviors. You can download the script for dtrace log coloring here:

The colors can be assigned to regular expressions in the hash 'coloring' in the script itself. For the example above I have used the following assignments:

my %coloring = (
  '.\*kssl_mblk-i-handle_record_cycle.\*' => '4b6983',    # dark blue
  '.\*kssl_mblk-enqueue_mp.\*'    => '6b7f0d',    # dark green
  '.\*kssl_mblk-getnextrecord_retmp.\*'   => 'a11c10',    # dark red

In the outputs above you might have noticed that a timestamp is printed when a probe fires. This is useful for pre-processing of the log file. dtrace(1) (or libdtrace to be more precise) does not sort events as they come from the kernel. (see CR 6496550 for more details) In cases when hunting down a race condition on multiprocessor machine having the output sorted is crucial. So in order to get consistent image suitable for race condition investigation a sort script is needed. You might use a crude script of mine or you can write yours :)

Technorati Profile

Tuesday Mar 27, 2007

Sound Blaster Live! on Solaris

During stressful days of libmd backport to Solaris 10 update 4 I managed to break my iPod mini by sliding on my chair backwards and running to get next coffee cup. (The catch was that I still had my headphones connected to the iPod on so the iPod fell down on the floor. It now only reports the unhappy face of Macintosh.)

Only after that I have realized how music is important for my day-to-day tasks. I cannot simply continue in the same pace as before without steady flow of rhythm into my ears.

Before buying another iPod I wanted to get some interim relief. This is when I entered the not-so-happy world of Solaris audio drivers. My workstation is Acer Aspire E300 which came with integrated TV card and whatnot but also with integrated sound card. The sound card is supported in Solaris but the level of noise coming from it was unbearable. (and I am no HiFi snob, listening mostly to 32kbps radio Akropolis streams) After some googling I have realized that there is a driver for Sound Blaster Live! 5.1 PCI card which I had in my Solaris test workstation at home. The driver was backported by J├╝rgen Keil (frequent OpenSolaris contributor) from NetBSD among other drivers.

The relevant part of prtconf -v output looks like this:

            pci1102,8027 (driver not attached)
                Hardware properties:
                    name='assigned-addresses' type=int items=5
                    name='reg' type=int items=10
                    name='compatible' type=string items=7
                        value='pci1102,2.1102.8027.7' + 'pci1102,2.1102.8027' + 
'pci1102,8027' + 'pci1102,2.7' + 'pci1102,2' + 'pciclass,040100' + 'pciclass,0401'
                    name='model' type=string items=1
                        value='Audio device'

It's quite easy to get it working:

  1. disable the on-board sound card in BIOS
  2. get the sources and extract them
      bzcat audio-1.9beta.tar.bz2 | tar -xf -
  3. compile (set the PATH if needed to get access to gcc/ar)
      export PATH=/usr/bin:/usr/sbin:/usr/sfw/bin:/usr/sfw/sbin:/usr/ccs/bin
      cd audio-1.9beta && make
  4. install the driver (audioemu) and its dependency (audiohlp) on x86:
      cp drv/emu/audioemu /platform/i86pc/kernel/drv
      cp drv/emu/amd64/audioemu /platform/i86pc/kernel/drv/amd64
      cp drv/emu/audioemu.conf /platform/i86pc/kernel/drv
      cp misc/audiohlp /kernel/misc
      cp misc/amd64/audiohlp /kernel/misc/amd64
      cp misc/audiohlp.conf /kernel/misc
  5. attach the driver (see instructions in drv/emu/Makefile)
    add_drv -i '"pci1102,2" "pci1102,4"' audioemu
  6. reboot

Yes, I could have used the infrastructure provided in Makefiles to create the package and install it but I wanted to have minimalistic install and have just the things which are really needed.

After the reboot you should be able to play sound via e.g. xmms. (installed e.g. from Blastwave) Check for audioemu in modinfo(1M) output and dmesg(1M) output for error messages if something goes wrong. So far it has been working for me rather flawlessly. (no kernel panics ;))

During the search for the driver I have discovered number of complaints from the users trying OpenSolaris for the first time that their Sound Blaster Live! was not recognized. Looking into Device drivers community not much is going on about soundcard drivers.
I wonder how hard would it be to get the audioemu NetBSD-based driver into ON.. the CR number is 6539690.

2007-08-08 update: After asking on opensolaris-discuss mailing list I have realized that there is a project underway which will deliver OSS into (Open)Solaris. Some info can be found in PSARC/2007/238, however there is no project page at (yet). Hopefully, RFE 6539690 will be closed after OSS integrates into ONNV.

Monday Nov 13, 2006

Simple Solaris Installation

OpenSolaris was already out before I joined Sun so I had a chance to play with Solaris Express Community release for couple of months and actually look into the source code online thanks to Chandan's OpenGrok. Still, with all these goodies it required fair amount of exploration before I figured things out. Being a BSD person some of them were not surprising, whereas some of them slowed me down substantialy.

I don't remember now the origin of the thought to summarize the installation steps and basic steps after installation to make the system more useable, this is not important. Anyway, the slides containing all of this information (and more) were made.
The slides were originaly meant for CZOSUG Bootcamp where people brought their laptops and installed Solaris on them. I have created the slides together with Jan Pechanec . After watching both external people and new-hires struggle with basic steps after completing Solaris installation I think they could be used also to ease those post-install how-do-I-set-this steps.

They are not perfect, could contain errors (please do report them) but here you go:

Also, do not forget to look at recently founded Immigrants community at which contains other goodies such as links to Ben Rockwood's Accelerated introduction to Solaris. Also do not forget to subscribe to the Immigrants mailing list.

Friday Nov 03, 2006

Cleartext for now !

I am Vlad, working as "Revenue Product Engineer" in security "technology-management" team. This means that my main job is to fix security bugs (both vulnerabilities and bugs in technologies) in Solaris.

My fixes usually have to do something with following technologies/products: IPsec, SSH, OpenSSL and crypto. In this blog I will try to present not only security technologies but also Solaris/OpenSolaris-related stuff.


blog about security and various tools in Solaris


« July 2016