More on Bootchart for Solaris

So Eric Let the cat out of the bag on our reworked BootChart for Solaris. Eric posted one image; here is another: the bootup of a single Solaris Zone. I'm pretty happy with this, as we boot in only about 7 seconds.

This was an interesting experience because Eric and I had not previously worked together very closely. I had a great time doing this, and because Eric immediately stuck everything into a Teamware workspace, we were able to work simultaneously. Eric and I both worked on the D scripting, and somehow a joke about "wouldn't it be nice if the script output XML?" turned into our strategy. This turned out to be a good decision, as I hate writing parsers; instead we just let SAX do the work. We were able to maintain the split of having boot-log generation in one self-contained component, and the log processing into another. Because the XML logs are in an easily parsed format (as opposed to parsing the output of top and iostat), they can be useful to anyone doing boot analysis on Solaris. We've already had some such requests. I'm sure Eric will have more to say about the implementation so I'll leave it to him, except to say that some of the visual design changes can be blamed on me, taking inspiration from Edward Tufte's work.

Something else which fell out of this experience is that it's easy to use the log gatherer on any collection of processes which start up (as we saw in the zones example, above). We hope that this will be helpful in highlighting performance wins in the startup of complex ISV programs.

Following the experience of Linux developers, we've also found a series of bugs in Solaris with this tool. Let's start with an easy one, and the first one I found. The bug is visible in this chart from xanadu, my Shuttle SN45G4. Because we don't have support (sadly) for the on-board ethernet on this box, I had inserted another network card (I won't name the vendor, as I don't want to put them on the spot). If you look carefully at this bootchart, you'll see that the ifconfig process is spending a lot of time on the CPU. What's up with that? A brief investigation with DTrace made it clear that the driver had code like the following (shown here reduced as pseudo-code) in the attach(9E) codepath (attach is the process by which a driver begins controlling a hardware device):

    for (i := 0 to auto_negotiation_timeout) {
        if auto_negotiation_complete()
            return success;

Which all looks fine except that wait_milliseconds() (a function defined by the driver) is a wrapper around drv_usecwait(9F) ("busy-wait for specified interval"). Busy is of course the problem. drv_usecwait is really more about waiting for short intervals for various information to become ready in various hardware registers. Busy-waiting 100 milliseconds at a time is practically forever, and ties up the CPU just spinning in a loop. The authors almost certainly meant to use delay(9F). I filed a bug, and hopefully we'll have it fixed soon (since this driver comes to us from a third party, they request that we let them make the code changes). Fun, eh?

Another two issues we spotted concern inetd, which has been rewritten from scratch in Solaris 10; it is now a delegated restarter, which basically means that it take some of its direction from the system's master restarter (svc.startd). The behavior we noticed sticks out on any of the boot charts, including the zone boot chart mentioned above: inetd is starting a lot of very short-lived ksh processes. Why? When I first spotted this, I used DTrace to work out the answer, as follows:

# dtrace -n 'proc:::create/execname=="inetd"/{ustack();}'
restart inetd
  0  12188                     cfork:create 
Ahh, so we stumble upon an (embarrassing) implementation artifact of libc-- it uses ksh to help it implement the libc routine wordexp(3c). So, every time inetd needs to wordexp() something, we wind up running a ksh. We can also see that this is not severely impacting performance, but we would like to get this fixed. Personally, I'd like to see wordexp() fixed to not rely upon ksh at all.

Another somewhat more subtle issue is something that SMF engineers like Liane are still looking at. It's also visible in the zone boot chart. It appears that some services (such as nfs/client) are delayed in coming on-line because it is taking the startd/inetd cabal a while to mark their dependent services as online, even though that shouldn't really entail much work. We can see this as follows:

$ svcs -l nfs/client
fmri         svc:/network/nfs/client:default
name         NFS client service
dependency   optional_all/none svc:/network/rpc/keyserv (online)
dependency optional_all/none svc:/network/rpc/gss (online)
dependency require_all/refresh svc:/milestone/name-services (online) $ svcs -l network/rpc/gss fmri svc:/network/rpc/gss:default name Generic Security Service enabled true state online next_state none state_time Fri Jan 07 18:22:32 2005
restarter svc:/network/inetd:default
dependency require_all/restart svc:/network/rpc/bind (online) dependency optional_all/none svc:/network/rpc/keyserv (online)
Since nfs/client depends upon GSS, it can't be started by startd until GSS is online. Liane and Jonathan have offered up some theories about why this is happening, but we've all been engaged on higher priority work, and so we haven't had much time yet to dig deeper. This serialization point appears to be costing us roughly 1 full second on boot, so it's something we need to look at further. Have a great weekend folks!


Post a Comment:
  • HTML Syntax: NOT allowed

Kernel Gardening.


« July 2016