Speeding to a Halt

On Sunday I committed changes into Nevada build 110 (and hence, into OpenSolaris 2009.next) to improve the speed at which OpenSolaris systems shutdown.  On a generic x86 test system, it used to take about 41 seconds to shut the system down starting from the time you typed init 6, shutdown(1m), or pressed the shutdown button in GNOME.  This form of shutdown is known as gentle shutdown because SMF takes care to stop each software service in dependency order.  In contrast, when you use reboot, halt, or poweroff, you're using what I'll call violent shutdown.  In the latter, the boot archive is brought up to date and that's about it.  It has traditionally been much, much faster than gentle shutdown.

This relatively long shutdown time has had an interesting effect: Solaris developers almost universally cheat, and use violent shutdown.  Typing reboot is terribly ingrained in my fingertips.  This is bad, because it means less test coverage for the shutdown method which actually matters to customers.  I recently began to be bothered by this because the GNOME gui uses init 6, and so shutting down via the nice shiny "Shut Down" button is also terribly slow.


On something of a whim, I dusted off our variant of the bootchart project which we ported in-house several years ago to get some initial information on what was happening during system shutdown.  Click here to see a graphical representation of a typical system shutting down (note: big image).  To read the image, note that time is on the X axis, from left to right.  Processes are represented as horizontal bars representing their duration.  At the rightmost side of the chart, the system has stopped.

In the image I've highlighted a couple of points of interest:

  • The pppd shutdown script seems to sleep 1, always, even if you aren't using ppp; since pppd isn't converted to SMF (bug 6310547), we will try to stop it on all systems on every shutdown.
  • The wbem service seems to sleep 1 while shutting down, and the the webconsole service takes a while to shutdown.  However, these services are present only on Nevada, and not on OpenSolaris, so I chose not to pursue trying to fix them.
  • The deferred patching script, installupdates is really slow.  And needlessly so-- it can run in a few milliseconds with a simple fix; I filed a bug.
  • There are some long calls to /usr/bin/sleep.  In the chart linked above, you can see svc-autofs, rpc-bind, and svc-syseventd each taking five seconds to stop.  Five seconds is a really long time!
  • There's a call to something called killall near the end of shutdown.  Then, 5 seconds later, another.  Then, 10 seconds later, things proceed again.  I wondered what the killall was all about?  Did it really need 15 seconds to do its work?

After a bit of effort (ok, a lot of effort), I've cleaned up these, and some other problems I spotted along the way.  It turns out that the five second delays are from some poor quality shell code in /lib/share/smf_include.sh:

smf_kill_contract() {
        ...
        # Kill contract.
        /usr/bin/pkill -$2 -c $1
        ...
        # If contract does not empty, keep killing the contract to catch
        # any child processes missed because they were forking
        /usr/bin/sleep 5
        /usr/bin/pgrep -c $1 > /dev/null 2>&1
        while [ $? -eq 0 ] ; do
...

Ugh. So this shell function rather bizarrely always waits five seconds, even if the contract empties out in 1/10th of a second!  I fixed this to have a smarter algorithm, and to keep checking at more frequent intervals (every 0.2 seconds).

I discovered that the calls to killall(1m) were really ancient history, and probably did not need to occupy 15 seconds worth of system shutdown.  I have shortened the interval substantially.

Another problem we faced was that, in the last moments before shutdown, startd runs some commands using system(3c).  This can be a problem if one of those commands, for some reason, wedges up.  So, I've replaced the calls to system with calls which timeout after a set number of seconds.  This is some nice insurance that the system continues to make progress as it shuts down.  Since I wound up with so much extra time available at shutdown, I've taken the chance to add a call to lockfs(1m) in an effort to get as much on-disk UFS consistency as possible.

So, here is the intermediate picture.  I've slightly revised the shutdown messaging, as well, to include a datestamp and a measurement of how long the shutdown took:

    svc.startd: The system is coming down.  Please wait.
    svc.startd: 83 system services are now being stopped.
    Mar  5 19:43:34 soe-x4100-3 syslogd: going down on signal 15
    svc.startd: Killing user processes.
    Mar  5 19:43:40 The system is down.  Shutdown took 17 seconds.

But wait, there's more!  On OpenSolaris, we don't have the time consuming wbem or webconsole services.  So, we can disable those and try again.  And, we use ZFS, for which the time consuming lockfs call at the end of shutdown is a no-op (on UFS, it takes at least two seconds).  This slimmed down stack results in an impressive shutdown time:

    Mar  6 02:51:51 The system is down.  Shutdown took 7 seconds.

And here is what it looks like.  If you want to see the complete set of changes, the codereview is also available.  As you can see, revising the way we kill off processes at the end of the life of the system is the big unrealized win.  And doing so would likely shave about 3 more seconds off, for a gentle shutdown of 4-5 seconds.  I ran out of time to do that this time around.

Some caveats:

  • You mileage may vary: you might run a different mix of services on your system, and perhaps one of those has a slow shutdown method which will gum up the works.  If you want to test how long a service takes to stop, try ptime svcadm disable -s <servicename>.
  • Your performance improvement is likely to be less dramatic on systems with less available parallelism.  Most of my test systems have two or four CPUs.

I should add a coda here: this work is greatly improved by recent bootadm performance work by Enrico.  While building the boot archive is still sometimes triggered on shutdown, it takes a lot less time than it did previously.

I had a good time working on this project; I hope you'll enjoy using it.

Comments:

Post a Comment:
Comments are closed for this entry.
About

Kernel Gardening.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today