Monday Apr 04, 2005

gopherblog

I'm conflicted. On one hand, George Hotelling's gopherblog1 brings me a nostalgic chuckle. On the other, I lament this fine, structured protocol being usurped in the name of a weblog. I have hazy recollections of days when http sites carried only esoteric physics information and pictures of peoples' cats, but gopher sites were useful. Caring about any of this certainly relegates me to a niche that shouldn't be offering opinions.


1If your browser is confused by the protocol, try firefox.

Thursday Mar 31, 2005

smf and webmin

While it is far from new news, webmin (1.190 or better) now groks smf(5), thanks to Sun's own Alan Maguire. Particularly cool is the tantalizing "Create New Service" button. It leads you through a set of questions which help you to create a simple daemon-style service. Two hints: it drops the manifest in /etc/webmin/smf/manifest.xml, and you need to use "Add" on each screen to get the changes to take ("Next" isn't sufficient). You can see my "foo" and "bar" services on my laptop in the screenshot which I quickly created with webmin. John Clingan wrote more about how to get the new version up and running. Alan says he's happy to get feedback about the smf(5) functionality he's added to webmin: you can mail him at firstname.lastname@sun.com, just like the rest of us.

Monday Mar 28, 2005

smf(5) presentation

I've posted the current slide set we're using for smf(5) presentations here at the mediacast.sun.com site. The presentation is currently focused on using and administering a Solaris system with smf(5). There are only a few developer slides, but I expect to post a more comprehensive presentation about smf(5) development eventually.

If you're near the San Francisco bay area and are interested in seeing me talk to some variant of these slides live and in person, plan to attend the inaugural meeting of the OpenSolaris User Group on April 26th, 2005 at the Sun Santa Clara campus. I'll post more details as I get them, so watch this space.

Wednesday Mar 23, 2005

what services do I have?

Discovering services available on your system is really easy. A few interesting questions to ask:

  1. What services are enabled/running? svcs(1) with no options answers that easily:

    $ svcs
    ...
    online         Feb_04   svc:/network/ntp:default
    online         Feb_04   svc:/network/service:default
    online         Feb_04   svc:/application/x11/xfs:default
    online         Feb_04   svc:/application/font/stfsloader:default
    ...
    
  2. What services are available? Just ask svcs(1) to list all services, including the disabled ones:

    $ svcs -a
    disabled       Feb_04   svc:/system/metainit:default
    disabled       Feb_04   svc:/network/rpc/nisplus:default
    disabled       Feb_04   svc:/network/nis/server:default
    
  3. What do these available services do anyways? Again, just ask svcs(1). This time, get the service description too:

    $ svcs -a -o FMRI,DESC
    svc:/milestone/name-services:default               name services milestone
    svc:/platform/i86pc/kdmconfig:default              Display configuration
    svc:/system/cron:default                           clock daemon (cron)
    
  4. And how do I find out more about the service I'm interested in? svcs gives useful information with both the -x and -l options. The manpage references in svcs -x are particularly helpful. We'll be adding those to the -l output as well.

    $ svcs -x system-log
    svc:/system/system-log:default (system log)
     State: online since Fri Feb 04 19:30:11 2005
       See: syslogd(1M)
       See: /var/svc/log/system-system-log:default.log
    Impact: None.
    
    $ svcs -l system-log
    fmri         svc:/system/system-log:default
    name         system log
    enabled      true
    state        online
    next_state   none
    state_time   Fri Feb 04 19:30:11 2005
    logfile      /var/svc/log/system-system-log:default.log
    restarter    svc:/system/svc/restarter:default
    contract_id  51 
    dependency   require_all/none svc:/milestone/sysconfig (online)
    dependency   require_all/none svc:/system/filesystem/local (online)
    dependency   optional_all/none svc:/system/filesystem/autofs (online)
    dependency   require_all/none svc:/milestone/name-services (online)
    

Wednesday Mar 16, 2005

smf(5) and fault isolation on Solaris 10

In addition to trying to improve the service deployment and administration model in Solaris 10, smf(5) (also known as Solaris Service Manager) works hand-in-hand with the Solaris Fault Manager (also known as fmd(1M)) to isolate and recover from faults1. The Fault Manager handles detecting and predicting hardware faults, including retiring bad hardware when faults are predicited/detected. That's a very simplistic description of a very sophisticated suite of software, but hopefully it is enough for me to continue the smf(5) part of the discussion.

In earlier versions of Solaris, we could detect hardware faults, but not always recover from them. I'll focus on memory errors here, which can occur either on your physical memory, or on the cache that's part of the CPU module. Either way, memory can go bad. It can generate either a correctable error, or an uncorrectable one. Solaris has always recovered gracefully from correctable errors. They're handled by the kernel and never seen by a user process. But, uncorrectable ones mean that we can't find a good copy of the data. The error can occur either in the kernel's address space or in a user process's address space. An error in kernel address space means that we need to panic the kernel immediately. An error in user space can be dealt with more gracefully. As we know which process the error effected, we can kill it before it causes any more damage. However, what we didn't know in previous versions of Solaris were the relationships between user processes. As we didn't know if the corrupted/absent memory in one process would cause corruption in another process which was cooperating very closely with the one that received the error, we had to gracefully (via init 6) take the entire system down.

In Solaris 10, fmd(1M) can take hardware that's about to have a failure offline in advance of that failure, or after that failure occurs. But, when a failure does slip through it is smf(5)'s job to know the relationships between processes/services on the system. There are two main types of relationships:

  1. processes part of the same service / fault boundary, and

  2. services which depend upon each other.

To track processes as part of the same service, the smf(5) restarters write process(4) contracts to be able to receive events on a group of related processes. Certain types of events can be classified as important:

  • empty - the last member of a process was killed

  • fork - a new process was added to the contract

  • exit - a member of the contract exited

  • core - a process dumped core

  • signal - a process received a fatal signal

  • hwerr - a process was killed due to an uncorrectable hardware error

Each of these events is detected by the kernel, and then passed on to the contract owner. In the specific case of hwerr, if an uncorrectable hardware error does occur in a user process the kernel detects it and kills the process where the error occurs, just like in Solaris 9. What differs in Solaris 10 is that we no longer need to restart the system -- with smf(5) and contracts, we can just restart the "associated processes".

I was planning on a separate post about contracts, but it seems like much of the back information is necessary to explain the architecture here. Bear with me. Maybe I'll write a followup post which gives only the higher level view of our fault isolation features in Solaris 10.

Contracts are written with three types of event sets: informative, critical, and fatal. Informative and critical only differ really in the guarantees about event delivery. Fatal means we kill off all processes in the contract if a fatal event is received. smf(5) puts the hwerr event into the critical event set. A few things to look at here. First, I can find out about contract and process relationships using:

   $ ptree -c `pgrep sendmail`
   [process contract 1]
     1     /sbin/init
       [process contract 4]
         7     /lib/svc/bin/svc.startd
           [process contract 513]
             18676 /usr/lib/sendmail -Ac -q15m
             18678 /usr/lib/sendmail -bd -q15m

You can see that sendmail is in contract 513. Using that information, you can look at the terms of the contract:

   $ ctstat -vi 513
      CTID    ZONEID  TYPE    STATE   HOLDER  EVENTS  QTIME   NTIME   
      513     0       process owned   7       0       -       -       
              cookie:                0x20
              informative event set: none
              critical event set:    hwerr empty
              fatal event set:       none
              parameter set:         inherit regent
              member processes:      18676 18678
              inherited contracts:   none

That output confirms what I described: hwerr is in the critical event set. If there's a hwerr in either of the sendmail processes, the contract owner (7, svc.startd as you see above) will get a critical error. svc.startd then responds to the error by stopping the service, and restarting it if possible. Thus, when an uncorrectable memory error occurs in a process managed as an smf(5) service, smf(5) is able to detect an uncorrectable memory error in a process, and repair it by restarting the service2. That handles the first relationship type I described above -- processes related as part of the same service / fault boundary. So, how about service relationships?

Service relationships are managed by smf(5) dependencies. Most dependencies are used to specify startup order, by using grouping=require_all and restart_on=none. However, you can also specify that a service is restarted if its dependency experiences any type of error (hardware error, core dump, etc.). You do this by using restart_on=error as opposed to none. Then when the dependency is restarted due to that error, your dependent service will be too. Pretty simple.

The astute observers will note that I haven't described how those nasty uncorrectable errors are handled for processes that aren't explicitly part of an smf(5) service. How does Solaris know what to do if you didn't write a service manifest to describe how faults should be handled?

All processes are part of a process contract. If no software creates a new contract, the process is in the same contract as its parent. The default terms for a contract are not the same as what svc.startd uses. Instead, the default process contract is written such that hardware errors are fatal. Remember, that means all processes in the contract are killed if any process sees an uncorrectable memory error. svc.startd also helpfully puts each legacy-run service in its own contract. Thus, if any processes launched out of a legacy-run service (e.g. vold or dtlogin) fall victim to an uncorrectable memory error, all processes in the contract will be killed.

   $ ptree -c `pgrep vold`
   [process contract 81]
     481   /usr/sbin/vold
   $ ctstat -vi 81 
   CTID    ZONEID  TYPE    STATE   HOLDER  EVENTS  QTIME   NTIME   
   81      0       process orphan  -       0       -       -       
           cookie:                0
           informative event set: core signal
           critical event set:    hwerr empty
           fatal event set:       hwerr
           parameter set:         none
           member processes:      481
           inherited contracts:   none

Note that for vold's process, hwerr is in the fatal event set. But, since there's no service manifest to tell Solaris how to deal with the legacy-run service, we can't restart it. That's one of the reasons why even though we do provide compatibility for legacy services, we strongly suggest folks take the time to do a quick conversion of their service to smf(5).

Finally, what does this mean for hardware faults inside zones? As a zone doesn't have a kernel of its own, an uncorrectable memory error in the kernel still means that the entire system goes down. However, each zone has its own copy of smf(5) inside which is completely separate from the other zones on the system. As smf(5) runs inside the zone as well, faults are handled inside the local zone the same was as they are in the global zone. There's no need to isolate the fault to the zone because we isolate the fault to a finer granularity -- the service. smf(5) and zones are highly complementary technologies.


1Mike covers the topic of self-healing systems and Solaris' approach to self-healing in greater detail in his ACM Queue article. Read it if you want a more comprehensive architectural view, rather than the smf(5) implementation/day-to-day use view I provide here.

2If you've specified the following with your service manifest, you've told smf(5) that you don't care about what happens to the processes that your start method starts up.

   <property_group name='startd' type='framework'>
      <propval name='duration' type='astring' value='transient' />
   </property_group>

We provided this functionality for configuration services which need to tell smf(5) that they don't have processes that need to be restarted if they fail. Basically, no processes in the contract isn't an error. But, this has also (understandably) been abused to shoehorn legacy services which may or may not have processes running when their start method exits into smf(5). Even Sun is guilty of some of these. svc:/network/initial may start up a number of daemons on your Solaris 10 system, but you don't see them under svcs -p. That's because the duration property is set to transient. You can see this with:

   $ svcprop -p startd/duration network/initial
   transient

svc.startd believes there are no important processes to worry about restarting, so it doesn't track them under svcs -p, and won't restart the service if one of the processes is killed due to an uncorrectable memory error. We're properly ashamed of the partial conversion that was done with network/initial and a few other services, and are working on fixing them. But, if you want the processes in your service to be restarted on failure, don't set startd/duration to transient.

Thursday Mar 10, 2005

Maltby on error handling in Solaris

Gavin's talking about Solaris and its error handling in his blog. Go, read.

His most recent entry talks about the philosophy behind error handling, but his previous post is nice too (though, I'm perhaps just narcissistic and happy to see SMF get a mention), and I'm already looking forward to future posts. If his internal communication is anything to go by, I'll often be reading while thinking "hey, that's a much clearer description of what I was trying to convey".

Tuesday Mar 08, 2005

new mnttab entries in Solaris 10

I'm still delinquent on a number of smf(5) entries. But, here's a quick one that's at least smf(5)-related. Based on a few pieces of internal mail, it seems like lots of folks out there are asking about the new filesystems that appeared on Solaris 10 when they typed mount. An excerpt from one of my systems:

/ on /dev/dsk/c1d0s0 read/write/setuid/devices/intr/largefiles/logging/xattr/onerror=panic/dev=1980040 ...
/devices on /devices read/write/setuid/devices/dev=4380000 on Fri Feb  4 19:29:50 2005
/system/contract on ctfs read/write/setuid/devices/dev=43c0001 on Fri Feb  4 19:29:50 2005
/proc on proc read/write/setuid/devices/dev=4400000 on Fri Feb  4 19:29:50 2005
/etc/mnttab on mnttab read/write/setuid/devices/dev=4440001 on Fri Feb  4 19:29:50 2005
/etc/svc/volatile on swap read/write/setuid/devices/xattr/dev=4480001 on Fri Feb  4 19:29:50 2005
/system/object on objfs read/write/setuid/devices/dev=44c0001 on Fri Feb  4 19:29:50 2005
/lib/libc.so.1 on /usr/lib/libc/libc_hwcap2.so.1 read/write/setuid/devices/dev=1980040 on Fri ...
/dev/fd on fd read/write/setuid/devices/dev=4680001 on Fri Feb  4 19:29:58 2005
/tmp on swap read/write/setuid/devices/xattr/dev=4480002 on Fri Feb  4 19:29:59 2005
/var/run on swap read/write/setuid/devices/xattr/dev=4480003 on Fri Feb  4 19:29:59 2005

Most of the new mounts we've added this release are dynamic filesystems which reflect kernel state. These include ctfs(7FS) (used extensively by smf(5)) and objfs(7FS). Like procfs, they're truly dynamic and are generated by the kernel on each boot. There's no need to include them in system backups.

The libc loopback mount is pretty nifty. Once Solaris has booted far enough, it looks at the hardware capabilities of the system, including what instruction sets it supports. Then it loopback mounts a customized libc which can take advantage of all the performance of the specific chip we're using (if such a customized library is available). Right now, we use this on a set of x86 and x64 systems. See moe(1) for more info on the $HWCAP capabilities. Darren also talks about this in more detail.

devfs(7FS) makes the /devices namespace fully dynamic. If you aren't doing things like using chmod(1) to change device permissions, there's no need to back it up either. Recovery to a new system will be easier if you don't back it up. You can use /etc/minor_perm to specify device permissions without using chmod. See add_drv(1M) for more details.

Thursday Feb 24, 2005

CEC2005

There was a bit of an internal push for folks to externally blog about CEC2005, the Sun internal Customer Engineering Conference happening this weekend. I don't engineer customers, but I hope I engineer (I use the term loosely here -- software developers are not Professional Engineers) for customers. Still, the sales and service engineers who run the conference will allow me the opportunity to speak about SMF at some really early hour Monday morning.

Still, with all of the technical-content talks that happen at CEC, it's too bad we don't have an external conference to focus on Sun-specific, or Solaris-specific technical information. Would people go? Certainly JavaOne is a raging success -- but the product I work on isn't Java-specific or even Java-focused. I'd be a fish out of water there. In the meantime, we'll keep pushing to attend and submit papers for usual non-Sun conferences. Hopefully a few SMF developers will be able to attend USENIX and LISA this year.

In the meantime, it's good that some CEC2005 attendees will be taking time to blog about the information presented there. Keep an eye out for posts on blogs.sun.com.

Thursday Feb 17, 2005

the Liane community

I hadn't logged into my Orkut account for a while, but it was time to accept connections of a few more friends over lunch. Poking around at my scraps revealed that there is an entire community there of "Liane"s. In Portugese. Seems that what I thought was an uncommon name is not entirely rare among Brazilian women. Nobody will ever mistake me for Brazilian (too bad, really), it's been a few years since I visited Brazil (highly recommended, by the way -- Rio is a gorgeous, friendly, vibrant city), and my Portugese is nonexistant. Still, maybe I'll pull out the Portugese-English dictionary again and join the forum.

We'll never compete in numbers with the vast Dave conspiracy 1, but the idea of a Liane-specific support network is still amusing.


1If you work in tech in the US, there's always at least one Dav[e|id] in close proximity. Usually multiple. I currently work regularly with six of them, rendering the usual technique to differentiate between two useless. Only one of them blogs, further validating the hypothesis of a vast underground Dave network. Clearly they're not ready to go public yet.

Wednesday Feb 16, 2005

powder 8s and a touch of altitute sickness

After 2 years of burn-the-candle-at both-ends effort on smf, I'm now trying to catch up on at least a little of my neglected vacation time. I had a few days in London (prior to some internal Solaris 10 training), which I'll try to remember to blog about once all my pictures are back. This weekend, my boyfriend and I made a short hop to Salt Lake City for my first time skiing in Utah.

Sunday was beautiful and sunny at Alta and Snowbird, even if the snow was a little heavy. Monday was a bit windy and clouded over, but with the reward of consistent fresh snow. Despite some altitute sickness, we spent the day making figure eights in the powder at Snowbird. Nowhere near good enough to play competitively, but a pretty nice valentine's day nonetheless. I wanted to take a few pictures of our tracks from the lift, but the snow was consistent enough to cover them up every time. A small price to pay for fresh tracks nearly every run.

I'll catch up with a new smf entry here soon.

Friday Feb 11, 2005

exercise in machismo

Some time ago I had to stop using my beloved 1982 Fiat Spider as a daily driver. It is an aging Italian sportscar, so a minor tantrum was to be expected from it once a year or so. But, getting to work was becoming an increasingly stressful endeavor. So, a hunt for a new car began.

Many hours were spent talking me out of waiting for the less-than-reasonable new Elise about to be released in the US. Eventually, I had to cede to the logic of a compromise car. Practical, reliable, maintainable, and even used. I've never been a fan of the Miata's handling, and while the Honda S2000 had the sweetest little engine you've ever heard, its lackluster steering feel would have always left me wanting. I've always loved the BMW M-coupe's looks, and a test drive confirmed everything I'd read about its performance. Lots of power and perfectly predictable to handle. But, no convertible. That was a non-starter.

After a month or two of foot-dragging, I managed to swallow my pride and climb into an M-roadster. Ok, it can probably be forgiven for looking like the Z3. All the power of the M-coupe, but plenty of body twist just waiting to jump out and bite you at the most inopportune moments. What a brute! Still, that's a lot of the appeal, and there were a few reasonably priced low-mileage examples to be found. So, we jumped in and bought one of those examples.

What prompted this useless anectdote? A friend sent along a link to a review from the Car Talk guys. They've, as usual, got it pretty spot on. Fortunately, I haven't run afoul of the law with the beast yet.

Thursday Feb 10, 2005

contributed service manifests

Folks are starting to write service manifests for various pieces of software. Figured it might be useful to collect a few here. If I've missed something useful (and I'm certain I have), feel free to drop me a line and it'll be included.

So far, one of my favorites is an article by Peter Tribble and Geoff Gibbs about creating an smf service for postfix. Coolest part is that they actually noticed what we intended and created postfix as an instance of the svc:/network/smtp service. It contains a great step-by-step set of instructions in addition to the manifests and method files. Nice.

Chris Gerhard posted this description for postgres.

Trevor Watson's done some work with squid. [2/11/2005] Wow, and he's already done another one for samba.

Michael Hunter contributed an entire package for freeradius to blastwave including a service manifest if you're running it on Solaris 10. If you're a Solaris user who hasn't heard of blastwave, check it out. It's an invaluable resource.

Wednesday Feb 09, 2005

what's with these "legacy" services anyways?

Most the questions I get about smf(5)'s continued execution of init.d scripts are along the lines of "for how long will you continue to support them?". The answer to that common question is: for the foreseeable future. We haven't yet, and have no near-term plans to file the EOF that would be necessary to remove support for init.d startup scripts. But, one reader (reader? Who am I? Dear Abby? Sheesh, sorry guys.) sent me mail to ask:

> With all of the goodness in smf, why do we still have "legacy"
> Solaris services?
> 
> I can understand having both systems so that there can be a
> transition from the old to the new for those who have invested a
> lot of work into their init scripts.
> 
> I'm afraid I must be missing something.

Not really. We did keep around compatibility for the legacy init.d scripts so that Solaris customers and ISVs don't have to do a bunch of work (it really should be no work, given the Solaris compatibility guarantees) to have their software work on Solaris 10. Those who want to realize the benefits of smf(5) can write a simple service description for their software. Those who can't yet fit it into their schedules don't have to. As mentioned in the quote, it eases the transition.

But, why isn't all of /etc/rc?.d empty for Solaris 10 as Sun delivers it? Honestly, because we just didn't have the time before Solaris 10 shipped. Our team was pretty small, and we didn't get the word out as well as we'd have liked within Sun. We did a bunch (well over 100) ourselves, but we also often need help from the people who own the specific services. Those of you on the Solaris Express train probably noticed smf(5) was a pretty late addition to the release. But, now that it is in the release, our job in selling the benefits and helping folks to convert is a lot easier.

That said, any of you out there under a support contract can certainly help us improve. If you've got a favorite Sun-delivered service that isn't under smf(5) control that you think would benefit, let us know through your standard channels! While I'm happy to file RFEs (that's Request For Enhancement) to get this work done, actual reports from customers carry (justifiably!) more weight than internal requests. You can also comment on this blog entry to let me know about any services in Solaris that you think are particularly important to get under the aegis of smf(5), and I'll see what I can do.

Tuesday Feb 08, 2005

smf repository design and implementation choices

A recurring concern about smf(5) is the configuration repository. To some folks, it resembles the Windows registry too much for comfort. Rather than trying to contrast with the Windows registry or other registries such as GConf, I thought I'd talk about the design choices we made when deciding how to implement the smf(5) repository. Below is the high-level list of design criteria. It may not be complete, but captures a reasonable amount of what we were thinking when designing the repository.

  1. Transactional.

    All of smf(5) is designed to be completely restartable from the ground up. Do you doubt? First, try killing all user processes (kill -9 -1) on a non-critical Solaris 9 system -- one that nobody's using, please:

          wands console login: root
          Password:
          Feb  7 13:26:30 wands login: ROOT LOGIN /dev/console
          Last login: Tue Feb  1 14:44:40 on console
          Sun Microsystems Inc.   SunOS 5.9       Generic January 2003
          # ptree
          59    /usr/lib/sysevent/syseventd
          73    /usr/lib/picl/picld
          130   /usr/sbin/in.routed
          149   /usr/sbin/rpcbind
          152   /usr/sbin/keyserv
          162   /usr/lib/netsvc/yp/ypbind -broadcast
          178   /usr/sbin/inetd -s
          199   /usr/lib/nfs/lockd
          201   /usr/lib/nfs/statd
          202   /usr/lib/autofs/automountd
          214   /usr/sbin/syslogd
          222   /usr/sbin/cron
          227   /usr/sbin/nscd
          240   /usr/lib/power/powerd
          251   /usr/lib/utmpd
          263   /usr/sadm/lib/smc/bin/smcboot
            270   /usr/sadm/lib/smc/bin/smcboot
            271   /usr/sadm/lib/smc/bin/smcboot
          268   /usr/lib/im/htt -port 9010 -syslog -message_locale C
            275   htt_server -port 9010 -syslog -message_locale C
          285   /usr/lib/sendmail -bd -q15m
          286   /usr/lib/sendmail -Ac -q15m
          311   /usr/dt/bin/dtlogin -daemon
          312   /usr/lib/snmp/snmpdx -y -c /etc/snmp/conf
            315   mibiisa -r -p 15488
          323   /usr/lib/dmi/dmispd
          324   /usr/lib/dmi/snmpXdmid -s wands
          329   /usr/sbin/vold
          336   /usr/lib/saf/sac -t 300
            339   /usr/lib/saf/ttymon
          337   -sh
           1234  ptree
          340   /usr/lib/ssh/sshd
          # kill -9 -1
    
          wands console login: root
          Password:
          Last login: Mon Feb  7 13:26:30 on console
          Sun Microsystems Inc.   SunOS 5.9       Generic January 2003
          # ptree
          1235  /usr/lib/saf/sac -t 300
            1238  /usr/lib/saf/ttymon
          1236  -sh
            1242  ptree
          # 
          

    You've really little chance of recovering your system without a reboot. Note that even init(1M) has disappeared. Restarting it manually won't even do the trick, as it doesn't maintain its process table in a persistent place in Solaris 9.

    Now try the same kill -9 -1 on a non-critical Solaris 10 system. Again, all user processes are killed, including init, svc.startd, svc.configd, inetd, and everything else. You'll be logged out, but log back in and poke around. You'll see that nearly the entire system comes back (services started by their legacy init.d scripts won't, though). Based on doing this experiment now, I'll be filing bugs against a few services, but the experiment is generally successful. All of the core daemons I mentioned dying have returned (we kill them individually as part of our standard testing), and we give restarting all services a good college try.

    In order to implement restart from the kernel up completely, we needed a transactional backing store for all of our service information, including things like service state. If any of our daemons die halfway through an operation, they need to pick up where they left off when they return. Thus, the repository must be transactional to allow us to implement recoverability.

  2. Typed.

    We want to be able to validate that configuration information is at least of the appropriate form. In the future, we expect to be able to do even further validation than just on the type.

  3. Single point of access.

    We wanted all configuration and runtime data access through a single API that can be maintained across release boundaries. Flat file administration tools usually allow access through multiple mechanisms -- e.g. editing the file directly or using an admin GUI which edits the file. This type of access reduces the ability to write event-based APIs -- e.g. "tell me if this service has changed configuration or state". While we don't have many of those APIs yet, they're coming. If we allowed vi as a tool to manipulate the repository, there's no precise way to provide the notification API. A single API also decreases the time to write layered administrative tools.

  4. Access control.

    Allow a subset of configuration changes to be safely delegated to non-root users, without requiring that all configuration changes be allowed by those users. We, however, didn't include provisions for configuration data to be hidden from unprivileged users or applications. While modification is protected, reading is not.

  5. Layerable

    Our configuration store must be designed to support a mix of configuration (with overrides) at the network level (shared among many machines) as well as at the local machine. This isn't here in Solaris 10, but we've designed to allow configuration to span multiple machines. It was important our initial implementation didn't impede that goal. It's easy to imagine that our underlying storage format for local data might not be the same as that for network data.

  6. Service/instance model specific.

    This is more of a non-goal, but, we didn't want to design for a general data storage model. We wanted to constrain ourselves to the general service/instance schema that we've designed for smf(5). That isn't to say the schema couldn't be abused for other data, but we didn't design to make it easy. I realize that perhaps nobody's done the service/instance split blog entry yet. I mentioned it in passing in the service developer intro, but will try to write a dedicated entry later.

  7. Rollback.

    Allow administrators to easily revert to previous configuration versions. This is sometimes solved manually with a revision control system (e.g. SCCS) and flat text files.

  8. Checkable consistency.

    We should be able to confirm on startup that at least the format of the system's configuration data looks sane. Obvious filesystem corruption should be flagged explicitly rather than parsed as lack of or incorrect configuration.

  9. Fast.

    It's pretty tricky to implement a structured, typed, and transactional common store as flat files that's still quick enough for the state-change updates we need to do. Binary format files is usually the way to go. Some other projects doing parallel startup only use a binary cache of plain text files, but that doesn't handle the other design criteria we had. I'm sure there will be comments telling me more precise projects that have solved this problem using plain text files, but the ability to leverage other code can decrease development time.

  10. Endian-neutral export.

    Allow export of all configuration data in an endian-neutral format, so that configuration can easily be moved from machine to machine, regardless of architecture. An easy way to marshall the data out of a machine-specific format and into a standard format (e.g. XML) was considered sufficient.

  11. Embeddable.

    Any open-source solution used must be in a commercial product without licensing/royalty issues. Obviously, writing something ourselves easily gets around this constraint, but that would be a pretty significant additional investment over the implementation we already needed to do for smf(5).

Based on this set of design considerations, we decided on a four-part scheme.

  1. Service Manifests: XML service descriptions provide a transportable way to deliver individual services. No knowledge of the underlying data format nor the full service creation API is necessary for simple service delivery.

  2. libscf: A library provides the fundamental API which all tools can build on. In addition to providing transactional create/change/delete semantics, this also allows us to write tools which also dump the repository in a standard format. Try svccfg archive (re-direct the output to a file) to dump existing configuration of all services and instances in our standard XML manifest format. While it doesn't contain things like snapshot information, it does provide all the information that's necessary to restore a system to its current configuration.

  3. svc.configd: A daemon to manage the data store, providing a single point of access to underlying data for security, layering, etc.

  4. Repository/SQLite: A back-end transactional database to provide file-level storage for smf(5) configuration data. To be precise, we've got 2 backing databases. One is for persistent property groups which, well, persist across system restart. It's located in /etc/svc/repository.db. The other is the non-persistent properties, such as states, which don't need to be kept across system restart. The non-persistent database is kept in /etc/svc/volatile/svc_nonpersist.db

We decided to use SQLite for the local-repository implementation because there was simply no need to re-invent the wheel and implement a transactional database ourselves. SQLite fit all of our other design criteria. However, we haven't exposed that implementation in the interfaces. If SQLite no longer fulfills all of our requirements, we'll change to using a different underlying implementation. Existing code based on libscf(3LIB) or svcprop(1) will continue to work unmodified. That's the nice thing about hiding the data format behind a set of standard interfaces.

By the way, now that I've pulled back the covers on our implementation I should give the warning: direct access to the underlying repository is completely unsupported. If you scrog your repository using direct (sqlite) access, you're on your own. If you'd like to take a copy of repository.db and poke around in it, go for it! But, don't muck with the running copy lest you end up with a 'repository corrupt' message.

Friday Feb 04, 2005

smf milestones, runlevels, and system maintenance

A number of questions about smf(5) milestones have been surfacing lately, so I'll try to give a summary of the topic and answer a few common questions here.

An smf(5) milestone is really nothing more than a service which aggregates a bunch of service dependencies. Usually, a milestone does nothing useful itself, but declares a specific state of system-readiness which other services can depend upon. One example is the name-services milestone. It simply depends upon the possible name services you might be running:

   $ svcs -d name-services
   STATE          STIME    FMRI
   disabled       Jan_04   svc:/network/rpc/nisplus:default
   disabled       Jan_04   svc:/network/dns/client:default
   disabled       Jan_04   svc:/network/ldap/client:default
   online         Jan_04   svc:/network/nis/client:default

and has no useful actions to perform during the start or stop method:

   $ svcprop -p start name-services
   start/exec astring :true
   start/timeout_seconds count 3
   start/type astring method

   $ svcprop -p stop name-services 
   stop/exec astring :true
   stop/timeout_seconds count 3
   stop/type astring method

The name-services milestone is considered online as long as any name services which are enabled are running. There's also nothing different about these milestones to smf(5), it just sees them as yet-another-service.

We've implemented standard Unix system run-levels in smf(5) using milestones. The single-user, multi-user, and multi-user-server milestones correspond to run-levels S, 2, and 3, respectively. In addition to the runlevel milestones, there are the all and none keywords. These aren't actual services, but shorthand for either the graph with no services, or the graph with all services. This set of five special milestones can either be booted directly to (boot -m milestone=) or reached by running svcadm milestone. As mentioned in a previous entry, the way we reach a limited milestone (any special milestone but all) is to temporarily disable all services which aren't part of the milestone's subgraph.

A common question is why the console-login service is disabled if you boot to a milestone that isn't all. This can easily be determined by looking at console-login's dependents.

   $ svcs -D console-login
   STATE          STIME    FMRI

As there are no milestones which have console-login as one of their dependencies, it won't be started as part of any milestone but all. Fortunately, we'll always start an sulogin(1M) prompt if a login service can't be reached.

So, why are milestones useful then? The most useful milestone is none, for the recovery/exploration scenario I described here. The other use is when doing service development. You can use svcadm milestone to transition to limited milestones then back up without rebooting the system.

There's a large omission in my description of milestone use above. I don't mention system maintenance or patching anywhere. A very common question is: Should I stop using init s, boot -s, and my other standard procedures to change runlevels and perform standard system maintenance? Emphatically, no! Your old favorite commands continue to work as they always have. There's no need to change procedures. There's no reason to retrain your fingers with a much longer-to-type command when init s works just fine. The init invocations will work just like they always have, where svcadm milestone won't. For example, running svcadm milestone svc:/milestone/single-user:default won't change the run-level of the system (as described by who -r). Running init s will.

About

Liane Praza

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today