Solaris Deployment and Kernel Development with Diskless Clients
By tpm on Aug 09, 2005
The Kernel Bringup Cycle
One of the things that characterizes bringup projects (particularly ones based on printf-debugging(!)) is a repeated cycle of fix-test-debug. This is different from incremental development. You're not working on one particular subsystem or tricky bug in a 99% working system, you're continually moving from subsystem to subsystem as the bringup progresses.
If you do disk-based bringup, to install the next version of the system, you have to boot that system using a working kernel, install the bits, then reboot to try them out. This can be quite a pain on a completely new machine, where a working kernel may not exist at all in which case you're continually recabling the disk. Even if you can boot it, after a while, you quickly get tired of listening to the disk clank and whirr, the BIOS chug through its tests to announce what identical things it has (re-)detected over and over again.
Diskless bringup using NFS is a lot faster and (once you tweak the configuration correctly) simpler. Instead you just place the new kernel bits you want to test onto an NFS server, then just boot the client machine. And of course booting diskless domains under Xen is even simpler because there's no BIOS involved at all - Xen's domain builder is vastly simpler and faster.
Once I/O happens over the network, you can easily observe what the client kernel is actually doing via snoop(1M), watching the first RARP and ARP attempts, through to the fully fledged NFS traffic between client and server.
Finally, of course, it's easier to work on the disk driver this way too, with a fully functional diskless system around you. That also helps as you don't place your boot image at quite the same level of risk as when testing your prototype driver.
One of the key things that Xen can do is transparent workload migration; that is the ability to move a running pure-virtual domain from machine to machine with almost imperceptible down-time. Diskless operation is a natural environment for exploring domain migration across a pool of machine resources in a data center, because of the various advantages of file-based protocols generally, and because that state is in storage across the network.
Diskless operation is also a means of managing the multiple OS images and patch levels for all the virtual machine environments that you might want to create on a pool of hardware resources. That is one of the biggest problems with large scale virtual machine deployments, and one of the problems that OS virtualization technologies like Solaris Zones neatly avoids.
While we're on this topic, I thought people might be interested in this little vignette. I attended a virtualization BOF at the Ottawa Linux Symposium a week or two ago; people from Red Hat, VMware and IBM spoke, but I was quite surprised when one of the IBM VM technologists stood up and said that Solaris Zones solved the problem of managing multiple OS images really well, and how their customers were asking for it, and how he wished the Linux community could extend projects like vservers to try to solve those problems in a similar way. Since IBM has been working on virtualization technologies for many, many years, and I have a lot of respect for their technology and experience in this area, I took that as quite a compliment to what we built in Solaris 10.
There is no "one-size fits all" virtualization technology; each has their own advantages and disadvantages. Eric Schrock wrote a great explanation of the relationship between the technologies i.e. where OS virtualization technology like Zones is useful, and where hardware virtualization technology like Xen can help. I strongly believe the two technologies are complementary, and will allow customers to provide a balance between utilization, isolation while keeping a lid on complexity for large-scale deployment.