ZFS, Live Upgrade and Flash Archive - happy together at last
By Jsavit-Oracle on Sep 25, 2009
Nothing really controversial today - just some pleasant experiences with relatively recent enhancements to Solaris.
One of the really great features of Solaris 10 is ZFS, described here, here, and many places, such as http://blogs.oracle.com/. ZFS really is a tremendous advance in filesystems - an elegant solution providing long-needed improvements for data integrity, usability, and performance. (Repeat after me: "No more fsck. No more fsck. No more fsck.")
ZFS Boot and Live Upgrade
Since last fall, with Solaris 10 10/08, it's been possible to use ZFS as a boot file system, too. If ever there's a filesystem you want to have immunized against failures, it's the boot environment. Now, you can use a mirrored ZFS boot environment to protect against media failures, enable compression to save disk space, take snapshots to preserve the point-in-time image of your system as a safeguard against "Ooops!", and other benefits.
You can also use ZFS with Live Upgrade, which lets you update system software on an alternate boot environment (ABE) separate from the currently-running system. This lets you safely apply system changes, but with UFS filesystems you have the aggravation of having to plan for and allocate a disk slice for each ABE.
With a ZFS boot filesystem, you just specify the name of the ZFS pool when you create the ABE. Live Upgrade then takes a ZFS snapshot and builds the environment on a clone of it. Because ZFS uses a "copy on write" architecture, you don't waste disk space on duplicate copies of unchanged disk contents in each boot environment. Only changed bytes are stored. The result is that you save a tremendous amount of disk space, creating a boot environment is much faster, and you can fit many, many more of them on the same disk.
For example, see the creation of an alternate boot environment below. It only took a few minutes, and the disk footprint is just a few hundred KB for metadata, instead of several GB.
# lucreate -c s10u7 -n s10u7patched -p rpool Analyzing system configuration. No name for current boot environment. Current boot environment is named <s10u7>. Creating initial configuration for primary boot environment <s10u7>. The device </dev/dsk/c0t0d0s0> is not a root device for any boot environment; cannot get BE ID. PBE configuration successful: PBE name <s10u7> PBE Boot Device </dev/dsk/c0t0d0s0>. Comparing source boot environment <s10u7> file systems with the file system(s) you specified for the new boot environment. Determining which file systems should be in the new boot environment. Updating boot environment description database on all BEs. Updating system configuration files. Creating configuration for boot environment <s10u7patched>. Source boot environment is <s10u7>. Creating boot environment <s10u7patched>. Cloning file systems from boot environment <s10u7> to create boot environment <s10u7patched>. Creating snapshot for <rpool/ROOT/s10s_u7wos_08> on <rpool/ROOT/s10s_u7wos_08@s10u7patched>. Creating clone for <rpool/ROOT/s10s_u7wos_08@s10u7patched> on <rpool/ROOT/s10u7patched>. Setting canmount=noauto for </> in zone <global> on <rpool/ROOT/s10u7patched>. Population of boot environment <s10u7patched> successful. Creation of boot environment <s10u7patched> successful. # lustatus Boot Environment Is Active Active Can Copy Name Complete Now On Reboot Delete Status -------------------------- -------- ------ --------- ------ ---------- s10u7 yes yes yes no - s10u7patched yes no no yes - gilbert / # zfs list NAME USED AVAIL REFER MOUNTPOINT rpool 20.2G 55.1G 94K /rpool rpool/ROOT 3.38G 55.1G 18K legacy rpool/ROOT/s10s_u7wos_08 3.38G 55.1G 3.38G / rpool/ROOT/s10s_u7wos_08@s10u7patched 69.5K - 3.38G - rpool/ROOT/s10u7patched 111K 55.1G 3.38G / ...snip for brevity...
Note the tiny disk space needed for the snapshot (the object with the "@" in its name) and the new boot environment. Now, I can go off and apply patches or install software to the alternate boot environment without affecting the running environment. When I'm done, I activate the alternate boot environment, reboot, and I'm running in the updated world. If I'm unhappy with the results, or if this was just a test period for the new OS level, I can fall back easily and safely to the unaltered original software environment.
Did I mention that you can also put the zoneroot of a Solaris Container on ZFS now? That leverages ZFS snapshots too, so cloning a zone (complete with customization) only takes a few seconds, and a new Container has a tiny disk footprint. Life is good.
Parenthetically, I just have to say that after years working in datacenters, I've always been queasy at the thought of modifying the One Good Copy of the OS on a system that you know is working. If new system code is bad, or Murphy's Law hits you have a lot of trouble on your hands. Live Upgrade is a Good Thing.
Something's missing - where's the flash?
Solaris administrators have long enjoyed the "Flash Archive" feature to create many cloned environments from a single "golden image". An administrator could configure a system, install software and fixes, test it, then save all or part of its contents in a "flash archive" (usually called a "flar"), and then use Jumpstart to install as many machines with that system image as needed. That make provisioning identical machines an almost completely hands-off activity. You can also make "differential flash archives" that only include differences (files added, removed, and changed) from a base (full) archive. With that, you can install a system from a full archive, and then quickly customize for individual servers or apply additional changes. This saves a lot of disk space, as you don't need to keep full archives for each variation. This is very nicely described at Joerg Moellenkamp's excellent Less Known Solaris Features - Jumpstart pages.
Unfortunately, Flash could not be used in ZFS boot environments until recently. If you tried to use it you would get an error message saying (essentially) that it wasn't supported. The functionality was so useful that my colleague Scott Dickson came up with ingenious ways to have a Flash-like capability with ZFS boot. He described that in blog entries: Flashless System Cloning with ZFS and A Much Better Way to use Flash and ZFS Boot. The first blog describes how to get the effects of a flash install, and the second is a very clever way to use a custom jumpstart profile to provision a ZFS boot environment from a flash archive. Very elegant, and solved the problem till an official solution arrived.
Now it can be done directly
Fortunately, there is now official support for Flash Archives when using ZFS boot. You have to apply particular patches to enable this feature:
- 119534-15 : fixes to the /usr/sbin/flarcreate and /usr/sbin/flar command
- 124630-26: updates to the install software
- 119535-15 : fixes to the /usr/sbin/flarcreate and /usr/sbin/flar command
- 124631-27: updates to the install software
Once you have this software installed you're able to proceed in a more direct manner:
# flarcreate -n s10u7patched /export/home/flar/s10u7patched.flar Full Flash Checking integrity... Integrity OK. Running precreation scripts... Precreation scripts done. Determining the size of the archive... The archive will be approximately 16.97GB. Creating the archive... Archive creation complete. Running postcreation scripts... Postcreation scripts done. Running pre-exit scripts... Pre-exit scripts done.
That's a pretty enormous flash archive, isn't it? I was making a flash archive that contained a software repository, and multiple copies of Solaris virtual disks for Logical Domains (each of them taking up a few GB). I later did a custom jumpstart using the following profile, and I was off to the races:
install_type flash_install archive_location nfs 192.168.2.4:/export/home/flar/s10u7patched.flar partitioning explicit pool rpool auto auto auto c0t0d0s0
Voila! Once I jumpstarted the target machine, it inhaled the flash archive and the new system image was built on the flash archive without me having to type anything.
Here it is in all the gory detail:
System identification complete. Starting Solaris installation program... Searching for JumpStart directory... Using rules.ok from 192.168.2.4:/jumpstart. Checking rules.ok file... Using profile: prescott_prof Executing JumpStart preinstall phase... Searching for SolStart directory... Checking rules.ok file... Using begin script: install_begin Using finish script: patch_finish Executing SolStart preinstall phase... Executing begin script "install_begin"... Begin script install_begin execution completed. WARNING: Flash install: Pool name
specified in profile will be ignored Processing profile - Opening Flash archive - Validating Flash archive - Selecting all disks - Configuring boot device - Configuring / (c0t0d0s0) Verifying disk configuration Verifying space allocation Preparing system for Flash install Configuring disk (c0t0d0) - Creating Solaris disk label (VTOC) - Creating pool rpool Beginning Flash archive processing Predeployment processing 16 blocks 32 blocks 16 blocks No local customization defined Extracting archive: s10u7patched Extracted 0.00 MB ( 0% of 17386.52 MB archive) Extracted 1.00 MB ( 0% of 17386.52 MB archive) Extracted 2.00 MB ( 0% of 17386.52 MB archive) ... Extracted 17386.52 MB (100% of 17386.52 MB archive) Extraction complete Postdeployment processing No local customization defined - Finishing post-flash pool setup for pool rpool - Creating swap zvol for pool rpool - Creating dump zvol for pool rpool Customizing system files - Mount points table (/etc/vfstab) - Network host addresses (/etc/hosts) - Environment variables (/etc/default/init) Cleaning devices Customizing system devices - Physical devices (/devices) - Logical devices (/dev) Installing boot information - Installing boot blocks (c0t0d0s0) - Installing boot blocks (/dev/rdsk/c0t0d0s0) Installation log location - /a/var/sadm/system/logs/install_log (before reboot) - /var/sadm/system/logs/install_log (after reboot) Flash installation complete Executing JumpStart postinstall phase... The begin script log 'begin.log' is located in /var/sadm/system/logs after reboot. Creating boot_archive for /a updating /a/platform/sun4v/boot_archive 15+0 records in 15+0 records out syncing file systems... done rebooting...
Pretty geeky stuff, huh? Note that while you use the
-x option to exclude UFS directories from the flash archive, you use the
-D option to exclude ZFS datasets.
Life is good and flashy
If I wanted to, I could replicate this system on as many servers as I have, without hands-on for each. I recently worked with a customer deploying 100 new Sun servers, and this capability is a life saver for knocking out cookie-cutter systems. Humans are just not good at doing the same thing many times in a row - we don't do "identical" and "fast" very well. Flash solves that problem for Solaris installs, and it now works with ZFS boot too, for best of both worlds.
I should mention: Flash and directly installing systems is very Old School. That's okay by me - I'm Old School, too. But that still doesn't scale as much as we'd like and can still be labor and skill-intensive. Automation is the way to go for discovery, mass provisioning, monitoring and management. For that, I recommend you take a look at Ops Center.
An update: Solaris 10 10/09 is available now, and includes the patches needed for this support. Another edit Dec 4, 2012: clean up URLs (content otherwise unchanged).