Flashless System Cloning with ZFS

Ancient History

Gather round kiddies and let Grandpa tell you a tale of how we used to to clone systems before we had Jumpstart and Flash, when we had to carry water in leaky buckets 3 miles through snow up to our knees, uphill both ways.

Long ago, a customer of mine needed to deploy 600(!) SPARCstation 5 desktops all running SunOS 4.1.4. Even then, this was an old operating system, since Solaris 2.6 had recently been released. But it was what their application required. And we only had a few days to build and deploy these systems.

Remember that Jumpstart did not exist for SunOS 4.1.4, Flash did not exist for Solaris 2.6. So, our approach was to build a system, a golden image, the way we wanted to be deployed and then use ufsdump to save the contents of the filesystems. Then, we were able to use Jumpstart from a Solaris 2.6 server to boot each of these workstations. Instead of having a Jumpstart profile, we only used a finish script that partitioned the disks and restored the ufsdump images. So Jumpstart just provided us clean way to boot these systems and apply the scripts we wanted to them.

Solaris 10 10/08, ZFS, Jumpstart and Flash

Now, we have a bit of a similar situation. Solaris 10 10/08 introduces ZFS boot to Solaris, something that many of my customers have been anxiously awaiting for some time. A system can be deployed using Jumpstart and the ZFS boot environment created as a part of the Jumpstart process.

But. There's always a but, isn't there.

But, at present, Flash archives are not supported (and in fact do not work) as a way to install into a ZFS boot environment, either via Jumpstart or via Live Upgrade. Turns out, they use the same mechanism under the covers for this. This is CR 6690473.

So, how can I continue to use Jumpstart to deploy systems, and continue to use something akin to Flash archives to speed and simplify the process?

Turns out the lessons we learned years ago can be used, more or less. Combine the idea of the ufsdump with some of the ideas that Bob Netherton recently blogged about (Solaris and OpenSolaris coexistence in the same root zpool), and you can get to a workaround that might be useful enough to get you through until Flash really is supported with ZFS root.

Build a "Golden Image" System

The first step, as with Flash, is to construct a system that you want to replicate. The caveat here is that you use ZFS for the root of this system. For this example, I have left /var as part of the root filesystem rather than a separate dataset, though this process could certainly be tweaked to accommodate a separate /var.

Once the system to be cloned has been built, you save an image of the system. Rather than using flarcreate, you will create a ZFS send stream and capture this in a file. Then move that file to the jumpstart server, just as you would with a flash archive.

In this example, the ZFS bootfs has the default name - rpool/ROOT/s10s_u6wos_07.


golden# zfs snapshot rpool/ROOT/s10s_u6wos_07@flar
golden# zfs send -v rpool/ROOT/s10s_u6wos_07@flar > s10s_u6wos_07_flar.zfs
golden# scp s10s_u6wos_07_flar.zfs js-server:/flashdirectory

How do I get this on my new server?

Now, we have to figure out how to have this ZFS send stream restored on the new clone systems. We would like to take advantage of the fact that Jumpstart will create the root pool for us, along with the dump and swap volumes, and will set up all of the needed bits for the booting from ZFS. So, let's install the minimum Solaris set of packages just to get these side effects.

Then, we will use Jumpstart finish scripts to create a fresh ZFS dataset and restore our saved image into it. Since this new dataset will contain the old identity of the original system, we have to reset our system identity. But once we do that, we are good to go.

So, set up the cloned system as you would for a hands-free jumpstart. Be sure to specify the sysid_config and install_config bits in the /etc/bootparams. The manual Solaris 10 10/08 Installation Guide: Custom JumpStart and Advanced Installations covers how to do this. We add to the rules file a finish script (I called mine loadzfs in this case) that will do the heavy lifting. Once Jumpstart installs Solaris according to the profile provided, it then runs the finish script to finish up the installation.

Here is the Jumpstart profile I used. This is a basic profile that installs the base, required Solaris packages into a ZFS pool mirrored across two drives.


install_type    initial_install
cluster         SUNWCreq
system_type     standalone
pool            rpool auto auto auto mirror c0t0d0s0 c0t1d0s0
bootenv         installbe bename s10u6_req

The finish script is a little more interesting since it has to create the new ZFS dataset, set the right properties, fill it up, reset the identity, etc. Below is the finish script that I used.


#!/bin/sh -x

# TBOOTFS is a temporary dataset used to receive the stream
TBOOTFS=rpool/ROOT/s10u6_rcv

# NBOOTFS is the final name for the new ZFS dataset
NBOOTFS=rpool/ROOT/s10u6f

MNT=/tmp/mntz
FLAR=s10s_u6wos_07_flar.zfs
NFS=serverIP:/export/solaris/Solaris10/flash

# Mount directory where archive (send stream) exists
mkdir ${MNT}
mount -o ro -F nfs ${NFS} ${MNT}

# Create file system to receive ZFS send stream &
# receive it.  This creates a new ZFS snapshot that
# needs to be promoted into a new filesystem
zfs create ${TBOOTFS}
zfs set canmount=noauto ${TBOOTFS}
zfs set compression=on ${TBOOTFS}
zfs receive -vF ${TBOOTFS} < ${MNT}/${FLAR}

# Create a writeable filesystem from the received snapshot
zfs clone ${TBOOTFS}@flar ${NBOOTFS}

# Make the new filesystem the top of the stack so it is not dependent
# on other filesystems or snapshots
zfs promote ${NBOOTFS}

# Don't automatically mount this new dataset, but allow it to be mounted
# so we can finalize our changes.
zfs set canmount=noauto ${NBOOTFS}
zfs set mountpoint=${MNT} ${NBOOTFS}

# Mount newly created replica filesystem and set up for
# sysidtool.  Remove old identity and provide new identity
umount ${MNT}
zfs mount ${NBOOTFS}

# This section essentially forces sysidtool to reset system identity at
# the next boot.
touch /a/${MNT}/reconfigure
touch /a/${MNT}/etc/.UNCONFIGURED
rm /a/${MNT}/etc/nodename
rm /a/${MNT}/etc/.sysIDtool.state
cp ${SI_CONFIG_DIR}/sysidcfg /a/${MNT}/etc/sysidcfg

# Now that we have finished tweaking things, unmount the new filesystem
# and make it ready to become the new root.
zfs umount ${NBOOTFS}
zfs set mountpoint=/ ${NBOOTFS}
zpool set bootfs=${NBOOTFS} rpool

# Get rid of the leftovers
zfs destroy ${TBOOTFS}
zfs destroy ${NBOOTFS}@flar

When we jumpstart the system, Solaris is installed, but it really isn't used. Then, we load from the send stream a whole new OS dataset, make it bootable, set our identity in it, and use it. When the system is booted, Jumpstart still takes care of updating the boot archives in the new bootfs.

On the whole, this is a lot more work than Flash, and is really not as flexible or as complete. But hopefully, until Flash is supported with a ZFS root and Jumpstart, this might at least give you an idea of how you can replicate systems and do installations that do not have to revert back to package-based installation.

Many people use Flash as a form of disaster recover. I think that this same approach might be used there as well. Still not as clean or complete as Flash, but it might work in a pinch.

So, what do you think? I would love to hear comments on this as a stop-gap approach.

Comments:

Very, very interesting!

My question is, is zfs receive faster than a .flar (which is basically a cpio(1) archive)?
Does using a compressed ZFS root really work? I'll have to try that!

Will you please blog on this again when the Flash(TM) fix becomes available?

You wrote that this fix isn't as complete. What is missing, or let's rephrase that, under which scenarios would your solution be inadequate/inappropriate?

Posted by UX-admin on December 05, 2008 at 07:52 PM EST #

I suspect that ZFS receive might be faster. It took on average about 700 seconds to do the receive across a 100Mb network between incredibly slow machines (server and client were both Sun E220 servers w/ 2x450MHz CPUs).

No, using ZFS here is not as complete as flar. For example, since Flash is integrated into the installer, log files are preserved and copied over. While I could have added that, I didn't, so you have to find the logs in the original boot environment. My approach requires ZFS on both ends. Presumably, a full Flash solution would not - you could create a ZFS system from a UFS original since it uses cpio under the covers - but that remains to be seen.

I suppose, as far as when would this be inadequate? I think that the disaster recovery uses of Flash would be the most difficult to reproduce and to manage with this, since it is not integrated into the installer. It requires a lot more work to make it happen, and in a DR situation, things need to be more streamlines, I suspect.

Additionally, this has no provision for layered or differential archives at all. Flash does.

But, this could be the basis of a real solution. Flash has fields that could conceivably allow for different ways to encapsulate a file system, I believe.

Posted by Scott Dickson on December 08, 2008 at 02:12 AM EST #

This looks pretty cool!! (as a stopgap)
You should be able to use rcnet as your initial cluster, you're never using it anyway.
You've inspired me to do some testing....

Posted by Mike Ramchand on December 08, 2008 at 02:22 AM EST #

I thought about using SUNWCrnet, but as it turns out, it has gotten so large that there was little point. 512MB vs 551MB. Used to be SUNWCrnet was really small.

Even so, when I tried things manually, SUNWCrnet lacked some of the key pieces. I know that when you are doing the receive, you are not actually running in that environment, but, in the end since the size was so similar, it was really a wash.

Posted by Scott Dickson on December 08, 2008 at 02:25 AM EST #

As you say, necessity is the mother of invention. Going back to the "ancient" days and retrieving the same type of logic is a great idea. Thank you for the "wisdom of a veteran".

Posted by Walter Frantz on December 10, 2008 at 03:43 AM EST #

Nice piece of work, Scott. Thanks for sharing!

Posted by Bob Netherton on December 10, 2008 at 01:58 PM EST #

I've got a working prototype JET module that does this now, just a bit more tidying, testing and feature proofing, and your method above becomes potentially a lot more painless! (I'll blog when done)

Posted by Mike Ramchand on December 11, 2008 at 10:17 PM EST #

Hi Scott, thanks for this, really useful.

That's perhaps a stupid question, but as I can't find the answer...
Is it possible to use this method on systems with different disk size? I mean for example, create the ZFS snapshot of a 1TB disk root zpool and deploy it on a 500GB disk?

Posted by Pierre Jean on April 28, 2009 at 06:24 AM EDT #

Hi Scott,

Thanks for reminding of the good old days and tested nice ways.

Although I'm probably not half as old as you, I did many interesting OS migrations
between hardwares and filesystems, so I appreciate your recipe and sharing it :)

I don't think this recipe becomes immediately extinct even with the integration
of Flash into ZFS Root and Jumpstart recently (119534-15 and 124630-26 for SPARC,
119535-15 and 124631-27 for x86). :)

Hi Pierre,

Yes it is possible to use this method on systems with different disk size (as
you've probably discovered in the past months since your post ;) ).

The snapshot occupies only as much space as its data, and expands to as much when
received. In other words, snapshots (and datasets in general) are not limited by
your hardware configuration and pool layout. They are only limited by the pool's
available space. And pools are the layer tied to the physical implementation.

Comparing performance to cpio flash archives, I'd probably look at the fact that
unpacking the cpio archive requires writing filesystem metadata upon creating each
file (which leads to mechanical seeks, among other things), while receiving a zfs
stream gets the whole chunk of the filesystem at once.

//Jim

Posted by Jim Klimov on July 08, 2009 at 11:09 PM EDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Interesting bits about Solaris, Virtualization, and Ops Center

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today