A Much Better Way to use Flash and ZFS Boot

A Different Approach

A week or so ago, I wrote about a way to get around the current limitation of mixing flash and ZFS root in Solaris 10 10/08. Well, here's a much better approach.

I was visiting with a customer last week and they were very excited to move forward quickly with ZFS boot in their Solaris 10 environment, even to the point of using this as a reason to encourage people to upgrade. However, when they realized that it was impossible to use Flash with Jumpstart and ZFS boot, they were disappointed. Their entire deployment infrastructure is built around using not just Flash, but Secure WANboot. This means that they have no alternative to Flash; the images deployed via Secure WANBoot are always flash archives. So, what to do?

It occurred to me that in general, the upgrade procedure from a pre-10/08 update of Solaris 10 to Solaris 10 10/08 with a ZFS root disk is a two-step process. First, you have to upgrade to Solaris 10 10/08 on UFS and then use lucreate to copy that environment to a new ZFS ABE. Why not use this approach in Jumpstart?

Turns out that it works quite nicely. This is a framework for how to do that. You likely will want to expand on it, since one thing this does not do is give you any indication of progress once it starts the conversion. Here's the general approach:

  • Create your flash archive for Solaris 10 10/08 as you usually would. Make sure you include all the appropriate LiveUpgrade patches in the flash archive.
  • Use Jumpstart to deploy this flash archive to one disk in the target system.
  • Use a finish script to add a conversion program to run when the system reboots for the first time. It is necessary to make this script run once the system has rebooted so that the LU commands run within the context of the fully built new system.

Details of this approach

Our goal when complete is to have the flash archive installed as it always has been, but to have it running from a ZFS root pool, preferably a mirrored ZFS pool. The conversion script requires two phases to complete this conversion. The first phase creates the ZFS boot environment and the second phase mirrors the root pool. The following in this example, our flash archive is called s10u6s.flar. We will install the initial flash archive onto the disk c0t1d0 and built our initial root pool on c0t0d0.

Here is the Jumpstart profile used in this example:


install_type    flash_install
archive_location nfs nfsserver:/export/solaris/Solaris10/flash/s10u6s.flar
partitioning    explicit
filesys         c0t1d0s1        1024    swap
filesys         c0t1d0s0        free    /

We specify a simple finish script for this system to copy our conversion script into place:

cp ${SI_CONFIG_DIR}/S99xlu-phase1 /a/etc/rc2.d/S99xlu-phase1

You see what we have done: We put a new script into place to run at the end of rc2 during the first boot. We name the script so that it is the last thing to run. The x in the name makes sure that this will run after other S99 scripts that might be in place. As it turns out, the luactivate that we will do puts its own S99 script in place, and we want to come after that. Naming ours S99x makes it happen later in the boot sequence.

So, what does this magic conversion script do? Let me outline it for you:

  • Create a new ZFS pool that will become our root pool
  • Create a new boot environment in that pool using lucreate
  • Activate the new boot environment
  • Add the script to be run during the second phase of the conversion
  • Clean up a bit and reboot

That's Phase 1. Phase 2 has its own script to be run at the same time that finishes the mirroring of the root pool. If you are satisfied with a non-mirrored pool, you can stop here and leave phase 2 out. Or you might prefer to make this step a manual process once the system is built. But, here's what happens in Phase 2:

  • Delete the old boot environment
  • Add a boot block to the disk we just freed. This example is SPARC, so use installboot. For x86, you would do something similar with installgrub.
  • Attach the disk we freed from the old boot environment as a mirror of the device used to build the new root zpool.
  • Clean up and reboot.

I have been thinking it might be worthwhile to add a third phase to start a zpool scrub, which will force the newly attached drive to be resilvered when it reboots. The first time something goes to use this drive, it will notice that it has not been synced to the master drive and will resilver it, so this is sort of optional.

The reason we add bootability explicitly to this drive is because currently, when a mirror is attached to a root zpool, a boot block is not automatically installed. If the master drive were to fail and you were left with only the mirror, this would leave the system unbootable. By adding a boot block to it, you can boot from either drive.

So, here's my simple little script that got installed as /etc/rc2.d/S99xlu-phase1. Just to make the code a little easier for me to follow, I first create the script for phase 2, then do the work of phase 1.


cat > /etc/rc2.d/S99xlu-phase2 << EOF
ludelete -n s10u6-ufs
installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk /dev/rdsk/c0t1d0s0
zpool attach -f rpool c0t0d0s0 c0t1d0s0
rm /etc/rc2.d/S99xlu-phase2
init 6
EOF
dumpadm -d swap
zpool create -f rpool c0t0d0s0
lucreate -c s10u6-ufs -n s10u6 -p rpool
luactivate -n s10u6
rm /etc/rc2.d/S99xlu-phase1
init 6

I think that this is a much better approach than the one I offered before, using ZFS send. This approach uses standard tools to create the new environment and it allows you to continue to use Flash as a way to deploy archives. The dependency is that you must have two drives on the target system. I think that's not going to be a hardship, since most folks will use two drives anyway. You will have to keep then as separate drives rather than using hardware mirroring. The underlying assumption is that you previously used SVM or VxVM to mirror those drives.

So, what do you think? Better? Is this helpful? Hopefully, this is a little Christmas present for someone! Merry Christmas and Happy New Year!

Comments:

But why exactly don't Flash(TM) archives work with a ZFS root pool?
What is the technical reason?

And, please, pretty please with sugar on top, would you provide an update as to the status of Flash(TM) and ZFS root?

Your customer isn't the only one for whom this is a show stopper. More than one major customer, and myself as well, depend on JumpStart(TM) + Flash(TM).

Flash(TM) is just a killer Solaris feature, that no other operating system has been able to get right.

Posted by UX-admin on December 23, 2008 at 05:45 AM EST #

I will have to leave it to some of the folks in engineering to comment on how we got to the current state. I do know that the bits in the installer are kind of antique code and fraught with peril.

As to status, again, the folks in engineering could comment on this better than I. They are the ones working on the scheduled. I think there has been some active discussion of all of this in the install-discuss@opensolaris.org list. Dave Miner may have updated status there. I am not certain. But my suggestion is track things there.

I absolutely agree that Flash is way cool. Virtually every customer I talk to who has more than 3 or 4 systems lives by Flash.

And I am certain that the message of how important this is to so many customers has come through loud and clear. Now, it's a matter of getting the code done and out. That takes time and has to line up with release schedules already in play, unfortunately. I would love for this to be out tomorrow, but the physics of development keep that from happening.

Posted by Scott Dickson on December 23, 2008 at 07:14 AM EST #

This seems useful, but better handling of possible pre-existing disk labels needs to be considered, based on my testing:

A. If the mirror boot disk previously had an EFI label written to it, the label probably needs to be destroyed and recreated as an SMI label, right?

B. If slice 0 on the SMI labeled mirror disk was smaller than s2, then you get errors about the mirror disk being too small. I guess you could just use s2 instead of s0 for all of the disks, since we're using slices anyway. Or I could try comparing s0's size to s2's, and wipe s0 and replace it with the size of s2 with fmthard if they're not identical.

Posted by Gordon Marler on January 22, 2009 at 03:57 AM EST #

Absolutely agree. This was just a quick and dirty prototype to prove that it could be done. There are probably a lot more edge conditions and such that you would want to be on the lookout for.

Thanks for your interest!

Posted by Scott Dickson on January 22, 2009 at 07:05 AM EST #

Great article. I was able to make this work in our environment with systems that have the same hardware configuration.

Is it possible to use on a system that has a different disk configuration? For example the original flash archive may have disks c0t0d0 and c0t1d0 while the system you wish to deploy the archive on has c1t0d0 and c1t1d0.

Thanks again.

Posted by Josh Albright on April 29, 2009 at 06:20 AM EDT #

Josh,

Sure. There's no requirement that the drives on the parent and the child system be numbered the same or sized the same. The only thing about this approach is that it pretty much requires that you know what the drives you are going to use on the child system are called. The script that you use to complete everything has to know what disks to use for the installboot and for the creation of the ABE. But those don't have to be the same as the disks used on the parent at all.

Posted by Scott Dickson on April 30, 2009 at 04:07 AM EDT #

Hi everyone

Regarding this comment

Flash(TM) is just a killer Solaris feature, that no other operating system has been able to get right.

miles away from the truth. AIX and HPUX both have mksysb and ignite which both do the job and if not better than a flash archive

So, this is a workaround, similar to something that we would expect from the likes of Ubuntu or other open source Linux distribution. As soon as Sun come up with a proper and supported approach they will be catching up with the competition again. ZFS by the way is a superb and flexible filesystem, tonnes better than any LVM implementation on Linux, HPUX or AIX OS

Steve

Posted by Steve Burgess on May 06, 2009 at 04:52 AM EDT #

Okay, now that 124630-26 and 119534-15 are out, can someone please tell me if there's a way to get luupgrade -f to work with a ZFS root and a flash archive? This is something that would be really useful to me.

Of course, I can't say why without lending my firstborn to the public relations department here ... :-)

thanks again for your ingenious suggestions to date!

ttfn
mm
mark.o.michael@boeing.com

Posted by Mark Michael on July 09, 2009 at 12:50 PM EDT #

Okay, now that 124630-26 and 119534-15 are out, can someone please tell me if there's a way to get luupgrade -f to work with a ZFS root and a flash archive? This is something that would be really useful to me.

Of course, I can't say why without lending my firstborn to the public relations department here ... :-)

thanks again for your ingenious suggestions to date!

ttfn
mm
mark.o.michael@boeing.com

Posted by Mark Michael on July 09, 2009 at 12:51 PM EDT #

Hi Scott, is there a typo in your S99 script or has the syntax for the "ludelete" and "luactivate" commands changed? I'm using Solaris 10 5/09 s10s_u7wos_08 SPARC and both the "ludelete" and "luactivate" commands do NOT have a "-n" option.

Please advise,

Dwight...

Posted by Dwight Victor on August 19, 2009 at 07:20 AM EDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Interesting bits about Solaris, Virtualization, and Ops Center

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today