Sunday Aug 04, 2013

Shrink a ZFS Root Pool, Solaris 11.1, SPARC

Revision info

  • Update III - 5-Aug-2013 1015am EDT - Further clarification on VARSHARE has been added.
  • Update II - 4-Aug-2013 6pm EDT -
    - A clarification has been added under the goals section.
    - A kind reviewer points out that I forgot about the new file system VARSHARE. An update was added to discuss it.
  • Update I - 4-Aug-2013 10am EDT - A kind reader points out an opportunity for jealousy, which has been addressed.

Summary

A root pool cannot be shrunk in a single operation, but it can be copied to a smaller partition. The Solaris 11 beadm command makes the task easier than it used to be. In this example, the pool is copied, and then mirrored.

  1. Use format to create a smaller partition on a new device, say c0tXs0
  2. # zpool create -f newpool c0tXs0
  3. # beadm create -a -d "smaller s11.1" -p newpool solaris-SRUnn
  4. Use {ok} probe-scsi-all and
    {ok} devalias to identify the new disk
  5. {ok} setenv boot-device diskNN
  6. Boot new system, and clean up or copy (zfs send/receive) other file systems from the old device (e.g. /export, /export/home, perhaps also swap, dump, and VARSHARE)
  7. Use zpool export - or use zpool destroy - to hide or destroy the original
  8. Use format to create the mirror partition, say c0tYs0
  9. zpool attach -f newpool c0tXs0 c0tYs0
  10. Allow the resilver to complete
  11. At OBP, hunt down c0tY and boot the mirror

A detailed example follows.

Contents

1. Goal: shrink a root pool on a SPARC system.

a. Sidebar: Why?

b. A long-discussed feature, and a unicorn

c. Web Resources

(i) Other bloggers

(ii) Solaris 11.1 Documentation Library

2. Initial state: one very large rpool, mostly empty

3. Create newpool and new swap/dump

a. Delete old swap, use the new

b. Delete old dump, use the new

4. The actual copy

a. Let beadm do the work!

b. Thank you, beadm, for automatically taking care of:

(i) activation, (ii) bootfs, (iii) bootloader, and (iv) menu.lst

Update: Beadm missed one item....VARSHARE

5. Boot the new system (after a little OBP hunting)

6. Cleanup

a. Copy additional file systems

b. Hide - or delete - the original

7. Mirror the newpool

8. Final verification

Thank you

1. Goal: shrink a root pool on a SPARC system.

A large root pool occupies most of a disk. I would like to make it much smaller.

a. Sidebar: Why?

Why do I want to shrink? Because the desired configuration is:

  • Mirrored root pool
  • Large swap partition, not mirrored

That is, the current configuration is:

diskX                 
  c0tXs0 rpool        
    rpool/ROOT/solaris
    rpool/swap        

I do not want to just add a mirror, like this:

diskX                           diskY
  c0tXs0 rpool                    c0tYs0 rpool
    rpool/ROOT/solaris              solaris (copy)
    rpool/swap                      swap (copy)

Instead, the goal is to have twice as much swap, like so:

diskX                           diskY
  c0tXs0 rpool                    c0tYs0 rpool
    rpool/ROOT/solaris              solaris (copy)
  c0tXs1                          c0tYs1 
    swap                            more swap

Clarification: Bytes of disk vs. bytes of memory. At least one reader seemed to want a clarification of the point of the above. To be explicit:

  • A 2-way mirrored rpool with a ZFS swap volume of size N spends 2 x N bytes of disk space to provide backing store to N bytes of memory.
  • Two swap partitions, each of size N, spend 2 x N bytes of disk space and provides backing store to 2 x N bytes of memory.

As it happens, due to the planned workload, this particular system is going to need a lot of swap space. Therefore, I prefer to avoid mirrored swap.

b. A long-discussed feature, and a unicorn

The word "shrink" does not appear in the ZFS admin guide.

Discussions at an archived ZFS discussion group assert that the feature was under active development in 2007, but by 2010 the short summary was "it's hiding behind the unicorn".

Apparently, the feature is difficult, and demand simply has not been high enough.

c. Web Resources

Well, if there is no shrink feature, surely, it can be done by other methods, right? Well....

(i) Other bloggers

If one uses Google to search for "shrink rpool", the top two blog entries that are returned appear to be relevant:

Both of the above are old, written well prior to the release of Solaris 11. Both also use x86 volumes and conventions, not SPARC.

Nevertheless, they contain some useful clues.

(ii) Solaris 11.1 Documentation Library

Since the above blog entries are dated, I also tried to use contemporary documentation. These were consulted, along with the corresponding man pages:

2. Initial state: one very large rpool, mostly empty

Here are the intial pools, file systems, and boot environments. Note that there is a large 556 GB rpool, and it is not mirrored.

# zpool status
  pool: rpool
 state: ONLINE
  scan: none requested
config:

     NAME                       STATE     READ WRITE CKSUM
     rpool                      ONLINE       0     0     0
       c0t5000CCA0224D6354d0s0  ONLINE       0     0     0
#

# zpool list
NAME   SIZE  ALLOC  FREE  CAP  DEDUP  HEALTH  ALTROOT
rpool  556G  77.0G  479G  13%  1.00x  ONLINE  -
#

# zfs list
NAME                       USED  AVAIL  REFER  MOUNTPOINT
rpool                     79.2G   468G  73.5K  /rpool
rpool/ROOT                5.60G   468G    31K  legacy
rpool/ROOT/solaris        42.6M   468G  3.78G  /
rpool/ROOT/solaris-1      5.56G   468G  3.79G  /
rpool/ROOT/solaris-1/var   657M   468G   521M  /var
rpool/ROOT/solaris/var    38.9M   468G   221M  /var
rpool/VARSHARE            83.5K   468G    58K  /var/share
rpool/dump                66.0G   470G  64.0G  -
rpool/export              3.43G   468G    32K  /export
rpool/export/home         3.43G   468G  3.43G  /export/home
rpool/swap                4.13G   468G  4.00G  -

# beadm list
BE        Active Mountpoint Space  Policy Created          
--        ------ ---------- -----  ------ -------          
solaris   -      -          81.58M static 2013-07-10 17:19 
solaris-1 NR     /          6.88G  static 2013-07-31 12:27 
#

The partitions on the original boot disk are:

# format
...
partition> p
Volume:  solaris
Current partition table (original):
Total disk cylinders available: 64986 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0       root    wm       0 - 64985      558.89GB    (64986/0/0) 1172087496
  1 unassigned    wm       0                0         (0/0/0)              0
  2     backup    wu       0 - 64985      558.89GB    (64986/0/0) 1172087496
  3 unassigned    wm       0                0         (0/0/0)              0
  4 unassigned    wm       0                0         (0/0/0)              0
  5 unassigned    wm       0                0         (0/0/0)              0
  6 unassigned    wm       0                0         (0/0/0)              0
  7 unassigned    wm       0                0         (0/0/0)              0

3. Create newpool and new swap/dump

The format utility was used to create a new, smaller partition for the root pool. The first swap partition was also created.

# format
...
partition> p
Volume:  smallsys
Current partition table (unnamed):
Total disk cylinders available: 64986 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0       root    wm       0 - 11627      100.00GB    (11628/0/0)  209722608
  1       swap    wu   11628 - 34883      200.01GB    (23256/0/0)  419445216
  2     backup    wu       0 - 64985      558.89GB    (64986/0/0) 1172087496
  3 unassigned    wm       0                0         (0/0/0)              0 (*)
  4 unassigned    wm       0                0         (0/0/0)              0 (*)
  5 unassigned    wm       0                0         (0/0/0)              0 (*)
  6 unassigned    wm       0                0         (0/0/0)              0 (*)
  7 unassigned    wm       0                0         (0/0/0)              0 (*)

partition> label
Ready to label disk, continue? y

And the new pool was created with zpool create:

# zpool create -f newpool c0t5000CCA0224D62A0d0s0
# 

(*) Note: one might ask, what about the other 250 GB available on the disk? Yes, this user has a plan in mind to use that space. It is not terribly relevant to the concerns covered in this particular blog, and so is left aside for now.

a. Delete old swap, use the new

As noted in the introduction, there will eventually be multiple swap partitions, and they will not be in the root pool. The first new swap partition was just created, above. Therefore, for my purposes, I might as well delete the originals now, if only because it would be useless to copy them. (Your needs may differ!)

In a separate window, a new vfstab was created, which removes the zvol swap and adds the new swap partition:

# cd /etc
# diff vfstab.orig vfstab.withnewswap 
12c12
< /dev/zvol/dsk/rpool/swap   -   -      swap    -       no      -
---
> /dev/dsk/c0t5000CCA0224D62A0d0s1 - - swap - no - 
# 

The commands below display the current swap partition, add the new one, and display the result.

# swap -lh
swapfile             dev    swaplo   blocks     free
/dev/zvol/dsk/rpool/swap 285,2        8K     4.0G     4.0G
#

# /sbin/swapadd
# swap -lh
swapfile             dev    swaplo   blocks     free
/dev/zvol/dsk/rpool/swap 285,2        8K     4.0G     4.0G
/dev/dsk/c0t5000CCA0224D62A0d0s1 203,49       8K     200G     200G
#

Next, use swap -d to stop swapping on the old, and then destroy it.

# swap -d /dev/zvol/dsk/rpool/swap
# swap -lh
swapfile             dev    swaplo   blocks     free
/dev/dsk/c0t5000CCA0224D62A0d0s1 203,49       8K     200G     200G
# 

# zfs destroy rpool/swap
#

b. Delete old dump, use the new

The largest part of the original pool is the dump device. Since we now have a large swapfile, we can use that instead:

# dumpadm
      Dump content: kernel pages
       Dump device: /dev/zvol/dsk/rpool/dump (dedicated)
Savecore directory: /var/crash
  Savecore enabled: yes
   Save compressed: on
#

# dumpadm -d swap
      Dump content: kernel pages
       Dump device: /dev/dsk/c0t5000CCA0224D62A0d0s1 (swap)
Savecore directory: /var/crash
  Savecore enabled: yes
   Save compressed: on
# 

And the volume can now be destroyed. Emphasis: your needs may differ. You may prefer to keep swap and dump volumes in the new pool.

# zfs destroy rpool/dump

4. The actual copy

Let's use pkg info to figure out a good name for the new BE. From the material below, it appears that this is Service Repository Update 9.5:

# pkg info entire (**)
          Name: entire
       Summary: entire incorporation including Support Repository 
                Update (Oracle Solaris 11.1.9.5.1).
     Publisher: solaris
Packaging Date: Thu Jul 04 03:10:15 2013
# 

(**) Output has been abbreviated for readability

At this point, I did something that I thought would be needed, based on previous blog entries, but as you will see in a moment, it was not needed yet.

# zfs snapshot -r rpool@orig_before_shrink

a. Let beadm do the work!

The previous blog entries at this point made use of zfs send and zfs receive. In a first attempt at this copy, so did I; but a more careful reading of the manpage indicated that beadm create would probably be a better idea. For the sake of brevity, the send/receive side track is omitted.

Here is the first attempt with beadm create:

# beadm create -a -d "smaller s11.1" -e rpool@orig_before_shrink \
> -p newpool s11.1-sru9.5
be_copy: failed to find zpool for BE (rpool)
Unable to create s11.1-sru9.5.   (oops)

Hmmm, it claims that it cannot find the snapshot that was just created a minute ago. ... Reading the manpage, I realize that a "beadm snapshot" is not exactly the same concept as a "zfs snapshot". OK.

The manpage also says that if -e is not provided, then it will clone the current environment. Sounds good to me.

# beadm create -a -d "smaller s11.1" -p newpool s11.1-sru9.5
#

The above command took only a few minutes to do, probably because rpool did not have a lot of content. Here is the result:

# zfs list
NAME                            USED  AVAIL  REFER  MOUNTPOINT
newpool                        4.30G  93.6G  73.5K  /newpool
newpool/ROOT                   4.30G  93.6G    31K  legacy
newpool/ROOT/s11.1-sru9.5      4.30G  93.6G  3.79G  /
newpool/ROOT/s11.1-sru9.5/var   519M  93.6G   519M  /var

rpool                          9.04G   538G  73.5K  /rpool
rpool/ROOT                     5.60G   538G    31K  legacy
rpool/ROOT/solaris             42.6M   538G  3.78G  /
rpool/ROOT/solaris-1           5.56G   538G  3.79G  /
rpool/ROOT/solaris-1/var        659M   538G   519M  /var
rpool/ROOT/solaris/var         38.9M   538G   221M  /var
rpool/VARSHARE                 83.5K   538G    58K  /var/share
rpool/export                   3.43G   538G    32K  /export
rpool/export/home              3.43G   538G  3.43G  /export/home
#

Note that /export and /export/home were not copied. We will come back to these later.

b. Thank you, beadm, for automatically taking care of...

The older blogs mentioned several additional steps that had to be performed when copying root pools. As I checked into each of these topics, it turned out - repeatedly - that beadm create had alredy taken care of it.

(i) activation

The new Boot Environment will be active on reboot, as shown by code "R", below, because the above beadm create command included the -a switch.

# beadm list
BE           Active Mountpoint Space  Policy Created          
--           ------ ---------- -----  ------ -------          
s11.1-sru9.5 R      -          4.80G  static 2013-08-02 10:22 
solaris      -      -          81.58M static 2013-07-10 17:19 
solaris-1    NR     /          6.88G  static 2013-07-31 12:27 

(ii) bootfs

The older blogs (which used zfs send/recv) mentioned that the bootfs property needs to be set on the new pool. This is no longer needed: bootadm create already set it automatically. Thank you, beadm.

# zpool list -o name,bootfs
NAME     BOOTFS
newpool  newpool/ROOT/s11.1-sru9.5
rpool    rpool/ROOT/solaris-1
# 

(iii) bootloader

The disk will need a bootloader. Here, some history may be of interest. A few years ago:

  • Frequently, system administrators needed to add bootloaders, for example, anytime a mirror was created.
  • The method differed by platform: installboot on SPARC, or something grub-ish on x86

Today,

  • The bootloader is added automatically when root pools are mirrored
  • And if for some reason, you do need to add one by hand, the command is now bootadm install-bootloader, which in turn calls installboot on your behalf, or messes with grub on your behalf.

The question for the moment: has a bootloader been placed on the new disk?

Here is the original bootloader, on rpool - notice the -R / for path

# bootadm list-archive -R /
platform/SUNW,Netra-CP3060/kernel
platform/SUNW,Netra-CP3260/kernel
platform/SUNW,Netra-T2000/kernel
platform/SUNW,Netra-T5220/kernel
platform/SUNW,Netra-T5440/kernel
platform/SUNW,SPARC-Enterprise-T1000/kernel
platform/SUNW,SPARC-Enterprise-T2000/kernel
platform/SUNW,SPARC-Enterprise-T5120/kernel
platform/SUNW,SPARC-Enterprise-T5220/kernel
platform/SUNW,SPARC-Enterprise/kernel
platform/SUNW,Sun-Blade-T6300/kernel
platform/SUNW,Sun-Blade-T6320/kernel
platform/SUNW,Sun-Blade-T6340/kernel
platform/SUNW,Sun-Fire-T1000/kernel
platform/SUNW,Sun-Fire-T200/kernel
platform/SUNW,T5140/kernel
platform/SUNW,T5240/kernel
platform/SUNW,T5440/kernel
platform/SUNW,USBRDT-5240/kernel
platform/sun4v/kernel
etc/cluster/nodeid
etc/dacf.conf
etc/driver
etc/mach
kernel
# 

After mounting the newly created environment, it can be seen that it also has a bootloader. There is no need to use installboot nor bootadm install-bootloader, because the beadm create command already took care of it. Thank you, beadm.

# beadm mount s11.1-sru9.5 /mnt
#

# bootadm list-archive -R /mnt
platform/SUNW,Netra-CP3060/kernel
platform/SUNW,Netra-CP3260/kernel
platform/SUNW,Netra-T2000/kernel
platform/SUNW,Netra-T5220/kernel
platform/SUNW,Netra-T5440/kernel
platform/SUNW,SPARC-Enterprise-T1000/kernel
platform/SUNW,SPARC-Enterprise-T2000/kernel
platform/SUNW,SPARC-Enterprise-T5120/kernel
platform/SUNW,SPARC-Enterprise-T5220/kernel
platform/SUNW,SPARC-Enterprise/kernel
platform/SUNW,Sun-Blade-T6300/kernel
platform/SUNW,Sun-Blade-T6320/kernel
platform/SUNW,Sun-Blade-T6340/kernel
platform/SUNW,Sun-Fire-T1000/kernel
platform/SUNW,Sun-Fire-T200/kernel
platform/SUNW,T5140/kernel
platform/SUNW,T5240/kernel
platform/SUNW,T5440/kernel
platform/SUNW,USBRDT-5240/kernel
platform/sun4u/kernel
platform/sun4v/kernel
etc/cluster/nodeid
etc/dacf.conf
etc/driver
etc/mach
kernel
# 

(iv) menu.lst

The first google reference above includes this sentence:

Change all the references to [the new pool] in the menu.1st file.

That sounds GRUBish, for x86, and not very much like SPARC. As it turns out, though, yes, there is a menu.lst file for SPARC:

# cat /rpool/boot/menu.lst
title Oracle Solaris 11.1 SPARC
bootfs rpool/ROOT/solaris
title solaris-1
bootfs rpool/ROOT/solaris-1
# 

And, oh look at this, beadm create also made a new menu.lst on the new pool. Thank you, beadm

# cat /newpool/boot/menu.lst 
title smaller s11.1
bootfs newpool/ROOT/s11.1-sru9.5
# 

Beadm missed one item....VARSHARE

Update (III): WHAT'S MISSING? An earlier update to this blog entry pointed out that I forgot about VARSHARE. It has been further clarified that the right time to worry about it is actually BEFORE the reboot. OK. So, if you are following this blog while working on a system of your own, do that zfs list command now, before rebooting. If VARSHARE is present, migrate it now.

5. Boot the new system (after a little OBP hunting)

Attempt to boot the new pool. First, remind myself of the disk ids, and then head off towards OBP

# zpool status (**)

           NAME                       STATE     READ WRITE CKSUM
           newpool                    ONLINE       0     0     0
        -->  c0t5000CCA0224D62A0d0s0  ONLINE       0     0     0

           rpool                      ONLINE       0     0     0
             c0t5000CCA0224D6354d0s0  ONLINE       0     0     0

# shutdown -y -g0 -i0
{0} ok probe-scsi-all (**)
/pci@4c0/pci@1/pci@0/pci@c/pci@0/pci@c/scsi@0   <--

Target a 
  Unit 0   Disk   HITACHI  H109060SESUN600G A31A    1172123568 Blocks, 600 GB
  SASDeviceName 5000cca0224d62a0  SASAddress 5000cca0224d62a1  PhyNum 1 
                ^^^^^^^^^^^^^^^^

It looks like the newly created pool is on

/pci@4c0/pci@1/pci@0/pci@c/pci@0/pci@c/scsi@0

at PhyNum 1 - note its SASDeviceName
   5000cca0224d62a0 which matches newpool at solaris device
c0t5000CCA0224D62A0d0s0.

Is there a device alias that also matches?

{0} ok devalias
screen   /pci@4c0/pci@1/pci@0/pci@c/pci@0/pci@7/display@0
disk7    /pci@4c0/pci@1/pci@0/pci@c/pci@0/pci@c/scsi@0/disk@p3
disk6    /pci@4c0/pci@1/pci@0/pci@c/pci@0/pci@c/scsi@0/disk@p2
disk5    /pci@4c0/pci@1/pci@0/pci@c/pci@0/pci@c/scsi@0/disk@p1   <--
disk4    /pci@4c0/pci@1/pci@0/pci@c/pci@0/pci@c/scsi@0/disk@p0
scsi1    /pci@4c0/pci@1/pci@0/pci@c/pci@0/pci@c/scsi@0
net3     /pci@4c0/pci@1/pci@0/pci@c/pci@0/pci@4/network@0,1
net2     /pci@4c0/pci@1/pci@0/pci@c/pci@0/pci@4/network@0
disk3    /pci@300/pci@1/pci@0/pci@4/pci@0/pci@c/scsi@0/disk@p3
disk2    /pci@300/pci@1/pci@0/pci@4/pci@0/pci@c/scsi@0/disk@p2
disk1    /pci@300/pci@1/pci@0/pci@4/pci@0/pci@c/scsi@0/disk@p1
disk     /pci@300/pci@1/pci@0/pci@4/pci@0/pci@c/scsi@0/disk@p0
disk0    /pci@300/pci@1/pci@0/pci@4/pci@0/pci@c/scsi@0/disk@p0
...

The OBP alias disk5 matches the desired disk.

Try the new disk, checking: does the boot -L switch include the desired new BE?

{0} ok boot disk5 -L
Boot device: /pci@4c0/pci@1/pci@0/pci@c/pci@0/pci@c/scsi@0/disk@p1  
File and args: -L

1 smaller s11.1
Select environment to boot: [ 1 - 1 ]: 1

To boot the selected entry, invoke:
boot [] -Z newpool/ROOT/s11.1-sru9.5   <--

Yes, disk5 offers a choice that matches the new boot environment.

Point the OBP boot-device at it, and off we go.

{0} ok setenv boot-device disk5
{0} ok boot

6. Cleanup

The system booted successfully. After the first boot, note that - as mentioned earlier - /export and /export/home are in the original pool:

# zfs list -r rpool
NAME                       USED  AVAIL  REFER  MOUNTPOINT
rpool                     9.04G   538G  73.5K  /rpool
rpool/ROOT                5.61G   538G    31K  legacy
rpool/ROOT/solaris        42.6M   538G  3.78G  /
rpool/ROOT/solaris-1      5.57G   538G  3.79G  /
rpool/ROOT/solaris-1/var   660M   538G   519M  /var
rpool/ROOT/solaris/var    38.9M   538G   221M  /var
rpool/VARSHARE             108K   538G  58.5K  /var/share
rpool/export              3.43G   538G    32K  /export
rpool/export/home         3.43G   538G  3.43G  /export/home
# 

# zfs list -r newpool
NAME                            USED  AVAIL  REFER  MOUNTPOINT
newpool                        4.33G  93.6G  73.5K  /newpool
newpool/ROOT                   4.33G  93.6G    31K  legacy
newpool/ROOT/s11.1-sru9.5      4.33G  93.6G  3.79G  /
newpool/ROOT/s11.1-sru9.5/var   524M  93.6G   519M  /var
newpool/VARSHARE                 43K  93.6G    43K  /var/share
# 

Update: VARSHARE Mike Gerdts points out what I missed in previous reading of the above: notice that rpool/VARSHARE contained some data that has not been migrated to newpool/VARSHARE. The VARSHARE file system provides a convenient place to store crash dumps, audit records, and similar data that can be shared across boot environments, as described under What's New with ZFS? in the updated ZFS Admin Guide.

Unfortunately, I missed my chance to migrate that data, it's gone. Fortunately, I didn't lose very much (about 108 KB, according to the above). If you are following this blog as you work on your own system, one hopes you noticed the note above about VARSHARE. If not, then would be a good moment to review the status of VARSHARE on your system, potentially merging the previous content with whatever has accumulated since the reboot.

Swap/dump - as already discussed, swap and dump were intentionally not migrated, because they are handled elsewhere. Your needs may differ. If you are following this blog as you work on your own system, now would be a good moment to ensure that you have figured out what you want to do for your swap / dump volumes.

a. Copy additional file systems

Earlier, a snapshot was created, using:

# zfs snapshot -r rpool@orig_before_shrink

That snapshot is used in a zfs send/receive command, which goes quickly, but it ends with an error:

# zfs send -vR rpool/export@orig_before_shrink | zfs receive -vFd newpool
sending from @ to rpool/export@before_shrink
receiving full stream of rpool/export@before_shrink 
   into newpool/export@before_shrink
sending from @before_shrink to rpool/export@orig_before_shrink
sending from @ to rpool/export/home@before_shrink
received 47.9KB stream in 1 seconds (47.9KB/sec)
receiving incremental stream of rpool/export@orig_before_shrink 
   into newpool/export@orig_before_shrink
received 8.11KB stream in 3 seconds (2.70KB/sec)
receiving full stream of rpool/export/home@before_shrink 
   into newpool/export/home@before_shrink
sending from @before_shrink to rpool/export/home@orig_before_shrink
received 3.46GB stream in 78 seconds (45.4MB/sec)
receiving incremental stream of rpool/export/home@orig_before_shrink 
   into newpool/export/home@orig_before_shrink
received 198KB stream in 2 seconds (99.1KB/sec)
cannot mount 'newpool/export' on '/export': directory is not empty
cannot mount 'newpool/export' on '/export': directory is not empty
cannot mount 'newpool/export/home' on '/export/home': 
    failure mounting parent dataset

The problem above is that more than one file system is eligible to be mounted. A better solution - pointed out by a kind reviewer - would have been to use the zfs receive -u switch:

# zfs send -vR rpool/export@orig_before_shrink | zfs receive -vFdu newpool

The -u switch would have avoided the attempt to mount the newly created file systems.

b. Hide - or delete - the original

Warning: YMMV Because I am nearly certain that I will soon be destroying the original rpool, my solution was to disqualify the old ones from mounting. Your mileage may vary. For example, you might prefer to leave the old one unchanged, in case it is needed later. In that case, you could skip directly to the export, described below.

Anyway, the following was satisfactory for my needs. I changed the canmount property:

# zfs list -o mounted,canmount,mountpoint,name -r rpool
MOUNTED  CANMOUNT  MOUNTPOINT    NAME
    yes        on  /rpool        rpool
     no       off  legacy        rpool/ROOT
     no    noauto  /             rpool/ROOT/solaris
     no    noauto  /             rpool/ROOT/solaris-1
     no    noauto  /var          rpool/ROOT/solaris-1/var
     no    noauto  /var          rpool/ROOT/solaris/var
     no    noauto  /var/share    rpool/VARSHARE
    yes        on  /export       rpool/export
    yes        on  /export/home  rpool/export/home
# 
# zfs list -o mounted,canmount,mountpoint,name -r newpool
MOUNTED  CANMOUNT  MOUNTPOINT    NAME
    yes        on  /newpool      newpool
     no       off  legacy        newpool/ROOT
    yes    noauto  /             newpool/ROOT/s11.1-sru9.5
    yes    noauto  /var          newpool/ROOT/s11.1-sru9.5/var
    yes    noauto  /var/share    newpool/VARSHARE
     no        on  /export       newpool/export
     no        on  /export/home  newpool/export/home
# 
# zfs set canmount=noauto rpool/export
# zfs set canmount=noauto rpool/export/home
# reboot

After the reboot, only one data set from the original pool is mounted:

# zfs list -r -o name,mounted,canmount,mountpoint 
NAME                           MOUNTED  CANMOUNT  MOUNTPOINT
newpool                            yes        on  /newpool
newpool/ROOT                        no       off  legacy
newpool/ROOT/s11.1-sru9.5          yes    noauto  /
newpool/ROOT/s11.1-sru9.5/var      yes    noauto  /var
newpool/VARSHARE                   yes    noauto  /var/share
newpool/export                     yes        on  /export
newpool/export/home                yes        on  /export/home

rpool                              yes        on  /rpool
rpool/ROOT                          no       off  legacy
rpool/ROOT/solaris                  no    noauto  /
rpool/ROOT/solaris-1                no    noauto  /
rpool/ROOT/solaris-1/var            no    noauto  /var
rpool/ROOT/solaris/var              no    noauto  /var
rpool/VARSHARE                      no    noauto  /var/share
rpool/export                        no    noauto  /export
rpool/export/home                   no    noauto  /export/home
# 

The canmount property could be set for that one too, but a better solution - as suggested by the kind reviewer - is to zpool export the dataset. The export command ensures that none of it will be seen until/unless a later zpool import command is done (which will not be done in this case, because I want to re-use the space for other purposes).

# zpool export rpool
# reboot
.
.
.
$ zpool status
  pool: newpool
 state: ONLINE
  scan: none requested
config:

        NAME                       STATE     READ WRITE CKSUM
        newpool                    ONLINE       0     0     0
          c0t5000CCA0224D62A0d0s0  ONLINE       0     0     0

errors: No known data errors
$

$ zfs list
NAME                            USED  AVAIL  REFER  MOUNTPOINT
newpool                        7.80G  90.1G  73.5K  /newpool
newpool/ROOT                   4.37G  90.1G    31K  legacy
newpool/ROOT/s11.1-sru9.5      4.37G  90.1G  3.79G  /
newpool/ROOT/s11.1-sru9.5/var   530M  90.1G   521M  /var
newpool/VARSHARE               45.5K  90.1G  45.5K  /var/share
newpool/export                 3.43G  90.1G    32K  /export
newpool/export/home            3.43G  90.1G  3.43G  /export/home
$ 

7. Mirror the newpool

The new root pool has been created, it boots, it has the correct size and now has all the right data sets. Mirror it.

Set up the partitions for the mirror volume to match the volume that has newroot:

partition> p
Volume:  mirror
Current partition table (unnamed):
Total disk cylinders available: 64986 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0       root    wm       0 - 11627      100.00GB    (11628/0/0)  209722608
  1       swap    wu   11628 - 34883      200.01GB    (23256/0/0)  419445216
  2     backup    wu       0 - 64985      558.89GB    (64986/0/0) 1172087496
  3 unassigned    wm       0                0         (0/0/0)              0
  4 unassigned    wm       0                0         (0/0/0)              0
  5 unassigned    wm       0                0         (0/0/0)              0
  6 unassigned    wm       0                0         (0/0/0)              0
  7 unassigned    wm       0                0         (0/0/0)              0

partition> label
Ready to label disk, continue? y

Start the mirror operation:

# zpool status
  pool: newpool
 state: ONLINE
  scan: none requested
config:

        NAME                       STATE     READ WRITE CKSUM
        newpool                    ONLINE       0     0     0
          c0t5000CCA0224D62A0d0s0  ONLINE       0     0     0

errors: No known data errors
#

# zpool attach -f newpool c0t5000CCA0224D62A0d0s0 c0t5000CCA0224D6A30d0s0
Make sure to wait until resilver is done before rebooting.
#
# zpool status
  pool: newpool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function in a degraded state.
action: Wait for the resilver to complete.
        Run 'zpool status -v' to see device specific details.
  scan: resilver in progress since Sat Aug  3 07:49:25 2013
    413M scanned out of 7.83G at 20.6M/s, 0h6m to go
    409M resilvered, 5.15% done
config:

        NAME                         STATE     READ WRITE CKSUM
        newpool                      DEGRADED     0     0     0
          mirror-0                   DEGRADED     0     0     0
            c0t5000CCA0224D62A0d0s0  ONLINE       0     0     0
            c0t5000CCA0224D6A30d0s0  DEGRADED     0     0     0  (resilvering)

errors: No known data errors
# 

It says it will complete in 6 minutes. OK, wait that long and check again:

# sleep 360; zpool status
  pool: newpool
 state: ONLINE
  scan: resilvered 7.83G in 0h3m with 0 errors on Sat Aug  3 07:52:30 2013
config:

        NAME                         STATE     READ WRITE CKSUM
        newpool                      ONLINE       0     0     0
          mirror-0                   ONLINE       0     0     0
            c0t5000CCA0224D62A0d0s0  ONLINE       0     0     0
            c0t5000CCA0224D6A30d0s0  ONLINE       0     0     0

errors: No known data errors
# 

Recall that a part of the goal was to have 2x swap partitions, not mirrored. Add the second one now.

# swap -lh
swapfile             dev    swaplo   blocks     free
/dev/dsk/c0t5000CCA0224D62A0d0s1 203,49       8K     200G     200G
#
# echo "/dev/dsk/c0t5000CCA0224D6A30d0s1 - - swap - no - " >> /etc/vfstab
# /sbin/swapadd
# swap -lh
swapfile             dev    swaplo   blocks     free
/dev/dsk/c0t5000CCA0224D62A0d0s1 203,49       8K     200G     200G
/dev/dsk/c0t5000CCA0224D6A30d0s1 203,41       8K     200G     200G
# 

8. Final verification

When the zpool attach command above was issued, the root pool was mirrored. As mentioned previously, in the days of our ancestors, one had to follow this up by adding the bootloader. Now, thanks to updated zpool attach, it happens automatically.

Verify the feature by booting the other side of the mirror:

# shutdown -y -g0 -i0
...
{0} ok printenv boot-device
boot-device =           disk5
{0} ok boot disk4
Boot device: /pci@4c0/pci@1/pci@0/pci@c/pci@0/pci@c/scsi@0/disk@p0  File and args: 
SunOS Release 5.11 Version 11.1 64-bit
Copyright (c) 1983, 2012, Oracle and/or its affiliates. All rights reserved.

Thank you

Thank you to bigal, and to Joe Mocker for their starting points. Thank you to Cloyce Spradling for review of drafts of this post.

Also, Michael Ramchand noted that the first post of this blog forgot to thank zpool attach in step 8, which has been fixed; that was important because, as Michael noted, "It might get jealous after all the thanks that beadm got."

(this space intentionally left blank)

 

Monday Jul 01, 2013

IBM "per core" comparisons for SPECjEnterprise2010

Responding to an IBM claim of double performance per core, questions and concerns are raised about:

  • Scaling is not free
  • Choosing the denominator radically changes the picture
  • SPEC Fair Use requirements
  • Substantiation and transparency
  • T5 claim for "fastest processor"
[Read More]
Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today