X

Solaris serviceability and nifty tools

ZFS is going to save my laptop data next time

The flashback is still alive even weeks after: the day before my presentation at FIRST Technical
Colloquium in Prague
I brought my 2 years old laptop with the work-in-progress slides to the office.
Since I wanted to finish the slides in the evening a live-upgrade
process was fired off on the laptop to get fresh Nevada
version. (of course, to show off during the presentation ;))


LU
is very I/O intensive process and the red Ferrari notebooks tend to get _very_ hot. In the afternoon I
noticed that the process failed. To my astonishment, the I/O operations started to fail.
After couple of reboots (and zpool status / fmadm faulty commands)
it was obvious that the disk cannot be trusted anymore. I was able to rescue some data
from the ZFS pool which was spanning the biggest slice of the internal disk but not all data.
(ZFS is not willing to get corrupted data out.) My slides were lost as well as other data.

After some time I stumbled upon James Gosling's
blog entry about ZFS mirroring on laptop.
This get me started (or more precisely I was astonished and wondered how is it possible
that this idea escaped me because at that time ZFS had been in Nevada for a long time)
and I have discovered several similar and more
in-depth
blog entries about the topic.


After some experiments with borrowed USB disk it was time to make it reality
on a new laptop.

The process was a multi-step one:



  1. First I had to extend the free slice #7 on the internal disk so it spans the remaining space
    on the disk because it was trimmed after the experiments. In the end the slices look like this
    in format(1) output:
    Part      Tag    Flag     Cylinders         Size            Blocks
    0 root wm 3 - 1277 9.77GB (1275/0/0) 20482875
    1 unassigned wm 1278 - 2552 9.77GB (1275/0/0) 20482875
    2 backup wm 0 - 19442 148.94GB (19443/0/0) 312351795
    3 swap wu 2553 - 3124 4.38GB (572/0/0) 9189180
    4 unassigned wu 0 0 (0/0/0) 0
    5 unassigned wu 0 0 (0/0/0) 0
    6 unassigned wu 0 0 (0/0/0) 0
    7 home wm 3125 - 19442 125.00GB (16318/0/0) 262148670
    8 boot wu 0 - 0 7.84MB (1/0/0) 16065
    9 alternates wu 1 - 2 15.69MB (2/0/0) 32130

  2. Then the USB drive was connected to the system and recognized via format(1):
    AVAILABLE DISK SELECTIONS:
    0. c0d0
    /pci@0,0/pci-ide@12/ide@0/cmdk@0,0
    1. c5t0d0
    /pci@0,0/pci1025,10a@13,2/storage@4/disk@0,0

  3. Live upgrade boot environment which was not active was deleted via ludelete(1M)
    and the slice was commented out in /etc/vfstab. This was needed to make
    zpool(1M) happy.

  4. ZFS pool was created out of the slice on the internal disk (c0d0s7) and external
    USB disk (c5t0d0). I had to force it cause zpool(1M) complained about the overlap
    of c0d0s2 (slice spanning the whole disk) and c0d0s7:
    # zpool create -f data mirror c0d0s7 c5t0d0

    For a while I have struggled with finding a name for the pool (everybody seems
    either to stick to the 'tank' name or come up with some double-cool-stylish name
    which I wanted to avoid because of the likely degradation of the excitement from that name)
    but then chosen the ordinary data (it's what it is, after all).
  5. I have verified that it is possible to disconnect the USB disk
    and safely connect it while an I/O operation is in progress:
    root:moose:/data# mkfile 10g /data/test &
    [1] 10933
    root:moose:/data# zpool status
    pool: data
    state: ONLINE
    scrub: none requested
    config:

    NAME STATE READ WRITE CKSUM

    data ONLINE 0 0 0

    mirror ONLINE 0 0 0

    c0d0s7 ONLINE 0 0 0

    c5t0d0 ONLINE 0 0 0
    errors: No known data errors

    It survived it without a hitch (okay, I had to wait for the zpool command to complete
    a little bit longer due to the still ongoing I/O but that was it) and resynced
    the contents automatically after the USB disk was reconnected:
    root:moose:/data# zpool status
    pool: data
    state: DEGRADED
    status: One or more devices are faulted in response to persistent errors.

    Sufficient replicas exist for the pool to continue functioning in a

    degraded state.
    action: Replace the faulted device, or use 'zpool clear' to mark the device

    repaired.
    scrub: resilver in progress for 0h0m, 3.22% done, 0h5m to go
    config:

    NAME STATE READ WRITE CKSUM

    data DEGRADED 0 0 0

    mirror DEGRADED 0 0 0

    c0d0s7 ONLINE 0 0 0

    c5t0d0 FAULTED 0 0 0 too many errors
    errors: No known data errors

    Also, with heavy I/O it is needed to mark the zpool as clear after the resilver completes
    via zpool clear data because the USB drive is marked as faulty. Normally this will
    not happen (unless the drive really failed) because I will be connecting and disconnecting
    the drive only when powering on or shutting down the laptop, respectively.
  6. After that I have used
    Mark Shellenbaum's
    blog entry
    about ZFS delegated administration
    (it was Mark who did
    the integration)

    and ZFS Delegated Administration
    chapter
    from OpenSolaris
    ZFS Administration Guide
    and created permissions set for my local user and assigned
    those permissions to the ZFS pool 'data' and the user:
      # chmod A+user:vk:add_subdirectory:fd:allow /data
    # zfs allow -s @major_perms clone,create,destroy,mount,snapshot data
    # zfs allow -s @major_perms send,receive,share,rename,rollback,promote data
    # zfs allow -s @major_props copies,compression,quota,reservation data
    # zfs allow -s @major_props snapdir,sharenfs data
    # zfs allow vk @major_perms,@major_props data

    All of the commands had to be done under root.
  7. Now the user is able to create a home directory for himself:
      $ zfs create data/vk
  8. Time to setup the environment of data sets and prepare it for data.
    I have separated the data sets according to a 'service level'. Some data are very
    important (e.g. presentations ;)) to me so I want them multiplied via the ditto blocks
    mechanism so they are actually present 4 times in case of copies dataset
    property set to 2. Also, documents are not usually accompanied by executable
    code so the exec property was set to off which will prevent
    running scripts or programs from that dataset.

    Some data are volatile and in high quantity so they do not need any additional
    protection and it is good idea to
    compress them with better
    compression algorithm
    to save some space. The following table summarizes the setup:
       dataset             properties                       comment
    +-------------------+--------------------------------+------------------------+
    | data/vk/Documents | copies=2 | presentations |
    | data/vk/Saved | compression=on exec=off | stuff from web |
    | data/vk/DVDs | compression=gzip-8 exec=off | Nevada ISOs for LU |
    | data/vk/CRs | compression=on copies=2 | precious source code ! |
    +-------------------+--------------------------------+------------------------+

    So the commands will be:
      $ zfs create -o copies=2 data/vk/Documents
    $ zfs create -o compression=gzip-3 -o exec=off data/vk/Saved
    ...

  9. Now it is possible to migrate all the data, change home directory of the
    user to /data/vk (e.g. via /usr/ucb/vipw) and relogin.

However, this is not the end of it but just beginning. There are many things to
make the setup even better, to name a few:

  • setup some sort of automatic snapshots for selected datasets

    The set of scripts and SMF service for doing ZFS snapshots and backup
    (see ZFS Automatic For The People
    and related blog entries)
    made by Tim Foster could be used for this task.
  • make zpool scrub run periodically
  • detect failures of the disks

    This would be ideal to see in Gnome panel or Gnome system monitor.
  • setup off-site backup via SunSSH + zfs send

    This could be done using the hooks provided by Tim's scripts (see above).
  • Set quotas and reservations for some of the datasets.
  • Install ZFS scripts for Gnome nautilus so I will be able to browse, perform and destroy
    snapshots in nautilus. Now which set of scripts to use ? Chris Gerhard's or Tim Foster's ? Or should I
    just wait for the official
    ZFS support for nautilus
    to be integrated ?
  • Find how exactly will the recovery scenario (in case of laptop drive failure) will look like.

    To import the ZFS pool from the USB disk should suffice but during my experiments
    I was not able to complete it successfully.

With all the above the data should be safe from disk failure (after all disks are
often called "spinning rust" so they are going to fail sooner or later) and also
the event of loss of both laptop and USB disk.

Lastly, a philosophical thought: One of my colleagues considers hardware
as a necessary (and very faulty) layer which is only needed to make it possible to
express the ideas in software. This might seem extreme but come to think of it.
ZFS is special in this sense - being a software which provides that bridge, it's core idea to
isolate the hardware faults.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.