By User13277689-Oracle on Apr 03, 2008
The flashback is still alive even weeks after: the day before my presentation at FIRST Technical
Colloquium in Prague I brought my 2 years old laptop with the work-in-progress slides to the office.
Since I wanted to finish the slides in the evening a live-upgrade
process was fired off on the laptop to get fresh Nevada
version. (of course, to show off during the presentation ;))
LU is very I/O intensive process and the red Ferrari notebooks tend to get _very_ hot. In the afternoon I noticed that the process failed. To my astonishment, the I/O operations started to fail. After couple of reboots (and zpool status / fmadm faulty commands) it was obvious that the disk cannot be trusted anymore. I was able to rescue some data from the ZFS pool which was spanning the biggest slice of the internal disk but not all data. (ZFS is not willing to get corrupted data out.) My slides were lost as well as other data.
After some time I stumbled upon James Gosling's
blog entry about ZFS mirroring on laptop.
This get me started (or more precisely I was astonished and wondered how is it possible
that this idea escaped me because at that time ZFS had been in Nevada for a long time)
and I have discovered several similar and more
blog entries about the topic.
After some experiments with borrowed USB disk it was time to make it reality on a new laptop.
The process was a multi-step one:
First I had to extend the free slice #7 on the internal disk so it spans the remaining space
on the disk because it was trimmed after the experiments. In the end the slices look like this
in format(1) output:
Part Tag Flag Cylinders Size Blocks 0 root wm 3 - 1277 9.77GB (1275/0/0) 20482875 1 unassigned wm 1278 - 2552 9.77GB (1275/0/0) 20482875 2 backup wm 0 - 19442 148.94GB (19443/0/0) 312351795 3 swap wu 2553 - 3124 4.38GB (572/0/0) 9189180 4 unassigned wu 0 0 (0/0/0) 0 5 unassigned wu 0 0 (0/0/0) 0 6 unassigned wu 0 0 (0/0/0) 0 7 home wm 3125 - 19442 125.00GB (16318/0/0) 262148670 8 boot wu 0 - 0 7.84MB (1/0/0) 16065 9 alternates wu 1 - 2 15.69MB (2/0/0) 32130
Then the USB drive was connected to the system and recognized via format(1):
AVAILABLE DISK SELECTIONS: 0. c0d0
/pci@0,0/pci-ide@12/ide@0/cmdk@0,0 1. c5t0d0 /pci@0,0/pci1025,10a@13,2/storage@4/disk@0,0
- Live upgrade boot environment which was not active was deleted via ludelete(1M) and the slice was commented out in /etc/vfstab. This was needed to make zpool(1M) happy.
ZFS pool was created out of the slice on the internal disk (c0d0s7) and external
USB disk (c5t0d0). I had to force it cause zpool(1M) complained about the overlap
of c0d0s2 (slice spanning the whole disk) and c0d0s7:
# zpool create -f data mirror c0d0s7 c5t0d0For a while I have struggled with finding a name for the pool (everybody seems either to stick to the 'tank' name or come up with some double-cool-stylish name which I wanted to avoid because of the likely degradation of the excitement from that name) but then chosen the ordinary data (it's what it is, after all).
- I have verified that it is possible to disconnect the USB disk
and safely connect it while an I/O operation is in progress:
root:moose:/data# mkfile 10g /data/test &  10933 root:moose:/data# zpool status pool: data state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 mirror ONLINE 0 0 0 c0d0s7 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 errors: No known data errorsIt survived it without a hitch (okay, I had to wait for the zpool command to complete a little bit longer due to the still ongoing I/O but that was it) and resynced the contents automatically after the USB disk was reconnected:
root:moose:/data# zpool status pool: data state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the faulted device, or use 'zpool clear' to mark the device repaired. scrub: resilver in progress for 0h0m, 3.22% done, 0h5m to go config: NAME STATE READ WRITE CKSUM data DEGRADED 0 0 0 mirror DEGRADED 0 0 0 c0d0s7 ONLINE 0 0 0 c5t0d0 FAULTED 0 0 0 too many errors errors: No known data errorsAlso, with heavy I/O it is needed to mark the zpool as clear after the resilver completes via zpool clear data because the USB drive is marked as faulty. Normally this will not happen (unless the drive really failed) because I will be connecting and disconnecting the drive only when powering on or shutting down the laptop, respectively.
- After that I have used
about ZFS delegated administration (it was Mark who did
and ZFS Delegated Administration
chapter from OpenSolaris
ZFS Administration Guide and created permissions set for my local user and assigned
those permissions to the ZFS pool 'data' and the user:
# chmod A+user:vk:add_subdirectory:fd:allow /data # zfs allow -s @major_perms clone,create,destroy,mount,snapshot data # zfs allow -s @major_perms send,receive,share,rename,rollback,promote data # zfs allow -s @major_props copies,compression,quota,reservation data # zfs allow -s @major_props snapdir,sharenfs data # zfs allow vk @major_perms,@major_props dataAll of the commands had to be done under root.
- Now the user is able to create a home directory for himself:
$ zfs create data/vk
- Time to setup the environment of data sets and prepare it for data.
I have separated the data sets according to a 'service level'. Some data are very
important (e.g. presentations ;)) to me so I want them multiplied via the
mechanism so they are actually present 4 times in case of copies dataset
property set to 2. Also, documents are not usually accompanied by executable
code so the exec property was set to off which will prevent
running scripts or programs from that dataset.
Some data are volatile and in high quantity so they do not need any additional protection and it is good idea to compress them with better compression algorithm to save some space. The following table summarizes the setup:
dataset properties comment +-------------------+--------------------------------+------------------------+ | data/vk/Documents | copies=2 | presentations | | data/vk/Saved | compression=on exec=off | stuff from web | | data/vk/DVDs | compression=gzip-8 exec=off | Nevada ISOs for LU | | data/vk/CRs | compression=on copies=2 | precious source code ! | +-------------------+--------------------------------+------------------------+So the commands will be:
$ zfs create -o copies=2 data/vk/Documents $ zfs create -o compression=gzip-3 -o exec=off data/vk/Saved ...
- Now it is possible to migrate all the data, change home directory of the user to /data/vk (e.g. via /usr/ucb/vipw) and relogin.
However, this is not the end of it but just beginning. There are many things to make the setup even better, to name a few:
- setup some sort of automatic snapshots for selected datasets
The set of scripts and SMF service for doing ZFS snapshots and backup (see ZFS Automatic For The People and related blog entries) made by Tim Foster could be used for this task.
- make zpool scrub run periodically
- detect failures of the disks
This would be ideal to see in Gnome panel or Gnome system monitor.
- setup off-site backup via
SunSSH + zfs send
This could be done using the hooks provided by Tim's scripts (see above).
- Set quotas and reservations for some of the datasets.
- Install ZFS scripts for Gnome nautilus so I will be able to browse, perform and destroy snapshots in nautilus. Now which set of scripts to use ? Chris Gerhard's or Tim Foster's ? Or should I just wait for the official ZFS support for nautilus to be integrated ?
- Find how exactly will the recovery scenario (in case of laptop drive failure) will look like.
To import the ZFS pool from the USB disk should suffice but during my experiments I was not able to complete it successfully.
With all the above the data should be safe from disk failure (after all disks are often called "spinning rust" so they are going to fail sooner or later) and also the event of loss of both laptop and USB disk.
Lastly, a philosophical thought: One of my colleagues considers hardware as a necessary (and very faulty) layer which is only needed to make it possible to express the ideas in software. This might seem extreme but come to think of it. ZFS is special in this sense - being a software which provides that bridge, it's core idea to isolate the hardware faults.