Tuesday Jun 12, 2007

ZFS on a laptop?

Sun is known for servers, not laptops. So a filesystem designed by Sun would surely be too powerful and too "heavy" for laptops, that the features of a "datacenter" filesystem wouldn't fit on a laptop. Right? Actually... not. As it turns out, ZFS is a great match for laptops.

Backup

One of the most important things a user needs to do on a laptop is to back his data up. Copying your data to DVD or an external drive is one way. ZFS snapshots with 'zfs send' and 'zfs recv' is a better way. Due to its architecture, snaphots in ZFS are very fast and only take up as much space as much data has changed. For a typical user, taking a snapshot every day, for example, will only take up a small amount of capacity.

So let's start off with a ZFS pool called 'swim' and two filesystems: 'Music' and 'Pictures':

fsh-mullet# zfs list
NAME            USED  AVAIL  REFER  MOUNTPOINT
swim            157K  9.60G    21K  /swim
swim/Music       18K  9.60G    18K  /swim/Music
swim/Pictures    19K  9.60G    19K  /swim/Pictures
fsh-mullet# ls /swim/Pictures 
bday.jpg        good_times.jpg

Taking a snapshot 'today' of Pictures is this easy:

fsh-mullet# zfs snapshot swim/Pictures@today

And now we can see the contents of snapshot 'today' via the '.zfs/snapshot' directory:

fsh-mullet# ls /swim/Pictures/.zfs/snapshot/today 
bday.jpg        good_times.jpg
fsh-mullet# 

If you want to take a snapshot of all your filesystems, then you can do:

fsh-mullet# zfs snapshot -r swim@today      
fsh-mullet# zfs list
NAME                  USED  AVAIL  REFER  MOUNTPOINT
swim                  100M  9.50G    21K  /swim
swim@today               0      -    21K  -
swim/Music            100M  9.50G   100M  /swim/Music
swim/Music@today         0      -   100M  -
swim/Pictures          19K  9.50G    19K  /swim/Pictures
swim/Pictures@today      0      -    19K  -
fsh-mullet# 

Now that you have snapshots, you can use the built-in features of 'zfs send' and 'zfs recv' to backup your data - even to another machine.

fsh-mullet# zfs send swim/Pictures@today | ssh host2 zfs recv -d backupswim

After you've sent over the first snapshot via 'zfs send', you can then do incremental 'zfs send's:

fsh-mullet# zfs send -i swim/Pictures@today | ssh host2 zfs recv -d backupswim

Now let's look at the backup ZFS pool 'backupswim' on host 'host2':

host2# zfs list
NAME                        USED  AVAIL  REFER  MOUNTPOINT
backupswim                  100M  9.50G    21K  /backupswim
backupswim/Music            100M  9.50G   100M  /backupswim/Music
backupswim/Music@today         0      -   100M  -
backupswim/Pictures          18K  9.50G    18K  /backupswim/Pictures
backupswim/Pictures@today      0      -    18K  -

What's really nice about using ZFS's snapshots is that you only need to send over (and store) the differences between snapshots. So if you're doing video editing on your laptop, and have a giant 10GB file, but only change, say, 1KB of data on this day, with ZFS you only have to send over 1KB of data - not the entire 10GB of the file. This also means you don't have to store multiple 10GB versions (one per snapshot) of the file on your backup device.

You can also backup with an external hard drive. Create a backup pool on the second hard drive, and just 'zfs send/recv' your nightly snapshots.

Reliability

Since laptops (typically) only have 1 disk, handling disk errors is very important. Bill introduced ditto blocks to handle partial disk failures. With typical filesystems, if part of the disk is corrupted/failing and that part of the disk stores your metadata, you're screwed. There's no way to access the data associated with the inaccessible metadata without backing up. With ditto blocks, ZFS stores multiple copies of the metadata in the pool. In the single disk case, we strategically store multiple copies of the metadata on different locations on disk (such as at the front and back of the disk). A subtle partial disk failure can make other filesystems useless, whereas ZFS can survive.

Matt took ditto blocks one step further and allowed the user to apply it to any filesystem's data. What this means is that you can make your more important data more reliable by stashing away multiple copies of your precious data (without muddying your namespace). Here's how you store two copies of your pictures:

fsh-mullet# zfs set copies=2 swim/Pictures
fsh-mullet# zfs get copies swim/Pictures
NAME           PROPERTY  VALUE          SOURCE
swim/Pictures  copies    2              local
fsh-mullet# 

Note, the number of copies property only affects future writes (not existing data). So i recommend you set this at filesystem creation time:

fsh-mullet# zfs create -o copies=2 swim/Music
fsh-mullet# zfs get copies swim/Music
NAME        PROPERTY  VALUE       SOURCE
swim/Music  copies    2           local
fsh-mullet# 

Built-in Compression

With ZFS, compression comes built-in. The current algorithms are lzjb (based on Lempel-Ziv) and gzip. Now its true that your jpegs and mp4s are already compressed quite nicely, but if you want to save capacity on other filesystems, all you have to do is:

fsh-mullet# zfs set compression=on swim/Documents
fsh-mullet# zfs get compression swim/Documents
NAME            PROPERTY     VALUE           SOURCE
swim/Documents  compression  on              local
fsh-mullet# 

The default compression algorithm is lzjb. If you want to use gzip, then do:

fsh-mullet# zfs set compression=gzip swim/Documents
fsh-mullet# zfs get compression swim/Documents     
NAME            PROPERTY     VALUE           SOURCE
swim/Documents  compression  gzip            local
fsh-mullet# 

That single disk stickiness

A major problem with laptops today is the single point of failure: the single disk. It makes complete sense today that laptops are designed this way given the physical space and power issues. But looking foward, as, say, flash gets cheaper and cheaper as well as more reliable, it becomes more and more of a possibility to replace the single disk in laptops. So now that you save physical space, you can actually fit more than one flash device in the laptop. Wouldn't it be really cool if you could then build RAID ontop of the multiple devices? Introducing some hardware RAID controller doesn't make any sense - but software RAID does.

ZFS allows you to do mirroring as well as RAID-Z (ZFS's unique form of RAID-5) - in software.

Creating a mirrored pool is easy:

diskmonster# zpool create swim mirror c7t0d0 c7t1d0
diskmonster# zpool status
  pool: swim
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        swim        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c7t0d0  ONLINE       0     0     0
            c7t1d0  ONLINE       0     0     0

errors: No known data errors
diskmonster# 

Similarly, creating a RAID-Z is also easy:

diskmonster# zpool create swim raidz c7t0d0 c7t1d0 c7t2d0 c7t5d0
diskmonster# zpool status
  pool: swim
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        swim        ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c7t0d0  ONLINE       0     0     0
            c7t1d0  ONLINE       0     0     0
            c7t2d0  ONLINE       0     0     0
            c7t5d0  ONLINE       0     0     0

errors: No known data errors
diskmonster# 

With either of these configurations, your laptop can now handle a whole device failure.

ZFS on a laptop - a perfect fit.

About

erickustarz

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today