Solaris@Home Part 3: The joy of ZFS (and the first PVR based on it!)

Welcome to the third edition in this little Solaris@Home series. As you know, my W1100z called "Condorito" is now eco-responsibly taking care of a few jobs down in my basement.

Today we want to check out ZFS and see what it can do for the average home-user. The nicest thing about ZFS is that it's so easy to use. In fact, it's easier to set up a ZFS filesystem than to do it the old UFS way. That alone is reason enough to use ZFS but it has much more to offer. I won't go into the details of data integrity, architecture or advanced features of ZFS, that has been covered elsewhere. Instead, let's focus on the basics that make life so enjoyable with ZFS. As a little bonus, I'd like to introduce you to the (probably) first ZFS-based PVR!

Like probably most home machines, mine has a patchwork of disks and partitions that have come from different sources and set up for different reasons. In fact, all of my 4 harddisks have different sizes and some come from different vendors. Here's how the first disk looks like:

Total disk cylinders available: 10008 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0       root    wm      67 -  1090        7.84GB    (1024/0/0)   16450560
  1       swap    wu       3 -    66      502.03MB    (64/0/0)      1028160
  2     backup    wm       0 - 10007       76.67GB    (10008/0/0) 160778520
  3 unassigned    wm    1091 -  9721       66.12GB    (8631/0/0)  138657015
  4 unassigned    wm    9722 - 10003        2.16GB    (282/0/0)     4530330
  5 unassigned    wm       0                0         (0/0/0)             0
  6 unassigned    wm       0                0         (0/0/0)             0
  7 unassigned    wm   10004 - 10007       31.38MB    (4/0/0)         64260
  8       boot    wu       0 -     0        7.84MB    (1/0/0)         16065
  9 alternates    wu       1 -     2       15.69MB    (2/0/0)         32130

Slice 0 is obviously for booting the OS, 1 is the swap slice, 2 spans the whole disk by definition. Slice 8 and 9 are reserved (probably for the boot mechanism and NewBoot, but I haven't checked yet). I spared some blocks on slice 7 so I can store Solaris Volume Manager replicas in the future. That leaves us with slice 3 for storing real data and slice 4 that just collects the remaining space. Why 8631 blocks for slice 3? Simply because that's the biggest slice I could create that would fit onto all 4 disks so I can mirror them.

ZFS gives us two views on disk space: The ZFS storage pools help us collect the disk space we have and assign to them a level of availability. The ZFS file system view presents that storage space to users in a structured way and allows us to assign user-level policies to that space.

When should one use multiple pools and when multiple ZFS filesystems? In our case, we're going to use pools to define levels of data availability and use one filesystem for each usage scenario. Let's have a look at my current pools for example:

# zpool status
  pool: chuquicamata
 state: ONLINE
 scrub: none requested

        NAME        STATE     READ WRITE CKSUM
        chuquicamata  ONLINE       0     0     0
          c0d0s4    ONLINE       0     0     0
          c0d1s4    ONLINE       0     0     0
          c1d0s4    ONLINE       0     0     0

  pool: pelotillehue
 state: ONLINE
 scrub: none requested

        NAME        STATE     READ WRITE CKSUM
        pelotillehue  ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c0d0s3  ONLINE       0     0     0
            c1d0s3  ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c0d1s3  ONLINE       0     0     0
            c1d1s3  ONLINE       0     0     0

As you see, I have two storage pools. "Pelotillehue" is a set of two mirrors, while "Chuquicamata" collects all the left overs on my disks into a simple, non-protected pool. This gives us two qualities of storage to choose from: Everything that needs to be protected goes into pelotillehue, everything that is less important goes to chuquicamata. "Pelotillehue" is the name of the town where the characters of the "Condorito" comic books live, so I thought that would be a nice name for the pool where my stuff should live. "Chuquicamata" is the largest open pit copper mine in the world, located in Chile. Since I'm "mining" all the remainders of my disk cylinders in this pool, that makes a nice name too.

Creating ZFS filesystems on top of these pools is now a matter of practical decisions. Obviously, home directories are important, so they go into pelotillehue, which is mirrored. I'm also fond of my collection of digital music, because it is 100% legal and I spent a lot of time making sure all the titles and other metadata are correct and even downloaded or scanned in all the cover images. So it deserves a place in pelotillehue as well.

On the other hand, I'm also going to store some video data. Video takes up a lot of space and is usually short lived, so it can go into chuquicamata. BTW: Check out for some great free movies. It has some vintage episodes of Bugs Bunny, Flash Gordon and even movie classics such as Night of the Living Dead. But let's turn back to our subject. Here's how my ZFS filesystems look like:

# zfs list
chuquicamata          70.1G   199G   9.5K  /chuquicamata
chuquicamata/m740av   44.7G  5.28G  44.7G  /chuquicamata/m740av
chuquicamata/movies   25.3G   199G  25.3G  /chuquicamata/movies
pelotillehue          40.8G  90.1G   9.5K  /pelotillehue
pelotillehue/home     32.4M  90.1G  32.4M  /pelotillehue/home
pelotillehue/music    40.8G  90.1G  40.8G  /pelotillehue/music

Now for some nice ZFS features: Every ZFS filesystem comes with a set of attributes that can be turned on or off or assigned a value to. For instance, the pelotillehue/home filesystem can benefit from compression, since typical user home data is nicely compressable:

# zfs set compression=on pelotillehue/home

On the other hand, there's no point in compressing digital music or video since most modern codecs already apply an entropy based compression algorithm at the end of encoding. Compression is set to off by default, so we don't have to change anything here.

Some nice security attributes are available as well: I can now allow the users of pelotillehue/home to execute binaries from these filesystems but I won't let them execute them with setuid privileges. And all media oriented filesystems shouldn't carry the exec nor setuid attributes since they're only supposed to store media data, not programs:

# zfs set setuid=off pelotillehue/home
# zfs set exec=off pelotillehue/music
# zfs set setuid=off pelotillehue/music
# zfs set exec=off chuquicamata/movies
# zfs set setuid=off chuquicamata/movies

You see: Using ZFS you can nicely apply user-driven policies to your filesystems with a very simple set of commands.

You're now probably wondering what the chuquicamata/m740av file system is for. That's our special for today: In Germany, we have officially introduced DVB-T in all major cities in 2005. My DVB-T receiver is a Siemens Gigaset M740 AV. This little box isn't just a DVB-T tuner, it also comes with excellent PVR functionality. Think of it as a german, free-to-air version of the famous TiVo. In fact, it comes with an ethernet connector and it understands the SMB protocol so it can record TV shows by using any SMB server on the network. You probably have guessed it by now: chuquicamata/m740av is shared through Samba and serves as a storage facility for recording and playing back TV shows through my M740 AV. So this is probably the first PVR to use ZFS for storing video :). Again, ZFS helps a lot here: Using the reservation and quota attributes I can make sure there's some minimum amount of space for it to at least record a few TV shows and provide time-shifting of live shows, while I can also restrict it's disk usage to a useful amount, such as 80GB. I don't want it to fill up my precious disk space too much:

# zfs set reservation=20G chuquicamata/m740av
# zfs set quota=80G chuquicamata/m740av

So, we've put ZFS to some good use today: We got organized on our multiple slices of multiple disks. We sorted them into a nice, reliable, mirrored storage pool and one to keep all the rest. Then we created some file systems on top of them with all the attributes they need to provide useful data services for us. And to top it off, we used Samba (I won't post yet another Samba tutorial here, the documentation is very useful already) to create the first ZFS powered PVR (let me know if anyone else is doing something similar...).

Tune in again for the next episode in this series, where we'll have a closer look at music and how to stream digital music from Solaris to iTunes or other digital music players in your home!

Edit:I changed the formatting of the command outputs to better reflect what's seen in a terminal window. This brings out the hierarchy of the pool devices in a more clear way. Hope this helps.

Technorati Tags: ,


So you've added individual slices to the mirror. I didn't know you could do that. Nice. And pelotillehue is made up of two separate mirrors? Suppose you added another single slice to pelotillehue, not part of a mirror. Then, you wouldn't know if data using pelotillehue was mirrored or not right?

Posted by Haik on January 03, 2006 at 01:48 PM CET #

Wow, fascinating stuff! I can't wait when I move to Germany (Munich) and setup ZFS as well! :)

Posted by vvs on January 03, 2006 at 04:32 PM CET #

What a tease, I was really hoping to see the word "MythTV" in this writeup, I was almost ready to jump and blow away my Linux.. but, no such luck yet.. hopefully someday soon there will be PVR-250 drivers and myth can just run.. I could use the Solaris box as a NFS and database server for myth.. but, no real performance problems right now that I need to fix, so it just waits. That and I need some asterisk testing on Solaris before I can afford to cutover.

Posted by Paul Greidanus on January 03, 2006 at 09:29 PM CET #

Hi all, thank you for your comments so far!

To Haik: <code>pelotillehue</code> is actually a stripe made of two mirrors. Actually, the <code>zpool status</code> output in the blog entry is a bit mis-formatted. the two lines below the mirror line should be indented to illustrate the hierarchy. So part 1 of the <code>pelotillehue</code> stripe is the mirror <code>c0d0s3/c1d0s3</code> (note that it's even mirrored across controllers), while part 2 is the mirror <code>c0d1s3/c1d1s3</code>. You can add whole mirrored vdevs to a pool with <code>zpool add</code>. Check out the zpool manpage.

To Paul: Yes MythTV is cool, but it relies on some card plugged into the server to do the capturing and encoding. Since my server is in the basement, that wasn't an option for me. The M740AV solution works so well because it's based on DVB-T, so it doesn't need to encode anything: It just grabs the MPEG-2 transport stream out of the air and records it. Much better quality and a nice fanless box next to my TV.

Posted by Constantin Gonzalez on January 04, 2006 at 03:40 AM CET #

Post a Comment:
Comments are closed for this entry.

Tune in and find out useful stuff about Sun Solaris, CPU and System Technology, Web 2.0 - and have a little fun, too!


« July 2016