New ZFS Features

I've been meaning to get around to blogging about these features that I putback a while ago, but have been caught up in a few too many things. In any case, the following new ZFS features were putback to build 48 of Nevada, and should be availble in the next Solaris Express

Create Time Properties

An old RFE has been to provide a way to specify properties at create time. For users, this simplifies admnistration by reducing the number of commands which need to be run. It also allows some race conditions to be eliminated. For example, if you want to create a new dataset with a mountpoint of 'none', you first have to create it and the underlying inherited mountpoint, only to remove it later by invoking 'zfs set mountpoint=none'.

From an implementation perspective, this allows us to unify our implementation of the 'volsize' and 'volblocksize' properties, and pave the way for future create-time only properties. Instead of having a separate ioctl() to create a volume and passing in the two size parameters, we simply pass them down as create-time options.

The end result is pretty straightforward:

        # zfs create -o compression=on tank/home
        # zfs create -o mountpoint=/export -o atime=off tank/export

'canmount' property

The 'canmount' property allows you create a ZFS dataset that serves solely as a mechanism for inheriting properties. When we first created the hierarchical dataset model, we had the notion of 'containers' - filesystems with no associated data. Only these datasets could contain other datasets, and you had to make the decision at create-time.

This turned out to be a bad idea for a number of reasons. It complicated the CLI, forced the user to make a create-time decision that could not be changed, and led to confusion when files were accidentally created on the underlying filesystem. So we made every filesystem able to have child filesystems, and all seemed well.

However, there is power in having a dataset that exists in the hierarchy but has no associated filesystem data (or effectively none by preventing from being mounted). One can do this today by setting the 'mountpoint' property to 'none'. However, this property is inherited by child datasets, and the administrator cannot leverage the power of inherited mountpoints. In particular, some users have expressed desire to have two sets of directories, belonging to different ZFS parents (or even to UFS filesystems), share the same inherited directory. With the new 'canmount' property, this becomes trivial:

        # zfs create -o mountpoint=/export -o canmount=off tank/accounting
        # zfs create -o mountpoint=/export -o canmount=off tank/engineering
        # zfs create tank/accounting/bob
        # zfs create tank/engineering/anne

Now, both anne and bob have directories at '/export/', except that they are inheriting ZFS properties from different datasets in the hierarchy. The adminsitrator may decide to turn compression on for one group of people or another, or set a quota to limit the amount of space consumed by the group. Or simply have a way to view the total amount of space consumed by each group without resorting to scripted du(1).

User Defined Properties

The last major RFE in this wad added the ability to set arbitrary properties on ZFS datasets. This provides a way for administrators to annotate their own filesystems, as well as ISVs to layer intelligent software without having to modify the ZFS code to introduce a new property.

A user-defined property name is one which contains a colon (:). This provides a unique namespace which is guaranteed to not overlap with native ZFS properties. The emphasis is to use the colon to separate a module and property name, where 'module' should be a reverse DNS name. For example, a theoretical Sun backup product might do:

        # zfs set com.sun.sunbackup:frequency=1hr tank/home

The property value is an arbitrary string, and no additional validation is done on it. These values are always inherited. A local adminstrator might do:

        # zfs set localhost:backedup=9/19/06 tank/home
        # zfs list -o name,localhost:backedup
        NAME            LOCALHOST:BACKEDUP
        tank            -
        tank/home       9/19/06
        tank/ws         9/10/06

The hope is that this will serve as a basis for some innovative products and home grown solutions which interact with ZFS datasets in a well-defined manner.

Comments:

This ZFS things sounds pretty interesting. Unfortunately, I am a total n00b with ZFS. It seems the best, useful, detailed information about it comes from user blogs. I think I need zfs for a project I am going to be working on. I will be setting up a new Solaris 10 X4500 server with some large number of SATA II disks. I can't figure out if ZFS helps me meet one criteria. This will be a new home dir server for our masses. I need to be able to to do a simple disk -> disk backup of the everything. Two raidz pools? Or do I? is there enough fault tolerance for multiple disk failures? I see some potential issues with faulted spared and faulted pool members. I also need access to be controlled by our MS Active Directory. Is ZFS compatible or simple transparent? Hmmm..

Posted by Joe Aiello on September 22, 2006 at 09:19 AM PDT #

You can also use RAIDZ2 which is double parity protection so 2 disks can fail in a RAID group. However RAIDZ2 is not available in S10 yet only in Solaris Express. You can also create many smaller RAIDZ groups in one bigger ZFS pool or use RAID-10. When it comes to backup disk->disk - then see 'zfs send' and 'zfs recv' in manual for zfs. For more documentation see http://opensolaris.org/os/community/zfs/

Posted by Robert Milkowski on September 22, 2006 at 05:53 PM PDT #

Who gives a hoot about ZFS new features if it can't boot on a Mac Book Pro ??

Posted by Wayne on September 26, 2006 at 10:41 PM PDT #

One thing that I cannot see is whether ZFS has a capability similar to Microsoft's VSS, where snapshots are application aware when applications are notified when a snap is desired and tell the infrastructre that they are able to be snapped in a consistent state. Database and application aware snapshots are more critical that being crash consistent, which is what most snapshots support.

Posted by Pat Lee on September 27, 2006 at 11:25 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

Musings about Fishworks, Operating Systems, and the software that runs on them.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today