Comparing ZFS to the 4.4BSD LFS

Remember the 4.4BSD Log Structured Filesystem? I do. I've been thinking about how the two compare recently. Let's take a look.

First, let's recap how LFS works. Open your trusty copies of "The Design and Implementation of the 4.4BSD Operating Systems" to chapter 8, section 3 -- or go look at the Wikepedia entry for LFS, or read any of the LFS papers referenced therein. Go ahead, take your time, this blog entry will wait for you.

As you can see, LFS is organized on disk as a sequence of large segments, each segments consisting of smaller chunks, the whole forming a log file of sorts. Each small chunk consists of data and meta-data blocks that have been modified/added at the time that the chunk was written. LFS, like ZFS, never overwrites data, excepting superblocks, so LFS does, by definition, copy-on-write, just like ZFS. In order to be able to find inodes whose block locations have changed LFS maintains a mapping of inode numbers to block addresses in a regular file called the "ifile."

That should be enough of a recap. As for ZFS, I assume the reader has been reading the same ZFS blog entries I have been reading. (By the way, I'm not a ZFS developer. I only looked at ZFS source code for the first time two days ago.)

So, let's compare the two filesystems, starting with generic aspects of transactional filesystems:

  • LFS writes data and meta-data blocks everytime it needs to fsync()/sync()
  • Whereas ZFS need only write data blocks and entries in its intent log (ZIL)

This is very important. The ZIL is a compression technique that allows ZFS to safely defer many writes that LFS could not. Most LFS meta-data writes are very redundant, after all: writing to one block of a file implies writing new indirect blocks, a new inode, a new data block for the ifile, new indirect blocks for the ifile, and a new ifile inode -- but all of these writes are easily summarized as "wrote block # to replace the block of the file whose inode # is ."

Of course, ZFS can't ward off meta-data block writes forever, but it can safely defer them with its ZIL, and in the process stands a good chance of being able to coalesce related ZIL entries.

  • LFS needs copying garbage collection, a process involving both, searching for garbage and relocating data surrounding garbage
  • Whereas where LFS sees garbage ZFS sees snapshots and clones

The Wikipedia LFS entry says this about LFS' lack of snapshots and clones: "LFS does not allow snapshotting or versioning, even though both features are trivial to implement in general on log-structured file systems." I'm not sure I'd say "trivial," but, certainly easier than in more traditional filesystems.

  • LFS has an ifile to track inode number to block address mappings. It has to, else how to COW inodes?
  • ZFS has a meta-dnode; all objects in ZFS are modelled as a "dnode" and so dnodes belie inodes, or rather, "znodes," and znode numbers are dnode numbers. Dnode-number-to-block-address mappings are kept in a ZFS filesystem's object set's meta-dnode much as in LFS inode-to-block address mappings are kept in the LFS ifile.

It's worth noting that ZFS uses dnodes for many purposes besides implementing regular files and directories; some such uses do not require many of the meta-data items associated with regular files and directories, such as ctime/mtime/atime, and so on. Thus "everything is a dnode" is more space efficient than "everything is a file" (in LFS the "ifile" is a regular file).

  • LFS does COW (copy-on-write)
  • ZFS does COW too


  • LFS stops there
  • Whereas ZFS uses its COW nature to improve on RAID-5 by avoiding the need to read-modify-write, so RAID-z goes faster than RAID-5. Of course, ZFS also integrates volume management.

Besides features associated with transactional filesystems, and besides volume management, ZFS provides many other features, such as:

  • data and meta-data checksums to protect against bitrot
  • extended attributes
  • NFSv4-style ACLs
  • ease of use
  • and so on

Post a Comment:
Comments are closed for this entry.

I'm an engineer at Oracle (erstwhile Sun), where I've been since 2002, working on Sun_SSH, Solaris Kerberos, Active Directory interoperability, Lustre, and misc. other things.


« February 2016