The End-to-end argument meets ZFS
By sommerfeld on Nov 16, 2005
I'm really a networking&security type at heart. Why am I excited about ZFS?
Back when I was studying for a degree in computer science, I took what was then (and probably still is) the best undergraduate course in MIT's CS department: Computer Systems Engineering, better known as "6.033" or just "'033".
A major part of the course was a series of case studies -- we would read an important paper on a system, write a short analysis, and then discuss the system in class.
One of the key papers presented was Saltzer, Reed, and Clark's "End to End Arguments in System Design"
I'll quote the abstract:
This paper presents a design principle that helps guide placement of
functions among the modules of a distributed computer system. This
principle, called the end-to-end argument, suggests that functions
placed at low levels of a system may be redundant or of little value
when compared with the cost of providing them at that low level.
Examples discussed in the paper include bit error recovery, security
using encryption, duplicate message suppression, recovery from system
crashes, and delivery acknowledgement. Low level mechanisms to
support these functions are justified only as performance
The paper has spawned a lot of debate and more than a few followups
over the years, and interminable arguments about what counts as an end,
but overall I think it's held up pretty well.
Fast forward to a couple years ago when I first saw a high level overview of the ZFS design. I immediately thought of this paper.
ZFS applies the end-to-end principle to filesystem design.
End-to-end is normally applied to distributed systems, where two distinct "ends" are communicating with each other, often in real time or with relatively short delays.
Here, the "ends" are separated mainly by time: one "end" writes data to the filesystem, and the other "end" expects to get the exact same data back in the future. (And the "middle" is the storage subsystem, which these days is itself a complex distributed system).
By placing the functionality required for robustness at a relatively high layer within the storage stack, ZFS can perform these functions with reduced overall system cost; you can use a much simpler disk subsystem to get a desired level of performance, availability and reliability.
For instance, the filesystem knows for sure which disk blocks are in use. The disk doesn't. If you replace a disk in a mirror or Raid-Z group, ZFS only needs to copy the blocks that are currently in use to the new disk; when lower layers are responsible for redundancy, you have to copy the whole
disk. With the upper layer responsible for redundancy, the repair takes less time, and your window of exposure to an additional failure can be significantly shorter.
I'm hoping this leads to simpler (and cheaper) storage hardware in the long run -- JBODs seem to be ideal for ZFS, and you can take the battery-backed NVRAM out of the raid controllers and give it to the lumberjacks.
Technorati Tag: ZFS