Double-parity RAID, or RAID-6, is the de facto industry standard for
storage; when I started talking about triple-parity RAID for ZFS earlier
this year, the need wasn't always immediately obvious. Double-parity RAID, of
course, provides protection from up to two failures (data corruption or the whole
drive) within a RAID stripe. The necessity of triple-parity RAID
arises from the observation that while hard drive capacity has roughly followed
Kryder's law, doubling annually, hard drive throughput has improved far more
modestly. Accordingly, the time to populate a replacement drive in a RAID
stripe is increasing rapidly. Today, a 1TB SAS drive takes about 4 hours to fill at its
theoretical peak throughput; in a real-world environment that number can easily double,
and 2TB and 3TB drives expected this year and next won't move data much faster.
Those long periods spent in a degraded state increase the
exposure to the bit errors and other drive failures that would in turn
lead to data loss.
The industry moved to double-parity RAID because one parity disk was insufficient; longer resilver times mean that we're spending more and more time back at single-parity.
From that it was obvious that double-parity will soon become
insufficient. (I'm working on an article that examines these phenomena
quantitatively so stay tuned... update Dec 21, 2009: you can find the article here)
Last week I
triple-parity RAID into ZFS. You can take a look at the implementation and
the details of the algorithm
but rather than describing the specifics, I wanted to describe its
genesis. For double-parity RAID-Z, we drew on the
Peter Anvin which was also the basis of RAID-6 in Linux. This work was more or
less a tutorial for systems programers, simplifying some of the more subtle
underlying mathematics with an eye towards optimization. While a systems
programmer by trade, I have a background in mathematics so was interested to
understand the foundational work. James S. Plank's paper
Tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID-like
Systems describes a technique for generalized N+M RAID.
Not only was it simple to implement, but it could easily be made to perform
well. I struggled for far too long trying to make the code work before
discovering trivial flaws with the math itself. A bit more digging revealed
that the author himself had published
Correction to the 1997 Tutorial on Reed-Solomon Coding
8 years later addressing those same flaws.
Predictably, the mathematically accurate version was far harder to optimize,
stifling my enthusiasm for the generalized case. My more serious concern was
that the double-parity RAID-Z code suffered some similar systemic flaw. This fear
was quickly assuaged as I verified that the RAID-6 algorithm was sound.
Further, from this investigation I was able to find a related method for doing
triple-parity RAID-Z that was nearly as simple as its double-parity cousin.
The math is a bit dense; but the key observation was that given that 3 is the
smallest factor of 255 (the largest value representable by an unsigned byte) it
was possible to find exactly of 3 different seed or generator values
after which there were collections of failures that formed uncorrectable
singularities. Using that technique I was able to implement a triple-parity
RAID-Z scheme that performed nearly as well as the double-parity version.
As far as generic N-way RAID-Z goes, it's still something I'd like to
add to ZFS. Triple-parity will suffice for quite a while, but we may want
more parity sooner for a variety of reasons. Plank's revised algorithm is an
excellent start. The test will be if it can be made to perform well enough or
if some new clever algorithm will need to be devised.
Now, as for what to call these additional RAID levels, I'm not sure. RAID-7 or RAID-8 seem a bit ridiculous and RAID-TP and RAID-QP aren't any better. Fortunately, in ZFS triple-parity RAID is just raidz3.
A little over three years ago, I integrated
into ZFS, a feature expected of enterprise class storage. This was in the
early days of Fishworks when much of our focus was on addressing functional
gaps. The move to triple-parity RAID-Z comes in the wake of a number of our
unique advancements to the state of the art such as
Hybrid Storage Pool as the
Storage 7000 series
products meet and exceed the standards set by the industry. Triple-parity
RAID-Z will, of course, be a feature included in the next major software
update for the 7000 series (2009.Q3).