What is RAID-Z?

The mission of ZFS was to simplify storage and to construct an enterprise level of quality from volume components by building smarter software — indeed that notion is at the heart of the 7000 series. An important piece of that puzzle was eliminating the expensive RAID card used in traditional storage and replacing it with high performance, software RAID. To that end, Jeff invented RAID-Z; it's key innovation over other software RAID techniques was to close the "RAID-5 write hole" by using variable width stripes. RAID-Z, however, is definitely not RAID-5 despite that being the most common comparison.

RAID levels

Last year I wrote about the need for triple-parity RAID, and in that article I summarized the various RAID levels as enumerated by Gibson, Katz, and Patterson, along with Peter Chen, Edward Lee, and myself:

  • RAID-0 Data is striped across devices for maximal write performance. It is an outlier among the other RAID levels as it provides no actual data protection.
  • RAID-1 Disks are organized into mirrored pairs and data is duplicated on both halves of the mirror. This is typically the highest-performing RAID level, but at the expense of lower usable capacity.
  • RAID-2 Data is protected by memory-style ECC (error correcting codes). The number of parity disks required is proportional to the log of the number of data disks.
  • RAID-3 Protection is provided against the failure of any disk in a group of N+1 by carving up blocks and spreading them across the disks — bitwise parity. Parity resides on a single disk.
  • RAID-4 A group of N+1 disks is maintained such that the loss of any one disk would not result in data loss. A single disks is designated as the dedicated parity disk. Not all disks participate in reads (the dedicated parity disk is not read except in the case of a failure). Typically parity is computed simply as the bitwise XOR of the other blocks in the row.
  • RAID-5 N+1 redundancy as with RAID-4, but with distributed parity so that all disks participate equally in reads.
  • RAID-6 This is like RAID-5, but employs two parity blocks, P and Q, for each logical row of N+2 disk blocks.
  • RAID-7 Generalized M+N RAID with M data disks protected by N parity disks (without specifications regarding layout, parity distribution, etc).

RAID-Z: RAID-5 or RAID-3?

Initially, ZFS supported just one parity disk (raidz1), and later added two (raidz2) and then three (raidz3) parity disks. But raidz1 is not RAID-5, and raidz2 is not RAID-6. RAID-Z avoids the RAID-5 write hole by distributing logical blocks among disks whereas RAID-5 aggregates unrelated blocks into fixed-width stripes protected by a parity block. This actually means that RAID-Z is far more similar to RAID-3 where blocks are carved up and distributed among the disks; whereas RAID-5 puts a single block on a single disk, RAID-Z and RAID-3 must access all disks to read a single block thus reducing the effective IOPS.

RAID-Z takes a significant step forward by enabling software RAID, but at the cost of backtracking on the evolutionary hierarchy of RAID. Now with advances like flash pools and the Hybrid Storage Pool, the IOPS from a single disk may be of less importance. But a RAID variant that shuns specialized hardware like RAID-Z and yet is economical with disk IOPS like RAID-5 would be a significant advancement for ZFS.

Comments:

Hi Adam,

thx for this; I do believe a typo made it into the list, the definition of RAID-5 contains a reference to itself.

cheers

Posted by Michael Schuster on July 21, 2010 at 05:20 PM PDT #

@Michael: Thanks; that was supposed to read "RAID-4", and I've corrected it.

Posted by Adam Leventhal on July 21, 2010 at 06:00 PM PDT #

The definition of specialized hardware has changed since RAID-Z was conceived. It seems to me that DRAM-based SSDs are the mass-market equivalent of custom battery-backed NVRAM. So why invent a new RAID variant? Just use standard RAID-5 (or n-parity RAID) plus an SSD intent log.

Posted by Tom Shaw on July 21, 2010 at 06:02 PM PDT #

So basically RAIDZ has, so far, less performance than traditional HW Raid 5/6 implementations, unless if we combine RAIDZ with a hybrid pool?

Posted by Bruno on July 22, 2010 at 12:28 AM PDT #

@Bruno RAID-Z is less capable specifically in the area of random read IOPS than RAID-5 or RAID-6.

@Tom Shaw It's true that SSDs have become prevalent, but they aren't yet at the same level of performance as traditional NV-DRAM cards. You're right that ZFS could just use RAID-5, but that would lose other benefits of RAID-Z such as resilver time proportional to the amount of data storee. A new RAID variant would have the IOPS of RAID-5, and an even greater dynamism in the stripe layout than RAID-Z; we're thinking carefully about this problem and considering having a fast intent log device as a required part of the system as you suggest.

Posted by Adam Leventhal on July 22, 2010 at 01:59 AM PDT #

Thanks Adam. I'm actually very excited that you folks are looking into this area. I've been very hesitant to recommend using RAID-Z because it puts the disks into lockstep and destroys IOPS. I've either recommended mirrored devices or no vdev-redundancy (relying on an existing array to provide RAID-5 "chunks").

I don't see the resilvering behaviour of RAID-Z as an advantage. It's often slower than a RAID-5 sequential rebuild, and it opens up a new class of leaky abstraction bugs (e.g. the interaction between resilvering and snapshots, which in the 7000-series leaked further into replication).

Finally, from the point of view someone ex-Sun, independent for the past 2 years: in the end, I don't care if your solution requires some special sauce in the form of specialized hardware or license-locked software. Do what it takes to make money and keep customers confident in the future roadmap. That's what will drive adoption.

Posted by Tom Shaw on July 22, 2010 at 11:01 AM PDT #

You have a lot of nice features, but where are the basics?
Where is an UPS-Support? (no power - go down, power recover - start up)
Where can i specify one or more Netbios-Alias (typical you need it for server consolidation)
Why there is no ISO-Image for reinstall my nice 7110 if system mirror fails?

Posted by Benedikt Esser on August 02, 2010 at 12:59 AM PDT #

@Benedikt For questions of support for the 7000 series please contact your service representative. Responding to support requests on unrelated blog posts is not a sustainable model.

Posted by Adam Leventhal on August 02, 2010 at 03:30 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

Adam Leventhal, Fishworks engineer

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today