Wednesday Nov 12, 2008

Fishworks Storage Configuration

Since our initial product was going to be a NAS appliance, we knew early on that storage configuration would be a critical part of the initial Fishworks experience. Thanks to the power of ZFS storage pools, we have the ability to present a radically simplified interface, where the storage "just works" and the administrator doesn't need to worry about choosing RAID stripe widths or statically provisioning volumes. The first decision was to create a single storage pool (or really one per head in a cluster)1, which means that the administrator only needs to make this decision once, and doesn't have to worry about it every time they create a filesystem or LUN.

Within a storage pool, we didn't want the user to be in charge of making decisions about RAID stripe widths, hot spares, or allocation of devices. This was primarily to avoid this complexity, but also represents the fact that we (as designers of the system) know more about its characteristics than you. RAID stripe width affects performance in ways that are not immediately obvious. Allowing for JBOD failure requires careful selection of stripe widths. Allocation of devices can take into account environmental factors (balancing HBAs, fan groups, backplance distribution) that are unknown to the user. To make this easy for the user, we pick several different profiles that define parameters that are then applied to the current configuration to figure out how the ZFS pool should be laid out.

Before selecting a profile, we ask the user to verify the storage that they want to configure. On a standalone system, this is just a check to make sure nothing is broken. If there is a broken or missing disk, we don't let you proceed without explicit confirmation. The reason we do this is that once the storage pool is configured, there is no way to add those disks to the pool without changing the RAS and performance characteristics you specified during configuration. On a 7410 with multiple JBODs, this verification step is slightly more complicated, as we allow adding of whole or half JBODs. This step is where you can choose to allocate half or all of the JBOD to a pool, allowing you to split storage in a cluster or reserve unused storage for future clustering options.

Fundamentally, the choice of redundancy is a business decision. There is a set of tradeoffs that express your tolerance of risk and relative cost. As Jarod told us very early on in the project: "fast, cheap, or reliable - pick two." We took this to heart, and our profiles are displayed in a table with qualitative ratings on performance, capacity, and availability. To further help make a decision, we provide a human-readable description of the layout, as well as a pie chart showing the way raw storage will be used (data, parity, spares, or reserved). The last profile parameter is called "NSPF," for "no single point of failure." If you are on a 7410 with multiple JBODs, some profiles can be applied across JBODs such that the loss of any one JBOD cannot cause data loss2. This often forces arbitrary stripe widths (with 6 JBODs your only choice is 10+2) and can result in less capacity, but with superior RAS characteristics.

This configuration takes just two quick steps, and for the common case (where all the hardware is working and the user wants double parity RAID), it just requires clicking on the "DONE" button twice. We also support adding additional storage (on the 7410), as well as unconfiguring and importing storage. I'll leave a complete description of the storage configuration screen for a future entry.


[1] A common question we get is "why allow only one storage pool?" The actual implementation clearly allows it (as in the failed over active-active cluster), so it's purely an issue of complexity. There is never a reason to create multiple pools that share the same redundancy profile - this provides no additional value at the cost of significant complexity. We do acknowledge that mirroring and RAID-Z provide different performance characteristics, but we hope that with the ability to turn on and off readzilla and (eventually) logzilla usage on a per-share basis, this will be less of an issue. In the future, you may see support for multiple pools, but only in a limited fashion (i.e. enforcing different redundancy profiles).

[2] It's worth noting that all supported configurations of the 7410 have multiple paths to all JBODs across multiple HBAs. So even without NSPF, we have the ability to survive HBA, cable, and JBOD controller failure.

About

Musings about Fishworks, Operating Systems, and the software that runs on them.

Search

Categories
Archives
« November 2008 »
SunMonTueWedThuFriSat
      
1
2
3
4
5
6
7
8
10
13
14
15
16
17
18
19
21
22
23
24
25
26
27
28
29
30
      
Today