Logzillas: to mirror or stripe?

The Hybrid Storage Pool integrates flash into the storage hierarchy in two specific ways: as a massive read cache and as fast log devices. For read cache devices, Readzillas, there's no need for redundant configurations; it's a clean cache so the data necessarily also resides on disk. For log devices, Logzillas, redundancy is essential, but how that translates to their configuration can be complicated. How to decide whether to stripe or mirror?

ZFS intent log devices

Logzillas are used as ZFS intent log devices (slogs in ZFS jargon). For certain synchronous write operations, data is written to the Logzilla so the operation can be acknowledged to the client quickly before the data is later streamed out to disk. Rather than the milliseconds of latency for disks, Logzillas respond in about 100μs. If there's a power failure or system crash before the data can be written to disk, the log will be replayed when the system comes back up, the only scenario in which Logzillas are read. Under normal operation they are effectively write-only. Unlike Readzillas, Logzillas are integral to data integrity and they are relied upon for data integrity in the case of a system failure.

A common misconception is that a non-redundant Logzilla configuration introduces a single point of failure into the system, however this is not the case since the data contained on the log devices is also held in system memory. Though that memory is indeed volatile, data loss could only occur if both the Logzilla failed and the system failed within a fairly small time window.

Logzilla configuration

While a Logzilla doesn't represent a single point of failure, redundant configurations are still desirable in many situations. The Sun Storage 7000 series implements the Hybrid Storage Pool, and offers several different redundant disk configurations. Some of those configurations add a single level of redundancy: mirroring and single-parity RAID. Others provide additional redundancy: triple-mirroring, double-parity RAID and triple-parity RAID. For disk configurations that provide double disk redundancy of better, the best practice is to mirror Logzillas to achieve a similar level of reliability. For singly redundant disk configurations, non-redundant Logzillas might suffice, but there are conditions such as a critically damaged JBOD that could affect both Logzilla and controller more or less simultaneously. Mirrored Logzillas add additional protection against such scenarios.

Note that the Logzilla configuration screen (pictured) includes a column for No Single Point of Failure (NSPF). Logzillas are never truly a single point of failure as previous discussed; instead, this column refers to the arrangement of Logzillas in JBODs. A value of true indicates that the configuration is resilient against JBOD failure.

The most important factors to consider when deciding between mirrored or striped Logzillas are the consequences of potential data loss. In a failure of Logzillas and controller, data will not be corrupted, but the last 5-30 seconds worth of transactions could be lost. For example, while it typically makes sense to mirror Logzillas for triple-parity RAID configurations, it may be that the data stored is less important and the implications for data loss not worthy of the cost of another Logzilla device. Conversely, while a mirrored or single-parity RAID disk configuration provides only a single level of redundancy, the implications of data loss might be such that the redundancy of volatile system memory is insufficient. Just as it's important to choose the appropriate disk configuration for the right balance of performance, capacity, and reliability, it's at least as important to take care and gather data to make an informed decision about Logzilla configurations.

Comments:

Hello Adam!
If i did understand correctly what you are saying, you are making a parallel between the probability of two disk failures (same mirror), and two failures in sequence: slog and controller (7410 head). It is that right? Well, if that is right, who wants to have a triple mirror configuration, should mirror the slog devices.
So, my question is: the 7410 storage has a sla like 99.995% ? In that calculation you are accounting the probability about the first scenario (the two disk failures)? Because seems to me that two disks failures are more easy to happen than a failure of a controller after the slog. The last one i think is so small, that is better to have the performance rather than the redundancy.

Posted by Marcelo Leal on December 09, 2009 at 06:53 AM PST #

Thank you for answering in detail, a question we get asked
quite frequently. As you said, logzillas are integral to data
integrity so defining a ZFS Intent Log (ZIL) best practice is
critical.

For the high-end segment, we take this one step further and
recommend that the ZIL not be hard disk based. This avoids not
only the well known reliability issues of spinning disk but the
complexity of the RAID based solutions required to compensate.

Although we do believe NAND (flash) based SSD's offer very
tangible benefits over disk, the ultimate matching of technology
to purpose leads to a mirrored NVRAM based solution. By
nature of the self-contained PCIe card design, each offers a
completely independent controller in addition to storage.

Christopher George
Founder/CTO
DDRdrive LLC
www.ddrdrive.com

Posted by Christopher George on December 13, 2009 at 02:33 AM PST #

@Marcelo A weakness of this post was that it was too qualitative, but the quantitative data was not immediately available to me and I wanted to make some information available. Once the data has been collected and analyzed, I'll post another update that examines those issues in more depth.

@Chris Agreed! The requirements for a Logzilla -- small capacity, low latency -- match very well with NV-DRAM. Indeed, the SSD that we use in the Sun Storage 7000 series combines a fair bit of NV-DRAM with flash in order to achieve much lower latency than other SSDs. PCIe doesn't work for the 7310 and 7410 because for clustering we require Logzillas to be dual-attached so that both heads can see them. While it doesn't have the performance of PCIe, this is why we've chosen SAS as the interconnect for Logzillas today.

Posted by Adam Leventhal on December 13, 2009 at 02:04 PM PST #

Excellent point. A PCIe based SSD (DDRdrive X1) is not a solution for
dual-attached 7310/7410 clustering. With the inherent benefits of a
native PCIe implementation and recent standout products such as the
Sun F20 PCIe card, I wonder if such a limitation could be addressed
in the future?

Christopher George
Founder/CTO
DDRdrive LLC
www.ddrdrive.com

Posted by Christopher George on December 13, 2009 at 05:50 PM PST #

Post a Comment:
Comments are closed for this entry.
About

Adam Leventhal, Fishworks engineer

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today