X

News, tips, partners, and perspectives for the Oracle Solaris operating system

Understanding RAID-6 With Junior High Math

Hisao Tsujimura
Principal Software Engineer

In my previous post on RAID-5, I talked about how we can recover the data using simple additions and subtractions.
We will discuss RAID-6 in this post.  Unfortunately, you need a bit more than additions and subtractions to understand how we recover data in RAID-6.  Toward the end of this post, I will touch briefly upon the "finite field," but I will refrain from being too mathematical.

Index:

  1. Recap
  2. Why Do We Need RAID-6
  3. Is Single Parity Useless?
  4. Assign Numbers For Each Drive
  5. Recovery Scenarios Using RAID-5 Technique
  6. Recovery from a Failed P and a Data Drive
  7. Recovery from 2 Failed Data Drives
  8. No, We Don't Want Floating Points
  9. Galois Field (Finite Field) Without Knowing You Are Using
  10. One With "Triple Parity"
  11. References 

Recap

I mentioned in my last post that we have no way to tell if someone swapped the drives in RAID-5 array (mathematically).
When the equation (a) is true, the equation (b) will also be true.  Therefore, we are unable to find out the swapped content in the drive (x and y) from this equation.

1 + x + y = 6 ... (a)
1 + y + x = 6 ... (b) 

To find the values of two unknown variables, we need at least two equations that describe the relationship between the variables.  Yes, we need to use a "system of equation."  Now the question is, how does another equation look?

If (a) is true, can another equation be something like below?

x + y = 5 ... (c)

If you think about it, it won't help us identify the "location, " so we need some twist.

Why Do We Need RAID-6?

In the previous post, in RAID-5, we learned that we calculated the sum of data by using exclusive OR (XOR) and use it as a parity.  When a drive breaks, we can use the sum of the data from surviving drives and subtract from the parity to calculate missing data.  To finish recovery, we need to read all the data and parity we need, recalculate, and write to the new drive.  Because of this, the reconstruction of data takes time.

Any drives, regardless of HDDs or SSDs, have their lifetime.  It will eventually break.  We lose data if we lose another drive while recovering the data in RAID-5.  Therefore, we need protection that is tolerant of a "two-drive failure, " such as RAID-6 or RAID-6 equivalent mechanism.

 

Single Parity is Useless?

Having two drive failures and trying to recover data is like solving the equation below.

1 + x + y = 6 … (1) (Equation  (7) from the previous post)

This equation has two problems.

  • Swapping x and y still gives us a total of 6.
  • We can't solve x and y by using only this formula.

Having only parity P won't help us in this case.  We need another formula that allows us to solve x and y, while we can also identify the location of the data.
 

Assign Numbers For Each Drive

A byte of data in each drive is a number.  So what if we simplify, for now, assume that we only can store an integer between 0 to 9 in each device.  Also, suppose that we have 5 data drives and number them d0, d1, d2, d3, and d4.  We also have P and Q drives for parities.  Let's say, for the sake of explanation, P and Q can have any positive integer.  Because it is easier to calculate and explain later, I will place the drives in the following order: d4, d3, d2, d1, d0, P, and Q.  If you XOR from d4 through d0, P is equal to 1.


Figure 1: Normal Data with Calculated P

Now, if you keep looking at the five numbers from d4 through d0, do they start looking like a single number to you?  If so, then, how about we assign a "digit" for each drive?  If we calculate Q by adding each digit by using XOR, we end up calculating R.  So let's put weight for each digit, such as 1 for d0, 10 for d1, and so on, up to 10000 for d4.

The above example will produce the following Q.

5 x 10000 + 4 x 1000 + 3 x 100 + 2 x 10 + 1 = 54321 ... (2)

Now, let's replace 5, 4, 3, 2 and 1 with  d4 through d0.  This will make (2) into below.  Since 54321 is the value of Q:

Q = d4 x 10000 + d3 x 1000 + d2 x 100 + d1 x 10 + d0 ... (3)

If you have noticed, 10, 100, 1000, and 10000 are products of 10, and 10's power.  We don't write explicitly in our daily lives, but d0 is actually d0 x 1, and 1 is 100.  So if we write using 10's nth power, (3) will be:

Q = 104d4 + 103d3 + 102d2 + 101d1 + 1 x d0
    = 104d4 + 103d3 + 102d2 + 101d1 + 100d... (4)

We used "10" since we know the decimal number well.  But it does not have to be 10, but any generator g.  (4) can be rewritten in the general equation such as below.

Q = g4d4 + g3d3 + g2d2+ g1d1 + g0d0 ...(5)

It now looks a bit mathematical.  But remember that we are only doing addition and multiplication.  Let's use some "math" signs, so it looks even more "mathematical." :-)

    Q is the Sum of All Weighed Data in Drives

It looks hard on the surface, but it isn't. 
I changed the number of drives from "fixed" to m drives.  Since we numbered the drives from 0 (zero), the index would be between 0 and m-1 for the data drives.   Each data drive has data dk, and we multiply by gk.  Q is the sum of dk multiplied by gk.

If you prefer following the steps to calculate, it is below:

  1. Suppose we have m data drives.
  2. Assign a number from 0 for each drive.  So we end up assigning 0 to m-1 for each drive.
  3. Start with k=0 and Q=0.
  4. For each kth drive, calculate gk multiplied by the content of data dk in the kth drive.
  5. Add the answer to the previous Q.
  6. Add 1 to the value of k.
  7. Go to step 4, and repeat until k=m-1.

It may be tedious, but you can follow it.

You came all the way through adding, multiplying, and transforming equations from one form to another.    You calculated Q. (*1)(*2) 

Congratulations, Q is Reed-Solomon code! 

CDs, DVDs, and BlueRay discs use Reed-Solomon code for their error correction.

 

Well, let me ask you -- have you seen the equations that look like (4) or (5)?  That's right -- it is based on simple polynomials.

(*1) A little more precisely, they manipulate the number to be in a specific range for error correction.  They use a finite field or Galois field to accomplish this.   However, the basic idea is the same.
(*2) I walked through very detailed steps because I am not good at math. ;-)

Recovery Scenarios Using RAID-5 Technique

Let's deal with the simple scenarios first.

In RAID-6, we use different methods for different failure scenarios.  When I say "easy" in this post, it means that we need calculation as much as RAID-5 recovery.  In the following explanation, when I say "data drive," it means drives with actual data, not parities.

1 Data Drive Is Broken

We do the same thing as RAID-5 recovery.  We use information from surviving drives and use XOR to calculate missing drive data.

1 Data Drive and Q Is Broken

When a data drive and Q parity drive break, we recover the data drive first using RAID-5 recovery and then recalculate Q.

 

Recovery from a Failed P and a Data Drive

One of the RAID-6 unique recovery scenarios is when both the parity drive and data drive fail.  This means that you can't use P to recover the data.  In this case, you need to calculate a new Q or Qx from surviving data drives and subtract Qx from Q to calculate the data in the drive x (Dx).  Dx, however, will be weighed; therefore, we need to divide by the weight after we know Dx.

Data in drive x can be calculated from the Q and Q from the surviving drives

Data in drive x weighed, so it needs to be divided by g's x-th power.

In our example below, Q is 54321.  (Note that we can store any positive decimal number in P and Q for the sake of explanation.)

 

5 data drives with P and Q

Figure 2: 5 Data Drives with Calculated P and Q

Suppose if d3 is broken and we don't know the P.  Qx and weighed d3 are as follows.

Qx = 50000 + 0 + 300 + 20 + 1 = 50321
Q - Qx = 54321 (Q) - 50321 (Qx) = 4000 (d3 with weight)

Because we have weight of 10for d3, we need to divide the answer by 1000.  This will recover 4.  Then we can recalculate P to double-check.  It will be 1 after recovery of d3.

Recovery from 2 Failed Data Drives

Suppose there are two failed data drives.   Let's say, in order to represent the data in each failed drive, we call them Dx and Dy.  We can calculate P parity from the surviving drives.  Let's call this Pxy.  We can also calculate Q parity from the surviving drives.  Let's call this Qxy.

Now we have the following simultaneous equations.

Simultaneous Equation for 2 Data Drive Failure

By transforming the equation for P, we can calculate the sum of Dx and Dy.  (Equation (10))  This is just a sum, so with this equation, we would know neither values.  So, let's transform the equation for Q as well.  (Equation 11)

Transformed Simultaneous Equation for 2 Data Drive Failure

The equation 11 is telling us the difference between Q and Qxy is equal to the sum of the weighted data from Dx and Dy.

Let's recall Figure 2, and go through this with actual numbers.

5 data drives with P and Q

Figure 2: 5 Data Drives with Calculated P and Q

Suppose both d2 and d3 are broken.
Each Qxy and Q - Qxy are as follows.

Q = 54321
Qxy = 50000 + 0 + 0 + 20 + 1 = 50021
Q - Qxy = 430

Now, in order for us to come up with Dy, we can divide the number by weight for Dand get rid of the number below the floating point.

Dy = 4300 ÷ 1000 = 4.3 … (12) 

The number above the floating point is 4, so we have Dy recovered.
(By the way, Pxy is 6, and P xor Pxy is 7.  So, Dx xor Dy has to be 7.)


We can calculate Dx by subtracting Dy from the answer of Q-Qy.
Let's transform equation 11 so that it is easier to calculate.

Calculating gxds from q, qxy and gydy

The answer would be 300.  d2 had a weight of 100, so 300 divided by 100 makes 3.  Now we have the value in d2.
All looks good.

No, We Don't Want Floating Points

All looks good, doesn't it?
Unfortunately not.
There are two problems.

The first is the condition we set to explain the whole thing.   Each drive could have a decimal number between 0 and 9.  In the computer world, the information is handled by the units of bits and bytes.  A bit is a digit in the binary system and a byte is 8 bits.  This means that a byte is worth a number of between 0 and 255 in decimal.  It does not necessarily fit our favorite boundary of 10.  We need a different system without breaking the equations we discussed so far.

Another issue is "floating-point."  We don't have a floating-point in the binary system.  We simulate it in the code so that we can calculate numbers with floating points.  However, in the end, but we need to round the number to the boundary of 2's power.  So there can be divergence when we calculate the number that can't be divided cleanly in the binary system.  Getting the different numbers than ones we stored means wrong data unless we are aware.  This is critical for the storage systems because we take it granted that storage stores data correctly and replays correctly without any "rounded-up" or "rounded-down" numbers.

At the beginning of the discussion, we decided to use only integers but when we need to divide, we may get a non-integer.

Galois Field (Finite Field) Without Knowing You Are Using

To sum up, here are some conditions we need for calculating RAID-6 data.
We would like to be able to:

  • Use integers between 0 to 255 in decimal.
  • Add.
  • Subtract.
  • Multiply.
  • Divide.
  • Write polynomials without contradictions.
  • (Hopefully) use only integers.

In mathematics, there is something called "Number Theory."   It is the field of research about the set of integers and their characteristic. 

Within Number Theory, there is a concept called "field."  A "field" is a set of numbers that you can add, subtract, multiply and divide freely, other than divide by zero.  For instance, a set of real numbers is a field.   A set of integers and fractions forms a field.  When there is an infinite number of elements in the field, it is called an "infinite field."  You are using different infinite fields every day without knowing the name of them.

When the number of elements is limited, we call it a "finite" field.  We also call it Galois Field because it was found by Évariste Galois.  If we have a finite field that we can use integers between 0 and 255 in decimal, it would be convenient.

We use the binary system in computers.  Lucky for us, the smallest finite field is a binary digit, that they denote as F2.
(Furthermore, we can treat addition and subtraction in the same manner.  Also, multiplication and division can easily be done by bit-shifting.)

Thanks to the effort of countless researchers, we know that the below polynomial does not fail in the binary system.  (I am presenting the equation (5) again)

  Q = g4d4 + g3d3 + g2d2+ g1d1 + g0d0 ...(5)

In the real world, it looks that we use 28 for defining the field probably because the papers like A Tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID-like Systems used it.  We use a finite field in Reed-Solomon Coding, so if you have ever used CDs, DVDs or BlueRay, you used Galois Field without knowing it.

One With "Triple Parity"

We recently hear triple parity.  (RAID-Z3 is triple parity equivalent.)  In case we have triple parity, we can recover from a "triple data drive" failure.  The idea is the same we discussed so far, but we have P, Q, and R parity to protect data in data drives.  Calculating R is not so difficult.  (I am presenting the equation (5) again)

  Q = g4d4 + g3d3 + g2d2+ g1d1 + g0d0 ...(5)

When g=4, it is the same equation calculates R.  I will not go through the steps to recovery from triple faults, but we know the equation is closed in the binary system.

Lastly, but not least, there are other means to calculate P/Q parities.  If you are interested, you should Google them with keywords like "Even-Odd" or "diagonal parity."

Thank you for reading my rather lengthy blog post.  If this post helps you in any way, it is my pleasure.  I appreciate corrections and comments.  The mistakes here are all mine.

<For Those Interested>
P, Q, and R represent the set whose generators are 1, 2 and 4 respectively.  When the generator is 1, it is equivalent to XOR, and 2 and 4 give us polynomials with linearly independent coefficients.  According to the llumos source code, there is no "g" other than 1, 2 and 4 with this property, therefore, a paper by James S. Plank, A Tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID-like Systems, will fail.  Mr. Plank corrected his paper in 2003 with his co-author. 

If we can take only 1, 2, and 4 as a value of g, the same method can't be applied to calculate the 4th parity and beyond.  This also means that using this method will prevent us to make the protection mechanism when more than or equal to 4 drives fail at the same time.

References

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.