Quorum Device - Why Configure One?

Introduction

The quorum component of Solaris Cluster (SC) is used to guarantee that a cluster does not suffer from partitions, namely split brain and amnesia. Both these types of partitions can lead to data corruption in a cluster,and the quorum component prevents this from happening in an SC config. Split brain is partition in space, where subclusters are up, but not able to talk to each other. Amnesia is partition in time, where a given cluster incarnation is not aware of the previous incarnation.

Quorum uses a voting mechanism in SC to prevent partitions. Each node is assigned a vote, and a cluster needs to have the majority of votes in order to stay up. For a greater-than-two node cluster, it is straightforward to deduce why, with such a voting scheme, one can not encounter either split brain or amnesia in the cluster. See Solaris Cluster Concepts Guide for a detailed discussion of this concept.

For a two node cluster, the voting mechanism requires an external tie-breaking mechanism, which is provided by a quorum device (QD). This is not required for greater-than-two node clusters. However, configuring a QD results in greater availability for the cluster in the event of failures of multiple nodes in a cluster. In fact, if you configure a fully connected (ie. connected to all nodes) QD in an N node cluster, the cluster can survive the failure of (N-1) nodes in the cluster. The lone survivor node will stay up and running, and assuming that the capacity planning for it had ensured that it be able to handle all the load in the system, this node will be able to service all client requests.

Examples

This is best illustrated through simple examples. Note that SC guarantees protection from single points of failure, so in all cases, the cluster will stay up if one of the nodes dies.

Example 1: 3 Node Cluster, No Quorum Device

Consider a 3 node cluster, with nodes A, B and C. Each has 1 vote.
Total votecount = 3
Majority votecount = 2
Ie, 2 of the nodes of the cluster need to be up for the cluster to stay up. So, the cluster is able to survive single node failures only, and can not survive the failure of greater than one node.

Example 2: 3 Node Cluster, 1 Quorum Device

Consider a 3 node cluster, with nodes A, B and C, and a quorum device QD, connected to all the three nodes. Each node has a votecount of 1. The QD is assigned a votecount of (#connected votes - 1) = 2.
Total votecount = 3 + 2 = 5
Majority votecount = 3
Here, if two of the nodes die, the cluster has the survivor node's vote. But it also has the QD vote, making the cluster votecount to 3. This is sufficient to keep the cluster up!

Example 3: 3 Node Cluster, 2 Quorum Devices with Restricted Connections

Consider a 3 node cluster, with nodes A, B and C, with a QD configured between nodes A and B, and another QD configured between nodes A and C. Each QD get assigned 1 (# connected votes - 1) vote.
Total votecount = 3 + 1 + 1 = 5
Majority votecount = 3.
Here, if nodes B and C were to die, node A can continue as a cluster since it can count its own vote, as well as those of the two QDs connected to it, totaling to 3 votes, the minimum required. However,if A dies along with B (or C), the sole survivor C (or B) can count its vote, and that of the QD connected to it, totaling 2 votes, which is lower than the minimum required (3). Hence, the cluster goes down.

From these examples, the cluster in Example 2 has the best availability characteristic (it also happens to be the maximum possible), The cluster in Example 1 has the worst of the three, and the cluster in Example 3 has better availability than the cluster in Example 1 because of the QDs configured in it.

Recommendation

Since a QD can also be used to share data, it is good practice to configure QDs in a cluster to improve the availability of the system.

Word of Caution

When configuring QDs in the system, be careful to not overconfigure QDs, as that will make the cluster vulnerable to QD failures, and will thus lower its availability characteristics. Again, this is best illustrated through a simple example.

Example 4: 3 Node Cluster, 2 Quorum Devices

Consider a 3 node cluster, with nodes A, B and C, and two fully connected QDs. Each QD has a vote of 2 (#connected votes - 1).
Total votecount = 3 + 2 + 2 = 7
Majority votecount = 4
Here, if the two QDs failed and there was a cluster reconfiguration, the cluster would go down even if all the nodes were ok. This is because the nodes by themselves can contribute only 3 votes, which does not constitute majority.

Rule of Thumb

Therefore, for configuring QDs in the cluster, remember: Total number of QD votes < Total number of node votes

A Nifty Tool

Richard Elling at Sun has written a nice spreadsheet that lets you specify nodes and QD connections in your cluster, and get the resulting cluster failure modes of the system in the event of node failures. That tool can be found here.


Ira Pramanick
Richard Elling

Comments:

As you write, "a QD can also be used to share data", but is it preferable to use a data device for quorum or is it better to use a separate device if you've got the extra lun?
In a dual site setup (let's assume 2 node cluster, storage at both sites and using software mirroring), would that change the recommendation?

Posted by Mads on October 17, 2007 at 06:41 PM PDT #

- It does not really matter whether data is stored on the QD or not. The main point here is that a separate lun
is not reqd for a QD.
- Recommendations wrt putting data on the QD in the case of campus clusters are the same as for standard
clusters, whether you use hardware or software mirroring.
- However, when you use \*data replication\* in a campus cluster between the sites, say using Hitachi Trucopy,
then you can not have the QD on a replicated lun (which you would use for your data). So, the QD is on a
separate unit.
- We recommend that in a 2 room campus cluster case, the QD be placed in a 3rd room to prevent the
cluster going down (due to potential loss of quorum) when an entire room is impacted (because the surviving
room without the QD will not have sufficient votes to stay up by itself.

Hope this helps,
Ira

Posted by Ira Pramanick on October 19, 2007 at 12:21 AM PDT #

Hi Ira Pramanick,

We got a 2 node cluster with a QD. However, whenever the 2 nodes are rebooted, it is noticed that the node that is boot up earlier will wait for the other node to start up the cluster service's.

Is that the correct behavior? This is because I think that once the node that has acquired the QD & it should boot up properly to cluster mode and no need to wait for the other node. Is that correct?

Thanks in advance.

Regards
Paul

Posted by Paul Liong on January 24, 2008 at 05:12 PM PST #

Hi Paul,

I need to understand exactly what you mean by the node waiting for the other node to start up the cluster services. Do you mean that the first node to come up waits for the cluster to come up before it can acquire quorum, or that it waits for some data service to start on the second node? I am assuming that you are referring to the first case, from what you wrote in the 2nd para.

In a 2 node cluster with 1 QD, as long as the node to boot up first (call it node1 and the other node, node2) can acquire the QD, it has enough votes to bootup into cluster mode. So under \*normal\* circumstances, you are correct that node1 should be able to form a cluster by itself since it would have acquired the QD and can count its votes.

There are a couple of scenarios when node1 will not be able to acquire the QD and will need to wait for node2 to come up.

The first is amnesia. If the last incarnation of the cluster had node2 as its sole member in the last membership (ie node1 was rebooted and node2 was up and had established itself as the only cluster member), then only node2's key will be on the QD and node1 will not be able to acquire it. This is as designed, to prevent amnesia (the condition where node2 might have changed some cluster config and node1 would not know about it). I am not sure you are hitting this case, because you have not indicated that your node1 is the one that was brought down first. Is it? In any case, do you see any msgs on the console, or can you look into /var/adm/messages for any msgs from the cmm/quorum module?

The second scenario is that the QD might have gone bad, and neither node (or atleast not node1) is able to acquire the QD. In this case, both nodes need to be up for the votes to reach the required number and for the cluster to form. Again, in this case, you should have seen error msgs on the console (or pls check /var/adm/messages).

If you can send me excerpts of the console msgs or /var/adm/messages, I can try to isolate the problem for you.

Hope this helps,
Ira

Posted by Ira Pramanick on January 25, 2008 at 09:19 AM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

mkb

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today