Saturday Dec 19, 2009

Quorum server patches and Sun Cluster 3.2

This is an early notify because there are some troubles around with the following Sun Cluster 3.2 quorum server patches:
127404-03 Sun Cluster 3.2: Quorum Server Patch for Solaris 9
127405-04 Sun Cluster 3.2: Quorum Server patch for Solaris 10
127406-04 Sun Cluster 3.2: Quorum Server patch for Solaris 10_x86
All these patches are part of Sun Cluster 3.2 11/09 Update3 release, but also available on My Oracle Support.

These patches delivers new features which requires attention in case of upgrade or patching. The installation of the mentioned patches on a Sun Cluster 3.2 quorum server can lead to a panic of all Sun Cluster 3.2 nodes which use this quorum server. The panic of Sun Cluster 3.2 nodes are as follows:
...
Dec 4 16:43:57 node1 \^Mpanic[cpu18]/thread=300041f0700:
Dec 4 16:43:57 node1 unix: [ID 265925 kern.notice] CMM: Cluster lost operational quorum; aborting.
...

Update 11.Jan.2010:
More details available Alert 1021769.1: Sun Cluster 3.2 Quorum Server Patches Cause All Cluster Nodes to Panic with "Cluster lost operational quorum"

General workaround before patching:
If the Sun Cluster only use the quorum server as a quorum device, temporarily add a second quorum device.
For example:
1) Configure a second temporary quorum device on each cluster node which uses the quorum server (use a disk if available).
# clq add d13 (or use clsetup)
2) Un-configure the quorum server from each cluster node that uses the quorum server.
# clq remove QuorumServer1 (or use clsetup)
3) Verify that the quorum server no longer serves any cluster.
on the Sun Cluster nodes # clq status
on the Sun Cluster quorum server # clqs show +
4) Install the Sun Cluster 3.2 Quorum Server patch
# patchadd 12740x-0x
5) Reboot and start the quorum server if not already started
# init 6
6) From a cluster node, configure the patched quorum server again as a quorum device.
# clq add -t quorum_server -p qshost=129.152.200.5 -p port=9000 QuorumServer1
7) Un-configure the temporary quorum device
# clq remove d13


Workaround if quorum server is already patched:
If patch is already installed (and rebooted) on the Sun Cluster 3.2 quorum server but quorum server is not working correctly then the following messages on the Sun Cluster nodes are visible:
...
Dec 3 17:24:24 node1 cl_runtime: [ID 868277 kern.warning] WARNING: CMM: Erstwhile online quorum device QuorumServer1 (qid 1) is inaccessible now.
Dec 3 17:29:20 node1 cl_runtime: [ID 237999 kern.notice] NOTICE: CMM: Erstwhile inaccessible quorum device QuorumServer1 (qid 1) is online now.
Dec 3 17:29:24 node1 cl_runtime: [ID 868277 kern.warning] WARNING: CMM: Erstwhile online quorum device QuorumServer1 (qid 1) is inaccessible now.
Dec 3 17:32:58 node1 cl_runtime: [ID 237999 kern.notice] NOTICE: CMM: Erstwhile inaccessible quorum device QuorumServer1 (qid 1) is online now.
...
DO NOT TRY to remove the quorum server on the Sun Cluster 3.2 nodes because this can end up in a panic loop.

Do:
1) Clear the configuration on the quorum server
# clqs clear <clustername> <quorumname>
or # clqs clear +

2.) Re-add the quorum server to the Sun Cluster nodes
# cluster set -p installmode=enabled
# clq remove QuorumServer1
# clq add -t quorum_server -p qshost=129.152.200.5 -p port=9000 QuorumServer1
# cluster set -p installmode=disabled


This is reported in Bug 6907934

As stated in my last blog the following note of the Special Install Instructions of Sun Cluster 3.2 core patch -38 and higher is very important.
NOTE 17: Quorum server patch 127406-04 (or greater) needs to be installed on quorum server host first, before installing 126107-37 (or greater) Core Patch on cluster nodes.
This means if using a Sun Cluster 3.2 quorum server then it's necessary to upgrade the quorum server before upgrade the Sun Cluster 3.2 nodes to Sun Cluster 3.2 11/09 update3 which use the quorum server. AND furthermore the same apply in case of patching. If installing the Sun Cluster core patch -38 or higher (-38 is part of Sun Cluster 3.2 11/09 update3)
126106-38 Sun Cluster 3.2: CORE patch for Solaris 10
126107-38 Sun Cluster 3.2: CORE patch for Solaris 10_x86
126105-38 Sun Cluster 3.2: CORE patch for Solaris 9
then the same rule apply. First update the quorum server and then the Sun Cluster nodes. Please refer to details above how to do it...
For upgrade also refer to the document: How to upgrade quorum server software

Keep in mind: Fresh installations with Sun Cluster 3.2 11/09 update3 on the Sun Cluster nodes and on the quorum server are NOT affected!

About

I'm still mostly blogging around Solaris Cluster and support. Independently if for Sun Microsystems or Oracle. :-)

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
23
24
25
26
27
28
29
30
   
       
Today