Monday Jun 30, 2008

prevent reservation conflict panic if using active/passive storage controller

Reservation conflicts can happen in a Sun Cluster environment if using active/passive storage controllers e.g. SE6540, SE6140, FLX380.

First of all you should always consider to disable auto-failback flag if using MPxIO on shared devices. This can also prevent reservation conflict panics.

Change the auto-failback value in /kernel/drv/scsi_vhci.conf to disable.
e.g of kernel/drv/scsi_vhci.conf
...
# Automatic failback configuration
# possible values are auto-failback="enable" or auto-failback="disable"
auto-failback="disable";
...


Furthermore the reservation conflict panic was seen when one cluster node is down and the shared storage array made some (at least 2 or 3) failovers between the active/passive controllers. The behavior always depends on the design of the storage array controller.

Two workarounds are available at the moment:

1.) In case of Sun Cluster 3.2 force the cluster to do scsi3 reservations even in 2 node cluster configurations. If you have a 3 node (or more nodes), the cluster should do scsi3 reservations anyway.

Be aware of Alert 1019005.1. In case of SE6540/SE6140/FLX380 use firmware 6.60.11.xx (which is part of CAM 6.1) or higher. To avoid trouble update this code before enabling SCSI3 reservations.

To force the Sun Cluster 3.2 to do scsi3 reservations run the command:
# cluster set -p global_fencing=prefer3

Verify the setting using :
# cluster show | grep -i scsi
   Type:                       scsi
   Access Mode:        scsi3


2.) Allow Reservation on Unowned LUNs in SE6540/SE6140. You should prefer the workaround #1 but in case of Sun Cluster 3.1 you can not force scsi3 reservation mechanism for 2 node clusters. So, there is a need to use scsi2 reservations.

The bit "Allow Reservation on Unowned LUNs" determines the controller response to Reservation/Release commands that are received for LUNs that are not owned by the controller. The value needs to be changed from 0x01 to 0x00. Beware this setting will be lost after a NVSRAM update!

Using CAM management software do the following:
# cd /opt/SUNWsefms/bin/

For 6540/FLX380/FLX240/FLX280 run:
# ./service -d -c set -q nvsram region=0xf2 offset=0x19 value=0x00 host=0x02

For 6140 and 6130 run:
# ./service -d -c set -q nvsram region=0xf2 offset=0x19 value=0x00 host=0x00

Reboot both controllers in order to make the change active :
# ./service -d -c reset -t a

Wait at least 5 minutes until the A controller is up again.
# ./service -d -c reset -t b


Why this not happing before? With the changes of patch 125081-14 (sparc) or 125082-14 (x86) Sun deliver new driver for MPxIO. Due to this changes the problem can be triggered.

About

I'm still mostly blogging around Solaris Cluster and support. Independently if for Sun Microsystems or Oracle. :-)

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
23
24
25
26
27
28
29
30
   
       
Today