Tuesday Aug 19, 2008

SAS/SATA HBA 375-3487 for Sun Storage J4200/J4400

The Sun Storage Arrays J4x00 was released some days ago.

At the time the SAS/SATA HBA for J4200 and J4400 is necessary:
Option: SG-XPCIE8SAS-E-Z
Part number: 375-3487
Codename: Pandora
- Details in System Handbook
- Sun StorageTek PCI Express SAS 8-Channel Internal HBA Installation Guide

IMPORTANT: Revision -02 of Pandora HBA is a requirement.
This means 375-3487-02 for J4200/J4400 and the 375-3487-01 can only be used with ST2530.
See also Sun Storage J4200/J4400 Array Release Notes

How to identify the Revision?
1.) Hardware: Look to the part number label on the HBA itself.
2.) Solaris: Do
# prtpicl -v > /tmp/prtpicl.out
# vi /tmp/prtpicl.out
Search for
subsystem-id with 0x3150
verify
device-id is 0x58 and
vendor-id is 0x1000
revision-id of 0x2 == 375-3487-01
revision-id of 0x8 == 375-3487-02



Sample output of 375-3487-01 in X4600 with Solaris 10 5/08 x86:
...
    pci1000,3150 (obp-device, a40000098c)
     :DeviceID 0
     :UnitAddress 4,4
     :pcie-capid-reg 0x1
     :pcie-capid-pointer 0x68
     :pci-msi-capid-pointer 0x98
     :pci-msix-capid-pointer 0xb0
     :device-id 0x58
     :vendor-id 0x1000
     :revision-id 0x2
     :class-code 0x10000
     :unit-address 0
     :subsystem-id 0x3150
     :subsystem-vendor-id 0x1000
     :interrupts 0x1
     :devsel-speed 0
     :power-consumption 01 00 00 00 01 00 00 00
      :model SCSI bus controller
      :compatible (a4000009adTBL)
      | pciex1000,58.1000.3150.2 |
      | pciex1000,58.1000.3150 |
      | pciex1000,58.2 |
      | pciex1000,58 |
      | pciexclass,010000 |
      | pciexclass,0100 |
      | pci1000,58.1000.3150.2 |
      | pci1000,58.1000.3150 |
      | pci1000,3150 |
      | pci1000,58.2 |
      | pci1000,58 |
      | pciclass,010000 |
      | pciclass,0100 |
...


Patch 125081-16 (sparc) or 125082-16 (x86) are required. They are embedded in Solaris 10 5/08 Update5.

Monday Jun 30, 2008

prevent reservation conflict panic if using active/passive storage controller

Reservation conflicts can happen in a Sun Cluster environment if using active/passive storage controllers e.g. SE6540, SE6140, FLX380.

First of all you should always consider to disable auto-failback flag if using MPxIO on shared devices. This can also prevent reservation conflict panics.

Change the auto-failback value in /kernel/drv/scsi_vhci.conf to disable.
e.g of kernel/drv/scsi_vhci.conf
...
# Automatic failback configuration
# possible values are auto-failback="enable" or auto-failback="disable"
auto-failback="disable";
...


Furthermore the reservation conflict panic was seen when one cluster node is down and the shared storage array made some (at least 2 or 3) failovers between the active/passive controllers. The behavior always depends on the design of the storage array controller.

Two workarounds are available at the moment:

1.) In case of Sun Cluster 3.2 force the cluster to do scsi3 reservations even in 2 node cluster configurations. If you have a 3 node (or more nodes), the cluster should do scsi3 reservations anyway.

Be aware of Alert 1019005.1. In case of SE6540/SE6140/FLX380 use firmware 6.60.11.xx (which is part of CAM 6.1) or higher. To avoid trouble update this code before enabling SCSI3 reservations.

To force the Sun Cluster 3.2 to do scsi3 reservations run the command:
# cluster set -p global_fencing=prefer3

Verify the setting using :
# cluster show | grep -i scsi
   Type:                       scsi
   Access Mode:        scsi3


2.) Allow Reservation on Unowned LUNs in SE6540/SE6140. You should prefer the workaround #1 but in case of Sun Cluster 3.1 you can not force scsi3 reservation mechanism for 2 node clusters. So, there is a need to use scsi2 reservations.

The bit "Allow Reservation on Unowned LUNs" determines the controller response to Reservation/Release commands that are received for LUNs that are not owned by the controller. The value needs to be changed from 0x01 to 0x00. Beware this setting will be lost after a NVSRAM update!

Using CAM management software do the following:
# cd /opt/SUNWsefms/bin/

For 6540/FLX380/FLX240/FLX280 run:
# ./service -d -c set -q nvsram region=0xf2 offset=0x19 value=0x00 host=0x02

For 6140 and 6130 run:
# ./service -d -c set -q nvsram region=0xf2 offset=0x19 value=0x00 host=0x00

Reboot both controllers in order to make the change active :
# ./service -d -c reset -t a

Wait at least 5 minutes until the A controller is up again.
# ./service -d -c reset -t b


Why this not happing before? With the changes of patch 125081-14 (sparc) or 125082-14 (x86) Sun deliver new driver for MPxIO. Due to this changes the problem can be triggered.

About

I'm still mostly blogging around Solaris Cluster and support. Independently if for Sun Microsystems or Oracle. :-)

Search

Archives
« July 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today