Kernel patch 141444-09 or 141445-09 with Sun Cluster 3.2

As stated in my last blog the following kernel patches are included in Solaris 10 10/09 Update8.
141444-09 SunOS 5.10: kernel patch or
141445-09 SunOS 5.10_x86: kernel patch

Update 10.Dec.2009:
Support of Solaris 10 10/09 Update8 with Sun Cluster 3.2 1/09 Update2 is now announced. The recommendation is to use the 126106-39 (sparc) / 126107-39 (x86) with Solaris 10 10/09 Update8. Note: The -39 Sun Cluster core patch is a feature patch because the -38 Sun Cluster core patch is part of Sun Cluster 3.2 11/09 Update3 which is already released.
For new installations/upgrades with Solaris 10 10/09 Update8 use:
\* Sun Cluster 3.2 11/09 Update3 with Sun Cluster core patch -39 (fix problem 1)
\* Use the patches 142900-02 / 142901-02 (fix problem 2)
\* Add "set nautopush=64" to /etc/system (workaround for problem 3)

For patch updates to 141444-09/141445-09 use:
\* Sun Cluster core patch -39 (fix problem 1)
\* Also use patches 142900-02 / 142901-02 (fix problem 2)
\* Add "set nautopush=64" to /etc/system (workaround for problem 3)


It's time to notify that there are some issues with these kernel patches in combination with Sun Cluster 3.2

1.) The patch breaks the zpool cachefile feature if using SUNW.HAStoragePlus

a.) If the kernel patch 141444-09 (sparc) / 141445-09 (x86) is installed on a Sun Cluster 3.2 system where the Sun Cluster core patch 126106-33 (sparc) / 126107-33 (x86) is already installed then hastorageplus_prenet_start will fail with the following error message:
...
Oct 26 17:51:45 nodeA SC[,SUNW.HAStoragePlus:6,rg1,rs1,hastorageplus_prenet_start]: Started searching for devices in '/dev/dsk' to find the importable pools.
Oct 26 17:51:53 nodeA SC[,SUNW.HAStoragePlus:6,rg1,rs1,hastorageplus_prenet_start]: Completed searching the devices in '/dev/dsk' to find the importable pools.
Oct 26 17:51:54 nodeA zfs: [ID 427000 kern.warning] WARNING: pool 'zpool1' could not be loaded as it was last accessed by another system (host: nodeB hostid: 0x8516ced4). See: http://www.sun.com/msg/ZFS-8000-EY
...


b.) If the kernel patch 141444-09 (sparc) / 141445-09 (x86) is installed on a Sun Cluster 3.2 system where the Sun Cluster core patch 126106-35 (sparc) / 126107-35 (x86) is already installed then hastorageplus_prenet_start will work but the zpool cachefile feature of SUNW.HAStoragePlus is disabled. Without the zpool cachefile feature the time of zpool import increases because the import will scan all available disks. The messages look like:
...
Oct 30 15:37:45 nodeA SC[,SUNW.HAStoragePlus:8,nfs-rg,zpool1-rs,hastorageplus_validate]: [ID 148650 daemon.notice] Started searching for devices in '/dev/dsk' to find the importable pools.
Oct 30 15:37:45 nodeA SC[,SUNW.HAStoragePlus:8,nfs-rg,zpool1-rs,hastorageplus_validate]: [ID 148650 daemon.notice] Started searching for devices in '/dev/dsk' to find the importable pools.
Oct 30 15:37:49 nodeA SC[,SUNW.HAStoragePlus:8,nfs-rg,zpool1-rs,hastorageplus_validate]: [ID 547433 daemon.notice] Completed searching the devices in '/dev/dsk' to find the importable pools.
Oct 30 15:37:49 nodeA SC[,SUNW.HAStoragePlus:8,nfs-rg,zpool1-rs,hastorageplus_validate]: [ID 547433 daemon.notice] Completed searching the devices in '/dev/dsk' to find the importable pools.
Oct 30 15:37:49 nodeA SC[,SUNW.HAStoragePlus:8,nfs-rg,zpool1-rs,hastorageplus_validate]: [ID 792255 daemon.warning] Failed to update the cachefile contents in /var/cluster/run/HAStoragePlus/zfs/zpool1.cachefile to CCR table zpool1.cachefile for pool zpool1 : file /var/cluster/run/HAStoragePlus/zfs/zpool1.cachefile open failed: No such file or directory.
Oct 30 15:37:49 nodeA SC[,SUNW.HAStoragePlus:8,nfs-rg,zpool1-rs,hastorageplus_validate]: [ID 792255 daemon.warning] Failed to update the cachefile contents in /var/cluster/run/HAStoragePlus/zfs/zpool1.cachefile to CCR table zpool1.cachefile for pool zpool1 : file /var/cluster/run/HAStoragePlus/zfs/zpool1.cachefile open failed: No such file or directory.
Oct 30 15:37:49 nodeA SC[,SUNW.HAStoragePlus:8,nfs-rg,zpool1-rs,hastorageplus_validate]: [ID 205754 daemon.info] All specified device services validated successfully.
...


If the ZFS cachefile feature is not required AND the above kernel patches are installed, problem a.) is resolved by installing Sun Cluster core patch 126106-35 (sparc) / 126107-35 (x86).
Solution for a) and b):
126106-39 Sun Cluster 3.2: CORE patch for Solaris 10
126107-39 Sun Cluster 3.2: CORE patch for Solaris 10_x86

Alert 1021629.1: A Solaris Kernel Change Stops Sun Cluster Using "zpool.cachefiles" to Import zpools Resulting in ZFS pool Import Performance Degradation or Failure to Import the zpools

2.) The patch breaks probe-based IPMP if more than one interface is in the same IPMP group

After installing the already mentioned kernel patch:
141444-09 SunOS 5.10: kernel patch or
141445-09 SunOS 5.10_x86: kernel patch
then the probe-based IPMP group feature is broken if the system is using more than one interface in the same IPMP group. This means all Solaris 10 systems which are using more than one interface in the same probe-based IPMP group are affected!

After installing this kernel patch the following errors will be sent to the system console after a reboot:
...
nodeA console login: Oct 26 19:34:41 in.mpathd[210]: NIC failure detected on bge0 of group ipmp0
Oct 26 19:34:41 in.mpathd[210]: Successfully failed over from NIC bge0 to NIC e1000g0
...

Workarounds:
a) Use link-based IPMP instead of probe-based IPMP
b) Use only one interface in the same IPMP group if using probe-based IPMP
See the blog "Tips to configure IPMP with Sun Cluster 3.x" for more details if you like to change the configuration.
c) Do not install the listed kernel patch above. Note: Fix is already in progress and can be reached via a service request. I will update this blog when the general fix is available.

Solution:
142900-02 SunOS 5.10: kernel patch
142901-02 SunOS 5.10_x86: kernel patch

Alert 1021262.1 : Solaris 10 Kernel Patches 141444-09 and 141445-09 May Cause Interface Failure in IP Multipathing (IPMP)
This is reported in Bug 6888928

3.) When applying the patch Sun Cluster can hang on reboot

After installing the already mentioned kernel patch:
141444-09 SunOS 5.10: kernel patch or
141511-05 SunOS 5.10_x86: ehci, ohci, uhci patch
the Sun Cluster nodes can hang within boot because the Sun Cluster nodes has exhausted the default number of autopush structures. When clhbsndr module is loaded, it causes a lot more autopushes to occur than would otherwise happen on a non-clustered system. By default, we only allocate nautopush=32 of these structures.

Workarounds:
a) Do not use the mentioned kernel patch with Sun Cluster
b) Boot in non-cluster-mode and add the following to /etc/system
set nautopush=64

Solution:
126106-42 Sun Cluster 3.2: CORE patch for Solaris 10
126107-42 Sun Cluster 3.2: CORE patch for Solaris 10_x86
for Sun Cluster 3.1 the issue is fixed in:
120500-26 Sun Cluster 3.1: Core Patch for Solaris 10

Alert 1021684.1: Solaris autopush(1M) Changes (with patches 141444-09/141511-04) May Cause Sun Cluster 3.1 and 3.2 Nodes to Hang During Boot
This is reported in Bug 6879232

Comments:

Excellent info, please keep it coming. Sun Cluster is a critical component in today's enterprise, and it'll be even more critical in the enterprise of tomorrow.

Posted by UX-admin on November 04, 2009 at 12:26 PM CET #

Juergen, I found out that this patch does not impact non-ZFS clusters, but 141879-08 (iSCSI patch) breaks my SC 3.2 setup based on iSCSI and SVM.

WIth that patch applied the did device goes into fail status at boot and even if you remove everything and recreate it, it will fail again at next boot.

Moreover, I found out that if you have a working setup and apply that patch, the did devices will be OK at the reconfiguration boot, but will fail from the 2nd boot onwards!! Weird....

Keep up the good work here!

Rick

Posted by Riccardo Pizzi on November 05, 2009 at 03:24 PM CET #

will the patch 141444-09 affect veritas volume manager & veritas cluster???

Posted by Aneesh M on August 03, 2010 at 10:10 AM CEST #

also i want to know if patch id 141444-09 is compatible with oracle 11g R2???

Posted by Aneesh M on August 03, 2010 at 10:16 AM CEST #

The 2.) issue also affect VxVM/VCS if using probe-based IPMP.
Yes, the patch can be used with Oracle 11gR2. The minimum release for 11gR2 is
Solaris 10 10/08 update6. The 141444-09 is the KU of Solaris 10 10/09 Update8.
See http://blogs.sun.com/js/entry/solaris_10_kernel_patches
This time Oracle 11gR2 can be used with Solaris Cluster 3.2 11/09 update3 and Solaris 10 10/09 Update8 but without ASM. This support is coming soon...

Posted by Juergen Schleich on August 05, 2010 at 11:18 AM CEST #

I just attempted to apply 126106-40 with 142900-14. Received the dependency error with the loaddid service and cluster would not boot. I was able to backout the patch and come back online but haven't go 126106-40 to work yet.

Posted by William Griffin on September 02, 2010 at 01:28 AM CEST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

I'm still mostly blogging around Solaris Cluster and support. Independently if for Sun Microsystems or Oracle. :-)

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today