Wednesday Jan 28, 2009

private interconnect and patch 138888/138889


In specific Sun Cluster 3.x configurations the cluster node can not join. Most of the time this issue comes up after the installation of kernel update patch
138888-01 until 139555-08 or higher SunOS 5.10: Kernel Patch OR
138889-01 until 139556-08 or higher SunOS 5.10_x86: Kernel Patch
AND
Sun Cluster 3.x using an Ethernet switch (with VLAN) for the private interconnect
AND
Sun Cluster 3.x using e1000g, nxge, bge or ixgb (GLDv3) interfaces for the private interconnect.

The issue looks similar to the following messages during the boot up of the cluster node.
...
Jan 25 15:46:14 node1 genunix: [ID 279084 kern.notice] NOTICE: CMM: node reconfiguration #2 completed.
Jan 25 15:46:15 node1 genunix: [ID 884114 kern.notice] NOTICE: clcomm: Adapter e1000g1 constructed
Jan 25 15:46:15 node1 ip: [ID 856290 kern.notice] ip: joining multicasts failed (18) on clprivnet0 - will use link layer broadcasts for multicast
Jan 25 15:46:16 node1 genunix: [ID 884114 kern.notice] NOTICE: clcomm: Adapter e1000g3 constructed
Jan 25 15:47:15 node1 genunix: [ID 604153 kern.notice] NOTICE: clcomm: Path node1:e1000g1 - node2:e1000g1 errors during initiation
Jan 25 15:47:15 node1 genunix: [ID 618107 kern.warning] WARNING: Path node1:e1000g1 - node2:e1000g1 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
Jan 25 15:47:16 node1 genunix: [ID 604153 kern.notice] NOTICE: clcomm: Path node1:e1000g3 - node2:e1000g3 errors during initiation
Jan 25 15:47:16 node1 genunix: [ID 618107 kern.warning] WARNING: Path node1:e1000g3 - node2:e1000g3 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
...
Jan 25 16:33:51 node1 genunix: [ID 224783 kern.notice] NOTICE: clcomm: Path node1:e1000g1 - node2:e1000g1 has been deleted
Jan 25 16:33:51 node1 genunix: [ID 638544 kern.notice] NOTICE: clcomm: Adapter e1000g1 has been disabled
Jan 25 16:33:51 node1 genunix: [ID 224783 kern.notice] NOTICE: clcomm: Path node1:e1000g3 - node2:e1000g3 has been deleted
Jan 25 16:33:51 node1 genunix: [ID 638544 kern.notice] NOTICE: clcomm: Adapter e1000g3 has been disabled
Jan 25 16:33:51 node1 ip: [ID 856290 kern.notice] ip: joining multicasts failed (18) on clprivnet0 - will use link layer broadcasts for multicast

Update 6.Mar.2009:
Available now:
Alert 1020193.1 Kernel Patches/Changes may Stop Sun Cluster Nodes From Joining the Cluster

Update 26.Jun.2009:
The issue is fixed in the patches
141414-01 or higher SunOS 5.10: kernel patch OR
137104-02 or higher SunOS 5.10_x86: dls patch

Both patches require the 13955[56]-08 kernel update patch which is included in Solaris 10 5/09 update7. If using Solaris 10 5/09 update7 then Sun Cluster 3.2 requires the Sun Cluster core patch in revision -33 or higher. So, to get this one fixed it's recommended to use Solaris 10 5/09 update7 (patch 13955[56]-8 or higher & 141414-01(sparc) or 137104-02(x86)) with the Sun Cluster 3.2 core patch -33 or higher.


Choose one of the corrective actions (if not install the patch with the fix):
  • Before install the mention patches configure VLAN tagging on the Sun interface and on the switch. This makes VLAN tagged packets expected and prevents drops. This means the interface name moves to e.g. e1000g810000. After configuration change to e.g. e1000g810000 it's recommend to reboot the Sun Cluster hosts. Configuration details.

  • If using the above mentioned kernel update patch enable QoS (Quality of Service) on the Ethernet switch. The switch should be able to handle priority tagging. Please refer to the switch documentation because each switch is different.

  • Do not install the above mentioned kernel update patch if using VLAN in Sun Cluster 3.x private interconnect.

The mentioned kernel update patch delivers some new features in the GLDv3 architecture. It makes packets 802.1q standard compliant by including priority tagging. Therefore the following Sun Cluster 3.x configuration should not be affected.
\* Sun Cluster 3.x which use ce, ge, hme, qfe, ipge or ixge network interfaces.
\* Sun Cluster 3.x which have back-to-back connections for the private interconnect.
\* Sun Cluster 3.x on Solaris 8 or Solaris 9.

About

I'm still mostly blogging around Solaris Cluster and support. Independently if for Sun Microsystems or Oracle. :-)

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
23
24
25
26
27
28
29
30
   
       
Today