Monday Sep 07, 2009

Entries in infrastructure file if using tagged VLAN for cluster interconnect

In some cases it's necessary to add a tagged VLAN id to the cluster interconnect. This example show the difference of the cluster interconnect configuration if using tagged VLAN id or not. The interface e1000g2 have a "normal" setup (no VLAN id) and the interface e1000g1 got a VLAN id of 2. The used ethernet switch must be configured first with tagged VLAN id before the cluster interconnect can be configured. Use "clsetup" to assign a VLAN id to cluster interconnect.

Entries for "normal" cluster interconnect interface in /etc/cluster/ccr/global/infrastructure - no tagged VLAN:
cluster.nodes.1.adapters.1.name e1000g2
cluster.nodes.1.adapters.1.properties.device_name e1000g
cluster.nodes.1.adapters.1.properties.device_instance 2
cluster.nodes.1.adapters.1.properties.transport_type dlpi
cluster.nodes.1.adapters.1.properties.lazy_free 1
cluster.nodes.1.adapters.1.properties.dlpi_heartbeat_timeout 10000
cluster.nodes.1.adapters.1.properties.dlpi_heartbeat_quantum 1000
cluster.nodes.1.adapters.1.properties.nw_bandwidth 80
cluster.nodes.1.adapters.1.properties.bandwidth 70
cluster.nodes.1.adapters.1.properties.ip_address 172.16.1.129
cluster.nodes.1.adapters.1.properties.netmask 255.255.255.128
cluster.nodes.1.adapters.1.state enabled
cluster.nodes.1.adapters.1.ports.1.name 0
cluster.nodes.1.adapters.1.ports.1.state enabled


Entries for cluster interconnect interface in /etc/cluster/ccr/global/infrastructure - with tagged VLAN:
cluster.nodes.1.adapters.2.name e1000g2001

cluster.nodes.1.adapters.2.properties.device_name e1000g
cluster.nodes.1.adapters.2.properties.device_instance 1
cluster.nodes.1.adapters.2.properties.vlan_id 2
cluster.nodes.1.adapters.2.properties.transport_type dlpi
cluster.nodes.1.adapters.2.properties.lazy_free 1
cluster.nodes.1.adapters.2.properties.dlpi_heartbeat_timeout 10000
cluster.nodes.1.adapters.2.properties.dlpi_heartbeat_quantum 1000
cluster.nodes.1.adapters.2.properties.nw_bandwidth 80
cluster.nodes.1.adapters.2.properties.bandwidth 70
cluster.nodes.1.adapters.2.properties.ip_address 172.16.2.1
cluster.nodes.1.adapters.2.properties.netmask 255.255.255.128
cluster.nodes.1.adapters.2.state enabled
cluster.nodes.1.adapters.2.ports.1.name 0
cluster.nodes.1.adapters.2.ports.1.state enabled

The tagged VLAN interface is a combination of the VLAN id and the used network interface. In this example e1000g2001, the 2 after the e1000g is the VLAN id and the 1 at the end is the instance of the e1000g driver. Normally this would be the e1000g1 interface but with the VLAN id it becomes the interface e1000g2001.

The ifconfig -a of the above configuration is:
# ifconfig -a

lo0: flags=20010008c9 mtu 8232 index 1
       inet 127.0.0.1 netmask ff000000
e1000g0: flags=9000843 mtu 1500 index 2
      inet 10.16.65.63 netmask fffff800 broadcast 10.16.55.255
      groupname sc_ipmp0
      ether 0:14:4f:20:6a:18
e1000g2: flags=201008843 mtu 1500 index 4
      inet 172.16.1.129 netmask ffffff80 broadcast 172.16.1.255
      ether 0:14:4f:20:6a:1a
e1000g2001: flags=201008843 mtu 1500 index 3
      inet 172.16.2.1 netmask ffffff80 broadcast 172.16.2.127
      ether 0:14:4f:20:6a:19

clprivnet0: flags=1009843 mtu 1500 index 5
      inet 172.16.4.1 netmask fffffe00 broadcast 172.16.5.255
      ether 0:0:0:0:0:1

Wednesday Jan 28, 2009

private interconnect and patch 138888/138889


In specific Sun Cluster 3.x configurations the cluster node can not join. Most of the time this issue comes up after the installation of kernel update patch
138888-01 until 139555-08 or higher SunOS 5.10: Kernel Patch OR
138889-01 until 139556-08 or higher SunOS 5.10_x86: Kernel Patch
AND
Sun Cluster 3.x using an Ethernet switch (with VLAN) for the private interconnect
AND
Sun Cluster 3.x using e1000g, nxge, bge or ixgb (GLDv3) interfaces for the private interconnect.

The issue looks similar to the following messages during the boot up of the cluster node.
...
Jan 25 15:46:14 node1 genunix: [ID 279084 kern.notice] NOTICE: CMM: node reconfiguration #2 completed.
Jan 25 15:46:15 node1 genunix: [ID 884114 kern.notice] NOTICE: clcomm: Adapter e1000g1 constructed
Jan 25 15:46:15 node1 ip: [ID 856290 kern.notice] ip: joining multicasts failed (18) on clprivnet0 - will use link layer broadcasts for multicast
Jan 25 15:46:16 node1 genunix: [ID 884114 kern.notice] NOTICE: clcomm: Adapter e1000g3 constructed
Jan 25 15:47:15 node1 genunix: [ID 604153 kern.notice] NOTICE: clcomm: Path node1:e1000g1 - node2:e1000g1 errors during initiation
Jan 25 15:47:15 node1 genunix: [ID 618107 kern.warning] WARNING: Path node1:e1000g1 - node2:e1000g1 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
Jan 25 15:47:16 node1 genunix: [ID 604153 kern.notice] NOTICE: clcomm: Path node1:e1000g3 - node2:e1000g3 errors during initiation
Jan 25 15:47:16 node1 genunix: [ID 618107 kern.warning] WARNING: Path node1:e1000g3 - node2:e1000g3 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
...
Jan 25 16:33:51 node1 genunix: [ID 224783 kern.notice] NOTICE: clcomm: Path node1:e1000g1 - node2:e1000g1 has been deleted
Jan 25 16:33:51 node1 genunix: [ID 638544 kern.notice] NOTICE: clcomm: Adapter e1000g1 has been disabled
Jan 25 16:33:51 node1 genunix: [ID 224783 kern.notice] NOTICE: clcomm: Path node1:e1000g3 - node2:e1000g3 has been deleted
Jan 25 16:33:51 node1 genunix: [ID 638544 kern.notice] NOTICE: clcomm: Adapter e1000g3 has been disabled
Jan 25 16:33:51 node1 ip: [ID 856290 kern.notice] ip: joining multicasts failed (18) on clprivnet0 - will use link layer broadcasts for multicast

Update 6.Mar.2009:
Available now:
Alert 1020193.1 Kernel Patches/Changes may Stop Sun Cluster Nodes From Joining the Cluster

Update 26.Jun.2009:
The issue is fixed in the patches
141414-01 or higher SunOS 5.10: kernel patch OR
137104-02 or higher SunOS 5.10_x86: dls patch

Both patches require the 13955[56]-08 kernel update patch which is included in Solaris 10 5/09 update7. If using Solaris 10 5/09 update7 then Sun Cluster 3.2 requires the Sun Cluster core patch in revision -33 or higher. So, to get this one fixed it's recommended to use Solaris 10 5/09 update7 (patch 13955[56]-8 or higher & 141414-01(sparc) or 137104-02(x86)) with the Sun Cluster 3.2 core patch -33 or higher.


Choose one of the corrective actions (if not install the patch with the fix):
  • Before install the mention patches configure VLAN tagging on the Sun interface and on the switch. This makes VLAN tagged packets expected and prevents drops. This means the interface name moves to e.g. e1000g810000. After configuration change to e.g. e1000g810000 it's recommend to reboot the Sun Cluster hosts. Configuration details.

  • If using the above mentioned kernel update patch enable QoS (Quality of Service) on the Ethernet switch. The switch should be able to handle priority tagging. Please refer to the switch documentation because each switch is different.

  • Do not install the above mentioned kernel update patch if using VLAN in Sun Cluster 3.x private interconnect.

The mentioned kernel update patch delivers some new features in the GLDv3 architecture. It makes packets 802.1q standard compliant by including priority tagging. Therefore the following Sun Cluster 3.x configuration should not be affected.
\* Sun Cluster 3.x which use ce, ge, hme, qfe, ipge or ixge network interfaces.
\* Sun Cluster 3.x which have back-to-back connections for the private interconnect.
\* Sun Cluster 3.x on Solaris 8 or Solaris 9.

About

I'm still mostly blogging around Solaris Cluster and support. Independently if for Sun Microsystems or Oracle. :-)

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
23
24
25
26
27
28
29
30
   
       
Today