Tuesday Feb 26, 2013

Solaris on Exalogic - Effect of VNIC over eoib0 & eoib1

There are lots of reason for customer to create VNIC over eoib0 & eoib1 on a compute node running Solaris, two typical examples are

1) compute node needs to connect to a VLAN over the EoIB network

2) there are containers running on the compute node that require 10GbE connectivity

We talked about why Transitive Probe-based Failure Detection is required in previous blog entry, the focus was on the link between IB gateway and customer's 10GbE infrastructure.

In fact, if there are VNIC created over eoib0 and eoib1, there is a chance that bond1 will not fail over even if the link between compute node and IB gateway goes down!

Here is a simple test to illustrate this scenario:

First of all, let's create a VNIC over eoib0 using the following command:

root@el01cn01:~#dladm create-vnic -l eoib0 vnic0

That's what the IPMP groups look like:

root@el01cn01:~# ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
eoib0       yes     bond1       --mb---   up        disabled  ok
eoib1       no      bond1       is-----   up        disabled  ok
bond0_0     yes     bond0       --mb---   up        disabled  ok
bond0_1     no      bond0       is-----   up        disabled  ok

Then we take the link down between compute node and the IB gateway where eoib0 is located, following is what we get:

root@el01cn01:~# ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
eoib0       yes     bond1       --mb---   up        disabled  ok
eoib1       no      bond1       is-----   up        disabled  ok
bond0_0     no      bond0       -------   down      disabled  failed
bond0_1     yes     bond0       -smb---   up        disabled  ok

Notice that bond0 has failover but not bond1. Even the LINK status is still up for eoib0, it has actually lost connectivity to the 10GbE network.

Obviously, the reason behind this behavior is related to the vnic0 that we created over eoib0, from the operating system point of view, the link between eoib0 and vnic0 is still up, therefore no failover of bond1 occurred.

This is another good reason why probe-based failure detection is required.




Solaris on Exalogic - Transitive Probe-based Failure Detection

On Exalogic, no matter which supported Operating Systems that a compute node is running, it relies on the IB gateways (NM2-GW) to provide both internal (IPoIB) and external (EoIB) network connectivity. Each compute node is physically connected to two IB gateways by copper cables, the IB gateway in turn is connected to customer's 10GbE infrastructure, typically a level 2 switch.

By default, only link-based failure detection for IPMP group is enabled on compute node running Solaris. This default setting remains the same even if a compute node has been upgraded from Solaris 11 Express to Solaris 11.1 on X2-2 hardware or on X3-2 hardware where Solaris 11.1 can be installed directly.

The limitation of link-based failure detection is that it cannot detect failure over the link between IB gateway and customer's infrastructure, that means even if that link goes down, bond1 will not fail over and therefore 10GbE connectivity to the compute node is lost.

In fact, there exists a scenario where even the link between compute node and IB gateway failed, bond1 will not fail over but that's a topic for another blog entry.

For customers running Solaris 11 Express, the solution is enable Probe-based Failure Detection, the downside of this solution is we need 2 additional IP addresses for each IPMP group. It could be a challenge for customers running tight on IP addresses.

On Solaris 11.1, we have a better solution called Transitive Probe-based Failure Detection and it does not require additional IP addresses to be assigned to the IPMP group members.

To enable Transitive Probe-based Failure Detection, run the following commands on a compute node:

#svccfg -s svc:/network/ipmp setprop config/transitive-probing=true
#svcadm refresh svc:/network/ipmp:default

If default gateway is already configured for bond1, it will be used as the target system, otherwise you will need to create a host route to a particular system that you would like to probe.

To check if Transitive  Probe-based Failure Detection is working, run the following command:

root@el01cn01:~# ipmpstat -t
INTERFACE   MODE       TESTADDR            TARGETS
eoib1       transitive <eoib1>             <eoib0>
eoib0       routes     el01cn01-pub   192.168.123.254
bond0_1     transitive <bond0_1>           <bond0_0>
bond0_0     multicast  el01cn01-priv  el01cn05-priv el01cn04-priv el01cn02-priv el01cn06-priv el01sn-priv

See the Solaris official documentation here on how to specify a target system for probe-based failure detection.



Solaris on Exalogic - Reverse the active/pasive interface of an IPMP group

In a customer engagement, I found that that their bond0 and bond1 configurations look like this:

root@el01cn01:~# ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
eoib0       no      bond1       is-----   up        disabled  ok
eoib1       yes     bond1       --mb---   up        disabled  ok
bond0_0     yes     bond0       --mb---   up        disabled  ok
bond0_1     no      bond0       is-----   up        disabled  ok

notice that the active interface for bond1 is eoib1 while the active interface for bond0 is bond0_0

Although it is perfectly fine for the EoIB traffic and IPoIB traffic to go over different IB gateway, customer want like to re-configure it so that both type of traffic will go through the same IB gateway.

First of all, we turn on "standby" property for eoib1 and turn off "standby" property for eoib0 with the following commands:

root@el01cn01:~# ipadm set-ifprop -p standby=on -m ip eoib1
root@el01cn01:~# ipadm set-ifprop -p standby=off -m ip eoib0

The status of the IPMP groups look like this now:

root@el01cn01:~# ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
eoib0       yes     bond1       -------   up        disabled  ok
eoib1       yes     bond1       -smb---   up        disabled  ok
bond0_0     yes     bond0       --mb---   up        disabled  ok
bond0_1     no      bond0       is-----   up        disabled  ok

Then we forced a failover by detaching eoib1 from bond1 with the following command:

root@el01cn01:~# if_mpadm -d eoib1

The status of the IPMP groups look like this now:

root@el01cn01:~# ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
eoib0       yes     bond1       --mb---   up        disabled  ok
eoib1       no      bond1       -s---d-   up        disabled  offline
bond0_0     yes     bond0       --mb---   up        disabled  ok
bond0_1     no      bond0       is-----   up        disabled  ok

Then we re-attach eoib1 back to bond1, because it is a standby interface, failback will not happen:

root@el01cn01:~# if_mpadm -r eoib1

That's how it looks like after the re-configuration:

root@el01cn01:~# ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
eoib0       yes     bond1       --mb---   up        disabled  ok
eoib1       no      bond1       is-----   up        disabled  ok
bond0_0     yes     bond0       --mb---   up        disabled  ok
bond0_1     no      bond0       is-----   up        disabled  ok

Sunday Mar 11, 2012

Configure IPoIB on Solaris 10 branded zone

Compute nodes in Exalogic communicate with each other and mount their shares from the ZFS storage appliance over the bond0 interface. Bond0 is a highly available network interface over the InfiniBand fabric using a portability layer called IPoIB that allows compute nodes and the storage appliance to communicate using TCP/IP protocol over InfiniBand.

In previous entry, we have created a Solaris 10 branded zone, obviously we would also like the branded zone to be able to mount shares from the ZFS storage appliance over IPoIB.

In this entry, I’ll show you the steps.

[Read More]

Create Solaris 10 Branded Zone on Exalogic

One of the reasons that customers choose to run Solaris 11 Express on Exalogic is the capability to create container. Container is a form of Operating System Virtualization that allows multiple operating system environments to coexist on a single system. Container not only allows user to run the same version of Solaris but it is also possible to create a container, also known as branded zone, to run previous Solaris version, such as Solaris 10. It is a very useful feature that enable customer to continue running applications that only certified on older OS version.

In this entry, I will show you the steps to create a Solaris 10 branded zone on Exalogic.

[Read More]
About

The primary contributors to this blog are comprised of the Exalogic and Cloud Application Foundation contingent of Oracle's Fusion Middleware Architecture Team, fondly known as the A-Team. As part of the Oracle development organization, The A-Team supports some of Oracle's largest and most strategic customers worldwide. Our mission is to provide deep technical expertise to support various Oracle field organizations and customers deploying Oracle Fusion Middleware related products. And to collect real world feedback to continuously improve the products we support. In this blog, our experts and guest experts will focus on Exalogic, WebLogic, Coherence, Tuxedo/mainframe migration, Enterprise Manager and JDK/JRockIT performance tuning. It is our way to share some of our experiences with Oracle community. We hope our followers took away something of value from our experiences. Thank you for visiting and please come back soon.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today