What is bondib1 used for on SPARC SuperCluster with InfiniBand, Solaris 11 networking & Oracle RAC?

A co-worker asked the following question about a SPARC SuperCluster InfiniBand network:

> on the database nodes the RAC nodes communicate over the cluster_interconnect. This is the
> 192.168.10.0 network on bondib0. (according to ./crs/install/crsconfig_params NETWORKS
> setting) 
> What is bondib1 used for? Is it a HA counterpart in case bondib0 dies?

This is my response:

Summary: In a SPARC SuperCluster installation, bondib0 and bondib1 are the InfiniBand links that are used for the private interconnect (usage includes global cache data blocks and heartbeat) and for communication to the Exadata storage cells. Currently, the database is idle, so bondib1 is currently only being used for outbound cluster interconnect traffic.

Details:

bondib0 is the cluster_interconnect

$ oifcfg getif           
bondeth0  10.129.184.0  global  public
bondib0  192.168.10.0  global  cluster_interconnect
ipmpapp0  192.168.30.0  global  public


bondib0 and bondib1 are on 192.168.10.1 and 192.168.10.2 respectively.

# ipadm show-addr | grep bondi
bondib0/v4static  static   ok           192.168.10.1/24
bondib1/v4static  static   ok           192.168.10.2/24


This private network is also used to communicate with the Exadata Storage Cells. Notice that the network addresses of the Exadata Cell Disks are on the same subnet as the private interconnect:  

SQL> column path format a40
SQL> select path from v$asm_disk;

PATH                                     

---------------------------------------- 
o/192.168.10.9/DATA_SSC_CD_00_ssc9es01  
o/192.168.10.9/DATA_SSC_CD_01_ssc9es01
...

Hostnames tied to the IPs are node1-priv1 and node1-priv2 

# grep 192.168.10 /etc/hosts
192.168.10.1    node1-priv1.us.oracle.com   node1-priv1
192.168.10.2    node1-priv2.us.oracle.com   node1-priv2

For the four compute node RAC:

  • Each compute node has two IP address on the 192.168.10.0 private network.
  • Each IP address has an active InfiniBand link and a failover InfiniBand link.
  • Thus, the compute nodes are using a total of 8 IP addresses and 16 InfiniBand links for this private network.

bondib1 isn't being used for the Virtual IP (VIP):

$ srvctl config vip -n node1
VIP exists: /node1-ib-vip/192.168.30.25/192.168.30.0/255.255.255.0/ipmpapp0, hosting node node1
VIP exists: /node1-vip/10.55.184.15/10.55.184.0/255.255.255.0/bondeth0, hosting node node1


bondib1 is on bondib1_0 and fails over to bondib1_1:

# ipmpstat -g
GROUP       GROUPNAME   STATE     FDT       INTERFACES
ipmpapp0    ipmpapp0    ok        --        ipmpapp_0 (ipmpapp_1)
bondeth0    bondeth0    degraded  --        net2 [net5]
bondib1     bondib1     ok        --        bondib1_0 (bondib1_1)
bondib0     bondib0     ok        --        bondib0_0 (bondib0_1)


bondib1_0 goes over net24

# dladm show-link | grep bond
LINK                CLASS     MTU    STATE    OVER
bondib0_0           part      65520  up       net21
bondib0_1           part      65520  up       net22
bondib1_0           part      65520  up       net24
bondib1_1           part      65520  up       net23


net24 is IB Partition FFFF

# dladm show-ib
LINK         HCAGUID         PORTGUID        PORT STATE  PKEYS
net24        21280001A1868A  21280001A1868C  2    up     FFFF
net22        21280001CEBBDE  21280001CEBBE0  2    up     FFFF,8503
net23        21280001A1868A  21280001A1868B  1    up     FFFF,8503
net21        21280001CEBBDE  21280001CEBBDF  1    up     FFFF


On Express Module 9 port 2:

# dladm show-phys -L
LINK              DEVICE       LOC
net21             ibp4         PCI-EM1/PORT1
net22             ibp5         PCI-EM1/PORT2
net23             ibp6         PCI-EM9/PORT1
net24             ibp7         PCI-EM9/PORT2


Outbound traffic on the 192.168.10.0 network will be multiplexed between bondib0 & bondib1

# netstat -rn

Routing Table: IPv4
  Destination           Gateway           Flags  Ref     Use     Interface
-------------------- -------------------- ----- ----- ---------- ---------
192.168.10.0         192.168.10.2         U        16    6551834 bondib1  
192.168.10.0         192.168.10.1         U         9    5708924 bondib0  


The database is currently idle, so there is no traffic to the Exadata Storage Cells at this moment, nor is there currently any traffic being induced by the global cache. Thus, only the heartbeat is currently active. There is more traffic on bondib0 than bondib1

# /bin/time snoop -I bondib0 -c 100 > /dev/null
Using device ipnet/bondib0 (promiscuous mode)
100 packets captured

real        4.3
user        0.0
sys         0.0


(100 packets in 4.3 seconds = 23.3 pkts/sec)

# /bin/time snoop -I bondib1 -c 100 > /dev/null
Using device ipnet/bondib1 (promiscuous mode)
100 packets captured

real       13.3
user        0.0
sys         0.0


(100 packets in 13.3 seconds = 7.5 pkts/sec)

Half of the packets on bondib0 are outbound (from self). The remaining packet are split evenly, from the other nodes in the cluster.

# snoop -I bondib0 -c 100 | awk '{print $1}' | sort | uniq -c
Using device ipnet/bondib0 (promiscuous mode)
100 packets captured
  49 node1
-priv1.us.oracle.com
  24 node2
-priv1.us.oracle.com
  14 node3
-priv1.us.oracle.com
  13 node4
-priv1.us.oracle.com

100% of the packets on bondib1 are outbound (from self), but the headers in the packets indicate that they are from the IP address associated with bondib0:

# snoop -I bondib1 -c 100 | awk '{print $1}' | sort | uniq -c
Using device ipnet/bondib1 (promiscuous mode)
100 packets captured
 100 node1-priv1.us.oracle.com

The destination of the bondib1 outbound packets are split evenly, to node3 and node 4.

# snoop -I bondib1 -c 100 | awk '{print $3}' | sort | uniq -c
Using device ipnet/bondib1 (promiscuous mode)
100 packets captured
  51 node3-priv1.us.oracle.com
  49 node4-priv1.us.oracle.com

Conclusion: In a SPARC SuperCluster installation, bondib0 and bondib1 are the InfiniBand links that are used for the private interconnect (usage includes global cache data blocks and heartbeat) and for communication to the Exadata storage cells. Currently, the database is idle, so bondib1 is currently only being used for outbound cluster interconnect traffic.

Comments:

Hi Jeff,

The article posted: "What is bondib1 used for on SPARC SuperCluster with InfiniBand, Solaris 11 networking & Oracle RAC?" is very informative.

I have a query and I hope you will be able to answer:

I have an environment similar to SPARC SuperCluster but it consists of two SPARC T5-2 servers, two Sun Network QDR InfiniBand switches and a ZS3-2 storage. There is no Exadata Storage server.

I want to create an Oracle RAC environment with InfiniBand as the private interconnect.

I want to dedicate PCIE2 and PCIE3 for the first Oracle RAC installation in LDOM.

Following commands were executed on the first server named "s1" but the InfiniBand interface shows as failed:

Enable Transitive probing:

svccfg -s svc:/network/ipmp setprop config/transitive-probing=true
svcadm refresh svc:/network/ipmp:default

cat /etc/hosts

172.16.16.11 s1.ocslab.com s1
172.16.16.21 s2.ocslab.com s2

192.168.3.2 s1-priv1
192.168.3.3 s1-priv2

192.168.3.4 s2-priv1
192.168.3.5 s2-priv2

root@s1:~# dladm show-phys -L
LINK DEVICE LOC
net0 ixgbe0 /SYS/MB
net1 ixgbe1 /SYS/MB
net2 ixgbe4 /SYS/MB
net3 ixgbe5 /SYS/MB
net4 ixgbe2 PCIE1
net5 ixgbe3 PCIE1
net16 vsw0 --
net17 vsw1 --
net18 vsw2 --
net19 vsw3 --
net20 vsw4 --
net21 vsw5 --
net22 vsw6 --
net23 vsw7 --
net24 vsw8 --
net15 vsw9 --
net26 vsw10 --
net27 vsw11 --
net28 vsw12 --
net29 vsw13 --
net6 ibp0 PCIE7/PORT1
net7 ibp1 PCIE7/PORT2
net8 ibp6 PCIE8/PORT1
net9 ibp7 PCIE8/PORT2
net10 ibp4 PCIE2/PORT1
net11 ibp5 PCIE2/PORT2
net12 ibp2 PCIE3/PORT1
net13 ibp3 PCIE3/PORT2
net14 usbecm0 --

root@s1:~# dladm show-ib
LINK HCAGUID PORTGUID PORT STATE PKEYS
net9 10E000013332F4 10E000013332F6 2 up FFFF
net11 10E0000134CC80 10E0000134CC82 2 up 8503,FFFF
net7 10E0000133B578 10E0000133B57A 2 up 8503,FFFF
net10 10E0000134CC80 10E0000134CC81 1 up FFFF
net6 10E0000133B578 10E0000133B579 1 up FFFF
net8 10E000013332F4 10E000013332F5 1 up 8503,FFFF
net12 10E0000134CC7C 10E0000134CC7D 1 up 8503,FFFF
net13 10E0000134CC7C 10E0000134CC7E 2 up FFFF

root@s1:~# dladm create-part -l net10 -P 0xFFFF bondib0_0
root@s1:~# dladm create-part -l net11 -P 0x8503 bondib0_1
root@s1:~# dladm create-part -l net12 -P 0x8503 bondib1_0
root@s1:~# dladm create-part -l net13 -P 0xFFFF bondib1_1

root@s1:~# ipadm create-ip bondib0_0
root@s1:~# ipadm create-ip bondib0_1

root@s1:~# ipadm create-ip bondib1_0
root@s1:~# ipadm create-ip bondib1_1

root@s1:~# ipadm create-ipmp -i bondib0_0 -i bondib0_1 bondib0
root@s1:~# ipadm set-ifprop -p standby=on -m ip bondib0_1

root@s1:~# ipadm create-ipmp -i bondib1_0 -i bondib1_1 bondib1
root@s1:~# ipadm set-ifprop -p standby=on -m ip bondib1_1

root@s1:~# ipadm create-addr -T static -a local=s1-priv1/24 bondib0/v4
root@s1:~# ipadm create-addr -T static -a local=s1-priv2/24 bondib1/v4

root@s1:~# ipadm
NAME CLASS/TYPE STATE UNDER ADDR
bondib0 ipmp ok -- --
bondib0/v4 static ok -- 192.168.3.2/24
bondib1 ipmp ok -- --
bondib1/v4 static ok -- 192.168.3.3/24
bondib0_0 ip ok bondib0 --
bondib0_1 ip ok bondib0 --
bondib1_0 ip ok bondib1 --
bondib1_1 ip failed bondib1 --
lo0 loopback ok -- --
lo0/v4 static ok -- 127.0.0.1/8
lo0/v6 static ok -- ::1/128
net14 ip ok -- --
net14/v4 static ok -- 169.254.182.77/24
net16 ip ok -- --
net16/v4 static ok -- 172.16.16.11/24

Please let me know if I am making any mistake.

Best Regards,
Raghu

Posted by guest on June 11, 2014 at 09:22 AM EDT #

Hi Raghu,

I don't see any problem with the steps that you've taken. This type of problem is typically a hardware problem. You might want to disconnect and reconnect the cable on PCIE3/PORT2, make sure that the card is seated well in the PCIE slot, check the connection at the IB switch, and check the status of the connection as seen from the switch.

Maybe one of these will be helpful:

* Monitoring and Troubleshooting IB Devices
- http://docs.oracle.com/cd/E23824_01/html/821-1459/gjwwf.html
- ibping and/or ibstat

* Managing Oracle® Solaris 11.1 Network Performance
- Failure Detection in IPMP (Page 76)
- Monitoring IPMP Information (Page 102)
- http://docs.oracle.com/cd/E26502_01/pdf/E28993.pdf

* Check your failback mode

* Results returned from:
# ipmpstat -i
# ipmpstat -p
# dladm show-part

Good luck and please let me know if you learn anything useful.

Posted by guest on June 12, 2014 at 01:10 PM EDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Jeff Taylor-Oracle

Search

Archives
« April 2015
SunMonTueWedThuFriSat
   
1
2
4
5
6
8
9
10
11
12
13
14
15
16
17
19
20
21
22
23
24
25
26
27
28
29
30
  
       
Today