Switches Inside Oracle's Engineered Systems

Continuing from my last blog about InfiniBand building blocks, now lets review the network switches used inside Oracle's Engineered Systems a little bit in detail. This will help you in understanding the overall integration, network design, architecture and troubleshooting in later articles.

There are total two category of network switches used to prepare computing environment inside the rack.

InfiniBand Switches - two models used depending on requirements
  1. Sun Oracle 36-port InfiniBand Switch
  2. Sun Oracle InfiniBand Gateway Switch

Ethernet Switch - primarily for management purposes

  1. Cisco Catalyst 4948

The following table will get you started quickly and save me a lot of writing.

IB Ports
IB Port 
Sun Oracle 

Sun Oracle 
10Gbps per port
0A-ETH-[1 to 4]
1A-ETH-[1 to 4]

48 [1-48]

Let me first give you some more insight on the InfiniBand switches and then we will talk about the Cisco Catalyst 4948. The following picture shows the 36-port IB switch. Gateway switch also looks similar with slight difference for the EoIB ports on extreme right.

Common information that applies to both of these InfiniBand switches
  1. Form Factor: One rack unit (1U) height
  2. Power Supplies: Two
  3. Cooling Fans: Five
  4. IB Subnet Management: Yes
  5. Firmware Upgradeable: Yes
  6. Command Line Access: Yes. Via ssh and usb-serial access
  7. Web Based Management: Yes
  8. SNMP Access: Yes

As you might have figured out by now that the IB Gateway switch is almost like a super set of 36-port switch in terms of features and capabilities.

Differences between 36-port and Gateway InfiniBand switches

Comparatively, there are four additional IB ports on 36-port switch. On the Gateway switch these are internally consumed to enable Ethernet over InfiniBand (EoIB) functionality. I am sure you are wondering how this is done. The simple explanation here is that there are two additional hardware devices installed inside IB Gateway switch. These are called Bridge-X, each of which internally connects to InfiniBand fabric via two IB ports. Hence, I showed the math of 36-4=32 in the table above. Towards the external world, they expose EoIB ports as 0A-ETH and 1A-ETH in QSFP+ form factor. But all devices in the the Ethernet world may not understand QSFP+ and we are not commonly using 40Gbps Ethernet too, so these are split into four (4) SFP+ at 10Gbps signalling rate each. Thats why the final port label on EoIB side is 0A-ETH-[N] and 1A-ETH-[N] where N has a fixed value from 1 to 4.

Why do we have two Ethernet ports on the InfiniBand switches ?

For those who have seen or will get their hands on these two InfiniBand switches, let me clarify something about the Ethernet management port. Visually, you will see two RJ45 ports on the switch but there is only one target interface inside. There is a small bridge inside the switch which connects to the management Ethernet and provides two connections to outside world. No, this is not for redundancy or high availability. It is there to allow you to create linear bus topology, if you need it. In simple term, you can daisy chain more than one such switch.

What about these Leaf and Spine switches ?

Okay, now that I have talked about these two InfiniBand switches... let me introduce you to two keywords which you will be hearing a lot and this will set the ground for further discussions.

  1. Spine Switch
  2. Leaf Switch

These are roles of a switch in the topology or connectivity layouts. I may write more about the topologies later but for now lets just keep this blog short, concise and in context of Oracle's Engineered Systems.

The switch where hosts are directly connected takes up the role of Leaf Switch.

The switch where there are no direct hosts attached but does have inter switch links (ISL) to provide alternate paths or for expanding the fabric takes up the role of Spine Switch.

In Exadata and SuperCluster racks, both roles are provided by 36-port InfiniBand switches.

In Exalogic racks, Leaf role is provided by Gateway switches whereas Spine role is provided by a 36-port switch.

How is the InfiniBand connectivity and topology build out

Consider all hosts with one dual-port HCA installed in their PCI-E slots. Connect port-1 to designated leaf switch-1 with an IB cable. When you are done, this completes a star topology. Now repeat the same on port-2 but this time use designated leaf switch-2. So, each host is connected to two leaf switches via independent port. This sets up your dual star topology. But wait, we need some inter switch links also. Why ? To ensure guaranteed communication in an asymmetric topology. For example, host A may be using port-1 while host-B may switch to port-2 for some reason.

Inter switch links may be as simple as cables between two leaf switches or they may go through another switch, which is known as Spine switch. I will not go into micro level details here as you can read more about how ISLs are chosen in various rack configurations in respective product guides.

Cisco Catalyst 4948

Each host and end point has a management network port. This is always Ethernet based. Cisco 4948 switch integrates all such management ports inside the rack. Everything is pre-wired and all you need is to connect an uplink from this Cisco switch to your data center access switch. Now be careful and do not connect two cables into your data center access switch without planning for Spanning Tree Protocol. This switch is fully managed and also provides VLAN capabilities based on 802.1Q specifications. By default, all hosts inside rack connected to this switch are on same VLAN.

Overall Network Design

At a very high level, we have the following setup:

  1. Ethernet based management network served through Cisco Catalyst 4948 switch
  2. InfiniBand internal network served through InfiniBand switches in redundant configuration for high availability
    • This network facilitates all the internal communications within the Engineered Systems framework
  3. Ethernet based external world connectivity
    • In Exadata and SuperCluster, this is achieved via physical 10Gbps Ethernet from individual hosts. There are dedicated 10Gbps NICs installed in hosts. Their switching environment is outside of the rack.
    • In Exalogic, this is is achieved via virtual 10Gbps Ethernet from individual hosts. We have been referring to this as EoIB. From hosts' view, there is no additional hardware or cable. Same IB media path carries this traffic as well.

Next time, I will talk more about the virtual networks that are carried over this physical network. Thanks for reading and I welcome all your comments and questions.


Keep up the great work, I am wondering if vLan tags are fully supported on the IB switches? and another question is about lacp support between two IB switches for fail over?

Posted by Eli Kleinman on February 13, 2012 at 06:45 AM PST #

Hi Eli, these topics are next on my agenda to publish. The short answer is that InfiniBand supports virtualization on LAN using partition keys (Pkeys) at layer 2. LACP in ethernet world is a little different but we do have alternate paths and fail over mechanisms in InfiniBand switches as well. More about it later. Thanks !

-Neeraj Gupta

Posted by Neeraj Gupta on February 13, 2012 at 09:22 AM PST #

HI Neeraj

Great blog by the way !!!

It's interesting you mentioned "Bridge X" for EoIB. Is this the same technology as the Mellanox BridgeX BX5020 ?

I was trying to find out if oracle solaris 11 would be compatible with the Mellanox gateway switch for EoIB but no one could say for sure ?

Regards, Daniel

Posted by Daniel on March 30, 2012 at 10:11 AM PDT #

Hi Neeraj,

I have a basic question about how the 4 GW switches in a full rack are used.

I understand that there are 2 IB ports per compute node, and 30 compute nodes in a full rack.

IB port-1 on each node --> 30 ports --> IB switch1
IB port-2 on each node --> 30 ports --> IB switch2

How are IB switches 3 and 4 in a full rack connected?

Posted by guest on July 15, 2012 at 09:44 AM PDT #

Great insights! Thank you very much for taking the time to share your knowledge & keep the Tech Community up to date

Posted by guest on October 22, 2013 at 07:48 AM PDT #

great blog

Posted by guest on August 11, 2014 at 11:24 AM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed

You have connected here over internet and already using the technologies under the hood for Networking and may be wondering how things work ?

This blog space will present you with various topics related to Oracle's Products and their close association with Networking. My goal is not to overwhelm you and I will try my best to present information in simple way.
Stay tuned !

About Author: Hi, I am Neeraj Gupta at Oracle. I worked at Sun Microsystems for 11 years specializing in InfiniBand, Ethernet, Security, HA and Telecom Computing Platforms. Prior to joining Sun, I spent 5 years in Telecom industry focusing on Internet Services and GSM Cellular Networks.
At present, I am part of Oracle's Engineered Systems team focused on Networking and Maximum Availability Architectures.


« June 2016