Oracle Cloud Infrastructure (OCI) is a public cloud designed to support mission critical applications that need a secure, performant, and scalable infrastructure. OCI offers cluster networks, which deliver demonstrable performance gains through unique engineering approaches for remote direct memory access (RDMA) networking. Cluster networking is the foundation for core OCI services that rely on RDMA, such as Oracle Autonomous Database and Oracle Exadata Cloud service, and enables high-performance computing (HPC) and graphics processing unit (GPU)-based machine learning (ML) workloads.
RDMA relies on direct memory access between Compute hosts spread across a network, using the network in a way that eliminates the CPU overhead associated with the traditional network communication and delivers a very low network latency. Most applications using RDMA are sensitive to network latency and jitter. RDMA communication is also ultra-sensitive to packet loss and typically requires lossless networking for good performance.
RDMA requirements pose challenges for multitenant networks. The most common implementations of RDMA involve using InfiniBand or proprietary protocols. OCI, on the other hand, uses battle-tested and widely-deployed ethernet and internet protocol (IP) technologies in the form of RDMA over converged ethernet (RoCE) delivered over NVIDIA ConnectX SmartNICs. This process required solving the scaling and performance issues that have traditionally been associated with RoCE.
OCI cluster networks rely on several innovations and the following key properties of our underlying network:
A purpose-built dedicated RDMA network: Instead of sharing the RDMA network with non-RDMA traffic, OCI invested in a high-performance network dedicated for RDMA traffic. Having a common shared network involved performance compromises for the RDMA traffic, such as sharing queues, buffers, and bandwidth with non-RDMA traffic. Using a dedicated RDMA network has higher networking costs for OCI but delivers the best performance for RDMA-dependent workloads.
Eliminate network blocking issues: RoCE traditionally uses priority flow control (PFC) to create a lossless network, but using PFC can lead to network blocking issues in certain kinds of congestion. OCI cluster networking restricts PFC to the edge of the network and eliminates such network blocking issues. Instead, OCI relies on explicit congestion notification (ECN) and data center quantized congestion notification (DC-QCN) for RDMA congestion control.
Multiple queues and buffer tuning for tailored experience: OCI cluster networks support multiple different classes of RDMA workloads, each aiming for a different tradeoff of latency versus throughput. OCI implemented multiple RDMA queues, each tailored to provide a differentiated performance experience
OCI Cluster Network provides ultra-high performance¹ for the most demanding RDMA workloads. OCI designed our Cluster Network to support multiple different types of RDMA workloads with a tailored performance experience to match the requirements of each workload. In the accompanying video blog we discuss in detail how we went about doing so.
The choices we made to develop Oracle Cloud Infrastructure are markedly different. The most demanding workloads and enterprise customers have pushed us to think differently about designing our cloud platform. We have more of these engineering deep dives as part of this First Principles series, hosted by Pradeep Vincent and other experienced engineers at Oracle.
¹ OCI Cluster Network achieves <=2us microseconds latency for single rack clusters and achieves <=6.5 microseconds latency for multi-rack clusters.
For more information, see the following resources:
First Principles posts
Jag Brar is Vice President and Distinguished Engineer at Oracle Cloud Infrastructure (OCI). He has over 25 years of experience in the field including key technical roles at Oracle, Amazon, Arista Networks, AT&T, Telus, Electric Lightwave and Terabeam Networks. Jag has been with OCI for almost eight years where he has played critical roles in Physical Networking, Virtual Networking, Compute, Database and Datacenter Engineering fields. He is an inventor or co-inventor in over 50 patent applications.
Pradeep Vincent is the Chief Technical Architect and Senior Vice President at Oracle Cloud Infrastructure (OCI). He is a technology and software architect with more than 20 years of experience in tech companies such as Oracle, AWS, and IBM. He has a deep understanding of Cloud Infrastructure, Compute, Storage and Networking. Pradeep has been with Oracle for more than eight years leading a team of architects and software engineers building Oracle’s Public Cloud. He also leads OCI’s Architecture and Engineering Community initiatives.