This blog post specifically explores how to use Border Gateway Protocol (BGP) for resiliency and high availability for IP-based applications (not DNS-enabled) hosted in Oracle Cloud Infrastructure diverse regions. The scope is limited to IPv4 addresses, but the solution presented also works for IPv6 services with some additional configuration. Most of these applications fall in the IoT application domain.
Because of the implementation of ubiquitous connectivity for the Internet of Things (IoT), devices like sensors and gateways communicate back to central processors hosted in cloud data centers. I have used this solution as a way to achieve resiliency between IoT endpoints and diverse Oracle Cloud Infrastructure regions.
Although Oracle Cloud Infrastructure provides computation and data storage resources for IoT workflows across regional availability domains, resiliency or high availability for the connectivity from sensor edge services to the Oracle Cloud Infrastructure regions is always a challenge. Usually, IoT devices use IPv6 while the computation applications in cloud datacenters are only IPv4 aware. Another limiting factor is that most of the sensors can’t use DNS for the services running in cloud datacenters because of the low buffer space of the IOT devices. This negates any DNS-based high-availability solution.
The following Oracle Cloud Infrastructure services and open-source software are used in this solution:
For information about configuring the VCNs, subnets, and other Oracle Cloud Infrastructure constructs needed for this solution, see the following resources:
This solution focuses on the following components:
Note: This solution excludes the details of IoT-workflow-related compute and storage handling of the data collectors and analytics applications. This solution also doesn’t examine the detailed architecture of the IoT edge services.
The IoT application for this use case comprises sensors installed at gas pumps to measure oil surface temperatures and to detect any significant spill. The data is uploaded to the edge services for normalization before transmitting to the Oracle Cloud Infrastructure region for processing, where the IoT processing and analytics applications are running.
The edge services can run on the customer’s on-premises datacenters, in a colocation datacenter, or in the Oracle IoT Cloud. The focus of this solution is how to design the connectivity from the customer’s on-premises or colocation datacenter to dual Oracle Cloud Infrastructure regions like Phoenix and Ashburn.
Connectivity methods from the edge services datacenters can be private, dedicated circuits including IPSec VPNs, and public connections using internet IPv4 space. A pair of SDN routers are used at the FastConnect colocation for IPv6 to IPv4 translation or IPSec termination before peering with the FastConnect edge routers.
Both regions are connected by means of Oracle Cloud Infrastructure inter-region backbones for disaster recovery (DR) replication using a DRG at each end for remote peering. The DRGs are inherently highly available and configured in active-active mode at each regional end. The estimated throughput for each DRG per customer VCN is around 7 GBPS. If more bandwidth is required, multiple VCN and DRGs can be deployed.
The latency between regions over the backbone is around 60 ms. Customers can deploy traffic accelerators like Riverbed virtual appliances in their VCNs at either end for caching.
The logical view depicts the pair of redundant routers running in each of the Oracle Cloud Infrastructure PoPs. These routers are managed by the customer network teams or the Oracle Managed Cloud Services team. This is the control plane for the data path resiliency and high availability from the IoT sensors in the field to the IoT applications running across the Oracle Cloud Infrastructure regions.
Customers should provision dual circuits or IPSec VPNs using SDN routers on each of the transit PoPs. On the backend, the Oracle Cloud Infrastructure team would establish connectivity from the customer routers to the Oracle Cloud Infrastructure PoP routers by using cross-connect or peering points. Each transit PoP is connected to all three availability domains (datacenters) in the region.
There are multiple FastConnect transit PoPs (ingress/egress) for a region and multiple FastConnect routers per PoP. Each transit PoP has access to each of the availability domains.
All the connections from PoPs to the availability domains (ADs) are provisioned and managed by Oracle Cloud Infrastructure teams. Apart from planning and ordering connections, following are some of the follow-up tasks:
The next section discusses one of the two options for connecting the edge services to the Oracle Cloud Infrastructure regions.
In this scenario, the pair of SDN routers are placed in the same colocation facility that serves as the FastConnect PoP. The routers are establishing external BGP (eBGP) peer relationships with the other edge data center routers and the Oracle Cloud Infrastructure DRGs. For DRG configuration guidance, see https://docs.cloud.oracle.com/iaas/Content/Network/Tasks/managingDRGs.htm.
Information about BGP configuration is provided later in this post.
The customer routers are placed in the customer cage in the FastConnect colocation. Cross-Over cables are provisioned between the customer routers in the customer cage and OCI equipments in the OCI FastConnect cage. Both sets of equipments are configured in high-availability for layer 2 and layer 3.
The following graphic shows a logical view of the configuration:
FastConnect configuration information for setting up the circuit is located at https://docs.cloud.oracle.com/iaas/Content/Network/Concepts/fastconnectprovider.htm.
Oracle Cloud Infrastructure supports only IPv4 peering, and Oracle Cloud Infrastructure regions support both public and private peering.
Connect edge service resources via FastConnect to access public services in Oracle Cloud Infrastructure without using the internet (for example, Object Storage, the Oracle Cloud Infrastructure Console and APIs, or public load balancers in your VCN). Communication across the connection is with IPv4 public IP addresses. Without FastConnect, the traffic destined for public IP addresses would be routed over the internet. With FastConnect, that traffic goes over your private physical connection.
Connect IoT edge services infrastructure to a VCN in Oracle Cloud Infrastructure. Communication across the connection is with IPv4 private addresses (typically RFC 1918).
Following is a sample BGP configuration. The scenario has been simplified by representing the customer router pair at the Oracle Cloud Infrastructure PoP (Colocation) as a single router, focusing on the eBGP for path resiliency.
As depicted in the picture, to add resiliency to the edge services in case of a region failure, use AS Path prepending. AS Path prepending artificially lengthens the AS Path that is advertised to a neighbor to make the neighbor think that the path is much longer than it actually is.
For step-by-step configuration guidance for collocated routers, see the following resources:
As a result of this configuration, if there is an outage of the first (preferred) region, the IoT sensor network or the edge network will follow the second best preferred path advertised through BGP and reach the second region.
Note: All the IP addresses and ASNs mentioned here are for testing purposes only. Oracle Cloud Infrastructure uses the same ASN (31898) for all of its regions.