When using cloud architectures, consider how cloud service providers (CSP) enable your own business continuity, disaster recovery, and solution availability. CSPs apply a region-based approach that utilizes multiple availability zones in a single region. However, because most CSPs deploy a single region in each country, you end up relying on other regions outside your country that might not meet your disaster recovery requirements.
This blog explores the benefits of the Oracle Cloud Infrastructure (OCI) two-region design strategy to provide resilience for your mission-critical workloads. We examine some of the reasons why we believe that a two-region design offers the best approach against other alternatives, such as a single-region design with multi-availability zones. This blog examines the following broad areas:
Business continuity and service continuity
Latency and response times
Developing solutions for high availability/disaster recovery
At its top level, OCI’s region design consists of a realm, which is a logical collection of regions. A region in OCI is a single geographic location that contains one or more availability domains. An availability domain is one or more data centers that host the OCI cloud resources such as instances, volumes, and subnets. Inside an availability domain, Oracle has designed and deployed a technology called fault domains, which provide an optimized architecture for delivering high availability.
Fault domains are a means for Oracle to partition a single physical data center into three distinct logical data centers, which allow customers to implement an antiaffinity pattern to their high availability architecture. Fault domains allow for the isolation of OCI resources during hardware failure or unexpected software changes. So, a hardware failure or compute hardware maintenance event that affects one fault domain doesn’t affect the other two fault domains.
Consider the following key concepts:
Realm: A logical collection of regions. Realms are isolated from each other and don’t share any data. Your tenancy exists in a single realm and can access the regions in that realm.
Region: A collection of availability domains, such as data centers, located in a single geographic location.
Availability domain: One or more data centers located in a region. availability domains are isolated from each other, fault-tolerant, and unlikely to fail simultaneously. They don’t share physical infrastructure or the internal availability domain network, so a failure that impacts one availability domain is unlikely to impact the others.
Fault domain: A grouping of hardware and infrastructure within an availability domain. fault domains let you distribute your instances so that they aren’t on the same physical hardware within a single availability domain. As a result, an unexpected hardware failure or hardware maintenance that affects one fault domain doesn’t affect instances in other fault domains.
Backbone: OCI regions rely on a state-of-the-art network technology and design. Each region is interconnected by a dedicated infrastructure backbone designed to enable your workloads by providing a high performance, reliable and scalable transport. The backbone is a dedicated, secure network for interconnecting regions. The backbone network provides privately routed interregion connectivity with consistent interregion performance for bandwidth, latency, and jitter. This setup enables workloads including disaster recovery, real-time replication, clustering, and other scenarios.
A high-availability solution is one that has no single point of failure. You can design solutions to utilize multiple regions, multiple availability domains, or multiple fault domains, depending on the type of failures you want to protect against and the realm specification you require.
You can achieve high availability at many different levels, including the application and cloud infrastructure levels. Configuring high availability for applications requires similar effort, whether you’re using availability domains or fault domains while attaining similar benefits.
While availability domains and fault domains serve similar functions, fault domains are physically closer together than availability domains and therefore have lower latency times across the network. The connectivity is also less susceptible to external risk factors. For latency-sensitive applications, this factor can be key in deciding which of them better fit your high availability needs.
Resilient cloud architectures should include a backup of your data, applications, and operating environments to ensure continuity of your operations for when—not if—your region goes down because of configuration changes, network outages, equipment failures, or natural disasters. An application or solution architecture might store data in different services. For example, a backup strategy for an application should cover the following factors:
Data in storage services
Data in databases
Verification of backup integrity and processes
Validation of the backup security and encryption
Replication of data for disaster recovery
Let’s consider replication of data for disaster recovery. OCI provides several approaches that can be used depending on where data resides that makes disaster recovery easy. With two regions, you can use the following OCI services to replicate data from one region to the disaster recovery region:
Block Volume: The OCI Block Volume service enables you to perform ongoing automatic asynchronous replication of block volumes, boot volumes, and volume groups from the primary region to the secondary region.
Object Storage: The OCI Object Storage service supports replication that provides protection from regional outages, aids in disaster recovery efforts, and addresses data redundancy compliance requirements. You can use it to replicate the objects in one bucket to another bucket in a different region.
File Storage: The OCI File Storage service supports replication that provides protection from regional outages, aids in disaster recovery efforts, and addresses data redundancy compliance requirements. You can use it to replicate the data in one file system to another file system in a different region.
Oracle Database: OCI offers various options for database replication, depending on your needs and your choice of Oracle Database service.
Oracle Data Guard provides comprehensive data protection, high availability, and disaster recovery for Oracle Database by maintaining a synchronized physical replica (standby) of a production database in a different region.
Oracle GoldenGate is an advanced logical replication product that supports multimaster replication, hub-and-spoke deployment, and data transformation. GoldenGate provides you with flexible options to address the complete range of replication requirements, including heterogeneous hardware platforms.
OCI Full Stack Disaster Recovery: Information about the service is provided later in this blog.
Without a two-region approach, the backup is only available for as long as the region storing it is accessible.
Geography in the cloud is an important risk factor that you must consider for fault tolerance. A natural disaster, power outage, or other human-induced event can take out a data center, an availability domain, or an entire region. An architectural design that relies on multiple availability domains in a single region can’t tolerate events that affect the entire region.
Geographical redundancy and a supporting backup strategy are a crucial part of any business continuity and disaster recovery plan that enables organizations to recover from unplanned disruptions. Geographical redundancy constitutes two (or more) regions across multiple data centers in different geographic locations that are separated by a minimum distance. This geography allows for the distribution of mission-critical infrastructure and components. Consequently, applications that utilize a two-region design are more resilient than applications in a single region with a single availability domain.
Only a multiregion application architecture, which is an industry best practice, can survive an entire region being unavailable. Both Amazon Web Services (AWS) and Microsoft Azure have also expressed this option.
A cascading failure can cause the entire region to fail. If your cloud resources are limited to only a single affected region, downtime and loss of business continuity can occur. A cascading failure is a situation that can arise from failure of certain components that lead to the failure of other components, growing progressively and triggering a ripple effect. One example of a cascading failure was an issue in a single region that started with a CSP’s data streaming service, which then caused issues across several other dependent services. This issue had a consequential impact for thousands of online services using that cloud platform.
OCI’s Full Stack Disaster Recovery (FSDR) is a disaster recovery orchestration and management service that enables you to automate disaster recovery seamlessly and quickly in a two-region environment. It provides comprehensive disaster recovery capabilities for all layers of an application stack, including infrastructure, middleware, database, and application. You can use the FSDR service to create dedicated disaster recovery configurations for each of your application stacks that requires disaster recovery protection. It generates, runs, and monitors disaster recovery plans for services and applications deployed in your tenancy.
With Full Stack Disaster Recovery, you don’t need to spend time reconfiguring systems to incorporate disaster recovery. This simplification significantly reduces the development effort, timeline, and budget by helping you implement robust disaster recovery at the beginning of the project instead of a post-production add-on. To find out more about Full Stack Disaster Recovery, visit the service page.
To help ensure the highest rate of serviceability uptime in OCI regions, all changes including planned maintenance are sequentially deployed across regional pairs within a realm. In a regional pair, one region will be defined as the primary and the other region as secondary; the primary is always updated before the secondary, ensuring that only resources in one of the regions within a pair are being actively changed at any point in time. The health of the resources in the primary region that are subject to the change are then monitored for a minimum period, for validation purposes, before applying the update(s) in the secondary region within the pair.
Oracle’s unique dual-region cloud strategy enables you to deploy resilient services in multiple geographically separated locations. To help you build true business continuity and disaster protection, Oracle plans to establish at least two cloud regions in almost every country where it operates, regardless of the OCI deployment type. For example, in the OCI commercial realm, the following countries already have two cloud regions: US, Canada, UK, France, South Korea, Japan, Brazil, India, United Arab Emirates, and Australia.
OCI offers the following deployment options:
Oracle public cloud: Oracle Cloud regions accessible to any customer. It offers 41 cloud regions in 22 countries around the globe, with more to come. In 10 countries and across the EU, OCI has two or more cloud regions that enable availability in the event of a disaster without data leaving the borders of these territories. Many organizations operating in these locations can run cloud workloads in-country to meet their data residency and availability requirements.
Oracle Sovereign Cloud: Realms that meet the needs of commercial and governmental organizations
Sovereign cloud regions for the European Union: Launching in 2023 with data centers in Spain and Germany, Oracle’s new EU sovereign cloud regions are logically and physically separate from the existing Oracle commercial cloud regions in the EU. Both private companies and public sector organizations can use these new regions to host data and applications that are sensitive, regulated, or of strategic regional importance.
Dedicated Region Cloud@Customer: Designed for customers looking for a complete OCI region in their own data center with the agility, scalability, and economics of OCI public cloud.
Oracle Cloud Isolated Region: Regions secured to the highest government classification standards that are air-gapped with no internet connection. Oracle Cloud Isolated Regions deliver identical services and tools available in public Oracle Cloud regions to enable customers to take advantage of a continuous pace of innovation. These regions are supported by in-country, government-cleared personnel and only accessible from your secure government networks. Oracle Cloud Isolated Regions meet the most demanding data sovereignty and security requirements and are designed to ensure mission continuity and embrace industry standards with a two-region architecture model that provides both high availability and disaster recovery.
Oracle’s dual-region sovereign UK Government cloud: Oracle’s sovereign cloud designed to reflect the requirements of the UK government that consists of two internet-connected cloud regions operated by UK citizens who hold SC level security clearance. This dedicated dual-region cloud is built to support official-sensitive workloads, providing a fully sovereign model where your information (customer or operational) never leaves the environment without your expressed permission.
US: A highly secure, enterprise-scale cloud ecosystem isolated from commercial customers and built to support regulatory-compliant, mission-critical US public sector workloads. Oracle Cloud’s US National Security Regions offer an air-gapped, isolated cloud to support secret and top secret classified data. Oracle Cloud for US Government is a FedRAMP High JAB–accredited cloud for US federal and civilian agencies and state and local offices. Oracle Cloud for the US Defense Department is Impact Level 5–accredited to support the Department of Defense’s most sensitive unclassified data.
OCI’s two-region design strategy, combined with our out-of-the-box tools and features for developers, offers you a superior balance of high availability and disaster recovery. Our two-region strategy makes it easy for you to implement your resiliency requirements for your mission-critical workloads.
We know that every use case is different. The only way to know if OCI is right for you is to try it. You can select either the Oracle Cloud Free Tier or a 30-day free trial in our commercial regions, which includes credits to get you started with a range of services, including compute, storage, and networking. If you prefer Oracle Cloud Isolated Region, Oracle Dedicated Region Cloud@Customer, or Oracle National Security Regions (ONSR), consult your Oracle sales representative for a proof of concept in the appropriate region.
For further information regarding Oracle Cloud Infrastructure’s two-region strategy, high availability, and disaster recovery, see the following resources:
Dominic Velden is a Senior Product Architect in OCI's Global Government Sector team with a focus on Sovereign Cloud. Dominic has over 20 years experience working in the technology sector, and specializes in the architecture, security, and privacy of Cloud.