ODSA disaster recovery best practices: Exadata Database and Base Database services

January 16, 2023 | 10 minute read
Andrea Marchesini
Director of Product Management OCI Multicloud Platform
Text Size 100%:

Oracle Database Service for Microsoft Azure (ODSA) is an Oracle-managed service that enables customers to easily provision, access, and operate enterprise-grade Oracle Database services in Oracle Cloud Infrastructure (OCI) with a familiar Azure-like experience. The new ODSA service facilitates the OCI-Azure Interconnect to simplify the setup, management, and connectivity of Azure applications to databases running in OCI.

One of the most critical aspects to consider while designing an enterprise grade solution is to ensure high availability and business continuity even in case of a disaster. A disaster is a sudden and unplanned event, such as an accident, a natural catastrophe, or a distributed network outage, that causes significant damage or loss in a vast geographic area. A well-architected disaster recovery solution helps to reduce harm or disruption and smoothly recover as quickly as possible in the event of a disaster that leads to system failure.

This article is the first of a series that discuss disaster recovery best practices for the most common ODSA scenarios, including Oracle Autonomous Database, Exadata Database service, Base Database service, and regional and cross-region architectures. In this blog, we focus on ODSA’s best practices for disaster recovery across different cloud regions with some guidelines to ensure that both the application stack and database tier based on Exadata Database service or Base Database service continue serving you if a failover is triggered.

Disaster recovery considerations when using ODSA

Let’s review the main considerations to identify a deployment strategy that meets your organization requirements when disaster occurs, including the following examples:

  • Recovery time objectives (RTO) and recovery point objectives (RPO) expectations for both the database and the application layers

  • Latency between primary and disaster recovery regions and the latency between the regions and the final users or consumers for your application

  • Consider the data residency requirements and regulation for your application’s data to select the most appropriate failover cloud region.

Identifying and selecting the most appropriate cloud regions’ locations is critical to meeting these requirements.

OCI and Azure have several interconnected regions across the globe—12 at the time of publication, with new locations being planned. We recommend reviewing the documentation for the current list of regions. Although ODSA doesn’t come with a predefined failover region, table 2 shows the preferred failover region based on the considerations and the latency data provided by the OCI interregion latency dashboard.

A graphic depicting the OCI regions across the globe.
Figure 1: OCI cloud regions

 

Geographic region

Primary cloud regions

Preferred disaster recovery regions

Asia Pacific
(APAC)

OCI Japan East (Tokyo)–Azure Tokyo

OCI Singapore (Singapore)–Azure Singapore

OCI South Korea Central (Seoul)–Azure Seoul

OCI Singapore (Singapore)–Azure Singapore

OCI Japan East (Tokyo)–Azure Tokyo

OCI South Korea Central (Seoul)–Azure Seoul

OCI South Korea Central (Seoul)–Azure Seoul

OCI Singapore (Singapore)–Azure Singapore

OCI Japan East (Tokyo)–Azure Tokyo

Europe, Middle East, Africa
(EMEA)

OCI Germany Central (Frankfurt)–Azure Frankfurt 1 & 2

OCI Netherlands Northwest (Amsterdam)–Azure Amersterdam2

OCI UK South (London)–Azure London

OCI Netherlands Northwest (Amsterdam)–Azure Amersterdam2

OCI Germany Central (Frankfurt)–Azure Frankfurt 1 & 2

OCI UK South (London)–Azure London

OCI UK South (London)–Azure London

OCI Germany Central (Frankfurt)–Azure Frankfurt 1 & 2

OCI Netherlands Northwest (Amsterdam)–Azure Amersterdam2

OCI South Africa Central (Johannesburg)–Azure Johannesburg

OCI Germany Central (Frankfurt)–Azure Frankfurt 1 & 2

OCI UK South (London)–Azure London

Latin America
(LATAM)

OCI Brazil Southeast (Vinhedo)–Azure Campinas

OCI US West (Phoenix)–Azure Phoenix

OCI US West (San Jose)–Azure Silicon Valley

North America
(NA)

OCI Canada Southeast (Toronto)–Azure Canada Central

OCI US East (Ashburn)–Azure Washington DC 1 & 2

OCI US West (Phoenix)–Azure Phoenix

OCI US East (Ashburn)–Azure Washington DC 1 & 2

OCI US West (Phoenix)–Azure Phoenix

OCI US West (San Jose)–Azure Silicon Valley

OCI US West (Phoenix)–Azure Phoenix

OCI US East (Ashburn)–Azure Washington DC 1 & 2

OCI US West (San Jose)–Azure Silicon Valley

OCI US West (San Jose)–Azure Silicon Valley

OCI US West (Phoenix)–Azure Phoenix

OCI US East (Ashburn)–Azure Washington DC 1 & 2

Table 1: Preferred failover regions

Crossregion disaster recovery design using ODSA

After the primary and disaster recovery regions have been identified and both application layer and database tier have been provisioned on the primary or production environment, we can define the disaster recovery plan for our solution. A crossregion approach provides resiliency in the rare cases of either a disaster event that makes a whole region unavailable or a failure of the low-latency interconnection network link.

We discuss the proposed solution under the following assumptions:

  • Both the primary and disaster recovery environments are hosted in ODSA-enabled regions (See table 1).

  • Both the application layer running on Azure and the database layer running on OCI are deployed in the same geographical location at any time.

These requirements are meant to provide an easy path to consistently design a disaster recovery architecture applying ODSA capabilities and to always ensure the lowest latency between the application stack and the database tier, even in the event of a disaster. This architecture provides an effective solution for the most common scenarios. However, you can achieve more articulated architectures by adopting OCI and Azure Interconnection.

Figure 2 shows the disaster recovery capability for a split-stack solution across regions between OCI and Azure.

 

A graphic depicting the architecture for a disaster recovery solution in OCI.
Figure 2: Disaster recovery capability for a split-stack solution across regions between OCI and Azure.

 

Application layer on Azure

  1. Connect applications using the appropriate TNS connection string. How connections are established determines how efficiently applications can reconnect to the failover destination after a failure. The following TNS connection string is recommended for all Oracle drivers 12.2 and later:

    ALIAS =(DESCRIPTION = 
    
         (CONNECT_TIMEOUT=90)  (RETRY_COUNT=20)(RETRY_DELAY=3) (TRANSPORT_CONNECT_TIMEOUT=3)  
    
           (ADDRESS_LIST = 
    
                  (LOAD_BALANCE=on) 
    
                  ( ADDRESS = (PROTOCOL = TCP)(HOST=primary-scan)(PORT=1521))) 
    
               (ADDRESS_LIST = 
    
                   (LOAD_BALANCE=on) 
    
                  ( ADDRESS = (PROTOCOL = TCP)(HOST=secondary-scan)(PORT=1521)))       
    
              (CONNECT_DATA=(SERVICE_NAME = gold-cloud))) 

    You can tune the specific values, but the values quoted in this example are reasonable starting points. For more details, refer to the Application Checklist for Continuous Service for MAA Solutions.

  2. Replicate the application tier from the primary Azure region to the disaster recovery region using the Azure backbone network connectivity. The tools and processed to maintain primary and disaster recovery regions in sync can vary depending on the application’s components and the Azure services and resources involved. As an example, to migrate and synchronize Azure storage, you have several options to consider, including RoboCopy or AzCopy that support SMB Azure file shares, linux-based rsync, or AzureBackup. For more details and a comprehensive analysis, refer to Disaster recovery and storage account failover on Azure.

  3. Set up Azure Traffic Manager or OCI DNS Traffic Management to allow end-users to connect seamlessly to a secondary/ or standby application configured in another Azure region with the help of automation. Set up an automated process in the form of a script to detect the application failover in Azure and initiate a database switchover in OCI.

Database layer on OCI

  1. Through the ODSA console, provision the secondary database in the disaster recovery region.

  2. Log in to the Oracle Cloud Console and set up a private remote virtual cloud network (VCN) peering connection between the primary and disaster recovery regions. The traffic between OCI regions goes through the OCI backbone network connectivity.

  3. Manually set up a physical standby database to use with Oracle Data Guard to sync the primary database and standby database across OCI regions through the remote VCN peering.

  4. For Oracle Data Guard configurations, enable fast-start failover (FSFO) to allow the broker to automatically failover to the standby database in the OCI disaster recovery region in the event of losing the primary database.

  5. FSFO can run custom actions before and after the automatic failover occurs. So you can configure a process to initiate a switchover of the application layer running on Azure in the post-callout script that runs after the failover succeeds.

Conclusion

Preparation for a disaster isn’t an easy task. It requires a comprehensive approach that considers different business requirements and available architectures and encompasses those aspects into an actionable disaster recovery plan. The scenarios we’ve described provide guidelines to help select the disaster recovery approach that best fits your application deployment using a simple but effective failover and the disaster recovery configuration in your Oracle Cloud Infrastructure and Azure environments.

For more information, see the following resources:

Andrea Marchesini

Director of Product Management OCI Multicloud Platform


Previous Post

Running a deep learning workload with JAX on multinode multi-GPU clusters on OCI

Sanjay Basu PhD | 6 min read

Next Post


Configuring Identity and Access Management (IAM) policies to use Full Stack DR

Suraj Ramesh | 6 min read