The Oracle Cloud Infrastructure Compute service provides bare metal and virtual machine (VM) instances, which lets you deploy any size server that you need—from a small VM with a single core to a robust bare metal server with many cores and a large amount of RAM.
No matter the size of your compute instances, you need to ensure that they are highly available. To do so, consider the following design strategies when deploying your system:
Eliminate single points of failure, either by properly using an availability domain’s three fault domains or by deploying instances across multiple availability domains
Use floating IP addresses
Ensure that your design protects both the data availability and integrity of your compute instances.
This post describes the characteristics of high-availability systems and how to implement them in and across regions in Oracle Cloud Infrastructure (OCI).
When a computing environment is configured to provide nearly full-time availability, it’s a high-availability system. Such systems typically have redundant hardware and software that make it available despite failures. When failures occur, the failover process moves the processing performed by the failed component to the backup component. The more transparent that failover is to users, the higher the availability of the system.
Let’s consider the following example to show how you can architect a high-availability system in OCI. Suppose that you have a single web server connected to a single database and, of course, you have issues with unplanned downtime.
Figure 1: Example system that needs high availability
You decide to move this workload to OCI and architect it for high availability. Based on the previous concepts, you consider the following key elements:
Redundancy: Multiple components can perform the same task. The problem of a single point of failure is eliminated because redundant components can take over a task performed by a component that has failed.
Monitoring: Check whether a component is working properly.
Failover: A secondary component becomes primary when the primary component fails.
The best practices introduced in this post focus on these key elements. Although high availability can be achieved at many different levels, including the application level, here we focus on the cloud infrastructure level.
The following diagram shows an example of how you could make that system highly available in OCI (using MySQL database as an example). The rest of this post explains how to use elements such as availability domains and load balancers to achieve this high availability.
Figure 2: Achieving high availability in OCI
An Oracle Cloud Infrastructure region is a localized geographic area composed of one or more availability domains. Each availability domain has three fault domains. The redundancy of fault domains within the availability domains ensures high availability.
Availability domain: One or more data centers located in a region. Availability domains are isolated from each other, fault-tolerant, and unlikely to fail simultaneously. They don’t share physical infrastructure or the internal availability domain network, so a failure that impacts one availability domain is unlikely to impact the others.
Fault domain: A grouping of hardware and infrastructure within an availability domain. Fault domains let you distribute your instances so that they aren’t on the same physical hardware within a single availability domain. As a result, an unexpected hardware failure or hardware maintenance that affects one fault domain doesn’t affect instances in other fault domains.
All the availability domains in a region are connected to each other by a low-latency, high-bandwidth network. This predictable, encrypted interconnection between availability domains provides the building blocks for both high availability and disaster recovery.
Some OCI resources, such as virtual cloud networks, are specific to a region. Others, such as compute instances, are specific to an availability domain. When you configure cloud services that are specific to an availability domain, use multiple availability domains or fault domains to ensure high availability and to protect against resource failure. For example, by creating redundant compute instances in other availability domains or fault domains, you can avoid an impact to your applications by an issue that affects the primary instance or its domain.
You can design solutions to have multiple regions, multiple availability domains, or multiple fault domains, depending on the class of failures you want to protect against.
A high-availability system has no single point of failure. A key principle for designing such a system in OCI is to distribute your instances across multiple fault domains.
In a single-availability-domain deployment, by properly using fault domains, you can increase the availability of applications running on OCI. Your application’s architecture determines whether you separate or group instances by using fault domains.
Highly available application architecture: In this scenario, you have a highly available application—for example, two web servers and a clustered database. You group one web server and one database node in one fault domain and the other half of each pair in another fault domain. This architecture ensures that a failure of any one fault domain doesn’t result in an outage for your application.
Single web server and database instance architecture: In this scenario, your application architecture is not highly available—for example, you have one web server and one database instance. The web server and the database instance must be placed in the same fault domain. This architecture ensures that your application is impacted by the failure of only that fault domain.
Another approach to high availability is to deploy compute instances that perform the same tasks across multiple availability domains. This design removes a single point of failure by introducing redundancy.
The following diagram shows web server VMs deployed in two availability domains (AD1 and AD2) to implement redundancy:
Figure 3: Redundancy with multiple availability domains
Note: The architecture shows multiple availability domains. For a region that has a single availability domain, adjust the architecture to distribute your resources across the fault domains within the availability domain.
Depending on your system or application requirements, you can implement this architectural redundancy in either standby mode or active mode:
Standby mode: When the primary component fails, the standby component takes over. Standby mode is typically used for applications that must maintain their states.
Active mode: No components are designated as primary or standby. All components actively participate in performing the same tasks. When one of the components fails, the related tasks are distributed to another component. Active mode is typically used for stateless applications.
To provide highly available services on OCI, you need to use the Load Balancing service. Load Balancing improves resource use, facilitates scaling, and helps ensure high availability. It supports routing incoming requests to various backend sets based on virtual hostname, path route rules, or a combination of both. You can also create public and private load balancers, although a private load balancer is highly available only within an availability domain.
Figure 4: High availability on OCI using load balancers
To create a highly available web service in one region, you must create at least two instances in two availability domains and then use load balancing to balance the traffic between them. You would perform the following high-level steps:
Create two instances in two availability domains within a single region.
To create these instances and an associated virtual cloud network (VCN), you can follow the steps in Setting Up a Virtual Cloud Network (VCN) in Oracle Cloud Infrastructure.
Deploy the web service on these instances.
To deploy a web server that can show the source IP address, destination IP address, hostname, and so on, you can follow the steps in this repository. However, you can use any web server.
Create a load balancer instance to load balance the web service between these two instances.
To load balance traffic between these two instances, use a public load balancer. The following diagram shows a high-level view of a simple public load-balancing system configuration. Far more sophisticated and complex configurations are common.|
Figure 5: OCI public load balancer architecture
For more information about the Load Balancing service, see the documentation.
To create a load balancer, follow these detailed steps:
OCI uses the Traffic Management Steering Policies service to load balance traffic across regions. The service is a component of DNS. The service lets you configure policies to serve intelligent responses to DNS queries. As a result, different answers (endpoints) might be served for a query, depending on the logic that you define in the policy. Traffic Management Steering Policies can account for the health of answers to provide failover capabilities and the ability to load balance traffic across multiple resources. It can also account for the location where the query was initiated to provide a simple, flexible, and powerful mechanism to efficiently steer DNS traffic.
Note: Before you proceed, create an instance in another region. You can also use one of the instances that you created in the previous example. The goal is to have two instances running the same web server application in two regions.
The following components are used to build a traffic management steering policies:
Policy: A framework that defines the traffic management behavior for your zones. Steering policies contain rules that help to intelligently serve DNS answers.
Attachments: A way to link a steering policy to your zones. An attachment of a steering policy to a zone occludes all records at its domain that are of a covered record type, constructing DNS responses from its steering policy rather than from those domain’s records. A domain can have at most one attachment covering any given record type.
Rules: The guidelines that steering policies use to filter answers based on the properties of a DNS request, such as the request’s geolocation or the health of the endpoints.
Answers: The DNS record data and metadata to be processed in a steering policy.
Traffic Management Steering Policies has several types of policies, which are defined in the documentation. In this example, we’re using the Geolocation Steering policy, which distributes DNS traffic to different endpoints based on the location of the end-user. You can divide your global users into geographically defined regions (for example, state or province in North America, a country in the rest of the world) and steer customers to specified resources based on their location. This type of policy helps to ensure global, high-performing internet resolution and supports functions such as ring-fencing—for example, keeping traffic from Europe in Europe and blocking traffic outside of Europe from entering Europe.
To create a geolocation steering load balancer, you need to have a DNS zone.
A zone holds the trusted DNS records that will reside on Oracle Cloud Infrastructure’s nameservers.
In the OCI Console, open the navigation menu, go to Networking, and then click DNS Zone Management.
Click Create Zone.
In the Create Zone dialog box, choose Manual as the method.
Enter the following values:
Click Create. The system creates and publishes the zone, complete with the necessary SOA and NS records. The following image shows an example.
Figure 11: DNS zone
You need to add the NS records to your domain’s nameserver list so that your DNS sends the incoming requests to Oracle’s DNS server. This step depends on your domain provider. In this example, we use name.com as our domain name provider. The following image shows an example of how it looks like when we add Oracle’s NS records to name.com.
Figure 12: NS record propagation
Note: It takes hours for these records to propagate to the root domains. Until that happens, your load balancer doesn’t work.
To create a Geolocation Steering policy, follow these steps:
In the OCI Console, open the navigation menu, go to Networking, > and then click Traffic Management Steering Policies.
Click Create Traffic Management Steering Policy.
In the dialog box, select Geolocation Steering.
Enter a unique name that identifies policy.
Specify the Time to Live (TTL) value for responses from the steering > policy. If you don’t specify a value, the system sets this value > on the steering policy.
Define an answer pool, which contains a group of answers that are > served in response to DNS queries.
Enter a user-friendly name for the answer pool, unique within the > steering policy.
Enter a unique name to identify the answer.
Select the record type that will be provided as the answer. Choose > A as the type of the record.
For RDATA, enter a valid IP address to add as an answer. Enter > the public IP address of the instance you created in one of the > regions, such as Ashburn.
Select the Eligible check box to indicate that the answer is > available within the pool to be used in response to queries.
Select a location to use to distribute DNS traffic.
Select the priority in which the answers are served.
Add a global catch-all to specify answer pools for queries that > don’t match any of the rules you have added. Click Add Global > Catch-all and select the pool priorities.
Figure 14: Geolocation steering rules
In the Attach Health Check section, click Add New. Provide a > health check name and keep the default values for Interval and > Protocol as 30 seconds and HTTP.
In the Attach Domains section, provide the name of the > subdomain. This is the record that you will use to resolve your > web service.
Choose the compartment and zone to connect this subdomain to.
Figure 15: Geolocation Subdomain
After the policy is created, under Resources click Attached Domains, and then copy the domain name.
Open a browser and enter the domain name there to access the web service. If you are in the EU (as in this example), you see that the traffic is being served by the instances deployed in the EU (Frankfurt region).
Figure 16: Geolocation load balanced
The easiest way to verify whether you’re getting to the US instance sitting in the EU is to use a proxy server address in your laptop’s network configuration. The following image shows an example.
Figure 17: Geolocation reachability
This blog post provides an overview of Oracle Cloud Infrastructure’s high availability capabilities and how you can load balance traffic between two or more instances within a region and across regions. You can create two instances in two different availability domains within a region and load balance the traffic by using a public load balancer. You can also use DNS zone and traffic steering to load balance instances across geographical locations.
Every use case is different. The only way to know if Oracle Cloud Infrastructure is right for you is to try it. You can select either the Oracle Cloud Free Tier or a 30-day free trial, which includes US$300 in credit to get you started with a range of services, including compute, storage, and networking.