A blog about Oracle's Database Cloud Service Technology

Oracle Database High Availability in the Cloud Era!

Bob Thome
Vice President of Product Management

The Cloud doesn’t change everything.  Application agnostic data protection can offer some level of availability to the databases, but, database-aware high availability and disaster recovery features are vital  for mission critical database deployments in the cloud. Database-aware techniques provide superior continual database service availability during planned and unplanned outages.

The introduction of the cloud has changed the way many IT solutions are built.  Gone is the need to fully control every aspect of the process to get great security, performance, and availability.  Customers today are demonstrating, that yes, it is possible to run even mission critical applications in the cloud.  However, not all techniques for deploying a mission critical database in the cloud are created equally. 

Due to infrastructure limitations, application agnostic storage level snapshots, replication and even virtual machine failover have become commonly accepted techniques to improve availability for generic cloud deployments.  These fit well within most cloud infrastructures, where shared storage and redundant networks are not common, and replicated storage across availability domains exist.  These may be acceptable to non-critical application deployments, but do these replication solutions really do as good a job at high availability as sophisticated clustering and data protection solutions offered by Oracle?

On the surface, all may look good.  If your VM fails, you simply restart the database and service after a blackout of couple minutes. A common practice in many clouds, you can replicate the database to storage in the same or a different availability domain.  This protects from both server and instance failure, as well as an availability domain becoming unavailable due to network or power related failures.  It appears simple, but is it superior, to a cluster solution which offers both great scalability to distribute your workload and superior online failure protection? What happens if there is block corruption and storage faithfully replicates that block to the standby?  And under the surface, the reality is that minutes of downtime for your database often results in hours of downtime while restarting your entire solution stack.

Let us take clustering, for example.  Traditional clustering solutions rely on cold failover, where the database is restarted on another node after a failure.  These solutions provide simple high availability, and don’t rely on replication.  However, like replication solutions, it can take time to restart the database instance and the entire solution stack above it after a failure, plus the secondary resource is idle or cold when not used.

Real Application Clusters, builds upon cluster failover by running multiple active instances simultaneously against the same database files, providing both improved availability, and scalability across nodes in the cluster.  Because the instance and database service are already running on the surviving servers, there is no need to restart the database and mount the database files, thus maintaining continual database service availability.  Recovery after a failover is fast, as existing connections stay connected and failed connections can automatically get notified and reconnect.  More importantly, RAC provides a mechanism to maintain full database service availability for periodic software updates that include critical security fixes, operating system updates and database software updates (PSUs).   Instead of taking up to 2 hours of downtime per month for software updates, RAC in the Oracle cloud can enable zero downtime software maintenance for almost all software updates.

Figure 1:  A four-node RAC database cluster

Real Application Clusters has another benefit over storage replication and VM failover solutions, that’s not strictly related to high availability.  RAC enables workloads to scale out over multiple servers.  The largest servers in the Oracle Cloud provide 36 cores of power, which is equivalent to 72 vCPUs.  Using 2-node RAC, you’ve now introduced fast physical failover and online patching, plus a compute capability that now maxes out at 72 cores or 144 vCPUs.  Adding more servers to the cluster allows you to increase processing capability even further.  This extreme scalability eliminates the need to resort to other distributed database techniques to scale beyond a single server.  Some of these techniques will require custom application design, while in contrast, RAC scales off-the-shelf applications across servers.

Real Application Clusters can also combine with other Oracle database-integrated data protection techniques to provide a Maximum Availability Architecture (MAA). MAA extends the RAC benefits of elevated availability for local failures to larger systemic failures affecting the availability domain while also protecting against data corruption.  Using all these techniques together provide differentiated SLA’s and high performance database capabilities that will benefit and support the most demanding workloads. 

In the next posting on this blog, we will discuss in detail best practices on how to use Oracle’s Maximum Availability Architecture to best protect from disasters and data corruptions while reducing downtime for major upgrades.  We invite you to come back, and learn what other cloud vendors may not be telling you.


Join the discussion

Comments ( 1 )
  • Akash Gupta Tuesday, May 2, 2017
    Very informative blog
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.