Whether it is a cloud, an on-premises, or a hybrid environment, the question asked most often is “What is the right solution to protect my data from failures – with the least amount of downtime and data loss?” Well, one choice that comes to mind is to replicate the data to another physical site. This means all the data changes are physically copied to a remote location using some application-agnostic replication technology. You can then activate the remote site in the event of a primary failure. So far so good. Which means “replication” is the solution which fits the bill. The default replication choice is to just replicate all the changes happening to the data at the storage level. Simple enough. Right?
Not really. For most common deployments using generic storage-based replication, the replication simply stops at the storage level. It never gets deeper. When the replication happens at the block level, the storage system has no clue about the type of data it is replicating. It just replicates whatever you throw at the storage, faithfully. There is both a good and a really bad aspect to this method.
Peeling back some layers you will find that you opened a big can of worms. To start with, what if you accidentally delete a file? Guess what, that deletion gets replicated and the file is lost in the remote site too. What if there is a logical corruption on your production database? That too gets replicated faithfully. What if you want to offload your read-only analytical queries or reporting to the remote site while the replication is going on? Good luck with that, as mirrored copies of data are ‘dark’. The point is, while application agnostic storage replication seems to be easy, it just adds complexity when replicating databases that store your mission critical data.
To look at it in another way, take the database server as an example. The IO changes in the memory usually traverse through various layers – from memory to HBA to a storage switch to a storage controller and finally to an actual disk. All pieces of the stack have their own firmware. A software bug or any issue with any component in the stack can result in data corruption – which the storage is not aware of and it does not care about either. The corrupted data is replicated. I am not making this up, but there are real-world use cases. See the following:
This drives the point that database specific replication provides the best solution in detecting corruptions and offers a true data protection with reduced downtime and data loss.
For Oracle databases, Oracle Data Guard and Oracle Active Data Guard offer the best data protection and data availability solution for mission-critical databases that are the life-blood of businesses, large and small. It is important to note that Data Guard is not an island unto itself; it is one of many Oracle high availability technologies that, when integrated with each other, provide value that is greater than the sum of the parts. For example, Flashback Database features makes it possible to avoid rebuilding a failed primary database after a failover to its standby. Use of a flash recovery area automates the management of archive logs on both primary and standby databases. Data Guard is also integrated with Oracle RAC, Automatic Storage Management (ASM), Oracle Recovery Manager (RMAN) and the Zero Data Loss Recovery Appliance (ZDLRA). These integrations are not an afterthought – they are by design. Oracle has methodically inventoried the many sources of planned and unplanned downtime from countless customer environments and is following a blueprint to address all possible causes of downtime using capabilities integrated with the Oracle Database. Taken together, these capabilities define the Oracle Maximum Availability Architecture (MAA).
Data Guard, an important MAA technology, operates on a simple principle: ship redo, and then apply redo. Redo includes all of the information needed by the Oracle Database to recover a database transaction. A production database, referred to as the primary database, transmits redo to one or more independent replicas referred to as standby databases. Data Guard standby databases are in a continuous state of recovery, validating and applying redo to maintain synchronization with the primary database. Data Guard will also automatically resynchronize a standby database that becomes temporarily disconnected from its primary database because of a network or standby outage. As shown in the following example, changes occurring in the production (primary) databases are captured and sent directly from the memory to the remote site, thus bypassing the whole IO stack before the data is replicated.
Bypassing the IO layer not only improves performance (avoids waiting for local disk writes); it also eliminates any chance of errors being replicated to the standby sites. Secondly, a transaction may trigger many physical file writes and changes to many blocks. Storage replication replicates ALL the blocks from ALL the files – thus consuming more IO bandwidth.
Data Guard replication not only dramatically reduces the amount of data transmitted, it also verifies the blocks before transmission to make sure they are logically correct. The data also gets verified at the remote site.
In addition to protecting data and making it available after an unplanned event, a major undertaking for most organizations is to drive planned downtime to zero. Data Guard provides many ways to minimize planned downtime associated with patching and upgrades.
Oracle Data Guard also offers Snapshot standby, a method of fully leveraging a physical standby database for QA testing and other activities that require a database that is independent of the primary and is open for both read and update operations.
Data Guard is the feature included with Oracle Enterprise Edition and provides all the Data Guard functionality for creating and maintaining standby databases, performing role transitions (switchover and failover) as well as using the standby for Read Write testing in Snapshot Standby mode. Read-only capability at the remote site and certain other features of Data Guard require the Active Data Guard license. In the Oracle Cloud, Active Data Guard is bundled with Oracle Database Exadata Cloud Service and the Oracle Database Cloud Service – Extreme Performance package. You can deploy up to 32 standby databases to a primary and you can cascade them. The choice is up to you. Detailing the various use cases is beyond the scope of this blog post, but we will cover them in subsequent postings.
Data guard and Active Data Guard provide many choices and great flexibility. If you require zero data loss, you can replicate the database in SYNC mode. If you are replicating over long distances (say you want to replicate between San Francisco and Boston), you can replicate using ASYNC mode.
A Data Guard physical standby database licensed for the Active Data Guard option can be open for read-only queries and reporting while continuously applying updates received from the primary database. This can improve primary database performance and response time by offloading queries to an active standby database. In addition, applications can use Global Temporary tables on the standby to perform transitory writes during reporting operations. Active Data Guard can defer or eliminate new hardware and software purchases by using existing standby databases, previously idle, that are already in place. No other solution on the market offers the simplicity, transparency, and high performance of the Active Data Guard standby for maintaining a synchronized replica of a production database that is open read-only. In addition to using the standby database for read only operations, Active Data Guard also can automatically repair disk corruptions by retrieving a good copy of the affected block from the standby database and automaticly repairing the primary, all of which is transparent to the user. Other Active Data Guard features include fast incremental backups from a standby, zero data loss failovers to a remote standby, real-time cascading, automated Oracle Database rolling upgrades and the use of Application Continuity and Global Data Services.
If you have a scalable Real Application Clusters (RAC) configuration on your primary, you can choose to have the same number of nodes on standby, or a reduced number of nodes as your standby. RAC and Data Guard greatly complement each other and provide high availability for both planned (rolling updates, patching, upgrades) and unplanned downtime (Disaster Recovery).
I hope this article has provided a useful overview of Data Guard and explains why this approach is superior for Oracle Database protection when compared to storage level replication.
For more details, refer to