In recent months, our team has fielded various customer questions around Copy Data Management (CDM) products and suitability with Oracle databases, particularly for 1) large numbers and sizes of databases into 100+ TBs and 2) production recovery with respect to achievable Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO). Recalling an article published some years ago entitled "Snapshots are NOT Backups" and given the relevance with CDM products today, I thought it was important to re-visit key architectural differences and main purposes of these products vs. operational backup/recovery solutions for the database, with focus on Oracle's Zero Data Loss Recovery Appliance.
In Part 1, we highlight two key reasons why (CDM) snapshots are NOT backups, with regards to operational recovery procedures and database service levels:
1. Complex Recovery Procedure. CDM products are designed for..surprise..Copy Data Management, i.e. point-in-time data copies for test/dev/non-production use. For Oracle databases, these products utilize Recovery Manager (RMAN) image copies - typically written to NFS mounted CDM storage - to establish the base image, followed by storage snapshot operation. This second-tier storage is distinct from production storage with lower IOPS and bandwidth characteristics. The next day, an incremental backup is performed and merged back into the copy to create an updated base image..this is then followed by another snapshot operation, then incremental backup + merge and so on, as shown below.
1. Image Copy + Snapshot
2. Incremental Backup + Merge into Copy
3. New Snapshot of Latest Copy
4. New Incremental Backup + Merge into Copy
While this method can be leveraged for any disk destination, note for high change rate databases that the merge procedure can be as time-consuming and database/storage resource intense as a new full backup.
Snapshot copies are then restored as read-writable data files running on the same CDM storage - a process which some CDM vendors describe as “Instant Restore” - followed by normal archived log recovery process to open the database (more on that in #2 below).
This sounds reasonable for test/dev copies, but let's look at the operational recovery use case. Note that RMAN is only aware of the latest image copy on NFS mount, not the snapshots of older copy versions. To correctly achieve point-in-time recovery for addressing application or human errors, those files must be restored to alternate or original storage location, then cataloged by RMAN for use in database restore operations. Archived log backups must also be restored and cataloged as needed for recovery operations. Thus, executing these tasks requires careful coordination across storage, network, and DBA teams which prolongs critical, time-sensitive recovery operations, particularly for large database volumes.
Contrast this with Recovery Appliance, which offers true incremental forever and virtual full technology, where recoveries can be performed to any point-in-time with just the familiar RMAN 'restore' and 'recover' commands. This innovative approach also eliminates the daily incremental merge operation and resource overhead, freeing production databases to handle more critical business workloads.
2. Instant Restore is NOT Production Restore. While snapshot 'instant restore' sounds ideal for test/dev activities, it is not ideal for production restore scenarios. Instant restore involves redirecting storage block pointers on-disk to support read-write operations on the now-live snapshot files, while continuing to support incremental merge write operations on the RMAN image copies. When this heavy I/O activity is coupled with second-tier CDM storage, it is clear that production quality performance on the 'instantly restored' snapshot is just not realistic. Note also for Exadata databases, mounting CDM storage via NFS or SCSI with RMAN 'switch to copy' for purposes of running data files is not supported - see Using External Storage with Exadata (Doc ID 2663308.1) for details.
As most CDM vendors acknowledge, for databases to achieve the same read-write performance and availability levels as before, backups must be fully restored to the original production or equivalent alternate storage. This involves restoring the last full backup or image copy, restore and merge of incremental backups (i.e block changes) to the restored files, and finally restore and apply of archived logs to bring the restored files to consistent and as current time possible - only at that point can the database be opened.
For example below, a full backup is taken on Sunday followed by incremental backups on Monday through Saturday, with archived log backups taken throughout the week:
And so recovery would start with restoring the Sunday full backup, followed by restoring and merging of the daily incremental backups into the restored data files, and finally restore and apply of archived log backups. As seen, the restore and merge of incremental backups must be performed in sequence and for databases with high daily change rate, this process can prolong recovery time objectives.
Recovery Appliance speeds recovery as all incremental restore and merge operations - whether done before or during recovery - are completely removed, thanks to virtual full restore technology where all data file blocks for the requested full backup are directly retrieved. And when combined with high-speed 4x25GE connectivity, a single appliance can support restore rates up to 38 TB/hour. Net-net, faster production restore times are achieved while minimizing database overhead.
Stay tuned for Part 2 of this series, where we will present two additional key differences between snapshots and backups. In the meantime, see this insightful article on the fundamentals of RMAN incremental merge and how image copy-based recovery differs from operational point-in-time recovery.
As always, we welcome your questions and comments below!