By JoeMeeks on Nov 05, 2014
Welcome to the fourth installment in a series of blog posts describing MAA Best Practices that define four standard reference architectures for HA and data protection: BRONZE, SILVER, GOLD and PLATINUM. The objective of each reference architecture is to deploy an optimal set of Oracle HA capabilities that reliably achieve a given service level (SLA) at the lowest cost.
This article provides details for the Gold reference architecture.
Gold substantially raises the service level for business critical applications that cannot accept vulnerability to single points-of-failure. Gold builds upon Silver by using database replication technology to eliminate single point of failure and provide a much higher level of data protection and HA from all types of unplanned and planned outages.
An overview of Gold is provided in the figure below.
Gold delivers substantially enhanced service levels using the following capabilities:
Oracle Active Data Guard replaces backups used by Bronze and Silver reference architectures as the first line of defense against an unrecoverable outage of the production database. Recovery time (RTO) for outages caused by data corruption, database failure, cluster
failure, and site failures is reduced to seconds or minutes with an accompanying
data loss exposure (RPO) of zero or near zero depending upon
configuration. While backups are no longer used for availability, they are still included in the Gold reference architecture for archival purposes and as an additional level of data protection.
Active Data Guard uses simple physical replication to maintain one or more synchronized copies (standby databases) of the production database (primary database). If the primary becomes unavailable for any reason, production is quickly failed over to the standby and availability is restored.
Active Data Guard offers a unique set of capabilities for availability and Oracle data protection that exceed other alternatives based upon storage remote-mirroring or other methods of database replication. These capabilities include:
- Choice of zero data loss (sync) or near-zero data loss (async) disaster protection.
- Direct transmission of database changes (redo) directly from the log buffer of the
primary database providing strong isolation from lower layer hardware and software faults.
- Use of intimate knowledge of Oracle data block and redo structures to perform
continuous Oracle data validation at the standby, further isolating the standby database from corruptions that can impact a primary database.
- Native support for all Oracle data types and features combined with high performance capable of supporting all applications and workloads.
- Manual or automatic failover to quickly transfer production to the standby database should the primary become unavailable for any reason.
- Integrated application failover to quickly transition application connections to the new primary database after a failover has occurred.
- Database rolling maintenance to reduce downtime and risk during planned maintenance.
- High return on investment by offloading read-only workloads and backups to an Active Data Guard standby while it is being synchronized by the primary database.
Oracle GoldenGate logical replication is also included in the Gold reference architecture, either to complement Active Data Guard when performing planned maintenance or to use as an alternative replication mechanism for maintaining a synchronized copy (target database) of a production database (source database).
GoldenGate reads changes from disk at a source database, transforms the data into a platform independent file format, transmits the file to a target database, then transforms the data into SQL (updates, inserts, and deletes) native to a target database that is open read-write. The target database contains the same data, but is a different physical database from the source (for example, backups are not interchangeable). This enables GoldenGate to easily support heterogeneous environments across different hardware platforms and relational database management systems. This flexibility makes it ideal for a wide range of planned maintenance and other replication requirements. GoldenGate can:
- Efficiently replicate subsets of a source database to distribute data to other target databases. It can also be used to consolidate data into a single target database (for example, an Operational Data Store) from multiple source databases. This function of GoldenGate is relevant to each of the four MAA reference architectures and is complementary to the use of Active Data Guard.
- Perform maintenance and migrations in a rolling manner for use-cases
that cannot be supported using Data Guard replication. For example,
Oracle GoldenGate enables replication from a source database running on a
big-endian platform to a target database running on a little-endian
platform. This enables cross-platform migration with the additional
advantage of being able to reverse replication for fast fallback to the
prior version after cutover. When used in this fashion GoldenGate is complementary to Active Data Guard.
- Maintain a complete replica of a source database for high availability or disaster protection that is ready for immediate failover should the source database become unavailable. GoldenGate would be an alternative to Active Data Guard when used for this purpose. The primary use-case where you would use GoldenGate instead of Active Data Guard for complete database replication is when there is a requirement for the target database to be open read-write at all times (remember an Active Data Guard standby is open read-only).
Note that there are several trade-offs that must be accepted when using logical replication in place of Active Data Guard for data protection and availability
- Logical replication has additional pre-requisites and operational complexity.
- Logical replication is inherently an asynchronous process and thus not able to provide zero data loss protection. Only Active Data Guard can provide zero data loss protection
- It is obvious that a logical copy is not a physical replica of the source database. Rather than offload backups to a standby you must backup both source and target. Logical replication also can not support advanced data protection features that come with Data Guard physical replication: lost-write detection and automatic block repair.
Oracle Site Guard is optional in the Gold tier but is useful to reduce administrative overhead and the potential for human error. Site Guard enables administrators to automate the orchestration of switchover (a planned event) and failover (in response to an unplanned outage) of their complete Oracle environment - multiple databases and applications - between a production site and a remote disaster recovery site. Oracle Site Guard is included with the Oracle Enterprise Manager Life-Cycle Management Pack.
Oracle Site Guard offers the following benefits:
- Reduction of errors due to prepared response to site failure. Recovery strategies are mapped out, tested, and rehearsed in prepared responses within the application. Once an administrator initiates a Site Guard operation for disaster recovery, human intervention is not required.
- Coordination across multiple applications, databases, and various replication technologies. Oracle Site Guard automatically handles dependencies between different targets while starting or stopping a site. Site Guard integrates with Oracle Active Data Guard to coordinate multiple concurrent database failovers. Site Guard also integrates with storage remote mirroring that may be used for data that resides outside of the Oracle Database.
- Faster recovery time. Oracle Site Guard automation minimizes time spent in the manual coordination of recovery activities.
Gold = Better HA and Data Protection
Gold builds upon Silver by addressing all fault domains. Even in the worst cases of a complete cluster or site outage, database service can be resumed within seconds or minutes of a failure occurring. Gold eliminates the downtime and potential uncertainty of a restore from backup. Gold also eliminates data loss by protecting every database transaction in real-time.
Database-aware replication is key to achieving Gold service levels. It is network efficient. It enforces a high degree of isolation between replicated copies for optimal data protection and HA. It enables fast failover to an already synchronized and running copy of production. It achieves high ROI by enabling workloads to be offloaded from production to the replicated copy. As important as these tangible benefits are, there is the equally significant benefit of reducing risk. By running workloads at the replicated copy you are performing continuous application-level validation that it is ready for production when needed.
So what is left to address after Gold? There is a class of application where it is desirable to mask the effect of an outage from the end-user. Imagine you are the customer in the process of making a purchase online - you don't want to be left in an uncertain state should there be a database outage. Did my purchase go through, do I resubmit my payment, and if I do will I be charged twice? From a data loss perspective, what if the DR site is 100's or 1000's of miles away. How do you guarantee zero data loss protection? Finally, this same class of application frequently can not tolerate downtime for planned maintenance - how can you shrink maintenance windows to zero so that applications can be available at all times? To learn how to address this set of requirements stay tuned for the final installment in this MAA series when we cover the Platinum reference architecture.