Wednesday Nov 05, 2014

Oracle MAA Part 4: Gold HA Reference Architecture

Welcome to the fourth installment in a series of blog posts describing MAA Best Practices that define four standard reference architectures for HA and data protection: BRONZE, SILVER, GOLD and PLATINUM. The objective of each reference architecture is to deploy an optimal set of Oracle HA capabilities that reliably achieve a given service level (SLA) at the lowest cost.

This article provides details for the Gold reference architecture.

Gold substantially raises the service level for business critical applications that cannot accept vulnerability to single points-of-failure. Gold builds upon Silver by using database replication technology to eliminate single point of failure and provide a much higher level of data protection and HA from all types of unplanned and planned outages.

An overview of Gold is provided in the figure below.

Gold delivers substantially enhanced service levels using the following capabilities:

Oracle Active Data Guard replaces backups used by Bronze and Silver reference architectures as the first line of defense against an unrecoverable outage of the production database. Recovery time (RTO) for outages caused by data corruption, database failure, cluster failure, and site failures is reduced to seconds or minutes with an accompanying data loss exposure (RPO) of zero or near zero depending upon configuration. While backups are no longer used for availability, they are still included in the Gold reference architecture for archival purposes and as an additional level of data protection.

Active Data Guard uses simple physical replication to maintain one or more synchronized copies (standby databases) of the production database (primary database). If the primary becomes unavailable for any reason, production is quickly failed over to the standby and availability is restored.

Active Data Guard offers a unique set of capabilities for availability and Oracle data protection that exceed other alternatives based upon storage remote-mirroring or other methods of database replication. These capabilities include:

  • Choice of zero data loss (sync) or near-zero data loss (async) disaster protection.
  • Direct transmission of database changes (redo) directly from the log buffer of the primary database providing strong isolation from lower layer hardware and software faults.
  • Use of intimate knowledge of Oracle data block and redo structures to perform continuous Oracle data validation at the standby, further isolating the standby database from corruptions that can impact a primary database.
  • Native support for all Oracle data types and features combined with high performance capable of supporting all applications and workloads.
  • Manual or automatic failover to quickly transfer production to the standby database should the primary become unavailable for any reason.
  • Integrated application failover to quickly transition application connections to the new primary database after a failover has occurred.
  • Database rolling maintenance to reduce downtime and risk during planned maintenance.
  • High return on investment by offloading read-only workloads and backups to an Active Data Guard standby while it is being synchronized by the primary database.

Oracle GoldenGate logical replication is also included in the Gold reference architecture, either to complement Active Data Guard when performing planned maintenance or to use as an alternative replication mechanism for maintaining a synchronized copy (target database) of a production database (source database).

GoldenGate reads changes from disk at a source database, transforms the data into a platform independent file format, transmits the file to a target database, then transforms the data into SQL (updates, inserts, and deletes) native to a target database that is open read-write. The target database contains the same data, but is a different physical database from the source (for example, backups are not interchangeable). This enables GoldenGate to easily support heterogeneous environments across different hardware platforms and relational database management systems. This flexibility makes it ideal for a wide range of planned maintenance and other replication requirements.  GoldenGate can:

  • Efficiently replicate subsets of a source database to distribute data to other target databases. It can also be used to consolidate data into a single target database (for example, an Operational Data Store) from multiple source databases. This function of GoldenGate is relevant to each of the four MAA reference architectures and is complementary to the use of Active Data Guard.
  • Perform maintenance and migrations in a rolling manner for use-cases that cannot be supported using Data Guard replication. For example, Oracle GoldenGate enables replication from a source database running on a big-endian platform to a target database running on a little-endian platform. This enables cross-platform migration with the additional advantage of being able to reverse replication for fast fallback to the prior version after cutover. When used in this fashion GoldenGate is complementary to Active Data Guard.
  • Maintain a complete replica of a source database for high availability or disaster protection that is ready for immediate failover should the source database become unavailable. GoldenGate would be an alternative to Active Data Guard when used for this purpose. The primary use-case where you would use GoldenGate instead of Active Data Guard for complete database replication is when there is a requirement for the target database to be open read-write at all times (remember an Active Data Guard standby is open read-only).

Note that there are several trade-offs that must be accepted when using logical replication in place of Active Data Guard for data protection and availability

  • Logical replication has additional pre-requisites and operational complexity.
  • Logical replication is inherently an asynchronous process and thus not able to provide zero data loss protection. Only Active Data Guard can provide zero data loss protection
  • It is obvious that a logical copy is not a physical replica of the source database. Rather than offload backups to a standby you must backup both source and target. Logical replication also can not support advanced data protection features that come with Data Guard physical replication: lost-write detection and automatic block repair.

Oracle Site Guard is optional in the Gold tier but is useful to reduce administrative overhead and the potential for human error. Site Guard enables administrators to automate the orchestration of switchover (a planned event) and failover (in response to an unplanned outage) of their complete Oracle environment - multiple databases and applications - between a production site and a remote disaster recovery site. Oracle Site Guard is included with the Oracle Enterprise Manager Life-Cycle Management Pack.

Oracle Site Guard offers the following benefits:

  • Reduction of errors due to prepared response to site failure. Recovery strategies are mapped out, tested, and rehearsed in prepared responses within the application. Once an administrator initiates a Site Guard operation for disaster recovery, human intervention is not required.
  • Coordination across multiple applications, databases, and various replication technologies. Oracle Site Guard automatically handles dependencies between different targets while starting or stopping a site. Site Guard integrates with Oracle Active Data Guard to coordinate multiple concurrent database failovers. Site Guard also integrates with storage remote mirroring that may be used for data that resides outside of the Oracle Database.
  • Faster recovery time. Oracle Site Guard automation minimizes time spent in the manual coordination of recovery activities. 

Gold = Better HA and Data Protection

Gold builds upon Silver by addressing all fault domains. Even in the worst cases of a complete cluster or site outage, database service can be resumed within seconds or minutes of a failure occurring. Gold eliminates the downtime and potential uncertainty of a restore from backup. Gold also eliminates data loss by protecting every database transaction in real-time.

Database-aware replication is key to achieving Gold service levels. It is network efficient. It enforces a high degree of isolation between replicated copies for optimal data protection and HA. It enables fast failover to an already synchronized and running copy of production. It achieves high ROI by enabling workloads to be offloaded from production to the replicated copy. As important as these tangible benefits are, there is the equally significant benefit of reducing risk. By running workloads at the replicated copy you are performing continuous application-level validation that it is ready for production when needed.

So what is left to address after Gold? There is a class of application where it is desirable to mask the effect of an outage from the end-user. Imagine you are the customer in the process of making a purchase online - you don't want to be left in an uncertain state should there be a database outage. Did my purchase go through, do I resubmit my payment, and if I do will I be charged twice? From a data loss perspective, what if the DR site is 100's or 1000's of miles away. How do you guarantee zero data loss protection? Finally, this same class of application frequently can not  tolerate downtime for planned maintenance - how can you shrink maintenance windows to zero so that applications can be available at all times? To learn how to address this set of requirements stay tuned for the final installment in this MAA series when we cover the Platinum reference architecture.   

Thursday Sep 11, 2014

Oracle MAA Part 3: Silver HA Reference Architecture

This is the third installment in a series of blog posts describing Oracle Maximum Availability Architecture (Oracle MAA) best practices that define four standard reference architectures for data protection and high availability: BRONZE, SILVER, GOLD and PLATINUM.  Each reference architecture uses an optimal set of Oracle HA capabilities that reliably achieve a given service level (SLA) at the lowest cost.

This article provides details for the Silver reference architecture.

Silver builds upon Bronze by adding clustering technology - either Oracle RAC or RAC One Node. This enables automatic failover if there is an unrecoverable outage of a database instance or a complete failure of the server on which it runs. Oracle RAC also delivers substantial benefit by eliminating downtime for many types of planned maintenance. It does this by performing maintenance in a rolling manner across Oracle RAC nodes so that services remain available at all times. As in the case of Bronze, RMAN provides database-optimized backups to protect data and restore availability should an outage prevent the cluster from being able to restart. An overview of Silver is provided in the figure below.

Silver HA Reference Architecture

Oracle RAC

Oracle RAC is an active-active clustering solution that provides instantaneous failover should there be an outage of a database instance or of the server on which it runs. A quick review of how Oracle RAC functions helps to understand its many benefits. There are two major components to any Oracle RAC cluster: Oracle Database instances and the Oracle Database itself.

  • A database instance is defined as a set of server processes and memory structures running on a single node (or server) which make a particular database available to clients.
  • The database is a particular set of shared files (data files, index files, control files, and initialization files) that reside on persistent storage, and together can be opened and used to read and write data.
  • Oracle RAC uses an active-active architecture that enables multiple database instances, each running on different nodes, to simultaneously read and write to the same database.

Oracle RAC is the MAA best practice for server HA and provides a number of advantages:

  • Improved HA: If a server or database instance fails, connections to surviving instances are not affected; connections to the failed instance are quickly failed over to surviving instances that are already running and open on other servers in the cluster.
  • Scalability: Oracle RAC is ideal for applications with high workloads or consolidated environments where scalability and the ability to dynamically add or reprioritize capacity are required. Additional servers, database instances, and database services can be provisioned online. The ability to easily distribute workload across the cluster makes Oracle RAC an ideal solution when Oracle Multitenant is used for database consolidation.
  • Reliable performance: Oracle Quality of Service (QoS) can be used to allocate capacity for high priority database services to deliver consistent high performance in database consolidated environments. Capacity can be dynamically shifted between workloads to quickly respond to changing requirements.
  • HA during planned maintenance: High availability is maintained by implementing changes in a rolling manner across Oracle RAC nodes. This includes hardware, OS, or network maintenance that requires a server to be taken offline; software maintenance to patch the Oracle Grid Infrastructure or database; or if a database instance needs to be moved to another server to increase capacity or balance the workload.

Oracle RAC One Node

RAC One Node provides an alternative to Oracle RAC when scalability and instant failover are not required. RAC One Node license is one-half the price of Oracle RAC, providing a lower cost option when an RTO of minutes is sufficient for server outages.

RAC One Node is an active-passive failover technology. During normal operation it only allows a single database instance to be open at one time. If the server hosting the open instance fails, RAC One Node automatically starts a new database instance on a second node to quickly resume service.

RAC One Node provides several advantages over alternative active-passive clustering technologies.

  • It automatically responds to both database instance and server failures.
  • Oracle Database HA Services, Grid Infrastructure, and database listeners are always running on the second node. At failover time only the database instance and database services need to start, reducing the time required to resume service, and enabling service to resume in minutes. 
  • It provides the same advantages for planned maintenance as Oracle RAC. RAC One Node allows two active database instances during periods of planned maintenance to allow graceful migration of users from one node to another with zero downtime; database services remain available to users at all times.

Silver = Better HA

Silver represents a significant increase in HA compared to the Bronze reference architecture and is very well suited to a broad range of application requirements.  Oracle RAC immediately responds to an instance or server outage and reconnects users to surviving instances.

While Silver has substantial benefits, it is only one step above Bronze - there is still a much broader fault domain beyond instance or server failure. This includes events that can impact the availability of an entire cluster - data corruptions, storage array failures, bugs, human error, site outages, etc. There is also a class of application where the impact of outages must be completely transparent to the user. Stay tuned for future installments when we address this expanded set of requirements with the Gold and Platinum reference architectures.

Wednesday Jul 16, 2014

Oracle MAA Part 2: Bronze HA Reference Architecture

In the first installment of this series we discussed how one size does not fit all when it comes to HA architecture. We described Oracle Maximum Availability Architecture (Oracle MAA) best practices that define four standard reference architectures for data protection and high availability: BRONZE, SILVER, GOLD and PLATINUM.  Each reference architecture uses an optimal set of Oracle HA capabilities that reliably achieve a given service level (SLA) at the lowest cost. As you progress from one level to the next, each architecture expands upon the one that preceded it in order to handle an expanded fault domain and deliver a high level of service.

This article provides details for the Bronze reference architecture.

Bronze is appropriate for databases where simple restart or restore from backup is ‘HA enough’. It uses single instance Oracle database (no cluster) to provide a very basic level of HA and data protection in exchange for reduced cost and implementation complexity. An overview is provided in the figure below.

Bronze Reference Architecture

When a database instance or the server on which it is running fails, the recovery time objective (RTO) is a function of how quickly the database can be restarted and resume service. If a database is unrecoverable the RTO becomes a function of how quickly a backup can be restored. In a worst case scenario of a complete site outage additional time is required to provision new systems and perform these tasks at a secondary location, in some cases this can take days.

The potential data loss if there is an unrecoverable outage (recovery point objective or RPO), is equal to the data generated since the last backup was taken. Copies of database backups are retained locally and at a remote location or on the Cloud for the dual purpose of archival and DR should a disaster strike the primary data center.

Major components of the Bronze reference architecture and the service levels achieved include:

Oracle Database HA and Data Protection

  • Oracle Restart automatically restarts the database, the listener, and other Oracle components after a hardware or software failure or whenever a database host computer restarts.
  • Oracle corruption protection checks for physical corruption and logical intra-block corruptions. In-memory corruptions are detected and prevented from being written to disk and in many cases can be repaired automatically. For more details see Preventing, Detecting, and Repairing Block Corruption.
  • Automatic Storage Management (ASM) is an Oracle-integrated file system and volume manager that includes local mirroring to protect against disk failure.
  • Oracle Flashback Technologies provide fast error correction at a level of granularity that is appropriate to repair an individual transaction, a table, or the full database.
  • Oracle Recovery Manager (RMAN) enables low-cost, reliable backup and recovery optimized for the Oracle Database.
  • Online maintenance includes online redefinition and reorganization for database maintenance, online file movement, and online patching. 

Database Consolidation

  • Databases deployed using Bronze often include development and test databases and databases supporting smaller work group and departmental applications that are often the first candidates for database consolidation.
  • Oracle Multitenant is the MAA best practice for database consolidation from Oracle Database 12c onward. 
Life Cycle Management
  • Oracle Enterprise Manager Cloud Control enables self service deployment of IT resources for business users along with resource pooling models that cater to various multitenant architectures. It supports Database as a Service (DBaaS), a paradigm in which end users (Database Administrators, Application Developers, Quality Assurance Engineers, Project Leads, and so on) can request database services, consume it for the lifetime of the project, and then have them automatically de-provisioned and returned to the resource pool.

Oracle Engineered Systems

  • Oracle Engineered Systems are an efficient deployment option for database consolidation and DBaaS. Oracle Engineered Systems reduce lifecycle cost by standardizing on a pre-integrated and optimized platform for Oracle Database that is completely supported by Oracle.

Bronze Summary:  Data Protection, RTO, and RPO

Table 1 summarizes the data protection capabilities and service levels provided by the Bronze tier. The first column indicates when validations for physical and logical corruption are performed:

  • Manual checks are initiated by the administrator or at regular intervals by a scheduled job.
  • Runtime checks are automatically executed on a continuous basis by background processes while the database is open.
  • Background checks are run on a regularly scheduled interval, but only during periods when resources would otherwise be idle.
  • Each check is unique to Oracle Database using specific knowledge of Oracle data block and redo structures.

Table 1: Bronze - Data Protection

Type Capability Physical Block Corruption
Logical Block Corruption
Manual Dbverify, Analyze Physical block checks Logical checks for intra-block and inter-object consistency
Manual RMAN Physical block checks during backup and restore Intra-block logical checks
Runtime Database In-memory block and redo checksum In-memory intra block logical checks
Runtime ASM Automatic corruption detection and repair using local extent pairs
Runtime Exadata HARD checks on write HARD checks on write
Background Exadata Automatic HARD Disk Scrub and Repair

Note that HARD validation and the Automatic Hard Disk Scrub and Repair (the last two rows of Table 1) are unique to Exadata storage. HARD validation ensures that Oracle Database does not write physically corrupt blocks to disk. Automatic Hard Disk Scrub and Repair inspects and repairs hard disks with damaged or worn out disk sectors (cluster of storage) or other physical or logical defects periodically when there are idle resources.

Table 2 summarizes RTO and RPO for the Bronze tier for various unplanned and planned outages.

Table 2: Bronze - Recovery Time and Data Loss Potential

Type  Event  Downtime Data Loss Potential
Unplanned  Database instance failure
 Minutes  Zero
Unplanned  Recoverable server failure
Minutes to an hour
 Zero
Unplanned Data corruptions, unrecoverable server failure, database failures, or site failures
Hours to days
Since last backup
Planned Online file move, online reorganization and redefinition, online patching
Zero
 Zero
Planned Hardware or operating system maintenance and database patches that cannot be performed online
Minutes to hours
Zero
Planned Database upgrades: patch sets and full database releases
Minutes to hours
Zero
Planned Platform migrations
Hours to a day
Zero
Planned Application upgrades that modify back-end database objects
Hours to days
Zero

So when would you use bronze?  Bronze is useful when users can wait for a backup to be restored if there is an unrecoverable outage and accept that any data generated since the last backup was taken will be lost. The Oracle Database has a number of included capabilities described above that provide unique levels of data protection and availability for a low-cost environment based upon the Bronze reference architecture.

But what if I can't accept this level of downtime or data loss potential - well that is where the Silver, Gold and Platinum reference architectures come in. Bronze is only a starting point that establishes the foundation for subsequent HA reference architectures that provide higher quality of service. Stay tuned for future blog posts that will dive into the details of each reference architecture.

Thursday May 29, 2014

Oracle MAA Part 1: When One Size Does Not Fit All

The good news is that Oracle Maximum Availability Architecture (MAA) best practices combined with Oracle Database 12c (see video) introduce first-in-the-industry database capabilities that truly make unplanned outages and planned maintenance transparent to users. The trouble with such good news is that Oracle’s enthusiasm in evangelizing its latest innovations may leave some to wonder if we’ve lost sight of the fact that not all database applications are created equal. Afterall, many databases don’t have the business requirements for high availability and data protection that require all of Oracle’s ‘stuff’. For many real world applications, a controlled amount of downtime and/or data loss is OK if it saves money and effort.


Well, not to worry. Oracle knows that enterprises need solutions that address the full continuum of requirements for data protection and availability. Oracle MAA accomplishes this by defining four HA service level tiers: BRONZE, SILVER, GOLD and PLATINUM. The figure below shows the progression in service levels provided by each tier.




Each tier uses a different MAA reference architecture to deploy the optimal set of Oracle HA capabilities that reliably achieve a given service level (SLA) at the lowest cost.  Each tier includes all of the capabilities of the previous tier and builds upon the architecture to handle an expanded fault domain.



  • Bronze is appropriate for databases where simple restart or restore from backup is ‘HA enough’. Bronze is based upon a single instance Oracle Database with MAA best practices that use the many capabilities for data protection and HA included with every Oracle Enterprise Edition license. Oracle-optimized backups using Oracle Recovery Manager (RMAN) provide data protection and are used to restore availability should an outage prevent the database from being able to restart.

  • Silver provides an additional level of HA for databases that require minimal or zero downtime in the event of database instance or server failure as well as many types of planned maintenance. Silver adds clustering technology - either Oracle RAC or RAC One Node. RMAN provides database-optimized backups to protect data and restore availability should an outage prevent the cluster from being able to restart.

  • Gold raises the game substantially for business critical applications that can’t accept vulnerability to single points-of-failure. Gold adds database-aware replication technologies, Active Data Guard and Oracle GoldenGate, which synchronize one or more replicas of the production database to provide real time data protection and availability. Database-aware replication greatly increases HA and data protection beyond what is possible with storage replication technologies. It also reduces cost while improving return on investment by actively utilizing all replicas at all times.

  • Platinum introduces all of the sexy new Oracle Database 12c capabilities that Oracle staff will gush over with great enthusiasm. These capabilities include Application Continuity for reliable replay of in-flight transactions that masks outages from users; Active Data Guard Far Sync for zero data loss protection at any distance; new Oracle GoldenGate enhancements for zero downtime upgrades and migrations; and Global Data Services for automated service management and workload balancing in replicated database environments. Each of these technologies requires additional effort to implement. But they deliver substantial value for your most critical applications where downtime and data loss are not an option.


The MAA reference architectures are inherently designed to address conflicting realities. On one hand, not every application has the same objectives for availability and data protection – the Not One Size Fits All title of this blog post. On the other hand, standard infrastructure is an operational requirement and a business necessity in order to reduce complexity and cost.


MAA reference architectures address both realities by providing a standard infrastructure optimized for Oracle Database that enables you to dial-in the level of HA appropriate for different service level requirements. This makes it simple to move a database from one HA tier to the next should business requirements change, or from one hardware platform to another – whether it’s your favorite non-Oracle vendor or an Oracle Engineered System.


Please stay tuned for additional blog posts in this series that dive into the details of each MAA reference architecture.


Meanwhile, more information on Oracle HA solutions and the Maximum Availability Architecture can be found at:






About

Musings on Oracle's Maximum Availability Architecture (MAA), by members of Oracle Development team. Note that we may not have the bandwidth to answer generic questions on MAA.

Search

Categories
Archives
« September 2015
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today