Friday Dec 19, 2014

Oracle Global Data Services (GDS): Part 2 – Load Balancing Use Cases

Oracle Database 12c Global Data Services galvanizes the asset utilization of replicated database resources. It allows connect-time and run-time load balancing, routing and service failover across replicated databases situated in any data center in any geographical region. With GDS, customers can now achieve these capabilities without the need to either integrate their High Availability stack with hardware load balancers or write custom homegrown connection managers. And remember that GDS comes with the Active Data Guard license and is also available to Oracle GoldenGate customers at no additional charge as well.

In this blog we follow up on the introduction to GDS from Part 1 and walk through a couple of use cases for workload balancing:

1. The first use case (shown below) is load balancing for reader farms:

Imagine a scenario where GDS is enabled for an Active Data Guard or GoldenGate reader farm with physical standby replicas located in both local and remote data centers. Let’s say a Read Write global service for Order Entry runs on the Primary database and the Read Only Global Services for Reporting run on the reader farm. Using GDS, the client connections are automatically load balanced among the Read Only global services running on the reader farm (across data centers). This capability improves resource utilization, performance and scalability with Read Only workload balancing on Active Data Guard or Oracle GoldenGate reader farms.

2. Another use case (as shown below) is load balancing of Read Write services among multi-masters within and across regions:

Let’s take a scenario of active/active databases using Oracle GoldenGate in a GDS configuration. In this case the Read Write and Read Only global services are both configured to run on each of the masters. For this scenario, GDS automatically balances the workloads for Read-Only and Read-Write Services in the GoldenGate multi-master configuration.

This wraps up our exploration of key Oracle Database 12c GDS load balancing use cases. In the next installment of the GDS blog series (Part 3), we will take a look at few more interesting use cases where GDS can help in mitigating planned and unplanned downtime for applications.

Wednesday Dec 17, 2014

Oracle GoldenGate Active-Active Part 3

Here is the last (3 of 3) blog posting on Active-Active replication for OGG, and my post this time will cover the actual usage of the CDR resolution routines and examples of how they are built. Part 1 is located here, and part 2, here. I’ll cover 2 different use cases. The first will be timestamp based and the second will be trusted source. As a refresher, timestamp is going to have the record with the lowest timestamp win (i.e. whichever record came in first) and the trusted source is going to assume that one system always takes precedence over another system.

For these examples, I’m going to use macros, which makes it so much easier and cleaner to read, and it dramatically reduces the amount of typing I have to do.

My macro file, will be called cdr_macros.prm. I normally wouldn’t want to mix trusted source and timestamp in the same environment, but I’m doing it here just as an example. In this macro file, I have included every CDR function that I want to use, on all systems, for both extracts and replicats. This way if I need to make a change to my CDR rules, I can make the change in the macro file and it effects the entire server. Just make sure to make the same change to each OGG environment. Inside each macro, there is a short description of what the command is going to be used for.

*********************************************************************************************************

MACRO #ExtractCdrDate
BEGIN
COMMENT This is used to ensure that the key columns + the UPDATE_TIME
COMMENT column is always brought over as part of the trail file record
GETBEFORECOLS (ON UPDATE KEYINCLUDING(UPDATE_TIME), ON DELETE KEYINCLUDING(UPDATE_TIME)) , FETCHCOLS (*)
END;
COMMENT END TO ExtractCdrDate

MACRO #ExtractCdrAllColunms
BEGIN
COMMENT This is used when I want to ensure that ALL columns are in the
COMMENT trail file for each record. It has a higher overhead, so be
COMMENT careful on how frequently it is used.
GETBEFORECOLS (ON UPDATE ALL, ON DELETE ALL), FETCHCOLS(*)
END;
COMMENT END TO ExtractCompAllColunms

MACRO #DateCompare
BEGIN
COMMENT This is used when doing a timestamp resolution where the lowest
COMMENT timestamp wins.
COMPARECOLS (ON UPDATE KEYINCLUDING (UPDATE_TIME),ON DELETE KEYINCLUDING (UPDATE_TIME)),
RESOLVECONFLICT (UPDATEROWEXISTS,(mon_resolution_method, USEMIN (UPDATE_TIME),COLS(*)) (DEFAULT, DISCARD)),
RESOLVECONFLICT (INSERTROWEXISTS, (DEFAULT, USEMIN (UPDATE_TIME) , COLS(*))),
RESOLVECONFLICT (DELETEROWEXISTS, (DEFAULT, OVERWRITE)),
RESOLVECONFLICT (UPDATEROWMISSING, (DEFAULT, OVERWRITE)),
RESOLVECONFLICT (DELETEROWMISSING, (DEFAULT, DISCARD))
END;
COMMENT END TO DateCompare

MACRO #FromTrusted
BEGIN
COMMENT This resolution is used on the non-trusted environment to
COMMENT allow operations from the trusted server to overwrite the existing
COMMENT data when there is a conflict.
COMPARECOLS (ON UPDATE ALL,ON DELETE ALL),
RESOLVECONFLICT (UPDATEROWEXISTS, (DEFAULT, OVERWRITE)) ,
RESOLVECONFLICT (INSERTROWEXISTS, (DEFAULT, OVERWRITE)) ,
RESOLVECONFLICT (DELETEROWEXISTS, (DEFAULT, OVERWRITE)) ,
RESOLVECONFLICT (UPDATEROWMISSING, (DEFAULT, OVERWRITE)) ,
RESOLVECONFLICT (DELETEROWMISSING, (DEFAULT, DISCARD))
END;
COMMENT END TO FromTrusted

MACRO #FromNoNTrusted
BEGIN
COMMENT This resolution is used to discard the record any time there is a
COMMENT conflict, when the record comes from the non-trusted server
COMPARECOLS (ON UPDATE ALL,ON DELETE ALL),
RESOLVECONFLICT (UPDATEROWEXISTS, (DEFAULT, DISCARD)) ,
RESOLVECONFLICT (INSERTROWEXISTS, (DEFAULT, DISCARD)) ,
RESOLVECONFLICT (DELETEROWEXISTS, (DEFAULT, DISCARD)) ,
RESOLVECONFLICT (UPDATEROWMISSING, (DEFAULT, DISCARD)) ,
RESOLVECONFLICT (DELETEROWMISSING, (DEFAULT, DISCARD))
END;
COMMENT END TO FromNonTrusted

*********************************************************************************************************

Now that all the hard work is done, and I've defined my rules for both Extract and Replicat in the Macro file, I can easily add those in. In the Extract I simply modify my TABLE statements to include the additional macro to tell OGG which columns to write to the trail file.  In this case, I'm using the #ExtractCdr macros from the first part of the file to instruct OGG which columns to include in the trail file.  This ensures that the resolution routines always have the data they need to perform the specified resolution. 

TABLE DEMO.VCRYPT_ACCOUNTS , #ExtractCdrDate();
TABLE DEMO.VCRYPT_ACCOUNTS_HIST , #ExtractCdrAllColunms();

The changes to the MAP statements in the Replicat parameter file itself is extremely elegant and simple. In the Replicat, the changes are also very straightforward, by simply adding the macros that were defined above. 

MAP DEMO. VCRYPT_ACCOUNTS, TARGET DEMO.VCRYPT_ACCOUNTS, #DateCompare();
MAP DEMO.VCRYPT_ACCOUNTS_HIST, TARGET DEMO.VCRYPT_ACCOUNTS_HIST , #FromTrusted();

Using the macro method, it’s easy to identify which objects are using each conflict detection and resolution routine, and if you need to make a change, you can make it once in the macro file and it will affect every parameter file. The methodologies and best practices in the last few of my blog postings on Active-Active Replication in GoldenGate and the white paper here: http://www.oracle.com/us/products/middleware/data-integration/golden-gate-active-active-1887519.pdf  should help implement robust Active-Active replication..

Friday Nov 14, 2014

Oracle Database 12c: RMAN Enhancements

Hello folks,



In this blog, I am going to provide an overview of new Oracle Recovery Manager (RMAN) features that were introduced in Oracle Database 12c. 


As an integral part of Oracle Database, RMAN offers a complete and integrated backup and recovery solution to address a variety of operations – from routine to very complex.  RMAN has steadily evolved over the past 16 years – mainly because Oracle has listened to our customers. We’ve continued to incorporate many valuable enhancements, including the following features that were introduced with Oracle Database 12.1.0.1.

 1. Fine grained recovery

With Oracle Database 12c, you can use a simple RECOVER TABLE command to perform a point-in-time recovery of a table/partition without having to go through a manual point-in-time recovery process. This command automatically performs the following steps: creation of the auxiliary instance, table recovery, exporting of the object, and importing it into the production database.

2. Support for multitenant databases

Oracle Database 12c offers this unprecedented consolidation feature called Oracle Multitenant. This capability simplifies database consolidation and management by enabling many individual pluggable databases (PDBs) to be “plugged-into” and supported within a container database (CDB).  Data protection is greatly simplified because you can perform backup and recovery at the CDB level, which includes and protects all the associated PDBs. For additional flexibility, you can still choose to perform backup and recovery for an individual PDB or a selected group of PDBs.

3. Improved RMAN duplication (cloning) performance

Duplicating an Oracle database can be performed in many ways. Today, customers use both Oracle features such as RMAN DUPLICATE or storage-based snapshot and cloning technologies. RMAN duplication can be performed by using an existing backup or by directly duplicating the database using ACTIVE DUPLICATE.  Prior to Oracle Database 12c,  the ACTIVE DUPLICATE process used production database processes to send image copies across the network. This could be a time-consuming activity because the duplication process is directly proportional to the database size. Now, with 12c, the database duplication process has been improved, with the use of backup sets instead of image copies. As a result, the database size is relatively smaller because RMAN skips unused blocks, committed undo blocks etc. Plus, you can use compression and multi-section options for even faster duplication. Moreover, auxiliary channels from the destination site are used to PULL the backups over the network, as opposed to the PUSH method, used prior to 12c.

4. Faster recovery in a Data Guard or Active Data Guard environment

You may already be aware of some cool RMAN features that are supported with Active Data Guard – for example, direct Block Media Recovery from the standby. However, in the event of either primary or standby datafile corruption (e.g. due to media errors), the traditional recovery process would be to copy the backup over the network and perform a restore/recovery.  With Oracle Database 12c, there is a new RMAN keyword  called “FROM SERVICE” whereby you can perform restores directly from the standby or from the primary (depending on which site has issues). This command creates a backup set and streams it over the network. This new process dramatically reduces the overall recovery time.

5. Expansion of Multi-section support

Prior to Oracle Database12c, parallelizing a single data file using MULTI SECTION was only supported with a level 0 backup or a full backup set. From 12c, Multi section is now supported with incremental backups as well as image copy backups.

6. Simplified cross-platform migration

Migrating the database from one platform to another can be performed in many ways. Oracle supports both database-level migration and tablespace-level migration. Database-level migration requires the endian type to be same on the source and destination platforms. Using tablespace migration, you can migrate across platforms and across endian formats. Oracle 12c introduces new keywords - FROM PLATFORM and TO PLATFORM. Using these keywords, RMAN takes care of converting the endian-ness,  so that the overall process is simplified. Depending on the availability requirements, tablespace migration can be performed with either long downtime or reduced downtime processes.
    

a) When using a longer downtime model, you place the tablespace(s) in read-only mode, take the full backup, and restore at the destination. You also take the metadata export of the tablespace at the source and then apply at the destination. Once you’re done, the tablespaces are made readable/writable at the destination.
  

 b) When using a reduced downtime model, you can keep your source database running for a longer time by doing incremental backups to the destination. Only the last step involves the procedure mentioned in (a).

7. Separation of Duty

A new role SYSBACKUP is introduced to separate backup administrator tasks from the SYS role. You can use this administrative privilege to perform backup and recovery operations from either RMAN or from SQL*Plus.



8. SQL interface in RMAN


 Beginning with Oracle Database12c, you no longer have to switch between the SQL*Plus interface and RMAN interface. The RMAN interface now supports SQL commands so you can directly run the commands from within RMAN.



I covered these topics in my Oracle Open World 2014 RMAN presentation.  The cloud backup solution Oracle Database Backup Serviceas well as our newly introduced Zero Data Loss Recovery Appliance are also covered in this presentation.


For further details, refer to Oracle Documentation.

If you would like me to provide further technical content on any of the above RMAN features or have questions, please register your comments below.

Wednesday Nov 05, 2014

Oracle MAA Part 4: Gold HA Reference Architecture

Welcome to the fourth installment in a series of blog posts describing MAA Best Practices that define four standard reference architectures for HA and data protection: BRONZE, SILVER, GOLD and PLATINUM. The objective of each reference architecture is to deploy an optimal set of Oracle HA capabilities that reliably achieve a given service level (SLA) at the lowest cost.

This article provides details for the Gold reference architecture.

Gold substantially raises the service level for business critical applications that cannot accept vulnerability to single points-of-failure. Gold builds upon Silver by using database replication technology to eliminate single point of failure and provide a much higher level of data protection and HA from all types of unplanned and planned outages.

An overview of Gold is provided in the figure below.

Gold delivers substantially enhanced service levels using the following capabilities:

Oracle Active Data Guard replaces backups used by Bronze and Silver reference architectures as the first line of defense against an unrecoverable outage of the production database. Recovery time (RTO) for outages caused by data corruption, database failure, cluster failure, and site failures is reduced to seconds or minutes with an accompanying data loss exposure (RPO) of zero or near zero depending upon configuration. While backups are no longer used for availability, they are still included in the Gold reference architecture for archival purposes and as an additional level of data protection.

Active Data Guard uses simple physical replication to maintain one or more synchronized copies (standby databases) of the production database (primary database). If the primary becomes unavailable for any reason, production is quickly failed over to the standby and availability is restored.

Active Data Guard offers a unique set of capabilities for availability and Oracle data protection that exceed other alternatives based upon storage remote-mirroring or other methods of database replication. These capabilities include:

  • Choice of zero data loss (sync) or near-zero data loss (async) disaster protection.
  • Direct transmission of database changes (redo) directly from the log buffer of the primary database providing strong isolation from lower layer hardware and software faults.
  • Use of intimate knowledge of Oracle data block and redo structures to perform continuous Oracle data validation at the standby, further isolating the standby database from corruptions that can impact a primary database.
  • Native support for all Oracle data types and features combined with high performance capable of supporting all applications and workloads.
  • Manual or automatic failover to quickly transfer production to the standby database should the primary become unavailable for any reason.
  • Integrated application failover to quickly transition application connections to the new primary database after a failover has occurred.
  • Database rolling maintenance to reduce downtime and risk during planned maintenance.
  • High return on investment by offloading read-only workloads and backups to an Active Data Guard standby while it is being synchronized by the primary database.

Oracle GoldenGate logical replication is also included in the Gold reference architecture, either to complement Active Data Guard when performing planned maintenance or to use as an alternative replication mechanism for maintaining a synchronized copy (target database) of a production database (source database).

GoldenGate reads changes from disk at a source database, transforms the data into a platform independent file format, transmits the file to a target database, then transforms the data into SQL (updates, inserts, and deletes) native to a target database that is open read-write. The target database contains the same data, but is a different physical database from the source (for example, backups are not interchangeable). This enables GoldenGate to easily support heterogeneous environments across different hardware platforms and relational database management systems. This flexibility makes it ideal for a wide range of planned maintenance and other replication requirements.  GoldenGate can:

  • Efficiently replicate subsets of a source database to distribute data to other target databases. It can also be used to consolidate data into a single target database (for example, an Operational Data Store) from multiple source databases. This function of GoldenGate is relevant to each of the four MAA reference architectures and is complementary to the use of Active Data Guard.
  • Perform maintenance and migrations in a rolling manner for use-cases that cannot be supported using Data Guard replication. For example, Oracle GoldenGate enables replication from a source database running on a big-endian platform to a target database running on a little-endian platform. This enables cross-platform migration with the additional advantage of being able to reverse replication for fast fallback to the prior version after cutover. When used in this fashion GoldenGate is complementary to Active Data Guard.
  • Maintain a complete replica of a source database for high availability or disaster protection that is ready for immediate failover should the source database become unavailable. GoldenGate would be an alternative to Active Data Guard when used for this purpose. The primary use-case where you would use GoldenGate instead of Active Data Guard for complete database replication is when there is a requirement for the target database to be open read-write at all times (remember an Active Data Guard standby is open read-only).

Note that there are several trade-offs that must be accepted when using logical replication in place of Active Data Guard for data protection and availability

  • Logical replication has additional pre-requisites and operational complexity.
  • Logical replication is inherently an asynchronous process and thus not able to provide zero data loss protection. Only Active Data Guard can provide zero data loss protection
  • It is obvious that a logical copy is not a physical replica of the source database. Rather than offload backups to a standby you must backup both source and target. Logical replication also can not support advanced data protection features that come with Data Guard physical replication: lost-write detection and automatic block repair.

Oracle Site Guard is optional in the Gold tier but is useful to reduce administrative overhead and the potential for human error. Site Guard enables administrators to automate the orchestration of switchover (a planned event) and failover (in response to an unplanned outage) of their complete Oracle environment - multiple databases and applications - between a production site and a remote disaster recovery site. Oracle Site Guard is included with the Oracle Enterprise Manager Life-Cycle Management Pack.

Oracle Site Guard offers the following benefits:

  • Reduction of errors due to prepared response to site failure. Recovery strategies are mapped out, tested, and rehearsed in prepared responses within the application. Once an administrator initiates a Site Guard operation for disaster recovery, human intervention is not required.
  • Coordination across multiple applications, databases, and various replication technologies. Oracle Site Guard automatically handles dependencies between different targets while starting or stopping a site. Site Guard integrates with Oracle Active Data Guard to coordinate multiple concurrent database failovers. Site Guard also integrates with storage remote mirroring that may be used for data that resides outside of the Oracle Database.
  • Faster recovery time. Oracle Site Guard automation minimizes time spent in the manual coordination of recovery activities. 

Gold = Better HA and Data Protection

Gold builds upon Silver by addressing all fault domains. Even in the worst cases of a complete cluster or site outage, database service can be resumed within seconds or minutes of a failure occurring. Gold eliminates the downtime and potential uncertainty of a restore from backup. Gold also eliminates data loss by protecting every database transaction in real-time.

Database-aware replication is key to achieving Gold service levels. It is network efficient. It enforces a high degree of isolation between replicated copies for optimal data protection and HA. It enables fast failover to an already synchronized and running copy of production. It achieves high ROI by enabling workloads to be offloaded from production to the replicated copy. As important as these tangible benefits are, there is the equally significant benefit of reducing risk. By running workloads at the replicated copy you are performing continuous application-level validation that it is ready for production when needed.

So what is left to address after Gold? There is a class of application where it is desirable to mask the effect of an outage from the end-user. Imagine you are the customer in the process of making a purchase online - you don't want to be left in an uncertain state should there be a database outage. Did my purchase go through, do I resubmit my payment, and if I do will I be charged twice? From a data loss perspective, what if the DR site is 100's or 1000's of miles away. How do you guarantee zero data loss protection? Finally, this same class of application frequently can not  tolerate downtime for planned maintenance - how can you shrink maintenance windows to zero so that applications can be available at all times? To learn how to address this set of requirements stay tuned for the final installment in this MAA series when we cover the Platinum reference architecture.   

Wednesday Oct 15, 2014

Oracle Database 12c Global Data Services: Part 1 – Automated Workload Management for Replicated Databases

Introduction

Global Data Services is a key offering within Oracle’s Maximum Availability Architecture. It’s really a must-have for organizations that are using Oracle high availability technologies such as Active Data Guard or Oracle GoldenGate to replicate data across multiple databases. With automated workload balancing and service failover capabilities, GDS improves performance, availability, scalability, and manageability for all databases that are replicated within a data center and across the globe. And GDS boosts resource utilization, which really improves the ROI of Active Data Guard and GoldenGate investments. It does this in an integrated, automated way that no other technology can match. Plus it’s included with the Active Data Guard license - and since GoldenGate customers have the right to use Active Data Guard, it’s available to them at no additional charge as well.

Customer Challenges

Enterprises typically deploy replication technologies for various business requirements – high availability and disaster recovery, content localization and caching, scalability, performance optimization for local clients or for compliance in accordance with local laws. Oracle customers use Active Data Guard and Oracle GoldenGate to address all of these business requirements. They use Active Data Guard to distribute their Read-Only workload and GoldenGate to distribute not only Read workloads but also Read Write workloads across their replicated databases.

However when you’re trying to optimize workload management across multiple database replicas, you run into certain challenges that simply extend beyond the capabilities of replication technology. That’s because customers are unable to manage replicated databases with a unified framework and instead have to deal with database silos from an application and DBA perspective.

Let’s look at a couple of the main problems with database silos.

  • The first is under-utilized resources – for example, when one replica cannot be leveraged to shoulder the workload of another over-utilized database. This leads to suboptimal resource utilization, which can adversely affect performance, availability and of course cost.
  • The other problem with silos is the inability to automatically fail over a service across databases - let’s say a production application workload is running against a particular replica. If that replica goes down due to an unplanned event, customers don’t have a mechanism that automatically and transparently relocates the Service to another available replica. When a replica fails that can lead to application outages.

Until the introduction of Oracle Global Data Services (GDS), there really wasn’t a way for enterprises to achieve Service Failover and load balancing across replicas out of the Oracle Stack. To address this, some customers have chosen to compile their own homegrown connection managers and others have integrated their HA stack with hardware load balancers. But these solutions still don’t address all of the issues:

  • Manual load balancing using homegrown connection managers, for example, incurs huge development costs and yet cannot optimize performance and availability for replicated systems
  • Special purpose network load balancers can help but they introduce additional cost and complexity – and they still can’t offer database service failover and centralized workload management

Global Data Services Overview

Global Data Services delivers automated workload management, which addresses all of these key pain points. It eliminates the need for custom connection managers and load balancers for database workloads.

With a newly created concept called Global Service, Oracle Global Data Services extends the familiar Oracle RAC-style connect-time and run-time load balancing, service failover and management capabilities beyond a single clustered database. Capabilities that were so far applicable only to a single database can now be applied to a set of replicated databases that may reside within or across datacenters. Customers can achieve these capabilities by simply setting the pertinent attributes of the Global Service.

https://blogs.oracle.com/MAA/resource/GDS.png

GDS sits between the application tier and the database tiers of the stack. It orchestrates the Service high availability, Service level load balancing and routing. Global Services run on the databases but are managed by GDS. GDS algorithms take into account DB instance load, network latency between data centers and the workload management policies (region affinity, load balancing goals, DB cardinality, DB role, replication lag tolerance) that the customers can configure. These workload management policies are enabled via the attributes of a given Global Service.

What are the key capabilities that are really unique to GDS?

1. For performance optimization, there’s region-based workload routing, which automatically routes workloads to the database closest to the clients. For example, what if the customer has a requirement that all the clients/applications closer to the North American data center need to be routed to the database in the North American data center? Likewise, European clients may need to be routed to the European database. GDS addresses this problem by managing this workload routing automatically.

2. In addition, GDS provides connect time load balancing and supports run time load balancing – another key performance advantage.

3. For higher application availability, GDS enables inter-database service failover. If a replica goes down as a result of a planned or unplanned event, GDS fails over the service to another replica

4. And it also offers role based global services. GDS will make sure that the global services are always started on those databases whose database role matches the role specified for the service. For example, if Data Guard undergoes role transitions, the global services are relocated accordingly, maintaining availability requirements.

5. For improved data quality, there’s also replication lag-based workload routing. This capability routes read workloads to a Data Guard standby whose replication lag is within a customer-specified threshold that’s based on business needs

6. By managing all of the resources of the replicas efficiently, customers are able to maximize their ROI because there are no longer any under-utilized servers

This wraps up the introductory blog post on Oracle Database 12c GDS. We looked at the challenges of workload management for replicated databases and how GDS addresses those challenges. In the next blog, we will review some of the key capabilities of GDS and the tangible business benefits.

Monday Sep 29, 2014

Oracle Reinvents Database Protection with Zero Data Loss Recovery Appliance

Reinventing Database Protection

During his opening keynote at Oracle OpenWorld 2014, Oracle Executive Chairman and Chief Technology Officer Larry Ellison announced the general availability of Oracle’s Zero Data Loss Recovery Appliance, the world's first and only engineered system designed specifically for Oracle Database protection. This massively scalable appliance delivers unparalleled data protection, efficiency, and scalability.

Unparalleled Data Protection

Visit the Oracle.com Main Landing Page

Read the Enhanced Press Release

Watch Sr. Vice President Juan Loaiza's Webcast

Read the story, featuring an interview with Ashish Ray, VP of Product Management

Oracle's Zero Data Loss Recovery Appliance (Recovery Appliance) is the first appliance ever to deliver zero data loss protection for critical Oracle Databases. When using today’s solutions to restore a database, businesses typically lose all data generated since the last backup—often hours to days of critical data.
The Recovery Appliance dramatically reduces the impact of backups on production servers and networks, virtually eliminating the need for lengthy backup windows. The cloud-scale architecture enables a single Recovery Appliance to manage the data protection requirements of thousands of databases, avoiding the cost and complexity of disparate backup systems. 

Zero Data Loss Recovery Appliance infographic

View the Infographic

Key Messages:

  • Eliminates data loss: Unique database integration enables continuous transport of redo data to the appliance, providing real-time protection for the most recent transactions so that databases can be restored without data loss.
  • Eliminates production impact: Backup algorithms integrated into Oracle Database send only changed data to the appliance, minimizing production database impact, I/O traffic, and network load. All expensive backup processing is offloaded to the Appliance.
  • Offloads tape archival: The Recovery Appliance can directly archive database backups to low-cost tape storage, offloading production database servers. Archival operations can run both day and night to improve tape drive utilization.
  • Enables restore to any point in time: The database change data stored on the Appliance can be used to create virtual full database copies at any desired point in time.
  • Delivers cloud-scale protection: A single Recovery Appliance can serve the data protection requirements of thousands of databases in a data center or region. Capacity expands seamlessly to petabytes of storage, with no downtime.
  • Protects data from disasters: The Recovery Appliance can replicate data in real time to a remote Recovery Appliance or to Oracle Database Backup Cloud Service to protect business data from site outages. Database blocks are continuously validated to eliminate data corruption at any stage of transmission or processing.

Want to learn more?

What else is happening at Oracle OpenWorld 2014 that includes the Zero Data Loss Recovery Appliance? Check out the following sessions:

  • Monday, Sept. 29:  2:45 - 3:30 (CON7684) - Moscone South - 307

Oracle Zero Data Loss Recovery Appliance: A New Era in Data Protection

  • Tuesday, Sept. 30:  3:45 - 4:30 (CON7686) - Moscone South - 305

Oracle Zero Data Loss Recovery Appliance: Deployment Best Practices 

And don't forget to see the Zero Data Loss Recovery Appliance in action. Visit the Engineered Systems Showcase at Moscone Center North and take a tour with one of our product experts.


Monday Sep 22, 2014

Oracle Database High Availability at Oracle OpenWorld 2014 - Download Your Schedule Now !

Learn how to maximize High Availability and Data Protection for your Oracle Databases at Oracle OpenWorld 2014. This year's conference offers 15 Oracle Database High Availability sessions that cover the entire spectrum of the Oracle Maximum Availability Architecture (MAA) - from database-integrated backup and recovery to zero data loss disaster protection to zero downtime maintenance. Discover how Oracle MAA and the latest HA enhancements with Oracle Database 12c can help you meet your availability service level objectives, build a resilient foundation for Oracle Multitenant and deliver Database-as-a-Service for Cloud deployments.

OpenWorld 2014 also offers opportunities for you to deepen your knowledge of Oracle Database HA technologies through 5 demo stations. You'll gain one-on-one access to Oracle Database HA experts who can provide best practices guidance and walk you through real-world scenarios that illustrate the full capabilities of Oracle Maximum Availability Architecture.

Don't miss this excellent opportunity to learn how to deliver the right levels of availability and data protection for Oracle Databases that meet your specific business needs. Download the Focus on Oracle Database High Availability Document to keep track of all the HA sessions and demos at Oracle OpenWorld 2014.

Thursday Sep 11, 2014

Oracle MAA Part 3: Silver HA Reference Architecture

This is the third installment in a series of blog posts describing Oracle Maximum Availability Architecture (Oracle MAA) best practices that define four standard reference architectures for data protection and high availability: BRONZE, SILVER, GOLD and PLATINUM.  Each reference architecture uses an optimal set of Oracle HA capabilities that reliably achieve a given service level (SLA) at the lowest cost.

This article provides details for the Silver reference architecture.

Silver builds upon Bronze by adding clustering technology - either Oracle RAC or RAC One Node. This enables automatic failover if there is an unrecoverable outage of a database instance or a complete failure of the server on which it runs. Oracle RAC also delivers substantial benefit by eliminating downtime for many types of planned maintenance. It does this by performing maintenance in a rolling manner across Oracle RAC nodes so that services remain available at all times. As in the case of Bronze, RMAN provides database-optimized backups to protect data and restore availability should an outage prevent the cluster from being able to restart. An overview of Silver is provided in the figure below.

Silver HA Reference Architecture

Oracle RAC

Oracle RAC is an active-active clustering solution that provides instantaneous failover should there be an outage of a database instance or of the server on which it runs. A quick review of how Oracle RAC functions helps to understand its many benefits. There are two major components to any Oracle RAC cluster: Oracle Database instances and the Oracle Database itself.

  • A database instance is defined as a set of server processes and memory structures running on a single node (or server) which make a particular database available to clients.
  • The database is a particular set of shared files (data files, index files, control files, and initialization files) that reside on persistent storage, and together can be opened and used to read and write data.
  • Oracle RAC uses an active-active architecture that enables multiple database instances, each running on different nodes, to simultaneously read and write to the same database.

Oracle RAC is the MAA best practice for server HA and provides a number of advantages:

  • Improved HA: If a server or database instance fails, connections to surviving instances are not affected; connections to the failed instance are quickly failed over to surviving instances that are already running and open on other servers in the cluster.
  • Scalability: Oracle RAC is ideal for applications with high workloads or consolidated environments where scalability and the ability to dynamically add or reprioritize capacity are required. Additional servers, database instances, and database services can be provisioned online. The ability to easily distribute workload across the cluster makes Oracle RAC an ideal solution when Oracle Multitenant is used for database consolidation.
  • Reliable performance: Oracle Quality of Service (QoS) can be used to allocate capacity for high priority database services to deliver consistent high performance in database consolidated environments. Capacity can be dynamically shifted between workloads to quickly respond to changing requirements.
  • HA during planned maintenance: High availability is maintained by implementing changes in a rolling manner across Oracle RAC nodes. This includes hardware, OS, or network maintenance that requires a server to be taken offline; software maintenance to patch the Oracle Grid Infrastructure or database; or if a database instance needs to be moved to another server to increase capacity or balance the workload.

Oracle RAC One Node

RAC One Node provides an alternative to Oracle RAC when scalability and instant failover are not required. RAC One Node license is one-half the price of Oracle RAC, providing a lower cost option when an RTO of minutes is sufficient for server outages.

RAC One Node is an active-passive failover technology. During normal operation it only allows a single database instance to be open at one time. If the server hosting the open instance fails, RAC One Node automatically starts a new database instance on a second node to quickly resume service.

RAC One Node provides several advantages over alternative active-passive clustering technologies.

  • It automatically responds to both database instance and server failures.
  • Oracle Database HA Services, Grid Infrastructure, and database listeners are always running on the second node. At failover time only the database instance and database services need to start, reducing the time required to resume service, and enabling service to resume in minutes. 
  • It provides the same advantages for planned maintenance as Oracle RAC. RAC One Node allows two active database instances during periods of planned maintenance to allow graceful migration of users from one node to another with zero downtime; database services remain available to users at all times.

Silver = Better HA

Silver represents a significant increase in HA compared to the Bronze reference architecture and is very well suited to a broad range of application requirements.  Oracle RAC immediately responds to an instance or server outage and reconnects users to surviving instances.

While Silver has substantial benefits, it is only one step above Bronze - there is still a much broader fault domain beyond instance or server failure. This includes events that can impact the availability of an entire cluster - data corruptions, storage array failures, bugs, human error, site outages, etc. There is also a class of application where the impact of outages must be completely transparent to the user. Stay tuned for future installments when we address this expanded set of requirements with the Gold and Platinum reference architectures.

Wednesday Jul 30, 2014

Backup to Oracle Cloud - Introduction to Oracle Database Backup Service

Backup and recovery of application data is the fundamental protection strategy for maintaining enterprise business continuity. I would be extremely surprised to hear of any enterprise that has never backed up its mission critical or business critical data. Any such a scenario is basically a ticking time bomb.

Depending on the specific RTO (recovery time objective) and RPO (recovery point objective) for each database, different Oracle Maximum Availability Architecture (MAA) strategies can be deployed by the enterprise.

From the backup and recovery perspective, the following are general practice guidelines that customers typically follow to address RTO and RPO requirements:

•    Local Fast Recovery Area (FRA): Typically stores backups for up to 7 days
•    External Storage (NAS): up to 30 days
•    Tape media (if available): 1 to 6 months
•    Tape vaulting (offsite storage): months to years

In addition to the above backup storage tiers, sophisticated organizations take additional precautions to avoid single site failure and to reduce load from production resources. MAA best practices, for example, recommend that copies of the backup data be stored in an offsite location.

But consider the following complications:

Other than tape vaulting, there is no alternative that enables complete physical offsite storage for short- and long- term backups. 

  • Many IT shops don’t have the tape infrastructure required for long term archival. Hence they are restricted to using local disk backups or expensive backup appliances.
  • Organizations with multiple databases that have various RTO/RPO requirements may have certain 2nd or 3rd tier databases that never get backed up.
  • Due to compliance requirements, customers now have to store backups for many years. Storing large volumes of data on local disks can become prohibitively expensive.
  • Many enterprises don’t have the CAPEX budget in place to implement these additional data protection steps. 
  • And almost ALL enterprise want a solution that’s operational right away.


So what’s the answer?

Introducing Oracle Database Backup Service - A Cloud Storage Solution for your Oracle Database Backups



Oracle Database Backup Service addresses the above needs by providing a low cost alternative for storing backups in an offsite location.  It is an Oracle Public Cloud object-based storage offering that enables you to store your on-premises or cloud-deployed database backups. You can use Oracle Database Backup Service as the Primary backup for 2nd or 3rd tier databases, or use the cloud backup as a secondary copy for long term archival requirements.

If you are familiar with Oracle Recovery Manager (RMAN), it should take only a few minutes for you to start backing up your database to the cloud. Here’s all you need to do:

1. Subscribe to the Oracle Database Backup Service.

  • This offering is available as a month-to-month or longer-term subscription (1,2, or 3 years).  Note that the prescription model is subject to change.

2. Download Oracle Database Cloud Backup Module from OTN site.

  • Unzip opc_installer.zip file, which has a detailed README about the steps to execute.

3. Run the installation procedure.

  • Provide your Oracle Public Cloud credentials, which are securely stored in an Oracle wallet with your database. The installation script also configures certain configuration files.

4. Configure RMAN

  • By using CONFIGURE (persistent), SET or even BACKUP commands, you can instruct RMAN to use the backup service module for backups.

5. Start enabling your backups and restores.

  • Use regular RMAN BACKUP or RESTORE commands for backups. All operations involving BACKUP SET mode of backups/recovery are supported.
  • You can also perform backups from FRA and other disk-based backup locations to the cloud.


How does this process work?

The Oracle Database Cloud Backup Module (ODCBM) receives backup blocks from RMAN, then chunks them into 20MB blocks and transmits to Oracle cloud. During the restore process, the same module retrieves data from the Cloud. The Oracle Database Cloud Backup Module is configured as SBT (Tape).



What are some unique features that Oracle Database Backup Service offers ?

To name a few:

  • End-to-end security (RMAN encryption is performed at backup time and data is securely transmitted over WAN).  And by the way, you don’t have to purchase the Advanced Security Option (ASO) to use RMAN encryption. You can use Password based, Transparent Data Encryption (TDE), or dual-mode. Encryption is supported for EE, SE, and SE1 editions.
  • Backups can be compressed to reduce the volume of data being transmitted. For Oracle Database 10gR2 and 11gR1, you can use BASIC compression. For 11gR2 and above, you can choose from LOW, MEDIUM, BASIC, and HIGH.
  • There’s NO ADDITIONAL COST other than the subscription to Oracle Database Backup Service.
  • You can use any number of RMAN channels to parallelize your backup and restore operations.
  • There are NO new commands to learn. Use the familiar RMAN commands.
  • Because a large portfolio of applications are already available in Oracle Cloud, you can use your backup in the cloud to spin a new instance or use it for your other PaaS or SaaS requirements.

So what are you waiting for?  Do you want to check the network throughput before you sign up? Start with a no-obligation one month trial by clicking “Try Now” from https://cloud.oracle.com/database_backup.

For more information,

In future blogs on Oracle Database Backup Service, I will discuss some best practices when deploying cloud-based backups.

Welcome to the MAA Blog!!

Welcome to the MAA blog! This set of blogs are created and maintained by members of Oracle’s Maximum Availability Architecture (MAA) team within Oracle’s Server Technology Development group. The MAA team interacts with Oracle’s customers around the world on various critical high availability (HA) initiatives, and with this blog forum, we hope to bring to you musings on some of the rich experiences we have gained till date. Our goal is to enrich the Oracle ecosystem with an interesting, informative and interactive conversation around Oracle MAA.

Please refer to the MAA website in OTN - http://www.oracle.com/goto/maa, for the latest collection of best practices for Oracle MAA.

Ashish Ray


Wednesday Jul 16, 2014

Oracle MAA Part 2: Bronze HA Reference Architecture

In the first installment of this series we discussed how one size does not fit all when it comes to HA architecture. We described Oracle Maximum Availability Architecture (Oracle MAA) best practices that define four standard reference architectures for data protection and high availability: BRONZE, SILVER, GOLD and PLATINUM.  Each reference architecture uses an optimal set of Oracle HA capabilities that reliably achieve a given service level (SLA) at the lowest cost. As you progress from one level to the next, each architecture expands upon the one that preceded it in order to handle an expanded fault domain and deliver a high level of service.

This article provides details for the Bronze reference architecture.

Bronze is appropriate for databases where simple restart or restore from backup is ‘HA enough’. It uses single instance Oracle database (no cluster) to provide a very basic level of HA and data protection in exchange for reduced cost and implementation complexity. An overview is provided in the figure below.

Bronze Reference Architecture

When a database instance or the server on which it is running fails, the recovery time objective (RTO) is a function of how quickly the database can be restarted and resume service. If a database is unrecoverable the RTO becomes a function of how quickly a backup can be restored. In a worst case scenario of a complete site outage additional time is required to provision new systems and perform these tasks at a secondary location, in some cases this can take days.

The potential data loss if there is an unrecoverable outage (recovery point objective or RPO), is equal to the data generated since the last backup was taken. Copies of database backups are retained locally and at a remote location or on the Cloud for the dual purpose of archival and DR should a disaster strike the primary data center.

Major components of the Bronze reference architecture and the service levels achieved include:

Oracle Database HA and Data Protection

  • Oracle Restart automatically restarts the database, the listener, and other Oracle components after a hardware or software failure or whenever a database host computer restarts.
  • Oracle corruption protection checks for physical corruption and logical intra-block corruptions. In-memory corruptions are detected and prevented from being written to disk and in many cases can be repaired automatically. For more details see Preventing, Detecting, and Repairing Block Corruption.
  • Automatic Storage Management (ASM) is an Oracle-integrated file system and volume manager that includes local mirroring to protect against disk failure.
  • Oracle Flashback Technologies provide fast error correction at a level of granularity that is appropriate to repair an individual transaction, a table, or the full database.
  • Oracle Recovery Manager (RMAN) enables low-cost, reliable backup and recovery optimized for the Oracle Database.
  • Online maintenance includes online redefinition and reorganization for database maintenance, online file movement, and online patching. 

Database Consolidation

  • Databases deployed using Bronze often include development and test databases and databases supporting smaller work group and departmental applications that are often the first candidates for database consolidation.
  • Oracle Multitenant is the MAA best practice for database consolidation from Oracle Database 12c onward. 
Life Cycle Management
  • Oracle Enterprise Manager Cloud Control enables self service deployment of IT resources for business users along with resource pooling models that cater to various multitenant architectures. It supports Database as a Service (DBaaS), a paradigm in which end users (Database Administrators, Application Developers, Quality Assurance Engineers, Project Leads, and so on) can request database services, consume it for the lifetime of the project, and then have them automatically de-provisioned and returned to the resource pool.

Oracle Engineered Systems

  • Oracle Engineered Systems are an efficient deployment option for database consolidation and DBaaS. Oracle Engineered Systems reduce lifecycle cost by standardizing on a pre-integrated and optimized platform for Oracle Database that is completely supported by Oracle.

Bronze Summary:  Data Protection, RTO, and RPO

Table 1 summarizes the data protection capabilities and service levels provided by the Bronze tier. The first column indicates when validations for physical and logical corruption are performed:

  • Manual checks are initiated by the administrator or at regular intervals by a scheduled job.
  • Runtime checks are automatically executed on a continuous basis by background processes while the database is open.
  • Background checks are run on a regularly scheduled interval, but only during periods when resources would otherwise be idle.
  • Each check is unique to Oracle Database using specific knowledge of Oracle data block and redo structures.

Table 1: Bronze - Data Protection

Type Capability Physical Block Corruption
Logical Block Corruption
Manual Dbverify, Analyze Physical block checks Logical checks for intra-block and inter-object consistency
Manual RMAN Physical block checks during backup and restore Intra-block logical checks
Runtime Database In-memory block and redo checksum In-memory intra block logical checks
Runtime ASM Automatic corruption detection and repair using local extent pairs
Runtime Exadata HARD checks on write HARD checks on write
Background Exadata Automatic HARD Disk Scrub and Repair

Note that HARD validation and the Automatic Hard Disk Scrub and Repair (the last two rows of Table 1) are unique to Exadata storage. HARD validation ensures that Oracle Database does not write physically corrupt blocks to disk. Automatic Hard Disk Scrub and Repair inspects and repairs hard disks with damaged or worn out disk sectors (cluster of storage) or other physical or logical defects periodically when there are idle resources.

Table 2 summarizes RTO and RPO for the Bronze tier for various unplanned and planned outages.

Table 2: Bronze - Recovery Time and Data Loss Potential

Type  Event  Downtime Data Loss Potential
Unplanned  Database instance failure
 Minutes  Zero
Unplanned  Recoverable server failure
Minutes to an hour
 Zero
Unplanned Data corruptions, unrecoverable server failure, database failures, or site failures
Hours to days
Since last backup
Planned Online file move, online reorganization and redefinition, online patching
Zero
 Zero
Planned Hardware or operating system maintenance and database patches that cannot be performed online
Minutes to hours
Zero
Planned Database upgrades: patch sets and full database releases
Minutes to hours
Zero
Planned Platform migrations
Hours to a day
Zero
Planned Application upgrades that modify back-end database objects
Hours to days
Zero

So when would you use bronze?  Bronze is useful when users can wait for a backup to be restored if there is an unrecoverable outage and accept that any data generated since the last backup was taken will be lost. The Oracle Database has a number of included capabilities described above that provide unique levels of data protection and availability for a low-cost environment based upon the Bronze reference architecture.

But what if I can't accept this level of downtime or data loss potential - well that is where the Silver, Gold and Platinum reference architectures come in. Bronze is only a starting point that establishes the foundation for subsequent HA reference architectures that provide higher quality of service. Stay tuned for future blog posts that will dive into the details of each reference architecture.

Wednesday Jul 02, 2014

Oracle GoldenGate Active-Active Part 2

My last post ( https://blogs.oracle.com/MAA/entry/oracle_goldengate_active_active_part )  focused on whether or not an application's database structure was set up sufficiently to perform conflict detection and resolution in active-active GoldenGate environments. Assuming that your application structure is ready, I'll now explain how to actually prevent conflicts from happening in the first place. While this is ideal, I don't think conflict prevention is something we could ever guarantee... especially when a fault or hiccup occurs in either the database or GoldenGate itself.  

Let's break up conflicts into 3 types, based on the DML: 

1. Inserts

2. Deletes

3. Updates 

1. Insert conflicts typically occur when two rows have the same primary key or when there are duplicate unique keys within a table. 

· Two rows with same primary key: To address these cases we could have primary keys generated based on a sequence value, then set up something like alternating sequences. Depending on how many nodes or servers are in the environment, you could use an algorithm that starts with n and increments by N (where n is the node or server number and N is the total number of nodes or servers). For example, in a 2-way scenario,  one  side  would  have  odd  sequence  values  (start with 1 and increment by 2) and the other would have even sequence values (start with 2 and increment by 2). 

· Duplicate unique keys: Avoiding conflicts in tables that have duplicate unique keys is a little trickier, and sometimes must be managed from the application perspective.  For example, let's say for a particular application that we have a table that contains login information for an account.  We would want the login name to be a unique value.  However it is possible that two people working on two different servers could attempt to obtain the same login name.  These kinds of operations can be eliminated if we restrict new account creation to a single server, thereby letting the database handle the uniqueness of a column. 

2. Delete conflicts are usually nothing to worry about. In most cases, this occurs when two people are attempting to delete the same record, or when someone tries to update a record that has already been deleted.  These conflicts can usually just be ignored.  However, I typically recommend that customers keep track of these types of conflicts in an exception table, just to make sure that nothing out of the ordinary is occurring. Once you’ve confirmed that things are running smoothly you can eliminate the exception mapping and just ignore the conflicts completely. 

3. Update conflicts are definitely the most prevalent.  These conflicts occur when two people try to update the same logical record on two different servers.  A typical example is when a customer is on the phone with support to change something associated with his or her credit card. At the same time, the customer is also logged into the account and is trying to change his or her address.  If these activities occur on two different servers and the lag is high enough, it could cause a conflict. In order to reduce or eliminate these conflicts there are a few best practices to follow: 

1) Reduce the Oracle GoldenGate (OGG) lag to the lowest level possible.  There are a few knowledge tickets on this. The master note is Main Note - Oracle GoldenGate - Lag, Performance, Slow and Hung Processes (Doc ID 1304557.1)

2) Logically partition users based upon geographical regions or usernames.  For example, when all users in North America access one server, and users in Europe access a different server, the chance of two people updating the same logical record on two different machines is greatly reduced.  Another option is to split up the users based on their usernames. Even something as simple as setting up usernames A-M to log into one server and usernames N-Z to log into another server can help reduce conflicts.   The reason this helps is related to my next point...

3) Set up Session Persistence time. IP or Session Persistence is the ability of a load balancer or router to keep track of where a connection is sent. In the event that a connection is lost, disconnected, etc, and a user attempts to reconnect or log back in, the connection will be sent to the same server where it was originally connected.  Most sessions have a time value that can be associated with this persistence. For example, if I set my session persistence to 10 seconds, then any time a session is disconnected or killed, the user will be sent to the same server as long as he or she logs back in within 10 seconds.  This is ideal for Oracle GoldenGate environments, where there would be lag between the different databases. In an ideal situation you would set this session persistence time value to be twice the average lag or 20 seconds – whichever is higher.  This allows a user who is filling a shopping cart or booking a reservation to maintain a consistent view of the data, even in the event of a client or network failure. 

By using these methods, the number of conflicts that actually occur can be drastically reduced, leading to a happier end user experience.  But even with the best intentions and preparation, not every conflict can be avoided. In my next post I will cover how to resolve such unavoidable conflicts. 

Thursday Jun 12, 2014

Oracle Data Protection: How Do You Measure Up? - Part 1

This is the first installment in a blog series, which examines the results of a recent database protection survey conducted by Database Trends and Applications (DBTA) Magazine.

All Oracle IT professionals know that a sound, well-tested backup and recovery strategy plays a foundational role in protecting their Oracle database investments, which in many cases, represent the lifeblood of business operations. But just how common are the data protection strategies used and the challenges faced across various enterprises? In January 2014, Database Trends and Applications Magazine (DBTA), in partnership with Oracle, released the results of its “Oracle Database Management and Data Protection Survey”. Two hundred Oracle IT professionals were interviewed on various aspects of their database backup and recovery strategies, in order to identify the top organizational and operational challenges for protecting Oracle assets.
Here are some of the key findings from the survey:

  • The majority of respondents manage backups for tens to hundreds of databases, representing total data volume of 5 to 50TB (14% manage 50 to 200 TB and some up to 5 PB or more).
  • About half of the respondents (48%) use HA technologies such as RAC, Data Guard, or storage mirroring, however these technologies are deployed on only 25% of their databases (or less).
  • This indicates that backups are still the predominant method for database protection among enterprises. Weekly full and daily incremental backups to disk were the most popular strategy, used by 27% of respondents, followed by daily full backups, which are used by 17%. Interestingly, over half of the respondents reported that 10% or less of their databases undergo regular backup testing.

 A few key backup and recovery challenges resonated across many of the respondents:

  • Poor performance and impact on productivity (see Figure 1)
    • 38% of respondents indicated that backups are too slow, resulting in prolonged backup windows.
    • In a similar vein, 23% complained that backups degrade the performance of production systems.
  • Lack of continuous protection (see Figure 2)
    • 35% revealed that less than 5% of Oracle data is protected in real-time.
  •  Management complexity
    • 25% stated that recovery operations are too complex. (see Figure 1)
    •  31% reported that backups need constant management. (see Figure 1)
    • 45% changed their backup tools as a result of growing data volumes, while 29% changed tools due to the complexity of the tools themselves.

Figure 1: Current Challenges with Database Backup and Recovery

Figure 2: Percentage of Organization’s Data Backed Up in Real-Time or Near Real-Time

In future blogs, we will discuss each of these challenges in more detail and bring insight into how the backup technology industry has attempted to resolve them.

Oracle Flashback Technologies - Overview

Oracle Flashback Technologies - Introduction

In his May 29th 2014 blog, my colleague Joe Meeks introduced Oracle Maximum Availability Architecture (MAA) and discussed both planned and unplanned outages. Let’s take a closer look at unplanned outages. These can be caused by physical failures (e.g., server, storage, network, file deletion, physical corruption, site failures) or by logical failures – cases where all components and files are physically available, but data is incorrect or corrupt. These logical failures are usually caused by human errors or application logic errors. This blog series focuses on these logical errors – what causes them and how to address and recover from them using Oracle Database Flashback.

In this introductory blog post, I’ll provide an overview of the Oracle Database Flashback technologies and will discuss the features in detail in future blog posts. Let’s get started.

We are all human beings (unless a machine is reading this), and making mistakes is a part of what we do…often what we do best!  We “fat finger”, we spill drinks on keyboards, unplug the wrong cables, etc.  In addition, many of us, in our lives as DBAs or developers, must have observed, caused, or corrected one or more of the following unpleasant events:
  • Accidentally updated a table with wrong values !!
  • Performed a batch update that went wrong - due to logical errors in the code !!
  • Dropped a table !!
How do DBAs typically recover from these types of errors? First, data needs to be restored and recovered to the point-in-time when the error occurred (incomplete or point-in-time recovery).  Moreover, depending on the type of fault, it’s possible that some services – or even the entire database – would have to be taken down during the recovery process.

Apart from error conditions, there are other questions that need to be addressed as part of the investigation. For example, what did the data look like in the morning, prior to the error? What were the various changes to the row(s) between two timestamps? Who performed the transaction and how can it be reversed?  

Oracle Database includes built-in Flashback technologies, with features that address these challenges and questions, and enable you to perform faster, easier, and convenient recovery from logical corruptions.

History

Flashback Query, the first Flashback Technology, was introduced in Oracle 9i. It provides a simple, powerful and completely non-disruptive mechanism for data verification and recovery from logical errors, and enables users to view the state of data at a previous point in time.

Flashback Technologies were further enhanced in Oracle 10g, to provide fast, easy recovery at the database, table, row, and even at a transaction level.

Oracle Database 11g introduced an innovative method to manage and query long-term historical data with Flashback Data Archive. The 11g release also introduced Flashback Transaction, which provides an easy, one-step operation to back out a transaction. Oracle Database versions 11.2.0.2 and beyond further enhanced the performance of these features. Note that all the features listed here work without requiring any kind of restore operation.

In addition, Flashback features are fully supported with the new multi-tenant capabilities introduced with Oracle Database 12c,

Flashback Features

Oracle Flashback Database enables point-in-time-recovery of the entire database without requiring a traditional restore and recovery operation. It rewinds the entire database to a specified point in time in the past by undoing all the changes that were made since that time.

Oracle Flashback Table enables an entire table or a set of tables to be recovered to a point in time in the past.

Oracle Flashback Drop enables accidentally dropped tables and all dependent objects to be restored.

Oracle Flashback Query enables data to be viewed at a point-in-time in the past. This feature can be used to view and reconstruct data that was lost due to unintentional change(s) or deletion(s). This feature can also be used to build self-service error correction into applications, empowering end-users to undo and correct their errors.

Oracle Flashback Version Query offers the ability to query the historical changes to data between two points in time or system change numbers (SCN)

Oracle Flashback Transaction Query enables changes to be examined at the transaction level. This capability can be used to diagnose problems, perform analysis, audit transactions, and even revert the transaction by undoing SQL

Oracle Flashback Transaction is a procedure used to back-out a transaction and its dependent transactions.

Flashback technologies eliminate the need for a traditional restore and recovery process to fix logical corruptions or make enquiries. Using these technologies, you can recover from the error in the same amount of time it took to generate the error. All the Flashback features can be accessed either via SQL command line (or) via Enterprise Manager.  

Most of the Flashback technologies depend on the available UNDO to retrieve older data. The following table describes the various Flashback technologies: their purpose, dependencies and situations where each individual technology can be used.  



Example Syntax


Error investigation related:

The purpose is to investigate what went wrong and what the values were at certain points in time

Flashback Queries  ( select .. as of SCN | Timestamp )
   - Helps to see the value of a row/set of rows at a point in time

Flashback Version Queries  ( select .. versions between SCN | Timestamp and SCN | Timestamp)
  - Helps determine how the value evolved between certain SCNs or between timestamps

Flashback Transaction Queries (select .. XID=)
   - Helps to understand how the transaction caused the changes.

Error correction related:

The purpose is to fix the error and correct the problems,

Flashback Table  (flashback table .. to SCN | Timestamp)
  - To rewind the table to a particular timestamp or SCN to reverse unwanted updates

Flashback Drop (flashback table ..  to before drop )
  - To undrop or undelete a table

Flashback Database (flashback database to SCN  | Restore Point )
  - This is the rewind button for Oracle databases. You can revert the entire database to a particular point in time. It is a fast way to perform a PITR (point-in-time recovery).

Flashback Transaction (DBMS_FLASHBACK.TRANSACTION_BACKOUT(XID..))
  - To reverse a transaction and its related transactions

Advanced use cases

Flashback technology is integrated into Oracle Recovery Manager (RMAN) and Oracle Data Guard. So, apart from the basic use cases mentioned above, the following use cases are addressed using Oracle Flashback.

  • Block Media recovery by RMAN - to perform block level recovery
  • Snapshot Standby - where the standby is temporarily converted to a read/write environment for testing, backup, or migration purposes
  • Re-instate old primary in a Data Guard environment – this avoids the need to restore an old backup and perform a recovery to make it a new standby.
  • Guaranteed Restore Points - to bring back the entire database to an older point-in-time in a guaranteed way.

and so on..

I hope this introductory overview helps you understand how Flashback features can be used to investigate and recover from logical errors.  As mentioned earlier, I will take a deeper-dive into to some of the critical Flashback features in my upcoming blogs and address common use cases.

Tuesday Jun 10, 2014

Oracle GoldenGate Active-Active Part 1

My name is Nick Wagner, and I'm a recent addition to the Oracle Maximum Availability Architecture (MAA) product management team.  I've spent the last 15+ years working on database replication products, and I've spent the last 10 years working on the Oracle GoldenGate product.  So most of my posting will probably be focused on OGG. 


One question that comes up all the time is around active-active replication with Oracle GoldenGate.  How do I know if my application is a good fit for active-active replication with GoldenGate?   To answer that, it really comes down to how you plan on handling conflict resolution. 


I will delve into topology and deployment in a later blog, but here is a simple architecture:


Active-Active Architecture


The two most common resolution routines are host based resolution and timestamp based resolution.


Host based resolution is used less often, but works with the fewest application changes.  Think of it like this: any transactions from SystemA always take precedence over any transactions from SystemB.  If there is a conflict on SystemB, then the record from SystemA will overwrite it.  If there is a conflict on SystemA, then it will be ignored.  It is quite a bit less restrictive, and in most cases, as long as all the tables have primary keys, host based resolution will work just fine. 


Timestamp based resolution, on the other hand, is a little trickier. In this case, you can decide which record is overwritten based on timestamps. For example, does the older record get overwritten with the newer record?  Or vice-versa?  This method not only requires primary keys on every table, but it also requires every table to have a timestamp/date column that is updated each time a record is inserted or updated on the table.  Most homegrown applications can always be customized to include these requirements, but it's a little more difficult with 3rd party applications, and might even be impossible for large ERP type applications. 


If your database has these features - whether it’s primary keys for host based resolution, or primary keys and timestamp columns for timestamp based resolution - then your application could be a great candidate for active-active replication.  But table structure is not the only requirement.  The other consideration applies when there is a conflict; i.e., do I need to perform any notification or track down the user that had their data overwritten?  In most cases, I don't think it's necessary, but if it is required, OGG can always create an exceptions table that contains all of the overwritten transactions so that people can be notified. It's a bit of extra work to implement this type of option, but if the business requires it, then it can be done. Unless someone is constantly monitoring this exception table or has an automated process in dealing with exceptions, there will be a delay in getting a response back to the end user.


Ideally, when setting up active-active resolution we can include some simple procedural steps or configuration options that can reduce, or in some cases eliminate the potential for conflicts.  This makes the whole implementation that much easier and foolproof.  And I'll cover these in my next blog. 

About

Musings on Oracle's Maximum Availability Architecture (MAA), by members of Oracle Development team. Note that we may not have the bandwidth to answer generic questions on MAA.

Search

Categories
Archives
« March 2015
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
    
       
Today