Thursday Jun 12, 2014

Oracle Data Protection: How Do You Measure Up? - Part 1

This is the first installment in a blog series, which examines the results of a recent database protection survey conducted by Database Trends and Applications (DBTA) Magazine.

All Oracle IT professionals know that a sound, well-tested backup and recovery strategy plays a foundational role in protecting their Oracle database investments, which in many cases, represent the lifeblood of business operations. But just how common are the data protection strategies used and the challenges faced across various enterprises? In January 2014, Database Trends and Applications Magazine (DBTA), in partnership with Oracle, released the results of its “Oracle Database Management and Data Protection Survey”. Two hundred Oracle IT professionals were interviewed on various aspects of their database backup and recovery strategies, in order to identify the top organizational and operational challenges for protecting Oracle assets.
Here are some of the key findings from the survey:

  • The majority of respondents manage backups for tens to hundreds of databases, representing total data volume of 5 to 50TB (14% manage 50 to 200 TB and some up to 5 PB or more).
  • About half of the respondents (48%) use HA technologies such as RAC, Data Guard, or storage mirroring, however these technologies are deployed on only 25% of their databases (or less).
  • This indicates that backups are still the predominant method for database protection among enterprises. Weekly full and daily incremental backups to disk were the most popular strategy, used by 27% of respondents, followed by daily full backups, which are used by 17%. Interestingly, over half of the respondents reported that 10% or less of their databases undergo regular backup testing.

 A few key backup and recovery challenges resonated across many of the respondents:

  • Poor performance and impact on productivity (see Figure 1)
    • 38% of respondents indicated that backups are too slow, resulting in prolonged backup windows.
    • In a similar vein, 23% complained that backups degrade the performance of production systems.
  • Lack of continuous protection (see Figure 2)
    • 35% revealed that less than 5% of Oracle data is protected in real-time.
  •  Management complexity
    • 25% stated that recovery operations are too complex. (see Figure 1)
    •  31% reported that backups need constant management. (see Figure 1)
    • 45% changed their backup tools as a result of growing data volumes, while 29% changed tools due to the complexity of the tools themselves.

Figure 1: Current Challenges with Database Backup and Recovery

Figure 2: Percentage of Organization’s Data Backed Up in Real-Time or Near Real-Time

In future blogs, we will discuss each of these challenges in more detail and bring insight into how the backup technology industry has attempted to resolve them.

Oracle Flashback Technologies - Overview

Oracle Flashback Technologies - Introduction

In his May 29th 2014 blog, my colleague Joe Meeks introduced Oracle Maximum Availability Architecture (MAA) and discussed both planned and unplanned outages. Let’s take a closer look at unplanned outages. These can be caused by physical failures (e.g., server, storage, network, file deletion, physical corruption, site failures) or by logical failures – cases where all components and files are physically available, but data is incorrect or corrupt. These logical failures are usually caused by human errors or application logic errors. This blog series focuses on these logical errors – what causes them and how to address and recover from them using Oracle Database Flashback.

In this introductory blog post, I’ll provide an overview of the Oracle Database Flashback technologies and will discuss the features in detail in future blog posts. Let’s get started.

We are all human beings (unless a machine is reading this), and making mistakes is a part of what we do…often what we do best!  We “fat finger”, we spill drinks on keyboards, unplug the wrong cables, etc.  In addition, many of us, in our lives as DBAs or developers, must have observed, caused, or corrected one or more of the following unpleasant events:
  • Accidentally updated a table with wrong values !!
  • Performed a batch update that went wrong - due to logical errors in the code !!
  • Dropped a table !!
How do DBAs typically recover from these types of errors? First, data needs to be restored and recovered to the point-in-time when the error occurred (incomplete or point-in-time recovery).  Moreover, depending on the type of fault, it’s possible that some services – or even the entire database – would have to be taken down during the recovery process.

Apart from error conditions, there are other questions that need to be addressed as part of the investigation. For example, what did the data look like in the morning, prior to the error? What were the various changes to the row(s) between two timestamps? Who performed the transaction and how can it be reversed?  

Oracle Database includes built-in Flashback technologies, with features that address these challenges and questions, and enable you to perform faster, easier, and convenient recovery from logical corruptions.


Flashback Query, the first Flashback Technology, was introduced in Oracle 9i. It provides a simple, powerful and completely non-disruptive mechanism for data verification and recovery from logical errors, and enables users to view the state of data at a previous point in time.

Flashback Technologies were further enhanced in Oracle 10g, to provide fast, easy recovery at the database, table, row, and even at a transaction level.

Oracle Database 11g introduced an innovative method to manage and query long-term historical data with Flashback Data Archive. The 11g release also introduced Flashback Transaction, which provides an easy, one-step operation to back out a transaction. Oracle Database versions and beyond further enhanced the performance of these features. Note that all the features listed here work without requiring any kind of restore operation.

In addition, Flashback features are fully supported with the new multi-tenant capabilities introduced with Oracle Database 12c,

Flashback Features

Oracle Flashback Database enables point-in-time-recovery of the entire database without requiring a traditional restore and recovery operation. It rewinds the entire database to a specified point in time in the past by undoing all the changes that were made since that time.

Oracle Flashback Table enables an entire table or a set of tables to be recovered to a point in time in the past.

Oracle Flashback Drop enables accidentally dropped tables and all dependent objects to be restored.

Oracle Flashback Query enables data to be viewed at a point-in-time in the past. This feature can be used to view and reconstruct data that was lost due to unintentional change(s) or deletion(s). This feature can also be used to build self-service error correction into applications, empowering end-users to undo and correct their errors.

Oracle Flashback Version Query offers the ability to query the historical changes to data between two points in time or system change numbers (SCN)

Oracle Flashback Transaction Query enables changes to be examined at the transaction level. This capability can be used to diagnose problems, perform analysis, audit transactions, and even revert the transaction by undoing SQL

Oracle Flashback Transaction is a procedure used to back-out a transaction and its dependent transactions.

Flashback technologies eliminate the need for a traditional restore and recovery process to fix logical corruptions or make enquiries. Using these technologies, you can recover from the error in the same amount of time it took to generate the error. All the Flashback features can be accessed either via SQL command line (or) via Enterprise Manager.  

Most of the Flashback technologies depend on the available UNDO to retrieve older data. The following table describes the various Flashback technologies: their purpose, dependencies and situations where each individual technology can be used.  

Example Syntax

Error investigation related:

The purpose is to investigate what went wrong and what the values were at certain points in time

Flashback Queries  ( select .. as of SCN | Timestamp )
   - Helps to see the value of a row/set of rows at a point in time

Flashback Version Queries  ( select .. versions between SCN | Timestamp and SCN | Timestamp)
  - Helps determine how the value evolved between certain SCNs or between timestamps

Flashback Transaction Queries (select .. XID=)
   - Helps to understand how the transaction caused the changes.

Error correction related:

The purpose is to fix the error and correct the problems,

Flashback Table  (flashback table .. to SCN | Timestamp)
  - To rewind the table to a particular timestamp or SCN to reverse unwanted updates

Flashback Drop (flashback table ..  to before drop )
  - To undrop or undelete a table

Flashback Database (flashback database to SCN  | Restore Point )
  - This is the rewind button for Oracle databases. You can revert the entire database to a particular point in time. It is a fast way to perform a PITR (point-in-time recovery).

  - To reverse a transaction and its related transactions

Advanced use cases

Flashback technology is integrated into Oracle Recovery Manager (RMAN) and Oracle Data Guard. So, apart from the basic use cases mentioned above, the following use cases are addressed using Oracle Flashback.

  • Block Media recovery by RMAN - to perform block level recovery
  • Snapshot Standby - where the standby is temporarily converted to a read/write environment for testing, backup, or migration purposes
  • Re-instate old primary in a Data Guard environment – this avoids the need to restore an old backup and perform a recovery to make it a new standby.
  • Guaranteed Restore Points - to bring back the entire database to an older point-in-time in a guaranteed way.

and so on..

I hope this introductory overview helps you understand how Flashback features can be used to investigate and recover from logical errors.  As mentioned earlier, I will take a deeper-dive into to some of the critical Flashback features in my upcoming blogs and address common use cases.

Tuesday Jun 10, 2014

Oracle GoldenGate Active-Active Part 1

My name is Nick Wagner, and I'm a recent addition to the Oracle Maximum Availability Architecture (MAA) product management team.  I've spent the last 15+ years working on database replication products, and I've spent the last 10 years working on the Oracle GoldenGate product.  So most of my posting will probably be focused on OGG. 

One question that comes up all the time is around active-active replication with Oracle GoldenGate.  How do I know if my application is a good fit for active-active replication with GoldenGate?   To answer that, it really comes down to how you plan on handling conflict resolution. 

I will delve into topology and deployment in a later blog, but here is a simple architecture:

Active-Active Architecture

The two most common resolution routines are host based resolution and timestamp based resolution.

Host based resolution is used less often, but works with the fewest application changes.  Think of it like this: any transactions from SystemA always take precedence over any transactions from SystemB.  If there is a conflict on SystemB, then the record from SystemA will overwrite it.  If there is a conflict on SystemA, then it will be ignored.  It is quite a bit less restrictive, and in most cases, as long as all the tables have primary keys, host based resolution will work just fine. 

Timestamp based resolution, on the other hand, is a little trickier. In this case, you can decide which record is overwritten based on timestamps. For example, does the older record get overwritten with the newer record?  Or vice-versa?  This method not only requires primary keys on every table, but it also requires every table to have a timestamp/date column that is updated each time a record is inserted or updated on the table.  Most homegrown applications can always be customized to include these requirements, but it's a little more difficult with 3rd party applications, and might even be impossible for large ERP type applications. 

If your database has these features - whether it’s primary keys for host based resolution, or primary keys and timestamp columns for timestamp based resolution - then your application could be a great candidate for active-active replication.  But table structure is not the only requirement.  The other consideration applies when there is a conflict; i.e., do I need to perform any notification or track down the user that had their data overwritten?  In most cases, I don't think it's necessary, but if it is required, OGG can always create an exceptions table that contains all of the overwritten transactions so that people can be notified. It's a bit of extra work to implement this type of option, but if the business requires it, then it can be done. Unless someone is constantly monitoring this exception table or has an automated process in dealing with exceptions, there will be a delay in getting a response back to the end user.

Ideally, when setting up active-active resolution we can include some simple procedural steps or configuration options that can reduce, or in some cases eliminate the potential for conflicts.  This makes the whole implementation that much easier and foolproof.  And I'll cover these in my next blog. 

Tuesday Jun 03, 2014

How to Achieve Real-Time Data Protection and Availabilty....For Real

There is a class of business and mission critical applications where downtime or data loss have substantial negative impact on revenue, customer service, reputation, cost, etc. Because the Oracle Database is used extensively to provide reliable performance and availability for this class of application, it also provides an integrated set of capabilities for real-time data protection and availability.

Active Data Guard, depicted in the figure below, is the cornerstone for accomplishing these objectives because it provides the absolute best real-time data protection and availability for the Oracle Database. This is a bold statement, but it is supported by the facts. It isn’t so much that alternative solutions are bad, it’s just that their architectures prevent them from achieving the same levels of data protection, availability, simplicity, and asset utilization provided by Active Data Guard. Let’s explore further.

Active Data Guard - Overview

Backups are the most popular method used to protect data and are an essential best practice for every database. Not surprisingly, Oracle Recovery Manager (RMAN) is one of the most commonly used features of the Oracle Database. But comparing Active Data Guard to backups is like comparing apples to motorcycles. Active Data Guard uses a hot (open read-only), synchronized copy of the production database to provide real-time data protection and HA. In contrast, a restore from backup takes time and often has many moving parts - people, processes, software and systems – that can create a level of uncertainty during an outage that critical applications can’t afford. This is why backups play a secondary role for your most critical databases by complementing real-time solutions that can provide both data protection and availability.

Before Data Guard, enterprises used storage remote-mirroring for real-time data protection and availability. Remote-mirroring is a sophisticated storage technology promoted as a generic infrastructure solution that makes a simple promise – whatever is written to a primary volume will also be written to the mirrored volume at a remote site. Keeping this promise is also what causes data loss and downtime when the data written to primary volumes is corrupt – the same corruption is faithfully mirrored to the remote volume making both copies unusable. This happens because remote-mirroring is a generic process. It has no  intrinsic knowledge of Oracle data structures to enable advanced protection, nor can it perform independent Oracle validation BEFORE changes are applied to the remote copy. There is also nothing to prevent human error (e.g. a storage admin accidentally deleting critical files) from also impacting the remote mirrored copy.

Remote-mirroring tricks users by creating a false impression that there are two separate copies of the Oracle Database. In truth; while remote-mirroring maintains two copies of the data on different volumes, both are part of a single closely coupled system. Not only will remote-mirroring propagate corruptions and administrative errors, but the changes applied to the mirrored volume are a result of the same Oracle code path that applied the change to the source volume. There is no isolation, either from a storage mirroring perspective or from an Oracle software perspective.  Bottom line, storage remote-mirroring lacks both the smarts and isolation level necessary to provide true data protection.

Active Data Guard offers much more than storage remote-mirroring when your objective is protecting your enterprise from downtime and data loss. Like remote-mirroring, an Active Data Guard replica is an exact block for block copy of the primary. Unlike remote-mirroring, an Active Data Guard replica is NOT a tightly coupled copy of the source volumes - it is a completely independent Oracle Database. Active Data Guard’s inherent knowledge of Oracle data block and redo structures enables a separate Oracle Database using a different Oracle code path than the primary to use the full complement of Oracle data validation methods before changes are applied to the synchronized copy. These include: physical check sum, logical intra-block checking, lost write validation, and automatic block repair. The figure below illustrates the stark difference between the knowledge that remote-mirroring can discern from an Oracle data block and what Active Data Guard can discern.

Oracle Block Structure

An Active Data Guard standby also provides a range of additional services enabled by the fact that it is a running Oracle Database - not just a mirrored copy of data files. An Active Data Guard standby database can be open read-only while it is synchronizing with the primary. This enables read-only workloads to be offloaded from the primary system and run on the active standby - boosting performance by utilizing all assets. An Active Data Guard standby can also be used to implement many types of system and database maintenance in rolling fashion. Maintenance and upgrades are first implemented on the standby while production runs unaffected at the primary. After the primary and standby are synchronized and all changes have been validated, the production workload is quickly switched to the standby. The only downtime is the time required for user connections to transfer from one system to the next. These capabilities further expand the expectations of availability offered by a data protection solution beyond what is possible to do using storage remote-mirroring.

So don’t be fooled by appearances.  Storage remote-mirroring and Active Data Guard replication may look similar on the surface - but the devil is in the details. Only Active Data Guard has the smarts, the isolation, and the simplicity, to provide the best data protection and availability for the Oracle Database.

Stay tuned for future blog posts that dive into the many differences between storage remote-mirroring and Active Data Guard along the dimensions of data protection, data availability, cost, asset utilization and return on investment.

For additional information on Active Data Guard, see:


Musings on Oracle's Maximum Availability Architecture (MAA), by members of Oracle Development team. Note that we may not have the bandwidth to answer generic questions on MAA.


« June 2014 »