X

Maximum Availability Architecture – Oracle’s industry-leading set of database high availability capabilities

  • September 22, 2012

To SYNC or not to SYNC – Part 4

Ashish Ray
Vice President

This is Part 4 of a multi-part blog article where we are discussing various aspects of setting up Data Guard synchronous redo transport (SYNC). In Part 1 of this article, I debunked the myth that Data Guard SYNC is similar to a two-phase commit operation. In Part 2, I discussed the various ways that network latency may or may not impact a Data Guard SYNC configuration. In Part 3, I talked in details regarding why Data Guard SYNC is a good thing, and the distance implications you have to keep in mind.

In this final article of the series, I will talk about how you can nicely complement Data Guard SYNC with the ability to failover in seconds.

Wait - Did I Say “Seconds”?

Did I just say that some customers do Data Guard failover in seconds? Yes, Virginia, there is a Santa Claus.

Data Guard has an automatic failover capability, aptly called Fast-Start Failover. Initially available with Oracle Database 10g Release 2 for Data Guard SYNC transport mode (and enhanced in Oracle Database 11g to support Data Guard ASYNC transport mode), this capability, managed by Data Guard Broker, lets your Data Guard configuration automatically failover to a designated standby database. Yes, this means no human intervention is required to do the failover. This process is controlled by a low footprint Data Guard Broker client called Observer, which makes sure that the primary database and the designated standby database are behaving like good kids. If something bad were to happen to the primary database, the Observer, after a configurable threshold period, tells that standby, “Your time has come, you are the chosen one!” The standby dutifully follows the Observer directives by assuming the role of the new primary database. The DBA or the Sys Admin doesn’t need to be involved.

And - in case you are following this discussion very closely, and are wondering … “Hmmm … what if the old primary is not really dead, but just network isolated from the Observer or the standby - won’t this lead to a split-brain situation?” The answer is No - It Doesn’t. With respect to why-it-doesn’t, I am sure there are some smart DBAs in the audience who can explain the technical reasons. Otherwise - that will be the material for a future blog post.

So - this combination of SYNC and Fast-Start Failover is the nirvana of lights-out, integrated HA and DR, as practiced by some of our advanced customers. They have observed failover times (with no data loss) ranging from single-digit seconds to tens of seconds. With this, they support operations in industry verticals such as manufacturing, retail, telecom, Internet, etc. that have the most demanding availability requirements.

One of our leading customers with massive cloud deployment initiatives tells us that they know about server failures only after Data Guard has automatically completed the failover process and the app is back up and running! Needless to mention, Data Guard Broker has the integration hooks for interfaces such as JDBC and OCI, or even for custom apps, to ensure the application gets automatically rerouted to the new primary database after the database level failover completes.

Net Net?

To sum up this multi-part blog article, Data Guard with SYNC redo transport mode, plus Fast-Start Failover, gives you the ideal triple-combo - that is, it gives you the assurance that for critical outages, you can failover your Oracle databases:

  1. very fast
  2. without human intervention, and
  3. without losing any data.

In short, it takes the element of risk out of critical IT operations. It does require you to be more careful with your network and systems planning, but as far as HA is concerned, the benefits outweigh the investment costs.

So, this is what we in the MAA Development Team believe in. What do you think? How has your deployment experience been? We look forward to hearing from you!

Join the discussion

Comments ( 3 )
  • cleto Sunday, April 21, 2013

    Thanks for the great explanation on To SYNC or not to SYNC 1-4

    We are at a critical decision to invest in exadata

    A. Primary Site [Prod] Oracle11gR2 RAC

    B. Primary Site [hot Active Standby] Oracle11gR2 Single no RAC [SYNC]

    cascade to DR with ASYNC

    C. DR Site [DR] Oracle11gR2 RAC Physical Standby [ASYNC]

    We want to achieve RPO=0 data loss.

    1. from A TO B active dg with SYNC max protection and B TO C with cascade ASYNC max Performance.

    A quick reply today to my email would be great of great help.

    As I have read old post saying you cannot cascade with RAC 11GR2?


  • Ashish Ray Thursday, July 18, 2013

    There are no restrictions wrt RAC & cascading Oracle Database 11.2.0.2 onwards. However in Oracle Database 11g, if your Data Guard setup is like A -> B -> C, the B -> C redo shipping is like ARCH. Please look into Oracle Database 12c where we have enabled B -> C redo shipping to be ASYNC. While you are reviewing Database 12c documentation, please also refer to Data Guard Far Sync, which enables zero data loss for long distance Data Guard deployments.


  • RobK Tuesday, March 6, 2018
    Dear Ashish!

    We are testing the network for SYNC Data Guard Setup.
    We use oratcptest as described in:
    Measuring Network Capacity using oratcptest (Doc ID 2064368.1)

    The num_conn parameter defaults to 1. This means that there is maximum 1 packet of redo information travelling forth or the ACKNOWLEDGE message travelling back through the network.

    Q: In the simplest Data Guard setup (1 Primary, 1 Standby) with SYNC transport is it possible that there is more than one redo packet (or ACK) travelling at the same time?
    So if I have
    - 10ms latency on the network
    - 3 commits issued on the primary in 3 sessions
    - 1ms delay between the commits

    Is it going to last for ~12ms or ~32ms?

    I am asking this as the note mentioned above does not discuss the number of connections under "Assessing Network Bandwidth for Data Guard SYNC Transport", which I see as a major problem.

    Thanks in advance,
    Rob
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.