Maximum Availability Architecture – Oracle’s industry-leading set of database high availability capabilities

Recent Posts

Maximum Availability Architecture

Standard Edition High Availability Released - See What's New

The "Standard Edition 2 – We Heard You! Announcing: Standard Edition High Availability" blog post published approximately two months ago resulted in a lot of interest in this new feature - Thank you! It is therefore with great pleasure that I can announce the general availability of Standard Edition High Availability (SEHA) on Linux, Microsoft Windows and Solaris with Oracle Database 19c, Release Update (RU) 19.7. Additional operating systems are planned to be supported later this year. What's New As the Oracle Database 19c New Features Guide under the "RAC and Grid section" states, Standard Edition High Availability provides cluster-based failover for single-instance Standard Edition Oracle Databases using Oracle Clusterware. It benefits from the cluster capabilities and storage solutions that are already part of Oracle Grid Infrastructure, such as Oracle Clusterware, Oracle Automatic Storage Management (Oracle ASM) and Oracle ASM Cluster File System (Oracle ACFS). Standard Edition High Availability is fully integrated with Oracle Grid Infrastructure starting with Oracle Grid Infrastructure 19c, Release Update 19.7. The prerequisites for SEHA database systems are therefore largely the same as for all Grid Infrastructure-based database systems, as discussed under Requirements for Installing Standard Edition High Availability. Standard Edition High Availability databases are not Real Application Clusters (RAC)-enabled. Oracle RAC One Node is a RAC-enabled database in that the RAC-option needs to be enabled in the Oracle Database home from which the database runs. This is not the case for Standard Edition High Availability databases. While both solutions provide cluster-based failover for the Oracle Database, RAC One node supports additional high availability features further reducing planned maintenance related downtime that are not part of the Standard Edition High Availability offering. For more information on how Standard Edition High Availability, Oracle Restart, Oracle RAC and Oracle RAC One Node compare, see the High Availability Options Overview for Oracle Databases using Oracle Clusterware. Standard Edition High Availability databases can be licensed using the “10-day failover rule”, which is described in this document. This rule includes the right to run the licensed program(s) [here the Standard Edition High Availability database] on an unlicensed spare computer in a failover environment for up to a total of ten separate days in any given calendar year. This right only applies when a number of machines are arranged in a cluster and share one disk array, which is the case for Standard Edition High Availability databases by default. In addition, SEHA databases are subject to all licensing regulations that generally apply to a Standard Edition 2 (SE2) single-instance Oracle Database. Note that SEHA databases are not subject to a per-cluster socket limitation, but need to adhere to the per-server socket limitation that applies to any Standard Edition 2 Oracle Database.  Standard Edition High Availability databases can either be freshly installed or configured using an existing single-instance Standard Edition Oracle Database. There is no direct upgrade path from either a single-instance or a pre-19c Oracle RAC Standard Edition 2 database. While the database can be upgraded, configuring it to be a Standard Edition High Availability database requires additional manual steps explained in the Managing Standard Edition High Availability with Oracle Databases section of the Oracle Database Administrator’s Guide. Standard Edition 2 Oracle RAC databases need to be converted to single-instance Standard Edition Oracle Databases prior to upgrading to Oracle Database 19c, as described in My Oracle Support note 2504078.1. Concluding, Standard Edition High Availability provides a fully integrated cluster-based failover for Standard Edition Oracle Databases using Oracle Clusterware. Oracle’s Standard Edition 2 (SE2) customers thereby benefit from the high availability capabilities and storage management solutions that are already part of Oracle Grid Infrastructure, such as Oracle Automatic Storage Management (ASM) and the Oracle ASM Cluster File System (ACFS), free of charge.  Start testing here: Deploying Standard Edition High Availability and let me know your feedback, please.  

The "Standard Edition 2 – We Heard You! Announcing: Standard Edition High Availability" blog post published approximately two months ago resulted in a lot of interest in this new feature - Thank you! It...

Maximum Availability Architecture

Standard Edition 2 – We Heard You! Announcing: Standard Edition High Availability

Based on your feedback, Oracle is planning to include a Standard Edition High Availability solution with Oracle Grid Infrastructure that provides cluster-based failover for Standard Edition Oracle Databases using Oracle Clusterware.  Oracle’s Standard Edition 2 (SE2) customers thereby benefit from the high availability capabilities and storage management solutions that are already part of Oracle Grid Infrastructure, such as Oracle Automatic Storage Management (ASM) and the Oracle ASM Cluster File System (ACFS), free of charge.   Standard Edition customers can use the maximum supported 16 CPU threads per instance during normal operations as well as after failover.  Using Oracle Grid Infrastructure and today’s hardware, the failover is expected to be faster than using any alternative solution.  Oracle Database Appliance (ODA) will provide ODA-specific enhancements that further simplify, automate, and optimize Standard Edition High Availability.  Using integrated, shared, and concurrently mounted storage management solutions such as Oracle ASM for database files and Oracle ACFS for database files as well as unstructured data, enables Oracle Grid Infrastructure to restart an Oracle Database on a failover node much faster than any cluster solution that relies on failing over and remounting volumes and file systems, which typically requires file system recovery prior to starting the database instance. Further, Oracle Grid Infrastructure Standalone Agents for Oracle Clusterware can be used in conjunction with Standard Edition High Availability to manage the failover of complete application stacks for certain applications, such as Oracle WebLogic and Apache.   The release of Standard Edition High Availability in addition to the recently announced inclusion of the formerly extra-cost options to Oracle Database Enterprise Edition: Oracle Machine Learning, Oracle Spatial and Oracle Graph, demonstrate Oracle’s ongoing commitment to Oracle Database Standard Edition.  Extending the already compelling functionality of Oracle Database Standard Edition, these new features help ensure that SE2 continues to provide great value to Oracle’s customers. Update: Standard Edition High Availability has been officially released with Oracle Database 19c, Release Update (RU) 19.7 for Linux, Microsoft Windows and Solaris.  

Based on your feedback, Oracle is planning to include a Standard Edition High Availability solution with Oracle Grid Infrastructure that provides cluster-based failover for Standard Edition...

Maximum Availability Architecture

Introduction to MAA and Overview

The availability of business-critical applications including the databases serving those applications is an important element of every IT strategy. Hence, Oracle constantly enhances the database’s features and solutions such as data protection, high availability and disaster recovery to provide the highest enterprise value.  As a result, Oracle has built upon its enterprise experience with tens of thousands of customers across every region and industry in the world to develop an all-encompassing set of high availability blueprints and validated solutions that are called Oracle Maximum Availability Architecture (MAA). What is MAA? Oracle MAA is a collection of architecture, configuration, and life cycle best practices and blueprints. It provides Oracle’s customers with valuable insights and expert recommendations which have been validated and tested working with enterprise customers. This is also an outcome of ongoing communication, with the community of database architects, software engineers, and database strategists, that helps Oracle develop a deep and complete understanding of various kinds of events that can affect availability or data integrity. Over the years, this led to the development and natural evolution of an array of availability reference architectures. MAA Reference Architectures MAA reference architectures are solutions designed to reduce downtime for planned and unplanned outages for databases. They are applicable to databases in all different tiers of high availability service levels; starting from databases that can tolerate some data loss and higher recovery time to most demanding databases used by enterprise customers which essentially require zero data loss and downtime. These validated reference architectures balance cost, efficiency and SLAs and have been implemented in tens of thousands of Oracle customer’s environments. The capabilities of the reference architectures generally carry forward from one tier to the next so that customers can upgrade to the next level of SLAs transparently. Only some of the advanced options in the Platinum Tier require customization of dependent applications. Refer to the below graphic for the different MAA tiers and their capabilities. HA Features and Deployment Choices These MAA reference architectures use a wide array of HA features, configurations and operational practices. Some of the key features help Oracle’s end customers to achieve primarily four goals – Continuous Availability – ensures database and storage are configured to tolerate failures and applications can transparently failover services to maximize availability Data protection – secures data integrity in the database and backups by automatically preventing, detecting and repairing block corruptions (both logical and physical) Active Replication – allows applications to leverage replicated sites in an Active-Active HA solution to provide disaster recovery protection and workload distribution Scale-Out – enables applications to scale out by adding database compute nodes (RAC), storage (ASM and Exadata) and databases (Active Data Guard, GoldenGate or Sharding)  Oracle’s customers can mitigate not only planned events – such as hardware elasticity, software updates, data schema changes, and major software upgrades, but also unplanned events – such as hardware failures, software crashes due to bugs and disasters. Oracle’s customers also have various deployment choices (as seen above) on which they can customize these HA solutions. The insights, recommendations, reference architectures, features, configurations, best practices, and deployment choices combine to form a holistic blueprint which allows customers to successfully achieve their high availability goals with differing degrees of built-in automation depending on the platform. Closing Oracle has applied 20+ years of experience in regard to solving some of the toughest HA problems for its customers. In reality, the work is far from done as there is always scope for improvement and enhancements. This blog will be a platform for feedback on Oracle’s HA products and also keep Oracle’s customers posted on the newest developments and features. In the meantime, for more information, please feel free to visit our Oracle High Availability and MAA home page or follow us on twitter. https://www.oracle.com/database/technologies/high-availability/maa.html https://twitter.com/OracleMAA

The availability of business-critical applications including the databases serving those applications is an important element of every IT strategy. Hence, Oracle constantly enhances the...

Maximum Availability Architecture

Welcome the New Maximum Availability Architecture (MAA) Product Management Team

Enterprises use Information Technology (IT) to gain competitive advantages, reduce operating costs, enhance communication with customers, and increase management insights into their business. Thus, enterprises become increasingly dependent on their IT infrastructure and its continuous availability.  In order to satisfy these ever-increasing high availability requirements, Oracle has invested in and provided the Oracle Maximum Availability Architecture (MAA), which consists of a set of best practices blueprints for the integrated use of Oracle’s High Availability (HA) and scalability technologies and which ensure the highest levels of availability for the Oracle Database on-premises, on Engineered Systems as well as in the Oracle Cloud. My name is Markus Michalewicz and I lead Oracle's Database High Availability & Scalability product management team. A very important component of my team's responsibility is driving and articulating MAA best practices for our worldwide customers, so that they can maximize the value of their Oracle investment. To that end, I am excited to announce three new members of our dedicated Oracle MAA PM team: Glen Hawkins, MAA Product Management Lead With over 22 years of experience, Glen has handled a wide range of both technical and business responsibilities within the world of high tech focused around cloud (PaaS/SaaS), database & application system management, data integration (ETL/Data Replication), application server development, supply chain software development and APM including product management & development, sales, marketing, and consulting. Before joining the Database MAA PM team, Glen led a variety of different teams and products within the Oracle Enterprise Manager portfolio for over 10 years including OEM Infrastructure, Database Lifecycle Management, Middleware Management, RUEI and various services within Oracle Management Cloud. Saurav Das, MAA Product Manager  Having been with Oracle since 2012, Saurav has recently joined the MAA PM team from the Oracle Fusion Applications Product Strategy team. Prior to joining Oracle, Saurav spent 6 years working as a DBA for insurance and pharmaceutical customers at Cognizant and Infosys. During his time in Fusion Applications, Saurav helped resolve customer lifecycle management issues related to onboarding, provisioning and termination. Prior to this, Saurav was part of the PaaS Cloud Operations team supporting the infrastructure for Oracle Schema Cloud Service and various PaaS services hosted on it. He has also helped with deploying various Oracle Cloud services such as Integration, Process, App Builder, and the Oracle Management Cloud. Pieter Van Puymbroeck, Oracle Data Guard Product Manager Pieter, who as a former Senior Oracle DBA has gained a wealth of experience in managing highly-critical Oracle Databases, joined the Oracle Database High Availability & Scalability Product Management team as the Product Manager for Oracle Data Guard half a year ago. Data Guard is an integral part of Oracle's Maximum Availability Architecture and so is Pieter now part of the MAA PM team. Pieter is well known in the Oracle Community and has blogged about and presented on different Oracle MAA topics in various occasions. He also had recently become an Oracle ACE Director, a status he unfortunately lost when joining Oracle. Pieter is based in Belgium and his focus is to help Oracle customers to find the best high availability & disaster recovery solution for their needs.  Together with the rest of the High Availability & Scalability product management team, Glen and the MAA PM team is your point of contact for all questions related to making your Oracle Database run smoother. http://www.oracle.com/goto/MAA Oracle MAA Part 1: When One Size Does Not Fit All

Enterprises use Information Technology (IT) to gain competitive advantages, reduce operating costs, enhance communication with customers, and increase management insights into their business. Thus,...

Maximum Availability Architecture

Using Oracle GoldenGate for an Efficient & Flexible Oracle Cloud Migration

In this blog, we’ll focus on why Oracle GoldenGate (OGG) is the perfect product to use to migrate your production databases to Oracle’s cloud platforms; including Oracle Cloud Infrastructure, DBaaS, SaaS, and the Exadata Cloud Server and Cloud at Customer solutions. Using OGG to migrate to the cloud has many benefits:  Downtime can be eliminated in most cases, or as least drastically reduced. GoldenGate can be used to load the cloud database while the production database is still online.   Once the Oracle Cloud database is in use, GoldenGate can be used to replicate back to the on-premise environment, providing the ability to fail back to it should anything unforeseen occur with the new environment.  In addition to being able to migrate Oracle databases, Oracle GoldenGate supports many different databases, and can migrate from SQL Server, Sybase, Informix, MySQL, HP Nonstop, and many flavors of DB2 to the Oracle Cloud.   Many companies are looking to move production databases to the Oracle Cloud, and there are many reasons why one would like to do this. Oracle GoldenGate can address two concerns that are constantly brought up when moving systems into any new cloud provider. First, the amount of downtime that it would take to move the data into the cloud, and second, how to failback to the on-premise environment should anything go wrong.   Using Oracle GoldenGate to reduce the downtime of migrations has been something customers have done thousands of times over the past decade. The key to the technology is the logical replication, and the ability of OGG to allow the target database in the cloud to be built, while normal activity continues on the production database. When the database in the cloud is ready for use, OGG can replicate from that new environment, back to the old one in case you need to fail-back for any reason.   Figure 1 – Migration to Oracle Cloud Oracle GoldenGate can do more than just migrate Oracle databases into the cloud (figure 1), it can also replicate between dissimilar environments, and even different databases.  Whether the source database is Oracle, SQL Server, DB2 or many other relational databases, GoldenGate can capture transaction log changes of that database to obtain details on the changes that are being performed. Then the database in the Oracle cloud is built in the background while Oracle GoldenGate is capturing the ongoing changes from the production database.  So, if it takes 3 minutes or 3 days to move the database, it does not impact the production environment.   Figure 1 (above), is a high level diagram that shows how OGG would migrate the data to the Oracle Cloud.  In the source datacenter, the Oracle GoldenGate Capture process would pull data from the transactions logs of the source database. As transactions are committed they would be written to disk.  These transactions are then sent into the Oracle cloud using AES256 bit encryption, or over HTTPS, TLS/SSL, or via VPN.  As the transactions are received in the cloud, the Oracle Goldengate Delivery process will apply them to the database, and this all occurs in real time with minimal latency.  If the source database is non-Oracle, or if there are datatype conversions or character set conversions required, the Delivery process will handle this.  If even more complex changes are needed, Oracle GoldenGate can be instructed do the transformation as it applies into the target database.  For such transformations you can use dozens of built in transformation routines or even PL/SQL to manipulate the data.   Throughout the migration, you can stop the apply process at any time and begin testing on the new cloud server. The Oracle database has many features allowing you to test the new environment, including Oracle Flashback, Oracle Real Application Testing, and others. Once testing has been completed on the new cloud server, and GoldenGate has finished applying the changes and is caught up, then you can freely switch the users from the old production server into the new environment in the cloud.        Figure 2 – Fail Back Configuration Once testing has been completed on the new cloud server, and GoldenGate has finished applying the changes and is caught up, then you can freely switch the users from the old production server into the new environment in the cloud. To enable fail back capabilities (figure 2) simply configure GoldenGate to replicate from the new cloud server, back to the old database, just in case something unforeseen does go wrong, you can quickly switch back to the old platform.  You can learn more about using GoldenGate here.

In this blog, we’ll focus on why Oracle GoldenGate (OGG) is the perfect product to use to migrate your production databases to Oracle’s cloud platforms; including Oracle Cloud Infrastructure, DBaaS,...

Maximum Availability Architecture

Database-Oriented Storage Management

This is the second blog post in a series covering Oracle’s Automatic Storage Management. This post discusses a remarkable new way of managing storage with ASM features introduced in Oracle Database 12c Release 2, and available in the latest 18c release. This new mode of storage management is called Database-Oriented Storage Management. To fully understand the value of these features, it’s worth going back and consider the original intent with the invention of ASM. The ASM mission was to simplify the deployment and management of storage for Oracle databases. Before ASM, achieving optimal performance for mission critical databases meant hand crafting storage resources for the many tablespaces in typical a database. It was common to have dedicated storage resources and individual file systems for each individual database. For example, tablespaces belonging to one database might be placed on particular storage hardware and file system, while another database had its own dedicated storage resources and file system. This strategy sort of worked for smaller environments, but as systems grew, administrators had an enormous and ongoing challenge of upgrading and changing physical storage configurations and file systems to keep pace with changing needs. It was a game of whack-a-mole and much of this work had to be done offline—meaning administrators worked late nights and weekends to minimize application impact. ASM simplified these tasks by providing a few disk groups for consolidating all the customer’s databases. The magic ingredient in ASM is wide-striping database files across all available storage resources. Additionally, through ASM, additional storage could be dynamically added to disk groups, all the while the databases remained online. The result is that database administrators have fewer things to manage and worry by through consolidating databases into a single or small number of disk groups.  This organizational model is described as Disk Group-Oriented Storage Management and is represented by the figure below. With this model all database files are stored in a shared disk group. Files for database DB1 (pictured below) are not treated any differently than files for database DB2 or DB3. Disk Group-Oriented Storage Management While disk group-oriented storage management greatly simplifies storage management, it limits the flexibility for customers wanting to consolidate databases having different requirements. To that end, we sometimes observed customers deploying many disk groups so they could separate databases with different requirements and priorities. For example, production databases might be isolated into separate disk groups from test and development databases having their own disk groups. While separating databases into distinct disk groups provides finer management granularity, it works against an objective of reducing management overhead.  Flex Disk Groups In response to how we observed customers using ASM, Oracle introduced the concept of Database-Oriented Storage Management in Oracle Database 12c Release 2. This management mode introduces a new disk group type called a Flex Disk Group. With flex disk groups, all files belonging to an individual database are collectively identified with a new ASM object called a File Group. For the purposes of this blog, we refer to both non-multitenant databases and multitenant PDBs as simply databases. A file group works with all database representations.  A file group provides a way for referring to all files in a disk group belonging to database with a single name. Additionally, there is a separate file group for each disk group when a database has files in multiple disk groups. This allows databases to span disk groups without leading to a file group name conflict. Typically, a file group will have the same name as the database. When a database is initially created, a file group with that name is also created. However, if a disk group already has an existing file group with the same name, then the existing file group is used for recording the file names associated with the database.  The figure below shows this logical representation. In each of the two flex disk groups are file groups representing databases DB1, DB2, and DB3. Database-Oriented Storage Management One important benefit of flex disk groups is that different file groups within a flex disk group can have different redundancies. Furthermore, the redundancies are changeable as needs dictate. For example, a production database can use HIGH redundancy, providing three copies, while a test database, in the same disk group, can have MIRROR redundancy, providing two copies, or even UNPROTECTED having only one copy. If required, the redundancy of a database, i.e. the file group, can be altered dynamically using the alter diskgroup command. When a file group’s redundancy is changed, ASM invokes an operation similar to an ASM rebalance operation causing the redundancy change in storage. Quota Groups From a storage management perspective, a critical requirement for consolidating databases is storage quota management. Without the means of providing quota management, a single database can consume all the space in a disk group without consideration to other databases. To address this need, flex disk groups provide a new feature called quota groups. A quota group is a logical container specifying the amount of disk group space that one or more file groups are permitted to consume. As an example, in figure below, quota group A contain file groups DB1 and DB2, whereas quota group B contains file group DB3. The databases in quota group A are then limited by the specification of available space in that quota group. Figure 3: Quota Space Management Every flex disk group has a default quota group. If a file group i.e. database, is not explicitly assigned a quota group during creation, then the file group is assigned to the default quota group for that disk group. Furthermore, the sum of space represented by all the quota groups can exceed the total physical space available. Consequently, quota groups represent a logical capacity constraint of available space.  Changing quota groups requires ASM administrative privileges. An ASM administrator can create a set of quota groups in which subsequent databases are allocated. Quota groups facilitate consolidating many databases into a single flex disk group by preventing any single database from consuming more than its fair share of storage and inhibiting the correct operation of the other databases.  In conclusion, in Oracle Database 12c Release 2 Oracle introduced a new and powerful set of storage management features collectively referred to as database-oriented storage management. These features are available in all releases following, including Oracle database 18c. Database-oriented storage management improves database consolidation, but with the ease of management that DBAs have come to expect with ASM. For more information please refer to ASM’s Administration guide. For more information on all new ASM 18c features please review the 18c ASM New Features White Paper. 

This is the second blog post in a series covering Oracle’s Automatic Storage Management. This post discusses a remarkable new way of managing storage with ASM features introduced in Oracle Database...

Maximum Availability Architecture

Replication Done Right!

Larry Carpenter, guest author -- Whether it is a cloud, an on-premises, or a hybrid environment, the question asked most often is “What is the right solution to protect my data from failures – with the least amount of downtime and data loss?” Well, one choice that comes to mind is to replicate the data to another physical site. This means all the data changes are physically copied to a remote location using some application-agnostic replication technology. You can then activate the remote site in the event of a primary failure. So far so good. Which means “replication” is the solution which fits the bill. The default replication choice is to just replicate all the changes happening to the data at the storage level. Simple enough. Right? Not really. For most common deployments using generic storage-based replication, the replication simply stops at the storage level. It never gets deeper. When the replication happens at the block level, the storage system has no clue about the type of data it is replicating. It just replicates whatever you throw at the storage, faithfully. There is both a good and a really bad aspect to this method.  Peeling back some layers you will find that you opened a big can of worms. To start with, what if you accidentally delete a file? Guess what, that deletion gets replicated and the file is lost in the remote site too. What if there is a logical corruption on your production database? That too gets replicated faithfully. What if you want to offload your read-only analytical queries or reporting to the remote site while the replication is going on? Good luck with that, as mirrored copies of data are ‘dark’. The point is, while application agnostic storage replication seems to be easy, it just adds complexity when replicating databases that store your mission critical data.  To look at it in another way, take the database server as an example. The IO changes in the memory usually traverse through various layers – from memory to HBA to a storage switch to a storage controller and finally to an actual disk. All pieces of the stack have their own firmware. A software bug or any issue with any component in the stack can result in data corruption – which the storage is not aware of and it does not care about either.  The corrupted data is replicated. I am not making this up, but there are real-world use cases.  See the following: https://theconversation.com/server-down-what-caused-the-ato-systems-to-crash-70396 https://www.theregister.co.uk/2012/01/16/tieto_vnx5700/ http://www.computerworld.com/article/2514954/disaster-recovery/american-eagle-outfitters-learns-a-painful-service-provider-lesson.html http://www.computerworld.com/article/2515163/disaster-recovery/update--virginia-s-it-outage-continues--3-agencies-still-affected.html This drives the point that database specific replication provides the best solution in detecting corruptions and offers a true data protection with reduced downtime and data loss. For Oracle databases, Oracle Data Guard and Oracle Active Data Guard offer the best data protection and data availability solution for mission-critical databases that are the life-blood of businesses, large and small.  It is important to note that Data Guard is not an island unto itself; it is one of many Oracle high availability technologies that, when integrated with each other, provide value that is greater than the sum of the parts. For example, Flashback Database features makes it possible to avoid rebuilding a failed primary database after a failover to its standby. Use of a flash recovery area automates the management of archive logs on both primary and standby databases. Data Guard is also integrated with Oracle RAC, Automatic Storage Management (ASM), Oracle Recovery Manager (RMAN) and the Zero Data Loss Recovery Appliance (ZDLRA). These integrations are not an afterthought – they are by design. Oracle has methodically inventoried the many sources of planned and unplanned downtime from countless customer environments and is following a blueprint to address all possible causes of downtime using capabilities integrated with the Oracle Database. Taken together, these capabilities define the Oracle Maximum Availability Architecture (MAA). Data Guard, an important MAA technology, operates on a simple principle: ship redo, and then apply redo. Redo includes all of the information needed by the Oracle Database to recover a database transaction. A production database, referred to as the primary database, transmits redo to one or more independent replicas referred to as standby databases. Data Guard standby databases are in a continuous state of recovery, validating and applying redo to maintain synchronization with the primary database. Data Guard will also automatically resynchronize a standby database that becomes temporarily disconnected from its primary database because of a network or standby outage. As shown in the following example, changes occurring in the production (primary) databases are captured and sent directly from the memory to the remote site, thus bypassing the whole IO stack before the data is replicated. Bypassing the IO layer not only improves performance (avoids waiting for local disk writes); it also eliminates any chance of errors being replicated to the standby sites. Secondly, a transaction may trigger many physical file writes and changes to many blocks. Storage replication replicates ALL the blocks from ALL the files – thus consuming more IO bandwidth.  Data Guard replication not only dramatically reduces the amount of data transmitted, it also verifies the blocks before transmission to make sure they are logically correct. The data also gets verified at the remote site.  In addition to protecting data and making it available after an unplanned event, a major undertaking for most organizations is to drive planned downtime to zero. Data Guard provides many ways to minimize planned downtime associated with patching and upgrades.   Oracle Data Guard also offers Snapshot standby, a method of fully leveraging a physical standby database for QA testing and other activities that require a database that is independent of the primary and is open for both read and update operations. Data Guard is the feature included with Oracle Enterprise Edition and provides all the Data Guard functionality for creating and maintaining standby databases, performing role transitions (switchover and failover) as well as using the standby for Read Write testing in Snapshot Standby mode. Read-only capability at the remote site and certain other features of Data Guard require the Active Data Guard license. In the Oracle Cloud, Active Data Guard is bundled with Oracle Database Exadata Cloud Service and the Oracle Database Cloud Service – Extreme Performance package. You can deploy up to 32 standby databases to a primary and you can cascade them. The choice is up to you. Detailing the various use cases is beyond the scope of this blog post, but we will cover them in subsequent postings. Data guard and Active Data Guard provide many choices and great flexibility. If you require zero data loss, you can replicate the database in SYNC mode. If you are replicating over long distances (say you want to replicate between San Francisco and Boston), you can replicate using ASYNC mode.  A Data Guard physical standby database licensed for the Active Data Guard option can be open for read-only queries and reporting while continuously applying updates received from the primary database. This can improve primary database performance and response time by offloading queries to an active standby database. In addition, applications can use Global Temporary tables on the standby to perform transitory writes during reporting operations. Active Data Guard can defer or eliminate new hardware and software purchases by using existing standby databases, previously idle, that are already in place. No other solution on the market offers the simplicity, transparency, and high performance of the Active Data Guard standby for maintaining a synchronized replica of a production database that is open read-only. In addition to using the standby database for read only operations, Active Data Guard also can automatically repair disk corruptions by retrieving a good copy of the affected block from the standby database and automaticly repairing the primary, all of which is transparent to the user. Other Active Data Guard features include fast incremental backups from a standby, zero data loss failovers to a remote standby, real-time cascading, automated Oracle Database rolling upgrades and the use of Application Continuity and Global Data Services. If you have a scalable Real Application Clusters (RAC) configuration on your primary, you can choose to have the same number of nodes on standby, or a reduced number of nodes as your standby. RAC and Data Guard greatly complement each other and provide high availability for both planned (rolling updates, patching, upgrades) and unplanned downtime (Disaster Recovery).  I hope this article has provided a useful overview of Data Guard and explains why this approach is superior for Oracle Database protection when compared to storage level replication.  For more details, refer to Oracle.com/goto/dataguard Oracle.com/goto/maa For demos, go to http://www.oracle.com/technetwork/database/features/availability/demonstrations-092317.html

Larry Carpenter, guest author -- Whether it is a cloud, an on-premises, or a hybrid environment, the question asked most often is “What is the right solution to protect my data from failures – with the...

Oracle Sharding - Introductory Blog Post

Oracle Database 12c Release 2 has been available on Oracle Cloud since Nov 4, 2016. Today, we have announced Oracle Database 12c Release 2 for on-premises as well. Oracle Sharding is one of the marquee features of Database 12.2. We, from the Oracle Sharding Product Development team will be publishing periodic blog posts on various topics on Sharding.  The topics will include - Oracle Sharding benefits, capabilities, methods, data modeling and application requirements, high availability architecture, replication, deployment automation, direct and proxy routing, life cycle management, benchmarking results, monitoring, patching and many others that you will find interesting.  So, what is Oracle Sharding? It is a scalability and availability feature for custom-designed OLTP applications that enablesdistribution and replication of data across a pool of discrete Oracle databases that share no hardware or software.Each database in the pool is referred to as a shard. The pool of shards is presented to an application as a singlelogical Oracle database (a sharded database or SDB). Oracle sharding distributes data across shards using horizontal partitioning. Horizontal partitioning splits a databasetable across shards so that each shard contains the table with the same columns but a different subset of rows. The number of shards and the distribution of data across them are completely transparent todatabase applications. SQL statements issued by an application do not refer to shards nor are they dependent onthe number of shards and their configuration. OLTP applications must be explicitly designed for a sharded database architecture in order to realize the benefits ofscalability and availability. This is different from an HA architecture based upon Oracle Real Application Clusters(Oracle RAC) where scalability and availability are achieved transparent to an application. Applications that use asharded database must have a well-defined data model and data distribution strategy (consistent hash, range, list orcomposite) that primarily accesses data via a sharding key. Examples of a shard key includes customer_id,account_no, country_id, etc. Oracle Sharding also supports data placement policies (rack and geo awareness) andall deployment models: on-premises and public or hybrid clouds. Transactions that require high performance must be single-shard transactions. For example, lookup and update of a customer’s billing record, lookup and update of a subscriber’sdocuments etc. There is no communication or coordination between shards for high performance transactions. Multi-shard operations and non-sharding key access are also supported.Such transactions include simple aggregations, reporting, etc. In return for these design considerations, applications that run on a sharded database architecture can achieve evenhigher levels of scalability and availability. Performance scales linearly as shards are added to the pool becauseeach shard is completely independent from other shards. Each shard typically uses local storage, flash, and memoryoffering customers a further opportunity to optimize performance at relatively low cost. The first release of OracleSharding is designed to scale up to 1,000 shards. Isolation between shards also means that outages or poorperformance of one shard does not impact the availability or performance of transactions executing at other shards. High Availability (HA) for individual shards is provided by automatic deployment of database replication. Simple,one-way Data Guard physical replication with automatic database failover is the default configuration. Active DataGuard (copies open read-only) or Oracle GoldenGate (bi-directional replication with all copies open read-write) mayalso be automatically deployed. Shards may be replicated within and across data centers. Replication is data-centerand rack aware using data placement policies supported by Oracle Sharding. Optionally, Oracle RAC may bemanually configured to provide Shard HA. Shards are front-ended by a set of replicated listeners called Shard Directors that act as routers. Oracle clients(JDBC, OCI, and ODP.net) and the Oracle Universal Connection Pool (UCP) have been enhanced to recognizeshard keys specified in a connection string and to insure availability by controlling the maximum number ofconnections allowed per shard. A shard routing cache in the connection layer (populated by the initial request to ashard) is used to route requests directly to the shard where the data resides for optimal runtime performance. Theshard routing cache is automatically refreshed if there is any change made to the sharded database (e.g. automaticrebalancing or add/delete of shards). In  this post, we have introduced you to Oracle Sharding at a high level. In the next post, we will look at the benefits of Oracle Sharding. 

Oracle Database 12c Release 2 has been available on Oracle Cloud since Nov 4, 2016. Today, we have announced Oracle Database 12c Release 2 foron-premises as well. Oracle Sharding is one of the marquee...

Oracle Global Data Services (GDS): Part 2 – Load Balancing Use Cases

Oracle Database 12c Global Data Services galvanizes the asset utilization of replicated database resources. It allows connect-time and run-time load balancing, routing and service failover across replicated databases situated in any data center in any geographical region. With GDS, customers can now achieve these capabilities without the need to either integrate their High Availability stack with hardware load balancers or write custom homegrown connection managers. And remember that GDS comes with the Active Data Guard license and is also available to Oracle GoldenGate customers at no additional charge as well. In this blog we follow up on the introduction to GDS from Part 1 and walk through a couple of use cases for workload balancing: 1. The first use case (shown below) is load balancing for reader farms: Imagine a scenario where GDS is enabled for an Active Data Guard or GoldenGate reader farm with physical standby replicas located in both local and remote data centers. Let’s say a Read Write global service for Order Entry runs on the Primary database and the Read Only Global Services for Reporting run on the reader farm. Using GDS, the client connections are automatically load balanced among the Read Only global services running on the reader farm (across data centers). This capability improves resource utilization, performance and scalability with Read Only workload balancing on Active Data Guard or Oracle GoldenGate reader farms. 2. Another use case (as shown below) is load balancing of Read Write services among multi-masters within and across regions: Let’s take a scenario of active/active databases using Oracle GoldenGate in a GDS configuration. In this case the Read Write and Read Only global services are both configured to run on each of the masters. For this scenario, GDS automatically balances the workloads for Read-Only and Read-Write Services in the GoldenGate multi-master configuration. This wraps up our exploration of key Oracle Database 12c GDS load balancing use cases. In the next installment of the GDS blog series (Part 3), we will take a look at few more interesting use cases where GDS can help in mitigating planned and unplanned downtime for applications.

Oracle Database 12c Global Data Services galvanizes the asset utilization of replicated database resources. It allows connect-time and run-time load balancing, routing and service failover across...

Oracle GoldenGate Active-Active Part 3

Here is the last (3 of 3) blog posting on Active-Activereplication for OGG, and my post this time will cover the actual usage of theCDR resolution routines and examples of how they are built. Part 1 is located here, and part 2, here. I’ll cover 2 different use cases. The firstwill be timestamp based and the second will be trusted source. As a refresher, timestamp is going to havethe record with the lowest timestamp win (i.e. whichever record came in first)and the trusted source is going to assume that one system always takesprecedence over another system. For these examples, I’m going to use macros, which makes itso much easier and cleaner to read, and it dramatically reduces the amount oftyping I have to do. My macro file, will be called cdr_macros.prm. I normally wouldn’t want to mix trustedsource and timestamp in the same environment, but I’m doing it here just as anexample. In this macro file, I haveincluded every CDR function that I want to use, on all systems, for bothextracts and replicats. This way if I need to make a change to my CDR rules, Ican make the change in the macro file and it effects the entire server. Just make sure to make the same change toeach OGG environment. Inside eachmacro, there is a short description of what the command is going to be used for. *********************************************************************************************************MACRO#ExtractCdrDate BEGIN COMMENT This is used to ensurethat the key columns + the UPDATE_TIME COMMENT column is always broughtover as part of the trail file record GETBEFORECOLS (ONUPDATE KEYINCLUDING(UPDATE_TIME), ON DELETE KEYINCLUDING(UPDATE_TIME)) ,FETCHCOLS (*) END; COMMENT END TO ExtractCdrDate MACRO#ExtractCdrAllColunms BEGIN COMMENT This is used when I wantto ensure that ALL columns are in the COMMENT trail file for eachrecord. It has a higher overhead, so be COMMENT careful on howfrequently it is used. GETBEFORECOLS (ONUPDATE ALL, ON DELETE ALL), FETCHCOLS(*) END; COMMENT END TOExtractCompAllColunms MACRO#DateCompare BEGIN COMMENT This is used when doinga timestamp resolution where the lowest COMMENT timestamp wins. COMPARECOLS (ONUPDATE KEYINCLUDING (UPDATE_TIME),ON DELETE KEYINCLUDING (UPDATE_TIME)), RESOLVECONFLICT(UPDATEROWEXISTS,(mon_resolution_method, USEMIN (UPDATE_TIME),COLS(*))(DEFAULT, DISCARD)), RESOLVECONFLICT(INSERTROWEXISTS, (DEFAULT, USEMIN(UPDATE_TIME) , COLS(*))), RESOLVECONFLICT(DELETEROWEXISTS, (DEFAULT, OVERWRITE)), RESOLVECONFLICT(UPDATEROWMISSING, (DEFAULT, OVERWRITE)), RESOLVECONFLICT(DELETEROWMISSING, (DEFAULT, DISCARD)) END; COMMENT END TO DateCompare MACRO#FromTrusted BEGIN COMMENT This resolution is usedon the non-trusted environment to COMMENT allow operations fromthe trusted server to overwrite the existing COMMENT data when there is aconflict. COMPARECOLS (ONUPDATE ALL,ON DELETE ALL), RESOLVECONFLICT(UPDATEROWEXISTS, (DEFAULT, OVERWRITE)), RESOLVECONFLICT(INSERTROWEXISTS, (DEFAULT, OVERWRITE)), RESOLVECONFLICT(DELETEROWEXISTS, (DEFAULT, OVERWRITE)), RESOLVECONFLICT(UPDATEROWMISSING, (DEFAULT, OVERWRITE)) , RESOLVECONFLICT(DELETEROWMISSING, (DEFAULT, DISCARD)) END; COMMENT END TO FromTrusted MACRO#FromNoNTrusted BEGIN COMMENT This resolution is usedto discard the record any time there is a COMMENT conflict, when therecord comes from the non-trusted server COMPARECOLS (ONUPDATE ALL,ON DELETE ALL), RESOLVECONFLICT(UPDATEROWEXISTS, (DEFAULT, DISCARD)) , RESOLVECONFLICT(INSERTROWEXISTS, (DEFAULT, DISCARD)) , RESOLVECONFLICT(DELETEROWEXISTS, (DEFAULT, DISCARD)) , RESOLVECONFLICT(UPDATEROWMISSING, (DEFAULT, DISCARD)) , RESOLVECONFLICT(DELETEROWMISSING, (DEFAULT, DISCARD)) END; COMMENT END TO FromNonTrusted ********************************************************************************************************* Now that all the hard work is done, and I've defined my rules for both Extract and Replicat in the Macro file, I can easily add those in. In the Extract I simply modify my TABLE statements toinclude the additional macro to tell OGG which columns to write to the trailfile.  In this case, I'm using the #ExtractCdr macros from the first part of the file to instruct OGG which columns to include in the trail file.  This ensures that the resolution routines always have the data they need to perform the specified resolution.  TABLE DEMO.VCRYPT_ACCOUNTS , #ExtractCdrDate(); TABLE DEMO.VCRYPT_ACCOUNTS_HIST ,#ExtractCdrAllColunms(); The changes to the MAP statements in the Replicatparameter file itself is extremely elegant and simple. In the Replicat, the changes are also verystraightforward, by simply adding the macros that were defined above.  MAP DEMO. VCRYPT_ACCOUNTS, TARGET DEMO.VCRYPT_ACCOUNTS, #DateCompare(); MAP DEMO.VCRYPT_ACCOUNTS_HIST, TARGET DEMO.VCRYPT_ACCOUNTS_HIST , #FromTrusted(); Using the macro method, it’s easy to identify whichobjects are using each conflict detection and resolution routine, and if youneed to make a change, you can make it once in the macro file and it willaffect every parameter file. The methodologies and best practices in the lastfew of my blog postings on Active-Active Replication in GoldenGate and the whitepaper here: http://www.oracle.com/us/products/middleware/data-integration/golden-gate-active-active-1887519.pdf  should help implement robust Active-Active replication..

Here is the last (3 of 3) blog posting on Active-Active replication for OGG, and my post this time will cover the actual usage of theCDR resolution routines and examples of how they are built. Part...

Oracle MAA Part 4: Gold HA Reference Architecture

Welcome to the fourth installment in a series of blog posts describing MAA Best Practices that define four standard reference architectures for HA and data protection: BRONZE, SILVER, GOLD and PLATINUM. The objective of each reference architecture is to deploy an optimal set of Oracle HA capabilities that reliably achieve a given service level (SLA) at the lowest cost. This article provides details for the Gold reference architecture. Gold substantially raises the service level for business critical applications that cannot accept vulnerability to single points-of-failure. Gold builds upon Silver by using database replication technology to eliminate single point of failure and provide a much higher level of data protection and HA from all types of unplanned and planned outages. An overview of Gold is provided in the figure below. Gold delivers substantially enhanced service levels using the following capabilities: Oracle Active Data Guard replaces backups used by Bronze and Silver reference architectures as the first line of defense against an unrecoverable outage of the production database. Recovery time (RTO) for outages caused by data corruption, database failure, cluster failure, and site failures is reduced to seconds or minutes with an accompanying data loss exposure (RPO) of zero or near zero depending upon configuration. While backups are no longer used for availability, they are still included in the Gold reference architecture for archival purposes and as an additional level of data protection. Active Data Guard uses simple physical replication to maintain one or more synchronized copies (standby databases) of the production database (primary database). If the primary becomes unavailable for any reason, production is quickly failed over to the standby and availability is restored. Active Data Guard offers a unique set of capabilities for availability and Oracle data protection that exceed other alternatives based upon storage remote-mirroring or other methods of database replication. These capabilities include: Choice of zero data loss (sync) or near-zero data loss (async) disaster protection. Direct transmission of database changes (redo) directly from the log buffer of the primary database providing strong isolation from lower layer hardware and software faults. Use of intimate knowledge of Oracle data block and redo structures to perform continuous Oracle data validation at the standby, further isolating the standby database from corruptions that can impact a primary database. Native support for all Oracle data types and features combined with high performance capable of supporting all applications and workloads. Manual or automatic failover to quickly transfer production to the standby database should the primary become unavailable for any reason. Integrated application failover to quickly transition application connections to the new primary database after a failover has occurred. Database rolling maintenance to reduce downtime and risk during planned maintenance. High return on investment by offloading read-only workloads and backups to an Active Data Guard standby while it is being synchronized by the primary database. Oracle GoldenGate logical replication is also included in the Gold reference architecture, either to complement Active Data Guard when performing planned maintenance or to use as an alternative replication mechanism for maintaining a synchronized copy (target database) of a production database (source database). GoldenGate reads changes from disk at a source database, transforms the data into a platform independent file format, transmits the file to a target database, then transforms the data into SQL (updates, inserts, and deletes) native to a target database that is open read-write. The target database contains the same data, but is a different physical database from the source (for example, backups are not interchangeable). This enables GoldenGate to easily support heterogeneous environments across different hardware platforms and relational database management systems. This flexibility makes it ideal for a wide range of planned maintenance and other replication requirements.  GoldenGate can: Efficiently replicate subsets of a source database to distribute data to other target databases. It can also be used to consolidate data into a single target database (for example, an Operational Data Store) from multiple source databases. This function of GoldenGate is relevant to each of the four MAA reference architectures and is complementary to the use of Active Data Guard. Perform maintenance and migrations in a rolling manner for use-cases that cannot be supported using Data Guard replication. For example, Oracle GoldenGate enables replication from a source database running on a big-endian platform to a target database running on a little-endian platform. This enables cross-platform migration with the additional advantage of being able to reverse replication for fast fallback to the prior version after cutover. When used in this fashion GoldenGate is complementary to Active Data Guard. Maintain a complete replica of a source database for high availability or disaster protection that is ready for immediate failover should the source database become unavailable. GoldenGate would be an alternative to Active Data Guard when used for this purpose. The primary use-case where you would use GoldenGate instead of Active Data Guard for complete database replication is when there is a requirement for the target database to be open read-write at all times (remember an Active Data Guard standby is open read-only). Note that there are several trade-offs that must be accepted when using logical replication in place of Active Data Guard for data protection and availability Logical replication has additional pre-requisites and operational complexity. Logical replication is inherently an asynchronous process and thus not able to provide zero data loss protection. Only Active Data Guard can provide zero data loss protection It is obvious that a logical copy is not a physical replica of the source database. Rather than offload backups to a standby you must backup both source and target. Logical replication also can not support advanced data protection features that come with Data Guard physical replication: lost-write detection and automatic block repair. Oracle Site Guard is optional in the Gold tier but is useful to reduce administrative overhead and the potential for human error. Site Guard enables administrators to automate the orchestration of switchover (a planned event) and failover (in response to an unplanned outage) of their complete Oracle environment - multiple databases and applications - between a production site and a remote disaster recovery site. Oracle Site Guard is included with the Oracle Enterprise Manager Life-Cycle Management Pack. Oracle Site Guard offers the following benefits: Reduction of errors due to prepared response to site failure. Recovery strategies are mapped out, tested, and rehearsed in prepared responses within the application. Once an administrator initiates a Site Guard operation for disaster recovery, human intervention is not required. Coordination across multiple applications, databases, and various replication technologies. Oracle Site Guard automatically handles dependencies between different targets while starting or stopping a site. Site Guard integrates with Oracle Active Data Guard to coordinate multiple concurrent database failovers. Site Guard also integrates with storage remote mirroring that may be used for data that resides outside of the Oracle Database. Faster recovery time. Oracle Site Guard automation minimizes time spent in the manual coordination of recovery activities.  Gold = Better HA and Data Protection Gold builds upon Silver by addressing all fault domains. Even in the worst cases of a complete cluster or site outage, database service can be resumed within seconds or minutes of a failure occurring. Gold eliminates the downtime and potential uncertainty of a restore from backup. Gold also eliminates data loss by protecting every database transaction in real-time. Database-aware replication is key to achieving Gold service levels. It is network efficient. It enforces a high degree of isolation between replicated copies for optimal data protection and HA. It enables fast failover to an already synchronized and running copy of production. It achieves high ROI by enabling workloads to be offloaded from production to the replicated copy. As important as these tangible benefits are, there is the equally significant benefit of reducing risk. By running workloads at the replicated copy you are performing continuous application-level validation that it is ready for production when needed. So what is left to address after Gold? There is a class of application where it is desirable to mask the effect of an outage from the end-user. Imagine you are the customer in the process of making a purchase online - you don't want to be left in an uncertain state should there be a database outage. Did my purchase go through, do I resubmit my payment, and if I do will I be charged twice? From a data loss perspective, what if the DR site is 100's or 1000's of miles away. How do you guarantee zero data loss protection? Finally, this same class of application frequently can not  tolerate downtime for planned maintenance - how can you shrink maintenance windows to zero so that applications can be available at all times? To learn how to address this set of requirements stay tuned for the final installment in this MAA series when we cover the Platinum reference architecture.   

Welcome to the fourth installment in a series of blog posts describing MAA Best Practices that define four standard reference architectures for HA and data protection: BRONZE, SILVER, GOLD...

Oracle Database 12c Global Data Services: Part 1 – Automated Workload Management for Replicated Databases

Introduction GlobalData Services is a key offering within Oracle’s Maximum AvailabilityArchitecture. It’s really a must-have for organizations that are using Oraclehigh availability technologies such as Active Data Guard or Oracle GoldenGateto replicate data across multiple databases. With automated workload balancingand service failover capabilities, GDS improves performance, availability,scalability, and manageability for all databases that are replicated within adata center and across the globe. And GDS boosts resource utilization, whichreally improves the ROI of Active Data Guard and GoldenGate investments. Itdoes this in an integrated, automated way that no other technology can match.Plus it’s included with the Active Data Guard license - and since GoldenGatecustomers have the right to use Active Data Guard, it’s available to them at noadditional charge as well. Customer Challenges Enterprisestypically deploy replication technologies for various business requirements –high availability and disaster recovery, content localization and caching,scalability, performance optimization for local clients or for compliance inaccordance with local laws. Oracle customersuse Active Data Guard and Oracle GoldenGate to address all of these businessrequirements. They use Active Data Guard to distribute their Read-Only workloadand GoldenGate to distribute not only Read workloads but also Read Writeworkloads across their replicated databases. Howeverwhen you’re trying to optimize workload management across multiple databasereplicas, you run into certain challenges that simply extend beyond thecapabilities of replication technology. That’s because customers are unable tomanage replicated databases with a unified framework and instead have to dealwith database silos from an application and DBA perspective. Let’s look at a couple of themain problems with database silos. The first is under-utilized resources – for example, when onereplica cannot be leveraged to shoulder the workload of another over-utilizeddatabase. This leads to suboptimal resource utilization, which can adverselyaffect performance, availability and of course cost. The other problem with silos is the inability to automatically failover a service across databases - let’s say a production application workloadis running against a particular replica. If that replica goes down due to anunplanned event, customers don’t have a mechanism that automatically andtransparently relocates the Service to another available replica. When areplica fails that can lead to application outages. Untilthe introduction of Oracle Global Data Services (GDS), there really wasn’t away for enterprises to achieve Service Failover and load balancing acrossreplicas out of the Oracle Stack. To address this, some customers have chosento compile their own homegrown connection managers and others have integratedtheir HA stack with hardware load balancers. But these solutions still don’taddress all of the issues: Manual load balancing usinghomegrown connection managers, for example, incurs huge development costs andyet cannot optimize performance and availability for replicated systems Special purpose network loadbalancers can help but they introduce additional cost and complexity – and theystill can’t offer database service failover and centralized workload management Global Data Services Overview GlobalData Services delivers automated workload management, which addresses all ofthese key pain points. It eliminates theneed for custom connection managers and load balancers for database workloads. With anewly created concept called Global Service,Oracle Global Data Services extends the familiar Oracle RAC-style connect-timeand run-time load balancing, service failover and management capabilitiesbeyond a single clustered database. Capabilities that were so far applicableonly to a single database can now be applied to a set of replicated databasesthat may reside within or across datacenters. Customers can achieve thesecapabilities by simply setting the pertinent attributes of the Global Service. GDS sits between the application tier and thedatabase tiers of the stack. It orchestrates the Service high availability,Service level load balancing and routing. Global Services run on the databasesbut are managed by GDS. GDS algorithms take into account DB instance load, networklatency between data centers and the workload management policies (regionaffinity, load balancing goals, DB cardinality, DB role, replication lagtolerance) that the customers can configure. These workload management policiesare enabled via the attributes of a given Global Service. What are the key capabilities that are really unique to GDS? 1. Forperformance optimization, there’s region-based workload routing, whichautomatically routes workloads to the database closest to the clients. Forexample, what if the customer has a requirement that all theclients/applications closer to the North American data center need to be routedto the database in the North American data center? Likewise, European clientsmay need to be routed to the European database. GDS addresses this problem bymanaging this workload routing automatically. 2. Inaddition, GDS provides connect time load balancing and supports run time loadbalancing – another key performance advantage. 3. Forhigher application availability, GDS enables inter-database service failover.If a replica goes down as a result of a planned or unplanned event, GDS failsover the service to another replica 4. And italso offers role based global services. GDS will make sure that the global servicesare always started on those databases whose database role matches the rolespecified for the service. For example, if Data Guard undergoes roletransitions, the global services are relocated accordingly, maintainingavailability requirements. 5. For improveddata quality, there’s also replication lag-based workload routing. Thiscapability routes read workloads to a Data Guard standby whose replication lagis within a customer-specified threshold that’s based on business needs 6. Bymanaging all of the resources of the replicas efficiently, customers are ableto maximize their ROI because there are no longer any under-utilized servers Thiswraps up the introductory blog post on Oracle Database 12c GDS. We looked at the challenges of workloadmanagement for replicated databases and how GDS addresses those challenges. Inthe next blog, we will review some of the key capabilities of GDS and thetangible business benefits.

Introduction Global Data Services is a key offering within Oracle’s Maximum Availability Architecture. It’s really a must-have for organizations that are using Oraclehigh availability technologies such...

Oracle Database High Availability at Oracle OpenWorld 2014 - Download Your Schedule Now !

Learn how to maximize High Availability and Data Protection for your Oracle Databases at Oracle OpenWorld 2014. This year's conference offers 15 Oracle Database High Availability sessions that cover the entire spectrum of the Oracle Maximum Availability Architecture (MAA) - from database-integrated backup and recovery to zero data loss disaster protection to zero downtime maintenance. Discover how Oracle MAA and the latest HA enhancements with Oracle Database 12c can help you meet your availability service level objectives, build a resilient foundation for Oracle Multitenant and deliver Database-as-a-Service for Cloud deployments. OpenWorld 2014 also offers opportunities for you to deepen your knowledge of Oracle Database HA technologies through 5 demo stations. You'll gain one-on-one access to Oracle Database HA experts who can provide best practices guidance and walk you through real-world scenarios that illustrate the full capabilities of Oracle Maximum Availability Architecture. Don't miss this excellent opportunity to learn how to deliver the right levels of availability and data protection for Oracle Databases that meet your specific business needs. Download the Focus on Oracle Database High Availability Document to keep track of all the HA sessions and demos at Oracle OpenWorld 2014.

Learn how to maximize High Availability and Data Protection for your Oracle Databases at Oracle OpenWorld 2014. This year's conference offers 15 Oracle Database High Availability sessions that cover...

Oracle MAA Part 3: Silver HA Reference Architecture

This is the third installment in a series of blog posts describing Oracle Maximum Availability Architecture (Oracle MAA) best practices that define four standard reference architectures for data protection and high availability: BRONZE, SILVER, GOLD and PLATINUM.  Each reference architecture uses an optimal set of Oracle HA capabilities that reliably achieve a given service level (SLA) at the lowest cost. This article provides details for the Silver reference architecture. Silver builds upon Bronze by adding clustering technology - either Oracle RAC or RAC One Node. This enables automatic failover if there is an unrecoverable outage of a database instance or a complete failure of the server on which it runs. Oracle RAC also delivers substantial benefit by eliminating downtime for many types of planned maintenance. It does this by performing maintenance in a rolling manner across Oracle RAC nodes so that services remain available at all times. As in the case of Bronze, RMAN provides database-optimized backups to protect data and restore availability should an outage prevent the cluster from being able to restart. An overview of Silver is provided in the figure below. Oracle RAC Oracle RAC is an active-active clustering solution that provides instantaneous failover should there be an outage of a database instance or of the server on which it runs. A quick review of how Oracle RAC functions helps to understand its many benefits. There are two major components to any Oracle RAC cluster: Oracle Database instances and the Oracle Database itself. A database instance is defined as a set of server processes and memory structures running on a single node (or server) which make a particular database available to clients. The database is a particular set of shared files (data files, index files, control files, and initialization files) that reside on persistent storage, and together can be opened and used to read and write data. Oracle RAC uses an active-active architecture that enables multiple database instances, each running on different nodes, to simultaneously read and write to the same database. Oracle RAC is the MAA best practice for server HA and provides a number of advantages: Improved HA: If a server or database instance fails, connections to surviving instances are not affected; connections to the failed instance are quickly failed over to surviving instances that are already running and open on other servers in the cluster. Scalability: Oracle RAC is ideal for applications with high workloads or consolidated environments where scalability and the ability to dynamically add or reprioritize capacity are required. Additional servers, database instances, and database services can be provisioned online. The ability to easily distribute workload across the cluster makes Oracle RAC an ideal solution when Oracle Multitenant is used for database consolidation. Reliable performance: Oracle Quality of Service (QoS) can be used to allocate capacity for high priority database services to deliver consistent high performance in database consolidated environments. Capacity can be dynamically shifted between workloads to quickly respond to changing requirements. HA during planned maintenance: High availability is maintained by implementing changes in a rolling manner across Oracle RAC nodes. This includes hardware, OS, or network maintenance that requires a server to be taken offline; software maintenance to patch the Oracle Grid Infrastructure or database; or if a database instance needs to be moved to another server to increase capacity or balance the workload. Oracle RAC One Node RAC One Node provides an alternative to Oracle RAC when scalability and instant failover are not required. RAC One Node license is one-half the price of Oracle RAC, providing a lower cost option when an RTO of minutes is sufficient for server outages. RAC One Node is an active-passive failover technology. During normal operation it only allows a single database instance to be open at one time. If the server hosting the open instance fails, RAC One Node automatically starts a new database instance on a second node to quickly resume service. RAC One Node provides several advantages over alternative active-passive clustering technologies. It automatically responds to both database instance and server failures. Oracle Database HA Services, Grid Infrastructure, and database listeners are always running on the second node. At failover time only the database instance and database services need to start, reducing the time required to resume service, and enabling service to resume in minutes.  It provides the same advantages for planned maintenance as Oracle RAC. RAC One Node allows two active database instances during periods of planned maintenance to allow graceful migration of users from one node to another with zero downtime; database services remain available to users at all times. Silver = Better HA Silver represents a significant increase in HA compared to the Bronze reference architecture and is very well suited to a broad range of application requirements.  Oracle RAC immediately responds to an instance or server outage and reconnects users to surviving instances. While Silver has substantial benefits, it is only one step above Bronze - there is still a much broader fault domain beyond instance or server failure. This includes events that can impact the availability of an entire cluster - data corruptions, storage array failures, bugs, human error, site outages, etc. There is also a class of application where the impact of outages must be completely transparent to the user. Stay tuned for future installments when we address this expanded set of requirements with the Gold and Platinum reference architectures.

This is the third installment in a series of blog posts describing Oracle Maximum Availability Architecture (Oracle MAA) best practices thatdefine four standard reference architectures for data...

Backup to Oracle Cloud - Introduction to Oracle Database Backup Service

Backup and recovery of application data is the fundamental protection strategy for maintaining enterprise business continuity. I would be extremely surprised to hear of any enterprise that has never backed up its mission critical or business critical data. Any such a scenario is basically a ticking time bomb. Depending on the specific RTO (recovery time objective) and RPO (recovery point objective) for each database, different Oracle Maximum Availability Architecture (MAA) strategies can be deployed by the enterprise. From the backup and recovery perspective, the following are general practice guidelines that customers typically follow to address RTO and RPO requirements:•    Local Fast Recovery Area (FRA): Typically stores backups for up to 7 days•    External Storage (NAS): up to 30 days•    Tape media (if available): 1 to 6 months•    Tape vaulting (offsite storage): months to yearsIn addition to the above backup storage tiers, sophisticated organizations take additional precautions to avoid single site failure and to reduce load from production resources. MAA best practices, for example, recommend that copies of the backup data be stored in an offsite location. But consider the following complications:Other than tape vaulting, there is no alternative that enables complete physical offsite storage for short- and long- term backups.  Many IT shops don’t have the tape infrastructure required for long term archival. Hence they are restricted to using local disk backups or expensive backup appliances. Organizations with multiple databases that have various RTO/RPO requirements may have certain 2nd or 3rd tier databases that never get backed up. Due to compliance requirements, customers now have to store backups for many years. Storing large volumes of data on local disks can become prohibitively expensive. Many enterprises don’t have the CAPEX budget in place to implement these additional data protection steps.  And almost ALL enterprise want a solution that’s operational right away. So what’s the answer? Introducing Oracle Database Backup Service - A Cloud Storage Solution for your Oracle Database Backups Oracle Database Backup Service addresses the above needs by providing a low cost alternative for storing backups in an offsite location.  It is an Oracle Public Cloud object-based storage offering that enables you to store your on-premises or cloud-deployed database backups. You can use Oracle Database Backup Service as the Primary backup for 2nd or 3rd tier databases, or use the cloud backup as a secondary copy for long term archival requirements.If you are familiar with Oracle Recovery Manager (RMAN), it should take only a few minutes for you to start backing up your database to the cloud. Here’s all you need to do:1. Subscribe to the Oracle Database Backup Service. This offering is available as a month-to-month or longer-term subscription (1,2, or 3 years).  Note that the prescription model is subject to change. 2. Download Oracle Database Cloud Backup Module from OTN site. Unzip opc_installer.zip file, which has a detailed README about the steps to execute. 3. Run the installation procedure. Provide your Oracle Public Cloud credentials, which are securely stored in an Oracle wallet with your database. The installation script also configures certain configuration files. 4. Configure RMAN By using CONFIGURE (persistent), SET or even BACKUP commands, you can instruct RMAN to use the backup service module for backups. 5. Start enabling your backups and restores. Use regular RMAN BACKUP or RESTORE commands for backups. All operations involving BACKUP SET mode of backups/recovery are supported. You can also perform backups from FRA and other disk-based backup locations to the cloud. How does this process work? The Oracle Database Cloud Backup Module (ODCBM) receives backup blocks from RMAN, then chunks them into 20MB blocks and transmits to Oracle cloud. During the restore process, the same module retrieves data from the Cloud. The Oracle Database Cloud Backup Module is configured as SBT (Tape). What are some unique features that Oracle Database Backup Service offers ?To name a few: End-to-end security (RMAN encryption is performed at backup time and data is securely transmitted over WAN).  And by the way, you don’t have to purchase the Advanced Security Option (ASO) to use RMAN encryption. You can use Password based, Transparent Data Encryption (TDE), or dual-mode. Encryption is supported for EE, SE, and SE1 editions. Backups can be compressed to reduce the volume of data being transmitted. For Oracle Database 10gR2 and 11gR1, you can use BASIC compression. For 11gR2 and above, you can choose from LOW, MEDIUM, BASIC, and HIGH. There’s NO ADDITIONAL COST other than the subscription to Oracle Database Backup Service. You can use any number of RMAN channels to parallelize your backup and restore operations. There are NO new commands to learn. Use the familiar RMAN commands. Because a large portfolio of applications are already available in Oracle Cloud, you can use your backup in the cloud to spin a new instance or use it for your other PaaS or SaaS requirements. So what are you waiting for?  Do you want to check the network throughput before you sign up? Start with a no-obligation one month trial by clicking “Try Now” from https://cloud.oracle.com/database_backup. For more information, Documentation White Paper DataSheet README In future blogs on Oracle Database Backup Service, I will discuss some best practices when deploying cloud-based backups.

Backup and recovery of application data is the fundamental protection strategy for maintaining enterprise business continuity. I would be extremely surprised to hear of any enterprise that has never...

Oracle MAA Part 2: Bronze HA Reference Architecture

In the first installment of this series we discussed how one size does not fit all when it comes to HA architecture. We described Oracle Maximum Availability Architecture (Oracle MAA) best practices that define four standard reference architectures for data protection and high availability: BRONZE, SILVER, GOLD and PLATINUM.  Each reference architecture uses an optimal set of Oracle HA capabilities that reliably achieve a given service level (SLA) at the lowest cost. As you progress from one level to the next, each architecture expands upon the one that preceded it in order to handle an expanded fault domain and deliver a high level of service. This article provides details for the Bronze reference architecture. Bronze is appropriate for databases where simple restart or restore from backup is ‘HA enough’. It uses single instance Oracle database (no cluster) to provide a very basic level of HA and data protection in exchange for reduced cost and implementation complexity. An overview is provided in the figure below. When a database instance or the server on which it is running fails, the recovery time objective (RTO) is a function of how quickly the database can be restarted and resume service. If a database is unrecoverable the RTO becomes a function of how quickly a backup can be restored. In a worst case scenario of a complete site outage additional time is required to provision new systems and perform these tasks at a secondary location, in some cases this can take days. The potential data loss if there is an unrecoverable outage (recovery point objective or RPO), is equal to the data generated since the last backup was taken. Copies of database backups are retained locally and at a remote location or on the Cloud for the dual purpose of archival and DR should a disaster strike the primary data center. Major components of the Bronze reference architecture and the service levels achieved include: Oracle Database HA and Data Protection Oracle Restart automatically restarts the database, the listener, and other Oracle components after a hardware or software failure or whenever a database host computer restarts. Oracle corruption protection checks for physical corruption and logical intra-block corruptions. In-memory corruptions are detected and prevented from being written to disk and in many cases can be repaired automatically. For more details see Preventing, Detecting, and Repairing Block Corruption. Automatic Storage Management (ASM) is an Oracle-integrated file system and volume manager that includes local mirroring to protect against disk failure. Oracle Flashback Technologies provide fast error correction at a level of granularity that is appropriate to repair an individual transaction, a table, or the full database. Oracle Recovery Manager (RMAN) enables low-cost, reliable backup and recovery optimized for the Oracle Database. Online maintenance includes online redefinition and reorganization for database maintenance, online file movement, and online patching.  Database Consolidation Databases deployed using Bronze often include development and test databases and databases supporting smaller work group and departmental applications that are often the first candidates for database consolidation. Oracle Multitenant is the MAA best practice for database consolidation from Oracle Database 12c onward.  Life Cycle Management Oracle Enterprise Manager Cloud Control enables self service deployment of IT resources for business users along with resource pooling models that cater to various multitenant architectures. It supports Database as a Service (DBaaS), a paradigm in which end users (Database Administrators, Application Developers, Quality Assurance Engineers, Project Leads, and so on) can request database services, consume it for the lifetime of the project, and then have them automatically de-provisioned and returned to the resource pool. Oracle Engineered Systems Oracle Engineered Systems are an efficient deployment option for database consolidation and DBaaS. Oracle Engineered Systems reduce lifecycle cost by standardizing on a pre-integrated and optimized platform for Oracle Database that is completely supported by Oracle. Bronze Summary:  Data Protection, RTO, and RPO Table 1 summarizes the data protection capabilities and service levels provided by the Bronze tier. The first column indicates when validations for physical and logical corruption are performed: Manual checks are initiated by the administrator or at regular intervals by a scheduled job. Runtime checks are automatically executed on a continuous basis by background processes while the database is open. Background checks are run on a regularly scheduled interval, but only during periods when resources would otherwise be idle. Each check is unique to Oracle Database using specific knowledge of Oracle data block and redo structures. Table 1: Bronze - Data Protection Type Capability Physical Block Corruption Logical Block Corruption Manual Dbverify, Analyze Physical block checks Logical checks for intra-block and inter-object consistency Manual RMAN Physical block checks during backup and restore Intra-block logical checks Runtime Database In-memory block and redo checksum In-memory intra block logical checks Runtime ASM Automatic corruption detection and repair using local extent pairs Runtime Exadata HARD checks on write HARD checks on write Background Exadata Automatic HARD Disk Scrub and Repair Note that HARD validation and the Automatic Hard Disk Scrub and Repair (the last two rows of Table 1) are unique to Exadata storage. HARD validation ensures that Oracle Database does not write physically corrupt blocks to disk. Automatic Hard Disk Scrub and Repair inspects and repairs hard disks with damaged or worn out disk sectors (cluster of storage) or other physical or logical defects periodically when there are idle resources. Table 2 summarizes RTO and RPO for the Bronze tier for various unplanned and planned outages. Table 2: Bronze - Recovery Time and Data Loss Potential Type  Event  Downtime Data Loss Potential Unplanned  Database instance failure  Minutes  Zero Unplanned  Recoverable server failure Minutes to an hour  Zero Unplanned Data corruptions, unrecoverable server failure, database failures, or site failures Hours to days Since last backup Planned Online file move, online reorganization and redefinition, online patching Zero  Zero Planned Hardware or operating system maintenance and database patches that cannot be performed online Minutes to hours Zero Planned Database upgrades: patch sets and full database releases Minutes to hours Zero Planned Platform migrations Hours to a day Zero Planned Application upgrades that modify back-end database objects Hours to days Zero So when would you use bronze?  Bronze is useful when users can wait for a backup to be restored if there is an unrecoverable outage and accept that any data generated since the last backup was taken will be lost. The Oracle Database has a number of included capabilities described above that provide unique levels of data protection and availability for a low-cost environment based upon the Bronze reference architecture. But what if I can't accept this level of downtime or data loss potential - well that is where the Silver, Gold and Platinum reference architectures come in. Bronze is only a starting point that establishes the foundation for subsequent HA reference architectures that provide higher quality of service. Stay tuned for future blog posts that will dive into the details of each reference architecture.

In the first installment of this series we discussed how one size does not fit all when it comes to HA architecture. We described Oracle Maximum Availability Architecture (Oracle MAA) best practices...

Oracle GoldenGate Active-Active Part 2

Mylast post ( https://blogs.oracle.com/MAA/entry/oracle_goldengate_active_active_part )  focused on whether or not an application's database structure was setup sufficiently to perform conflict detection and resolution in active-activeGoldenGate environments. Assuming that your application structure is ready,I'll now explain how to actually prevent conflicts from happening in the firstplace. While this is ideal, I don't think conflict prevention is somethingwe could ever guarantee... especially when a fault or hiccup occurs in eitherthe database or GoldenGate itself.   Let's break up conflicts into 3types, based on the DML:  1. Inserts 2. Deletes 3. Updates  1. Insert conflicts typically occurwhen two rows have the same primary key or when there are duplicate unique keyswithin a table.  · Two rowswith same primary key: To addressthese cases we could have primary keys generated based on a sequence value,then set up something like alternating sequences. Depending on how many nodesor servers are in the environment, you could use an algorithm that starts with n and increments by N(where n is the node or server number and N is the total numberof nodes or servers). For example, in a 2-way scenario,  one  side  would  have  odd sequence  values  (start with 1 and increment by 2) and the other would haveeven sequence values (start with 2 and increment by 2).  · Duplicateunique keys: Avoiding conflicts in tables that have duplicate unique keysis a little trickier, and sometimes must be managed from the applicationperspective.  For example, let's say for a particular application that wehave a table that contains login information for an account.  We wouldwant the login name to be a unique value.  However it is possible that twopeople working on two different servers could attempt to obtain the same loginname.  These kinds of operations can be eliminated if we restrict newaccount creation to a single server, thereby letting the database handle theuniqueness of a column.  2. Delete conflicts are usually nothing to worry about. In most cases, thisoccurs when two people are attempting to delete the same record, or when someonetries to update a record that has already been deleted.  These conflictscan usually just be ignored.  However, I typically recommend thatcustomers keep track of these types of conflicts in an exception table, just tomake sure that nothing out of the ordinary is occurring. Once you’ve confirmedthat things are running smoothly you can eliminate the exception mapping andjust ignore the conflicts completely.  3. Update conflicts are definitely the most prevalent.  These conflictsoccur when two people try to update the same logical record on two differentservers.  A typical example is when a customer is on the phone withsupport to change something associated with his or her credit card. At the sametime, the customer is also logged into the account and is trying to change hisor her address.  If these activities occur on two different servers andthe lag is high enough, it could cause a conflict. In order to reduce oreliminate these conflicts there are a few best practices to follow:  1) Reducethe Oracle GoldenGate (OGG) lag to the lowest level possible.  Thereare a few knowledge tickets on this. The master note is Main Note - OracleGoldenGate - Lag, Performance, Slow and Hung Processes (Doc ID 1304557.1) 2) Logicallypartition users based upon geographical regions or usernames.  Forexample, when all users in North America access one server, and users in Europeaccess a different server, the chance of two people updating the same logicalrecord on two different machines is greatly reduced.  Another option is tosplit up the users based on their usernames. Even something as simple as settingup usernames A-M to log into one server and usernames N-Z to log into anotherserver can help reduce conflicts.   The reason this helps is relatedto my next point... 3) Set up SessionPersistence time. IP or Session Persistence is the ability of a loadbalancer or router to keep track of where a connection is sent. In the eventthat a connection is lost, disconnected, etc, and a user attempts to reconnector log back in, the connection will be sent to the same server where it wasoriginally connected.  Most sessions have a time value that can beassociated with this persistence. For example, if I set my session persistence to10 seconds, then any time a session is disconnected or killed, the user will besent to the same server as long as he or she logs back in within 10 seconds. This is ideal for Oracle GoldenGate environments, where there would be lagbetween the different databases. In an ideal situation you would set this sessionpersistence time value to be twice the average lag or 20 seconds – whichever is higher.  This allows a userwho is filling a shopping cart or booking a reservation to maintain aconsistent view of the data, even in the event of a client or networkfailure.  By using these methods, the number of conflicts that actually occur can bedrastically reduced, leading to a happier end user experience.  But evenwith the best intentions and preparation, not everyconflict can be avoided. In my next post I will cover how to resolve such unavoidableconflicts. 

My last post ( https://blogs.oracle.com/MAA/entry/oracle_goldengate_active_active_part )  focused on whether or not an application's database structure was setup sufficiently to perform conflict...

Oracle Flashback Technologies - Overview

Oracle Flashback Technologies - IntroductionIn his May 29th 2014 blog, my colleague Joe Meeks introduced Oracle Maximum Availability Architecture (MAA) and discussed both planned and unplanned outages. Let’s take a closer look at unplanned outages. These can be caused by physical failures (e.g., server, storage, network, file deletion, physical corruption, site failures) or by logical failures – cases where all components and files are physically available, but data is incorrect or corrupt. These logical failures are usually caused by human errors or application logic errors. This blog series focuses on these logical errors – what causes them and how to address and recover from them using Oracle Database Flashback. In this introductory blog post, I’ll provide an overview of the Oracle Database Flashback technologies and will discuss the features in detail in future blog posts. Let’s get started. We are all human beings (unless a machine is reading this), and making mistakes is a part of what we do…often what we do best!  We “fat finger”, we spill drinks on keyboards, unplug the wrong cables, etc.  In addition, many of us, in our lives as DBAs or developers, must have observed, caused, or corrected one or more of the following unpleasant events: Accidentally updated a table with wrong values !! Performed a batch update that went wrong - due to logical errors in the code !! Dropped a table !! How do DBAs typically recover from these types of errors? First, data needs to be restored and recovered to the point-in-time when the error occurred (incomplete or point-in-time recovery).  Moreover, depending on the type of fault, it’s possible that some services – or even the entire database – would have to be taken down during the recovery process.Apart from error conditions, there are other questions that need to be addressed as part of the investigation. For example, what did the data look like in the morning, prior to the error? What were the various changes to the row(s) between two timestamps? Who performed the transaction and how can it be reversed?  Oracle Database includes built-in Flashback technologies, with features that address these challenges and questions, and enable you to perform faster, easier, and convenient recovery from logical corruptions. HistoryFlashback Query, the first Flashback Technology, was introduced in Oracle 9i. It provides a simple, powerful and completely non-disruptive mechanism for data verification and recovery from logical errors, and enables users to view the state of data at a previous point in time.Flashback Technologies were further enhanced in Oracle 10g, to provide fast, easy recovery at the database, table, row, and even at a transaction level.Oracle Database 11g introduced an innovative method to manage and query long-term historical data with Flashback Data Archive. The 11g release also introduced Flashback Transaction, which provides an easy, one-step operation to back out a transaction. Oracle Database versions and beyond further enhanced the performance of these features. Note that all the features listed here work without requiring any kind of restore operation.In addition, Flashback features are fully supported with the new multi-tenant capabilities introduced with Oracle Database 12c, Flashback Features Oracle Flashback Database enables point-in-time-recovery of the entire database without requiring a traditional restore and recovery operation. It rewinds the entire database to a specified point in time in the past by undoing all the changes that were made since that time.Oracle Flashback Table enables an entire table or a set of tables to be recovered to a point in time in the past.Oracle Flashback Drop enables accidentally dropped tables and all dependent objects to be restored.Oracle Flashback Query enables data to be viewed at a point-in-time in the past. This feature can be used to view and reconstruct data that was lost due to unintentional change(s) or deletion(s). This feature can also be used to build self-service error correction into applications, empowering end-users to undo and correct their errors.Oracle Flashback Version Query offers the ability to query the historical changes to data between two points in time or system change numbers (SCN) Oracle Flashback Transaction Query enables changes to be examined at the transaction level. This capability can be used to diagnose problems, perform analysis, audit transactions, and even revert the transaction by undoing SQLOracle Flashback Transaction is a procedure used to back-out a transaction and its dependent transactions.Flashback technologies eliminate the need for a traditional restore and recovery process to fix logical corruptions or make enquiries. Using these technologies, you can recover from the error in the same amount of time it took to generate the error. All the Flashback features can be accessed either via SQL command line (or) via Enterprise Manager.  Most of the Flashback technologies depend on the available UNDO to retrieve older data. The following table describes the various Flashback technologies: their purpose, dependencies and situations where each individual technology can be used.   Example Syntax Error investigation related:The purpose is to investigate what went wrong and what the values were at certain points in timeFlashback Queries  ( select .. as of SCN | Timestamp )   - Helps to see the value of a row/set of rows at a point in timeFlashback Version Queries  ( select .. versions between SCN | Timestamp and SCN | Timestamp)  - Helps determine how the value evolved between certain SCNs or between timestamps Flashback Transaction Queries (select .. XID=)   - Helps to understand how the transaction caused the changes.Error correction related:The purpose is to fix the error and correct the problems,Flashback Table  (flashback table .. to SCN | Timestamp)  - To rewind the table to a particular timestamp or SCN to reverse unwanted updates Flashback Drop (flashback table ..  to before drop )  - To undrop or undelete a table Flashback Database (flashback database to SCN  | Restore Point )  - This is the rewind button for Oracle databases. You can revert the entire database to a particular point in time. It is a fast way to perform a PITR (point-in-time recovery). Flashback Transaction (DBMS_FLASHBACK.TRANSACTION_BACKOUT(XID..))  - To reverse a transaction and its related transactions Advanced use cases Flashback technology is integrated into Oracle Recovery Manager (RMAN) and Oracle Data Guard. So, apart from the basic use cases mentioned above, the following use cases are addressed using Oracle Flashback. Block Media recovery by RMAN - to perform block level recovery Snapshot Standby - where the standby is temporarily converted to a read/write environment for testing, backup, or migration purposes Re-instate old primary in a Data Guard environment – this avoids the need to restore an old backup and perform a recovery to make it a new standby. Guaranteed Restore Points - to bring back the entire database to an older point-in-time in a guaranteed way. and so on..I hope this introductory overview helps you understand how Flashback features can be used to investigate and recover from logical errors.  As mentioned earlier, I will take a deeper-dive into to some of the critical Flashback features in my upcoming blogs and address common use cases.

Oracle Flashback Technologies - IntroductionIn his May 29th 2014 blog, my colleague Joe Meeks introduced Oracle Maximum Availability Architecture (MAA) and discussed both planned and unplanned...

Oracle GoldenGate Active-Active Part 1

My name is Nick Wagner, and I'm a recent addition to the Oracle Maximum Availability Architecture (MAA) product management team.  I've spent the last 15+ years working on database replication products, and I've spent the last 10 years working on the Oracle GoldenGate product.  So most of my posting will probably be focused on OGG.  One question that comes up all the time is around active-active replication with Oracle GoldenGate.  How do I know if my application is a good fit for active-active replication with GoldenGate?   To answer that, it really comes down to how you plan on handling conflict resolution.  I will delve into topology and deployment in a later blog, but here is a simple architecture: The two most common resolution routines are host based resolution and timestamp based resolution. Host based resolution is used less often, but works with the fewest application changes.  Think of it like this: any transactions from SystemA always take precedence over any transactions from SystemB.  If there is a conflict on SystemB, then the record from SystemA will overwrite it.  If there is a conflict on SystemA, then it will be ignored.  It is quite a bit less restrictive, and in most cases, as long as all the tables have primary keys, host based resolution will work just fine.  Timestamp based resolution, on the other hand, is a little trickier. In this case, you can decide which record is overwritten based on timestamps. For example, does the older record get overwritten with the newer record?  Or vice-versa?  This method not only requires primary keys on every table, but it also requires every table to have a timestamp/date column that is updated each time a record is inserted or updated on the table.  Most homegrown applications can always be customized to include these requirements, but it's a little more difficult with 3rd party applications, and might even be impossible for large ERP type applications.  If your database has these features - whether it’s primary keys for host based resolution, or primary keys and timestamp columns for timestamp based resolution - then your application could be a great candidate for active-active replication.  But table structure is not the only requirement.  The other consideration applies when there is a conflict; i.e., do I need to perform any notification or track down the user that had their data overwritten?  In most cases, I don't think it's necessary, but if it is required, OGG can always create an exceptions table that contains all of the overwritten transactions so that people can be notified. It's a bit of extra work to implement this type of option, but if the business requires it, then it can be done. Unless someone is constantly monitoring this exception table or has an automated process in dealing with exceptions, there will be a delay in getting a response back to the end user. Ideally, when setting up active-active resolution we can include some simple procedural steps or configuration options that can reduce, or in some cases eliminate the potential for conflicts.  This makes the whole implementation that much easier and foolproof.  And I'll cover these in my next blog.  Links to Part 2 - https://blogs.oracle.com/MAA/entry/oracle_goldengate_active_active_part1 and Part 3- https://blogs.oracle.com/MAA/entry/oracle_goldengate_active_active_part2

My name is Nick Wagner, and I'm a recent addition to the Oracle Maximum Availability Architecture (MAA) product management team.  I've spent the last 15+ years working on database replication...

How to Achieve Real-Time Data Protection and Availabilty....For Real

There is a class of business and mission critical applications where downtime or data loss have substantial negative impact on revenue, customer service, reputation, cost, etc. Because the Oracle Database is used extensively to provide reliable performance and availability for this class of application, it also provides an integrated set of capabilities for real-time data protection and availability. Active Data Guard, depicted in the figure below, is the cornerstone for accomplishing these objectives because it provides the absolute best real-time data protection and availability for the Oracle Database. This is a bold statement, but it is supported by the facts. It isn’t so much that alternative solutions are bad, it’s just that their architectures prevent them from achieving the same levels of data protection, availability, simplicity, and asset utilization provided by Active Data Guard. Let’s explore further. Backups are the most popular method used to protect data and are an essential best practice for every database. Not surprisingly, Oracle Recovery Manager (RMAN) is one of the most commonly used features of the Oracle Database. But comparing Active Data Guard to backups is like comparing apples to motorcycles. Active Data Guard uses a hot (open read-only), synchronized copy of the production database to provide real-time data protection and HA. In contrast, a restore from backup takes time and often has many moving parts - people, processes, software and systems – that can create a level of uncertainty during an outage that critical applications can’t afford. This is why backups play a secondary role for your most critical databases by complementing real-time solutions that can provide both data protection and availability. Before Data Guard, enterprises used storage remote-mirroring for real-time data protection and availability. Remote-mirroring is a sophisticated storage technology promoted as a generic infrastructure solution that makes a simple promise – whatever is written to a primary volume will also be written to the mirrored volume at a remote site. Keeping this promise is also what causes data loss and downtime when the data written to primary volumes is corrupt – the same corruption is faithfully mirrored to the remote volume making both copies unusable. This happens because remote-mirroring is a generic process. It has no  intrinsic knowledge of Oracle data structures to enable advanced protection, nor can it perform independent Oracle validation BEFORE changes are applied to the remote copy. There is also nothing to prevent human error (e.g. a storage admin accidentally deleting critical files) from also impacting the remote mirrored copy. Remote-mirroring tricks users by creating a false impression that there are two separate copies of the Oracle Database. In truth; while remote-mirroring maintains two copies of the data on different volumes, both are part of a single closely coupled system. Not only will remote-mirroring propagate corruptions and administrative errors, but the changes applied to the mirrored volume are a result of the same Oracle code path that applied the change to the source volume. There is no isolation, either from a storage mirroring perspective or from an Oracle software perspective.  Bottom line, storage remote-mirroring lacks both the smarts and isolation level necessary to provide true data protection. Active Data Guard offers much more than storage remote-mirroring when your objective is protecting your enterprise from downtime and data loss. Like remote-mirroring, an Active Data Guard replica is an exact block for block copy of the primary. Unlike remote-mirroring, an Active Data Guard replica is NOT a tightly coupled copy of the source volumes - it is a completely independent Oracle Database. Active Data Guard’s inherent knowledge of Oracle data block and redo structures enables a separate Oracle Database using a different Oracle code path than the primary to use the full complement of Oracle data validation methods before changes are applied to the synchronized copy. These include: physical check sum, logical intra-block checking, lost write validation, and automatic block repair. The figure below illustrates the stark difference between the knowledge that remote-mirroring can discern from an Oracle data block and what Active Data Guard can discern. An Active Data Guard standby also provides a range of additional services enabled by the fact that it is a running Oracle Database - not just a mirrored copy of data files. An Active Data Guard standby database can be open read-only while it is synchronizing with the primary. This enables read-only workloads to be offloaded from the primary system and run on the active standby - boosting performance by utilizing all assets. An Active Data Guard standby can also be used to implement many types of system and database maintenance in rolling fashion. Maintenance and upgrades are first implemented on the standby while production runs unaffected at the primary. After the primary and standby are synchronized and all changes have been validated, the production workload is quickly switched to the standby. The only downtime is the time required for user connections to transfer from one system to the next. These capabilities further expand the expectations of availability offered by a data protection solution beyond what is possible to do using storage remote-mirroring. So don’t be fooled by appearances.  Storage remote-mirroring and Active Data Guard replication may look similar on the surface - but the devil is in the details. Only Active Data Guard has the smarts, the isolation, and the simplicity, to provide the best data protection and availability for the Oracle Database. Stay tuned for future blog posts that dive into the many differences between storage remote-mirroring and Active Data Guard along the dimensions of data protection, data availability, cost, asset utilization and return on investment. For additional information on Active Data Guard, see: Active Data Guard Technical White Paper Active Data Guard vs Storage Remote-Mirroring Active Data Guard Home Page on the Oracle Technology Network

There is a class of business and mission critical applications where downtime or data loss have substantial negative impact on revenue, customer service, reputation, cost, etc. Because the Oracle...

Oracle MAA Part 1: When One Size Does Not Fit All

The good news is that Oracle Maximum Availability Architecture (MAA) best practices combined with Oracle Database 12c (see video) introduce first-in-the-industry database capabilities that truly make unplanned outages and planned maintenance transparent to users. The trouble with such good news is that Oracle’s enthusiasm in evangelizing its latest innovations may leave some to wonder if we’ve lost sight of the fact that not all database applications are created equal. Afterall, many databases don’t have the business requirements for high availability and data protection that require all of Oracle’s ‘stuff’. For many real world applications, a controlled amount of downtime and/or data loss is OK if it saves money and effort. Well, not to worry. Oracle knows that enterprises need solutions that address the full continuum of requirements for data protection and availability. Oracle MAA accomplishes this by defining four HA service level tiers: BRONZE, SILVER, GOLD and PLATINUM. The figure below shows the progression in service levels provided by each tier. Each tier uses a different MAA reference architecture to deploy the optimal set of Oracle HA capabilities that reliably achieve a given service level (SLA) at the lowest cost.  Each tier includes all of the capabilities of the previous tier and builds upon the architecture to handle an expanded fault domain. Bronze is appropriate for databases where simple restart or restore from backup is ‘HA enough’. Bronze is based upon a single instance Oracle Database with MAA best practices that use the many capabilities for data protection and HA included with every Oracle Enterprise Edition license. Oracle-optimized backups using Oracle Recovery Manager (RMAN) provide data protection and are used to restore availability should an outage prevent the database from being able to restart. Silver provides an additional level of HA for databases that require minimal or zero downtime in the event of database instance or server failure as well as many types of planned maintenance. Silver adds clustering technology - either Oracle RAC or RAC One Node. RMAN provides database-optimized backups to protect data and restore availability should an outage prevent the cluster from being able to restart. Gold raises the game substantially for business critical applications that can’t accept vulnerability to single points-of-failure. Gold adds database-aware replication technologies, Active Data Guard and Oracle GoldenGate, which synchronize one or more replicas of the production database to provide real time data protection and availability. Database-aware replication greatly increases HA and data protection beyond what is possible with storage replication technologies. It also reduces cost while improving return on investment by actively utilizing all replicas at all times. Platinum introduces all of the sexy new Oracle Database 12c capabilities that Oracle staff will gush over with great enthusiasm. These capabilities include Application Continuity for reliable replay of in-flight transactions that masks outages from users; Active Data Guard Far Sync for zero data loss protection at any distance; new Oracle GoldenGate enhancements for zero downtime upgrades and migrations; and Global Data Services for automated service management and workload balancing in replicated database environments. Each of these technologies requires additional effort to implement. But they deliver substantial value for your most critical applications where downtime and data loss are not an option. The MAA reference architectures are inherently designed to address conflicting realities. On one hand, not every application has the same objectives for availability and data protection – the Not One Size Fits All title of this blog post. On the other hand, standard infrastructure is an operational requirement and a business necessity in order to reduce complexity and cost. MAA reference architectures address both realities by providing a standard infrastructure optimized for Oracle Database that enables you to dial-in the level of HA appropriate for different service level requirements. This makes it simple to move a database from one HA tier to the next should business requirements change, or from one hardware platform to another – whether it’s your favorite non-Oracle vendor or an Oracle Engineered System. Please stay tuned for additional blog posts in this series that dive into the details of each MAA reference architecture. Meanwhile, more information on Oracle HA solutions and the Maximum Availability Architecture can be found at: Oracle Maximum Availability Architecture - Webcast Maximize Availability with Oracle Database 12c - Technical White Paper

The good news is that Oracle Maximum Availability Architecture (MAA) best practices combined with Oracle Database 12c (see video) introduce first-in-the-industry database capabilities that truly make...

CAP: Consistency and Availability except when Partitioned - Part 2

The previous post presented the CAP theorem as C and A except when P. For this formulation to be useful, partitions (or failures) must be uncommon and/or fast to recover from, i.e., the system must have liveness. Informally, liveness is a system's ability to eventually make progress, or be up most of the time ("something good eventually happens"). A good/useful system also has safety: a system's guarantee of correct behavior ("nothing bad happens"), e.g., to return correct results, maintain consistency, etc. This post explores the tradeoffs available, under the CAP theorem, to systems that meet both liveness and safety requirements; doing so economically is the main technical challenge of infrastructure-grade systems, including databases. Returning to the CAP theorem: rather than choose either consistency or availability, a good system can strive to maintain both in degrees, and/or suspend one or the other (perhaps on a per-operation basis) during some types of partitions or failures. A system may impose some restrictions on what operations it allows during a partition, to maintain availability and also the ability to restore consistency once it recovers (from a partition/failure). For example, a collaboration platform may restrict some operations when a user is updating a shared document locally, while disconnected from the document server (in effect, the system is partitioned with respect to that user's data). Restricting operations during a partition makes it easier to reconcile updates once the partition ends. In this instance, weakening availability (some operations are unavailable) enables restoring consistency (reconciling concurrent updates from disconnected users) as part of recovering from a failure/partition. As another example, where absolute strong consistency is required, as in an Oracle database, a partition may indeed result in an operational mode where only read-only operations are allowed. Google's Spanner is a good case study of CAP tradeoffs in a mission-critical globally distributed system. In a subsequent post we will examine consistency and availability tradeoffs in the Oracle database ecosystem.   Further Reading: Brewer and Gilbert and Lynch on the CAP Theorem; Vogels on Eventual Consistency, Hamilton on its limitations, and Bailis and Ghodsi on measuring it and more; and Sirer on the multiple meanings of consistency in Computer Science. Liveness manifestos has interesting definition variants for liveness and safety.

The previous post presented the CAP theorem as C and A except when P. For this formulation to be useful, partitions (or failures) must be uncommon and/or fast to recover from, i.e., the system must...

The CAP theorem: Consistency and Availability except when Partitioned

Some NoSQL databases present their implementations of eventual consistency (or other weak consistency variants) as an inevitable consequence of Brewer's CAP Theorem. The reasoning justifying such design choices goes more or less like this: My NoSQL database is distributed, possibly across continental distances (with the ensuing network latency), for scalability and for availability. Because it is a distributed system, I will get network partitions / partial failures. (I can't avoid Brewer's P). My NoSQL database cannot be completely unavailable every time there is a partition (I need Brewer's A). The CAP theorem says* that in a distributed system I can have only 2 of C, A, and P. I can't avoid P and want A, therefore I can't have C -- my NoSQL database will support only eventual or other weak consistency.  This reasoning, however, is flawed, because it relies on a simplistic interpretation (* above) of the CAP theorem. This phrasing is simplistic because the three properties of the CAP theorem are not fully orthogonal, nor is each a binary quantity, and each may have different values at different levels of a system. For example, the question of whether a system is currently partitioned, may not admit a single system-wide consistent answer. A better phrasing of CAP is: In a distributed system, you can have both Consistency and Availability, except when there is a Partition.  Relaxing the consistency requirements usually makes it easier to maintain availability, but the CAP theorem is not an excuse to give up strong consistency across the board. A well-designed system can balance both availability and consistency while tolerating partitions over a range of tradeoffs, where eventual consistency is just one possibility. The next post discusses these points in more detail.    

Some NoSQL databases present their implementations of eventual consistency (or other weak consistency variants) as an inevitable consequence of Brewer's CAP Theorem. The reasoning justifying such...

To SYNC or not to SYNC – Part 4

This is Part 4 of a multi-part blog article where we are discussing various aspects of setting up Data Guard synchronous redo transport (SYNC). In Part 1 of this article, I debunked the myth that Data Guard SYNC is similar to a two-phase commit operation. In Part 2, I discussed the various ways that network latency may or may not impact a Data Guard SYNC configuration. In Part 3, I talked in details regarding why Data Guard SYNC is a good thing, and the distance implications you have to keep in mind.In this final article of the series, I will talk about how you can nicely complement Data Guard SYNC with the ability to failover in seconds.Wait - Did I Say “Seconds”?Did I just say that some customers do Data Guard failover in seconds? Yes, Virginia, there is a Santa Claus. Data Guard has an automatic failover capability, aptly called Fast-Start Failover. Initially available with Oracle Database 10g Release 2 for Data Guard SYNC transport mode (and enhanced in Oracle Database 11g to support Data Guard ASYNC transport mode), this capability, managed by Data Guard Broker, lets your Data Guard configuration automatically failover to a designated standby database. Yes, this means no human intervention is required to do the failover. This process is controlled by a low footprint Data Guard Broker client called Observer, which makes sure that the primary database and the designated standby database are behaving like good kids. If something bad were to happen to the primary database, the Observer, after a configurable threshold period, tells that standby, “Your time has come, you are the chosen one!” The standby dutifully follows the Observer directives by assuming the role of the new primary database. The DBA or the Sys Admin doesn’t need to be involved. And - in case you are following this discussion very closely, and are wondering … “Hmmm … what if the old primary is not really dead, but just network isolated from the Observer or the standby - won’t this lead to a split-brain situation?” The answer is No - It Doesn’t. With respect to why-it-doesn’t, I am sure there are some smart DBAs in the audience who can explain the technical reasons. Otherwise - that will be the material for a future blog post.So - this combination of SYNC and Fast-Start Failover is the nirvana of lights-out, integrated HA and DR, as practiced by some of our advanced customers. They have observed failover times (with no data loss) ranging from single-digit seconds to tens of seconds. With this, they support operations in industry verticals such as manufacturing, retail, telecom, Internet, etc. that have the most demanding availability requirements. One of our leading customers with massive cloud deployment initiatives tells us that they know about server failures only after Data Guard has automatically completed the failover process and the app is back up and running! Needless to mention, Data Guard Broker has the integration hooks for interfaces such as JDBC and OCI, or even for custom apps, to ensure the application gets automatically rerouted to the new primary database after the database level failover completes.Net Net?To sum up this multi-part blog article, Data Guard with SYNC redo transport mode, plus Fast-Start Failover, gives you the ideal triple-combo - that is, it gives you the assurance that for critical outages, you can failover your Oracle databases:very fastwithout human intervention, andwithout losing any data.In short, it takes the element of risk out of critical IT operations. It does require you to be more careful with your network and systems planning, but as far as HA is concerned, the benefits outweigh the investment costs.So, this is what we in the MAA Development Team believe in. What do you think? How has your deployment experience been? We look forward to hearing from you!

This is Part 4 of a multi-part blog article where we are discussing various aspects of setting up Data Guard synchronous redo transport (SYNC). In Part 1of this article, I debunked the myth that Data...

To SYNC or not to SYNC – Part 3

I can't believe it has been almost a year since my last blog post. I know, that's an absolute no-no in the blogosphere. And I know that "I have been busy" is not a good excuse. So - without trying to come up with an excuse - let me state this - my apologies for taking such a long time to write the next Part.Without further ado, here goes. This is Part 3 of a multi-part blog article where we are discussing various aspects of setting up Data Guard synchronous redo transport (SYNC). In Part 1 of this article, I debunked the myth that Data Guard SYNC is similar to a two-phase commit operation. In Part 2, I discussed the various ways that network latency may or may not impact a Data Guard SYNC configuration.In this article, I will talk in details regarding why Data Guard SYNC is a good thing. I will also talk about distance implications for setting up such a configuration. So, Why Good?Why is Data Guard SYNC a good thing? Because, at the end of the day, this gives you the assurance of zero data loss - it doesn’t matter what outage may befall your primary system. Befall! Boy, that sounds theatrical. But seriously - think about this - it minimizes your data risks. That’s a big deal. Whether you have an outage due to bad disks, faulty hardware components, hardware / software bugs, physical data corruptions, power failures, lightning that takes out significant part of your data center, fire that melts your assets, water leakage from the cooling system, human errors such as accidental deletion of online redo log files - it doesn’t matter - you can have that “Om - peace” look on your face and then you can failover to the standby system, without losing a single bit of data in your Oracle database. You will be a hero, as shown in this not so imaginary conversation:IT Manager: Well, what’s the status?You: John is doing the trace analysis on the storage array.IT Manager: So? How long is that gonna take?You: Well, he is stuck, waiting for a response from <insert your not-so-favorite storage vendor here>.IT Manager: So, no root cause yet?You: I told you, he is stuck. We have escalated with their Support, but you know how long these things take.IT Manager: Darn it - the site is down!You: Not really …IT Manager: What do you mean?You: John is stuck, but Sreeni has already done a failover to the Data Guard standby.IT Manager: Whoa, whoa - wait! Failover means we lost some data, why did you do this without letting the Business group know?You: We didn’t lose any data. Remember, we had set up Data Guard with SYNC? So now, any problems on the production – we just failover. No data loss, and we are up and running in minutes. The Business guys don’t need to know.IT Manager: Wow! Are we great or what!!You: I guess …Ok, so you get it - SYNC is good. But as my dear friend Larry Carpenter says, “TANSTAAFL”, or "There ain't no such thing as a free lunch". Yes, of course - investing in Data Guard SYNC means that you have to invest in a low-latency network, you have to monitor your applications and database especially in peak load conditions, and you cannot under-provision your standby systems. But all these are good and necessary things, if you are supporting mission-critical apps that are supposed to be running 24x7. The peace of mind that this investment will give you is priceless, especially if you are serious about HA.How Far Can We Go?Someone may say at this point - well, I can’t use Data Guard SYNC over my coast-to-coast deployment. Most likely - true. So how far can you go? Well, we have customers who have deployed Data Guard SYNC over 300+ miles! Does this mean that you can also deploy over similar distances? Duh - no! I am going to say something here that most IT managers don’t like to hear - “It depends!” It depends on your application design, application response time / throughput requirements, network topology, etc. However, because of the optimal way we do SYNC, customers have been able to stretch Data Guard SYNC deployments over longer distances compared to traditional, storage-centric ways of doing this. The MAA Database 10.2 best practices paper Data Guard Redo Transport & Network Configuration, and Oracle Database 11.2 High Availability Best Practices Manual talk about some of these SYNC-related metrics. For example, a test deployment of Data Guard SYNC over 330 miles with 10ms latency showed an impact less than 5% for a busy OLTP application.Even if you can’t deploy Data Guard SYNC over your WAN distance, or if you already have an ASYNC standby located 1000-s of miles away, here’s another nifty way to boost your HA. Have a local standby, configured SYNC. How local is “local”? Again - it depends. One customer runs a local SYNC standby across the campus. Another customer runs it across 15 miles in another data center. Both of these customers are running Data Guard SYNC as their HA standard. If a localized outage affects their primary system, no problem! They have all the data available on the standby, to which they can failover. Very fast. In seconds.Wait - did I say “seconds”? Yes, Virginia, there is a Santa Claus. But you have to wait till the next blog article to find out more. I assure you tho’ that this time you won’t have to wait for another year for this.

I can't believe it has been almost a year since my last blog post. I know, that's an absolute no-no in the blogosphere. And I know that "I have been busy" is not a good excuse. So - without trying to...

To SYNC or not to SYNC – Part 2

It’s less than two weeks from Oracle OpenWorld! We are going to have an exciting set of sessions from the Oracle HA Development team. Needless to say, all of us are a wee bit busy these days. I think that’s just the perfect time for Part 2 of this multi-part blog article where we are discussing various aspects of setting up Data Guard synchronous redo transport (SYNC). In Part 1 of this article, I debunked the myth that Data Guard SYNC is similar to a two-phase commit operation. In case you are wondering what the truth is, and don’t have time to read the previous article, the answer is - No, Data Guard synchronous redo transport is NOT the same as two-phase commit. Now, let’s look into how network latency may or may not impact a Data Guard SYNC configuration. LATEncy The network latency issue is a valid concern. That’s a simple law of physics. We have heard of the term “lightspeed” (remember Star Wars?), but still - as you know from your high school physics days, light takes time to travel. So the acknowledgement from RFS back to NSS will take some milliseconds to traverse the network, and that is typically proportional to the network distance. Actually - it is both network latency and disk I/O latency. Why disk I/O latency? Remember, on the standby database, RFS is writing the incoming redo blocks on disk-resident SRLs. This is governed by the AFFIRM attribute of the log_archive_dest parameter corresponding to the standby database. We had one customer whose SYNC performance on the primary was suffering because of improperly tuned standby storage system. However, for most cases, network latency is likely to be the gating factor - for example, refer to this real-time network latency chart from AT&T - http://ipnetwork.bgtmo.ip.att.net/pws/network_delay.html. At the time of writing this blog, US coast-coast latency (SF - NY) is shown to be around 75 ms. Trans-Atlantic latency is shown to be around 80 ms, whereas Trans-Pacific latency is shown to be around 140 ms. Of course you can measure the latency between your own primary and standby servers using utilities such as “ping” and “traceroute”. Here is some good news - in Oracle Database 11g Release 2, the write to local online redo logs (by LGWR) and the remote write through the network layer (by NSS) happen in parallel. So we do get some efficiency through these parallel local write and network send operations. Still - you have to make the determination whether the commit operations issued by your application can tolerate the network latency. Remember - if you are testing this out, do it under peak load conditions. Obviously latency will have minimal impact on a read-intensive application (which, by definition, does not generate redo). There are also two elements of application impact - your application response time, and your overall application throughput. For example, your application may have a heavy interactive mode - especially if this interaction happens programmatically (e.g. a trading application accessing an authentication application which in turn is configured with Data Guard SYNC). In such cases, measuring the impact on the application response time is critical. However, if your application has enough parallelism built-in, you may notice that overall throughput doesn’t degrade much with higher latencies. In the database layer, you can measure this with the redo generation rate before and after configuring synchronous redo transport (using AWR). Not all Latencies are Equal The cool thing about configuring synchronous redo transport in the database layer, is just that - we do it in the database layer, and we just send redo blocks. Imagine if you have configured it in the storage layer. All the usual database file structures - data files, online redo logs, archived redo logs, flashback logs, control file - that get updated as part of the usual database activities, will have to be synchronously updated across the network. You have to closely monitor the performance of database checkpointing in this case! We discuss these aspects in this OTN article. So Why Bother? So where are we? I stated that Data Guard synchronous redo transport does not have the overhead of two-phase-commit - so that’s good, and at the same time I stated that you have to watch out for network latency impact because of simple laws of physics - so that’s not so good - right? So, why bother, right? This is why you have to bother - Data Guard synchronous redo transport, and hence - the zero data loss assurance, is a good thing! But to appreciate fully why this is a good thing, you have to wait for the next blog article. It’s coming soon, I promise! For now, let me get back to my session presentation slides for Oracle OpenWorld! See you there!

It’s less than two weeks from Oracle OpenWorld! We are going to have an exciting set of sessions from the Oracle HA Development team. Needless to say, all of us are a wee bit busy these days. I think...

To SYNC or not to SYNC – Part 1

Zero Data Loss – Nervously So? As part of our Maximum Availability Architecture (MAA) conversations with customers, one issue that is often discussed is the capability of zero data loss in the event of a disaster. Naturally, this offers the best RPO (Recovery Point Objective), as far as disaster recovery (DR) is concerned. The Oracle solution that is a must-have for this is Oracle Data Guard, configured for synchronous redo transport. However, whenever the word “synchronous” is mentioned, the nervousness barometer rises. Some objections I have heard: “Well, we don’t want our application to be impacted by network hiccups.” “Well, what Data Guard does is two-phase-commit, which is so expensive!” “Well, our DR data center is on the other coast, so we can’t afford a synchronous network.” And a few others. Some of these objections are valid, some are not. In this multi-part blog series, I will address these concerns, and more. In this particular blog, which is Part 1 of this series, I will debunk the myth that Data Guard synchronous redo transport is similar to two-phase commit. SYNC != 2 PC Let’s be as clear as possible. Data Guard synchronous redo transport (SYNC) is NOT two-phase-commit. Unlike distributed transactions, there is no concept of a coordinator node initiating the transaction, there are no participating nodes, there are no prepare and commit phases working in tandem. So what really happens with Data Guard SYNC? Let’s look under the covers. Upon every commit operation in the database, the LGWR process flushes the redo buffer to local online redo logs - this is the standard way Oracle database operates. With Data Guard SYNC, in addition, the LGWR process tells the NSS process on the primary database to make these redo blocks durable on the standby database disk as well. Until LGWR hears back from NSS that the redo blocks have been written successfully in the standby location, the commit operation is held up. That’s what provides the zero data loss assurance. The local storage on the primary database gets damaged? No problem. The bits are available on the standby storage. But how long should LGWR wait to hear back from NSS? Well, that is governed by the NET_TIMEOUT attribute of the log_archive_dest parameter corresponding to the standby. Once LGWR hears back from NSS that life is good, the commit operation completes. Now, let’s look into how the NSS process operates. Upon every commit, the NSS process on the primary database dutifully sends the committed redo blocks to the standby database, and then waits till the RFS process on the standby receives them, writes them on disk on the standby (in standby redo logs or SRLs), and then sends the acknowledgement back to the NSS process. So - on the standby database, what’s happening is just disk I/O to write the incoming redo blocks into the SRLs. This should not be confused with two-phase-commit, and naturally this process is much faster compared to a distributed transaction involving two-phase-commit coordination. In case you are wondering what happens to these incoming redo blocks in the SRLs - well, they get picked up - asynchronously, by the Managed Recovery Process (MRP) as part of Redo Apply, and the changes get applied to the standby data files in a highly efficient manner. But this Redo Apply process is a completely separate process from Redo Transport - and that is an important thing to remember whenever these two-phase-commit questions come up. Now that you are convinced that Data Guard SYNC is not the same as two-phase commit, in the next blog article, I will talk about impact of network latency on Data Guard SYNC redo transport.

Zero Data Loss – Nervously So? As part of our Maximum Availability Architecture (MAA) conversations with customers, one issue that is often discussed is the capability of zero data loss in the event of...