Friday Dec 19, 2014

Oracle Global Data Services (GDS): Part 2 – Load Balancing Use Cases

Oracle Database 12c Global Data Services galvanizes the asset utilization of replicated database resources. It allows connect-time and run-time load balancing, routing and service failover across replicated databases situated in any data center in any geographical region. With GDS, customers can now achieve these capabilities without the need to either integrate their High Availability stack with hardware load balancers or write custom homegrown connection managers. And remember that GDS comes with the Active Data Guard license and is also available to Oracle GoldenGate customers at no additional charge as well.

In this blog we follow up on the introduction to GDS from Part 1 and walk through a couple of use cases for workload balancing:

1. The first use case (shown below) is load balancing for reader farms:

Imagine a scenario where GDS is enabled for an Active Data Guard or GoldenGate reader farm with physical standby replicas located in both local and remote data centers. Let’s say a Read Write global service for Order Entry runs on the Primary database and the Read Only Global Services for Reporting run on the reader farm. Using GDS, the client connections are automatically load balanced among the Read Only global services running on the reader farm (across data centers). This capability improves resource utilization, performance and scalability with Read Only workload balancing on Active Data Guard or Oracle GoldenGate reader farms.

2. Another use case (as shown below) is load balancing of Read Write services among multi-masters within and across regions:

Let’s take a scenario of active/active databases using Oracle GoldenGate in a GDS configuration. In this case the Read Write and Read Only global services are both configured to run on each of the masters. For this scenario, GDS automatically balances the workloads for Read-Only and Read-Write Services in the GoldenGate multi-master configuration.

This wraps up our exploration of key Oracle Database 12c GDS load balancing use cases. In the next installment of the GDS blog series (Part 3), we will take a look at few more interesting use cases where GDS can help in mitigating planned and unplanned downtime for applications.

Wednesday Dec 17, 2014

Oracle GoldenGate Active-Active Part 3

Here is the last (3 of 3) blog posting on Active-Active replication for OGG, and my post this time will cover the actual usage of the CDR resolution routines and examples of how they are built. Part 1 is located here, and part 2, here. I’ll cover 2 different use cases. The first will be timestamp based and the second will be trusted source. As a refresher, timestamp is going to have the record with the lowest timestamp win (i.e. whichever record came in first) and the trusted source is going to assume that one system always takes precedence over another system.

For these examples, I’m going to use macros, which makes it so much easier and cleaner to read, and it dramatically reduces the amount of typing I have to do.

My macro file, will be called cdr_macros.prm. I normally wouldn’t want to mix trusted source and timestamp in the same environment, but I’m doing it here just as an example. In this macro file, I have included every CDR function that I want to use, on all systems, for both extracts and replicats. This way if I need to make a change to my CDR rules, I can make the change in the macro file and it effects the entire server. Just make sure to make the same change to each OGG environment. Inside each macro, there is a short description of what the command is going to be used for.

*********************************************************************************************************

MACRO #ExtractCdrDate
BEGIN
COMMENT This is used to ensure that the key columns + the UPDATE_TIME
COMMENT column is always brought over as part of the trail file record
GETBEFORECOLS (ON UPDATE KEYINCLUDING(UPDATE_TIME), ON DELETE KEYINCLUDING(UPDATE_TIME)) , FETCHCOLS (*)
END;
COMMENT END TO ExtractCdrDate

MACRO #ExtractCdrAllColunms
BEGIN
COMMENT This is used when I want to ensure that ALL columns are in the
COMMENT trail file for each record. It has a higher overhead, so be
COMMENT careful on how frequently it is used.
GETBEFORECOLS (ON UPDATE ALL, ON DELETE ALL), FETCHCOLS(*)
END;
COMMENT END TO ExtractCompAllColunms

MACRO #DateCompare
BEGIN
COMMENT This is used when doing a timestamp resolution where the lowest
COMMENT timestamp wins.
COMPARECOLS (ON UPDATE KEYINCLUDING (UPDATE_TIME),ON DELETE KEYINCLUDING (UPDATE_TIME)),
RESOLVECONFLICT (UPDATEROWEXISTS,(mon_resolution_method, USEMIN (UPDATE_TIME),COLS(*)) (DEFAULT, DISCARD)),
RESOLVECONFLICT (INSERTROWEXISTS, (DEFAULT, USEMIN (UPDATE_TIME) , COLS(*))),
RESOLVECONFLICT (DELETEROWEXISTS, (DEFAULT, OVERWRITE)),
RESOLVECONFLICT (UPDATEROWMISSING, (DEFAULT, OVERWRITE)),
RESOLVECONFLICT (DELETEROWMISSING, (DEFAULT, DISCARD))
END;
COMMENT END TO DateCompare

MACRO #FromTrusted
BEGIN
COMMENT This resolution is used on the non-trusted environment to
COMMENT allow operations from the trusted server to overwrite the existing
COMMENT data when there is a conflict.
COMPARECOLS (ON UPDATE ALL,ON DELETE ALL),
RESOLVECONFLICT (UPDATEROWEXISTS, (DEFAULT, OVERWRITE)) ,
RESOLVECONFLICT (INSERTROWEXISTS, (DEFAULT, OVERWRITE)) ,
RESOLVECONFLICT (DELETEROWEXISTS, (DEFAULT, OVERWRITE)) ,
RESOLVECONFLICT (UPDATEROWMISSING, (DEFAULT, OVERWRITE)) ,
RESOLVECONFLICT (DELETEROWMISSING, (DEFAULT, DISCARD))
END;
COMMENT END TO FromTrusted

MACRO #FromNoNTrusted
BEGIN
COMMENT This resolution is used to discard the record any time there is a
COMMENT conflict, when the record comes from the non-trusted server
COMPARECOLS (ON UPDATE ALL,ON DELETE ALL),
RESOLVECONFLICT (UPDATEROWEXISTS, (DEFAULT, DISCARD)) ,
RESOLVECONFLICT (INSERTROWEXISTS, (DEFAULT, DISCARD)) ,
RESOLVECONFLICT (DELETEROWEXISTS, (DEFAULT, DISCARD)) ,
RESOLVECONFLICT (UPDATEROWMISSING, (DEFAULT, DISCARD)) ,
RESOLVECONFLICT (DELETEROWMISSING, (DEFAULT, DISCARD))
END;
COMMENT END TO FromNonTrusted

*********************************************************************************************************

Now that all the hard work is done, and I've defined my rules for both Extract and Replicat in the Macro file, I can easily add those in. In the Extract I simply modify my TABLE statements to include the additional macro to tell OGG which columns to write to the trail file.  In this case, I'm using the #ExtractCdr macros from the first part of the file to instruct OGG which columns to include in the trail file.  This ensures that the resolution routines always have the data they need to perform the specified resolution. 

TABLE DEMO.VCRYPT_ACCOUNTS , #ExtractCdrDate();
TABLE DEMO.VCRYPT_ACCOUNTS_HIST , #ExtractCdrAllColunms();

The changes to the MAP statements in the Replicat parameter file itself is extremely elegant and simple. In the Replicat, the changes are also very straightforward, by simply adding the macros that were defined above. 

MAP DEMO. VCRYPT_ACCOUNTS, TARGET DEMO.VCRYPT_ACCOUNTS, #DateCompare();
MAP DEMO.VCRYPT_ACCOUNTS_HIST, TARGET DEMO.VCRYPT_ACCOUNTS_HIST , #FromTrusted();

Using the macro method, it’s easy to identify which objects are using each conflict detection and resolution routine, and if you need to make a change, you can make it once in the macro file and it will affect every parameter file. The methodologies and best practices in the last few of my blog postings on Active-Active Replication in GoldenGate and the white paper here: http://www.oracle.com/us/products/middleware/data-integration/golden-gate-active-active-1887519.pdf  should help implement robust Active-Active replication..

Wednesday Oct 15, 2014

Oracle Database 12c Global Data Services: Part 1 – Automated Workload Management for Replicated Databases

Introduction

Global Data Services is a key offering within Oracle’s Maximum Availability Architecture. It’s really a must-have for organizations that are using Oracle high availability technologies such as Active Data Guard or Oracle GoldenGate to replicate data across multiple databases. With automated workload balancing and service failover capabilities, GDS improves performance, availability, scalability, and manageability for all databases that are replicated within a data center and across the globe. And GDS boosts resource utilization, which really improves the ROI of Active Data Guard and GoldenGate investments. It does this in an integrated, automated way that no other technology can match. Plus it’s included with the Active Data Guard license - and since GoldenGate customers have the right to use Active Data Guard, it’s available to them at no additional charge as well.

Customer Challenges

Enterprises typically deploy replication technologies for various business requirements – high availability and disaster recovery, content localization and caching, scalability, performance optimization for local clients or for compliance in accordance with local laws. Oracle customers use Active Data Guard and Oracle GoldenGate to address all of these business requirements. They use Active Data Guard to distribute their Read-Only workload and GoldenGate to distribute not only Read workloads but also Read Write workloads across their replicated databases.

However when you’re trying to optimize workload management across multiple database replicas, you run into certain challenges that simply extend beyond the capabilities of replication technology. That’s because customers are unable to manage replicated databases with a unified framework and instead have to deal with database silos from an application and DBA perspective.

Let’s look at a couple of the main problems with database silos.

  • The first is under-utilized resources – for example, when one replica cannot be leveraged to shoulder the workload of another over-utilized database. This leads to suboptimal resource utilization, which can adversely affect performance, availability and of course cost.
  • The other problem with silos is the inability to automatically fail over a service across databases - let’s say a production application workload is running against a particular replica. If that replica goes down due to an unplanned event, customers don’t have a mechanism that automatically and transparently relocates the Service to another available replica. When a replica fails that can lead to application outages.

Until the introduction of Oracle Global Data Services (GDS), there really wasn’t a way for enterprises to achieve Service Failover and load balancing across replicas out of the Oracle Stack. To address this, some customers have chosen to compile their own homegrown connection managers and others have integrated their HA stack with hardware load balancers. But these solutions still don’t address all of the issues:

  • Manual load balancing using homegrown connection managers, for example, incurs huge development costs and yet cannot optimize performance and availability for replicated systems
  • Special purpose network load balancers can help but they introduce additional cost and complexity – and they still can’t offer database service failover and centralized workload management

Global Data Services Overview

Global Data Services delivers automated workload management, which addresses all of these key pain points. It eliminates the need for custom connection managers and load balancers for database workloads.

With a newly created concept called Global Service, Oracle Global Data Services extends the familiar Oracle RAC-style connect-time and run-time load balancing, service failover and management capabilities beyond a single clustered database. Capabilities that were so far applicable only to a single database can now be applied to a set of replicated databases that may reside within or across datacenters. Customers can achieve these capabilities by simply setting the pertinent attributes of the Global Service.

https://blogs.oracle.com/MAA/resource/GDS.png

GDS sits between the application tier and the database tiers of the stack. It orchestrates the Service high availability, Service level load balancing and routing. Global Services run on the databases but are managed by GDS. GDS algorithms take into account DB instance load, network latency between data centers and the workload management policies (region affinity, load balancing goals, DB cardinality, DB role, replication lag tolerance) that the customers can configure. These workload management policies are enabled via the attributes of a given Global Service.

What are the key capabilities that are really unique to GDS?

1. For performance optimization, there’s region-based workload routing, which automatically routes workloads to the database closest to the clients. For example, what if the customer has a requirement that all the clients/applications closer to the North American data center need to be routed to the database in the North American data center? Likewise, European clients may need to be routed to the European database. GDS addresses this problem by managing this workload routing automatically.

2. In addition, GDS provides connect time load balancing and supports run time load balancing – another key performance advantage.

3. For higher application availability, GDS enables inter-database service failover. If a replica goes down as a result of a planned or unplanned event, GDS fails over the service to another replica

4. And it also offers role based global services. GDS will make sure that the global services are always started on those databases whose database role matches the role specified for the service. For example, if Data Guard undergoes role transitions, the global services are relocated accordingly, maintaining availability requirements.

5. For improved data quality, there’s also replication lag-based workload routing. This capability routes read workloads to a Data Guard standby whose replication lag is within a customer-specified threshold that’s based on business needs

6. By managing all of the resources of the replicas efficiently, customers are able to maximize their ROI because there are no longer any under-utilized servers

This wraps up the introductory blog post on Oracle Database 12c GDS. We looked at the challenges of workload management for replicated databases and how GDS addresses those challenges. In the next blog, we will review some of the key capabilities of GDS and the tangible business benefits.

Wednesday Jul 02, 2014

Oracle GoldenGate Active-Active Part 2

My last post ( https://blogs.oracle.com/MAA/entry/oracle_goldengate_active_active_part )  focused on whether or not an application's database structure was set up sufficiently to perform conflict detection and resolution in active-active GoldenGate environments. Assuming that your application structure is ready, I'll now explain how to actually prevent conflicts from happening in the first place. While this is ideal, I don't think conflict prevention is something we could ever guarantee... especially when a fault or hiccup occurs in either the database or GoldenGate itself.  

Let's break up conflicts into 3 types, based on the DML: 

1. Inserts

2. Deletes

3. Updates 

1. Insert conflicts typically occur when two rows have the same primary key or when there are duplicate unique keys within a table. 

· Two rows with same primary key: To address these cases we could have primary keys generated based on a sequence value, then set up something like alternating sequences. Depending on how many nodes or servers are in the environment, you could use an algorithm that starts with n and increments by N (where n is the node or server number and N is the total number of nodes or servers). For example, in a 2-way scenario,  one  side  would  have  odd  sequence  values  (start with 1 and increment by 2) and the other would have even sequence values (start with 2 and increment by 2). 

· Duplicate unique keys: Avoiding conflicts in tables that have duplicate unique keys is a little trickier, and sometimes must be managed from the application perspective.  For example, let's say for a particular application that we have a table that contains login information for an account.  We would want the login name to be a unique value.  However it is possible that two people working on two different servers could attempt to obtain the same login name.  These kinds of operations can be eliminated if we restrict new account creation to a single server, thereby letting the database handle the uniqueness of a column. 

2. Delete conflicts are usually nothing to worry about. In most cases, this occurs when two people are attempting to delete the same record, or when someone tries to update a record that has already been deleted.  These conflicts can usually just be ignored.  However, I typically recommend that customers keep track of these types of conflicts in an exception table, just to make sure that nothing out of the ordinary is occurring. Once you’ve confirmed that things are running smoothly you can eliminate the exception mapping and just ignore the conflicts completely. 

3. Update conflicts are definitely the most prevalent.  These conflicts occur when two people try to update the same logical record on two different servers.  A typical example is when a customer is on the phone with support to change something associated with his or her credit card. At the same time, the customer is also logged into the account and is trying to change his or her address.  If these activities occur on two different servers and the lag is high enough, it could cause a conflict. In order to reduce or eliminate these conflicts there are a few best practices to follow: 

1) Reduce the Oracle GoldenGate (OGG) lag to the lowest level possible.  There are a few knowledge tickets on this. The master note is Main Note - Oracle GoldenGate - Lag, Performance, Slow and Hung Processes (Doc ID 1304557.1)

2) Logically partition users based upon geographical regions or usernames.  For example, when all users in North America access one server, and users in Europe access a different server, the chance of two people updating the same logical record on two different machines is greatly reduced.  Another option is to split up the users based on their usernames. Even something as simple as setting up usernames A-M to log into one server and usernames N-Z to log into another server can help reduce conflicts.   The reason this helps is related to my next point...

3) Set up Session Persistence time. IP or Session Persistence is the ability of a load balancer or router to keep track of where a connection is sent. In the event that a connection is lost, disconnected, etc, and a user attempts to reconnect or log back in, the connection will be sent to the same server where it was originally connected.  Most sessions have a time value that can be associated with this persistence. For example, if I set my session persistence to 10 seconds, then any time a session is disconnected or killed, the user will be sent to the same server as long as he or she logs back in within 10 seconds.  This is ideal for Oracle GoldenGate environments, where there would be lag between the different databases. In an ideal situation you would set this session persistence time value to be twice the average lag or 20 seconds – whichever is higher.  This allows a user who is filling a shopping cart or booking a reservation to maintain a consistent view of the data, even in the event of a client or network failure. 

By using these methods, the number of conflicts that actually occur can be drastically reduced, leading to a happier end user experience.  But even with the best intentions and preparation, not every conflict can be avoided. In my next post I will cover how to resolve such unavoidable conflicts. 

Tuesday Jun 10, 2014

Oracle GoldenGate Active-Active Part 1

My name is Nick Wagner, and I'm a recent addition to the Oracle Maximum Availability Architecture (MAA) product management team.  I've spent the last 15+ years working on database replication products, and I've spent the last 10 years working on the Oracle GoldenGate product.  So most of my posting will probably be focused on OGG. 


One question that comes up all the time is around active-active replication with Oracle GoldenGate.  How do I know if my application is a good fit for active-active replication with GoldenGate?   To answer that, it really comes down to how you plan on handling conflict resolution. 


I will delve into topology and deployment in a later blog, but here is a simple architecture:


Active-Active Architecture


The two most common resolution routines are host based resolution and timestamp based resolution.


Host based resolution is used less often, but works with the fewest application changes.  Think of it like this: any transactions from SystemA always take precedence over any transactions from SystemB.  If there is a conflict on SystemB, then the record from SystemA will overwrite it.  If there is a conflict on SystemA, then it will be ignored.  It is quite a bit less restrictive, and in most cases, as long as all the tables have primary keys, host based resolution will work just fine. 


Timestamp based resolution, on the other hand, is a little trickier. In this case, you can decide which record is overwritten based on timestamps. For example, does the older record get overwritten with the newer record?  Or vice-versa?  This method not only requires primary keys on every table, but it also requires every table to have a timestamp/date column that is updated each time a record is inserted or updated on the table.  Most homegrown applications can always be customized to include these requirements, but it's a little more difficult with 3rd party applications, and might even be impossible for large ERP type applications. 


If your database has these features - whether it’s primary keys for host based resolution, or primary keys and timestamp columns for timestamp based resolution - then your application could be a great candidate for active-active replication.  But table structure is not the only requirement.  The other consideration applies when there is a conflict; i.e., do I need to perform any notification or track down the user that had their data overwritten?  In most cases, I don't think it's necessary, but if it is required, OGG can always create an exceptions table that contains all of the overwritten transactions so that people can be notified. It's a bit of extra work to implement this type of option, but if the business requires it, then it can be done. Unless someone is constantly monitoring this exception table or has an automated process in dealing with exceptions, there will be a delay in getting a response back to the end user.


Ideally, when setting up active-active resolution we can include some simple procedural steps or configuration options that can reduce, or in some cases eliminate the potential for conflicts.  This makes the whole implementation that much easier and foolproof.  And I'll cover these in my next blog. 

About

Musings on Oracle's Maximum Availability Architecture (MAA), by members of Oracle Development team. Note that we may not have the bandwidth to answer generic questions on MAA.

Search

Categories
Archives
« February 2015
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
       
       
Today