In this article we shall demonstrate the use of the Oracle GoldenGate precise instantiation method to create a heterogeneous (Microsoft SQL Server to Oracle Database) replication implementation without taking an application or database outage.

Architecture overview
Architecture Overview

This article is valid only for Oracle GoldenGate Microservices Architecture for SQL Server release version 23.8 and above. Precise instantiation is not available in Oracle GoldenGate versions prior to 23.8. Please refer to the Oracle GoldenGate documentation for additional details regarding Microsoft SQL Server precise instantiation.

Prerequisites

Before we get started, there are several actions that must be completed which we will not demonstrate in this article.

  1. Install the 23.8 version of Oracle GoldenGate Microservices Architecture for SQL Server, or upgrade an existing 23ai installation as the replication source.
  2. Install the latest 23ai version of Oracle GoldenGate Microservices Architecture for Oracle Database as the replication target. (Note: The replication target must be of the same major release, i.e., 23ai, or higher, than the replication source.)
  3. Ensure your source and target databases are up to date with patch releases.
  4. Create source and target Oracle GoldenGate Microservices deployments.
  5. Create the Oracle GoldenGate database user with appropriate privileges.

Database Analysis

For this demonstration we will be using an OLTP (Order Entry simulation) database with real time activity occuring. Before creating the Oracle GoldenGate replication environment, it is a good practice to know something about the database and data being replicated as their metadata structure will affect Oracle GoldenGate replication efficiency. 

Likewise, for instantiation, it is good to take a snapshot of the number of rows per tables. The Oracle GoldenGate Initial Load Extract is a single threaded process that performs a “select *” operation for each table in alphabetical order. This select returns all rows, which are then moved downstream through the replication path to the target database. While effective, this is not very efficient. Knowing how many rows of data need to be replicated to the target during instantiation allows you to make informed decisions about the instantiation process as databases with a large number of tables and/or tables with a large number of rows will take a large amount of time to replicate.

The source database consists of nine tables. The query below was used to retrieve a snapshot of the number of rows per table, as depicted in Image 1.

USE tpcc;
go

SELECT cast(s.name as varchar(10)) AS SchemaName,
       cast(t.name as varchar(15)) AS TableName,
       cast(SUM(p.rows) as varchar(10)) AS TableRowCount
FROM sys.schemas AS s
JOIN sys.tables AS t ON t.schema_id = s.schema_id
JOIN sys.partitions AS p ON p.object_id = t.object_id
WHERE s.name = 'dbo' GROUP BY s.name, t.name ORDER BY SchemaName, TableName;
go

 

SQL Server row count
Image 1. Snapshot of source row counts

Given the number of rows and tables returned from the query, this database is very small; one Initial Load Extract is all we will need. (We’ll discuss strategies for instantiation of large databases in a future article.)

For instantiation and replication to function efficiently, we also need to know if the target tables have primary keys, unique constraints, or foreign keys. Oracle GoldenGate for Oracle Database uses the following heirarchy to determine row uniqueness:

  1. Primary key
  2. Unique key
  3. Not Nullable Unique key
  4. All table columns

Refer to the Oracle GoldenGate documentation for more details about data uniqueness and selection criteria.

When working with Oracle databases, we also need to check for “NOVALIDATE” constraints. When a constraint is created, or enabled, “NOVALIDATE”, all new insert and update data is validated to ensure it does not violate the constraint; however, any existing data is not checked. Oracle GoldenGate will not use “NOVALIDATE” constraints when resolving table structures. Any table with a “NOVALIDATE” primary key, and no other unique constraint will be treated as if it does not have a primary key or unique constraint.

We can check the target database for these charateristics.

Check for Primary, Unique, and Foreign Key Constraints

The query we used to check for primary, unique, and foreign keys on the target Oracle Database tables is shown below.

set linesize 80
set pagesize 100
set trimout on

column table_name heading "Table|Name" format a20
column constraint_name heading "Constraint|Name" format a20
column constraint_type heading "Type" format a5
column status heading "Status" format a10
column validated heading "Validated" format a10

select table_name, 
       constraint_name, 
       constraint_type, 
       status, 
       validated 
from user_constraints 
where owner = 'TPCC';

Image 2 presents the data returned by the query.

Oracle tables with PK, UK, or FK
Image 2. Tables with PK, UK, or FK

Two of the nine tables have a primary key that are enabled and validated. None of tables have neither unique nor foreign key constraints.

Because the Oracle GoldenGate Initial Load Extract processes source tables in alphabetical order, it is highly recommended that FK constraints in the target database be disabled during the instantiation process to prevent abnormal Replicat termination (ABEND) due to FK constraint violations.

Check for Unique Indexes

The query we used to check for unique indexes on the target Oracle Database tables was:

set linesize 90
set pagesize 100
set trimout on

column table_name heading "Table|Name" format a20
column index_name heading "Index|Name" format a20
column index_type heading "Type" format a10
column uniqueness heading "Uniqueness" format a10
column status heading "Status" format a10
column visibility heading "Visibility" format a10

select table_name, 
       index_name, 
       index_type, 
       uniqueness, 
       status, 
       visibility 
from user_indexes 
where table_owner = 'TPCC';

Image 3 presents the data returned by the query.

Tables with unique index
Image 3. Tables with UI

Eight of the tables have unique indexes that are valid and visible.

Check for tables with neither PK, UK, nor UI

Check the target for tables that have neither primary key constraints, unique key constraints, nor unique indexes. These tables pose a challenge if they are Update and/or Delete intensive as these operations will result in full table scans within the database which not only slows down replication but also impacts database resources.

To minimize this impact, you should attempt to identify columns within the table data that make Update and Delete operations unique and specify the Oracle GoldenGate MAP parameter option KEYCOLS to create a pseudokey for replication. To learn more about KEYCOLS, refer to the Oracle GoldenGate documentation.

The query we used is shown below.

SELECT DISTINCT TABLE_NAME
  FROM (SELECT t.TABLE_NAME
        FROM USER_TABLES t
        WHERE t.TABLE_NAME NOT IN (SELECT DISTINCT u.TABLE_NAME
                                     FROM USER_CONSTRAINTS u
                                     WHERE u.CONSTRAINT_TYPE IN ('U', 'P')) AND
                                     t.TABLE_NAME NOT IN (SELECT i.TABLE_NAME
                                       FROM USER_INDEXES i
                                       WHERE i.UNIQUENESS = 'UNIQUE'))
  ORDER BY TABLE_NAME;

Image 4 presents the data returned by this query.

Tables without pk, uk, or ui
Image 4. Tables without PK,
UK, or UI

Source Side Instantiation Steps

Add Trandata

In the Oracle GoldenGate Microservices Administration Services WebUI, select DB Connections and then open the connection to the source database. Select Trandata as shown in Image 5.

Db connection add trandata
Image 5. Connect to source database

Select the + button in the field labeled TRANDATA Information. A window labeled Trandata will slide open. Enter either a single table name in the Table Name field or a wildcard for all tables. We’ll be using a wildcard.

Add trandata
Image 6. Add Trandata

Select the Submit button to execute. Open the Notifications window to see execution status. In the Trandata Information window, you can verify all tables have been enabled by doing a search. In the Search for Table field, enter dbo.* and select the magnifying glass.

Trandata search results
Image 7. Trandata search results

Querying the database, we see new tables have been created which show Microsoft CDC Replication for Oracle GoldenGate has been enabled in the database. To learn more about CDC Supplemental Logging refer to the Oracle GoldenGate documentation.

Here’s the query we used:

USE tpcc;
go

SELECT cast(s.name as varchar(10)) AS SchemaName,
       cast(t.name as varchar(15)) AS TableName,
       cast(SUM(p.rows) as varchar(10)) AS TableRowCount
FROM sys.schemas AS s
JOIN sys.tables AS t ON t.schema_id = s.schema_id
JOIN sys.partitions AS p ON p.object_id = t.object_id
WHERE s.name != 'dbo' GROUP BY s.name, t.name ORDER BY SchemaName, TableName;
go

Image 8 presents the query results.

MSS CDC Replication enabled database
Image 8. Database with CDC Replication enabled

Enable Microsoft Snapshot Isolation

Oracle GoldenGate Precise Instantiation utilizes the Snaphot Isolation function of the Microsoft SQL Server Database. Per the Microsoft database documentation, when snapshot isolation is enabled:

“A unique transaction sequence number identifies each transaction, and these unique numbers are recorded for each row version. The transaction works with the most recent row versions having a sequence number before the sequence number of the transaction. Newer row versions created after the transaction has begun are ignored by the transaction.

The term “snapshot” reflects the fact that all queries in the transaction see the same version, or snapshot, of the database, based on the state of the database at the moment in time when the transaction begins. No locks are acquired on the underlying data rows or data pages in a snapshot transaction, which permits other transactions to execute without being blocked by a prior uncompleted transaction. Transactions that modify data do not block transactions that read data, and transactions that read data do not block transactions that write data, as they normally would under the default READ COMMITTED isolation level in SQL Server.”

To check if snapshot isolation is enabled in the database, we’ll execute the query: 

SELECT SNAPSHOT_ISOLATION_STATE_DESC FROM SYS.DATABASES WHERE NAME=’tpcc’;

SQL Server snapshot isolation status
Image 9. Snapshot Isolation status

To enable snapshot isolation in the database, we’ll execute the query:

ALTER DATABASE tpcc SET ALLOW_SNAPSHOT_ISOLATION ON;

Enable snapshot isolation
Image 10. Snapshot Isolation enabled

GoldenGate Microservices Purge Task

Data that is required by Extract to reconstruct transactions are stored in a series of SQL Server CDC system tables, along with Oracle GoldenGate objects that are used to track operation ordering within a transaction. These tables require routine purging to reduce data storage within the database. As part of enabling supplemental logging using TRANDATA, Microsoft SQL Server creates a Change Data Capture Cleanup job, however this job is unaware that an Oracle GoldenGate Extract may still require data from these CDC system tables and can remove that data before the Extract has a chance to capture it.

To remedy this situation, Oracle GoldenGate for SQL Server requires that a Purge Change Data task be created. This task will purge CDC staging data while ensuring that no data is purged that the Extract has yet to process.

In the Oracle GoldenGate Microservices Administration Services WebUI, select Tasks and then select Purge Change Data as shown in Image 11.

OGG purge tasks
Image 11. Tasks

In the Purge Change Data Tasks pane, select the + button to create a new task.

Purge change data tasks
Image 12. Purge Change Data Tasks

A new window labeled Create a new Purge Change Data task will slide out. Enter a name for the task, select the database access credentials, and provide the keep rules and purge frequency (we went with the defaults).

Create purge change data task
Image 13. Create Purge Change Data Task

Select the Submit button to create the task.

New purge change data task
Image 14. Purge Change Data Task

Initial Load Extract

In the Oracle GoldenGate Microservices Administration Services WebUI, select Extracts. Add an Initial Load Extract. 

The parameters we used for the source instantiation capture (commonly refered to as “initial load extract”) are shown below. It is a “best practice” to add comments in Oracle GoldenGate parameter files with information as to why the specified settings are enabled. Comments are denoted by two dashes.

EXTRACT iltpcc
USERIDALIAS mss2019tpcc_src, DOMAIN OracleGoldenGate

-- Create 2Gb initial load trail files.
-- Purge if the file exists
EXTFILE ip, MEGABYTES 2000, PURGE

-- Use precise instantiation
INITIALLOADOPTIONS USESNAPSHOT

-- Every 10 minutes, report the count of records 
-- processed since Extract start to the process 
-- report file.
-- Rate reports the number of operations per second
-- The report file also contains DELTA throughput which is
-- the number of records since the last report divided by 
-- the time since the last report
REPORTCOUNT EVERY 10 MINUTES, RATE

-- Select from all tables in the dbo schema.
TABLE dbo.*;

Start the Initial Load Extract. When it completes reading from every table it will automatically shutdown.

Open the Initial Load Extract report file and scroll to the last page to see the instantiation LSN and the number of rows read from each table. The instantiation LSN message will look similar to this:

INFO    OGG-05381  A consistent point is established in database ‘tpcc’ at LSN 0x0000016d:00006c9a:0002. Create or position a Change Data Capture Extract to LSN 0x0000016d:00006c9a:0002. Example: ADD EXTRACT <extract-name> TRANLOG LSN 0x0000016d:00006c9a:0002 or ALTER EXTRACT <extract-name> LSN 0x0000016d:00006c9a:0002.

Copy the LSN and save it, you will need this value when creating the CDC Extract.

Disable Microsoft Snapshot Isolation

At this point, Microsoft Snapshot Isolation is no longer required by Oracle GoldenGate. If it was disabled initially in the database, you may turn it off by executing the command:

ALTER DATABASE tpcc SET ALLOW_SNAPSHOT_ISOLATION OFF;

Disable snapshot isolation
Image 15. Disable Microsoft Snapshot Isolation

CDC Extract

Create a Change Data Capture (CDC) Extract to capture real time database operations. In Step 2 of the WebUI CDC Extract creation process, select Position in Log as the Extract Options starting point. Enter the Instantiation LSN we previously copied from the Initial Load Extract Report file into the field Position in Log as shown in Image 16.

Add CDC Extract using LSN
Image 16. Add CDC Extract using LSN

Below are the CDC Extract parameter settings we used.

EXTRACT CDCTPCC
USERIDALIAS mss2019tpcc_src, DOMAIN OracleGoldenGate
EXTTRAIL pc

-- Every 60 minutes, report the count of records 
-- processed since Extract start to the process 
-- report file.
-- Rate reports the number of operations per second
-- The report file also contains DELTA throughput which is
-- the number of records since the last report divided by 
-- the time since the last report
REPORTCOUNT EVERY 60 MINUTES, RATE

-- Select from all tables in the dbo schema.
TABLE dbo.*;

Start the CDC Extract, review the WebUI statistics pane to verify that data is being captured.

CDC Extract Statistics
Image 17. CDC Extract statistics

Distribution Path

Create and start two Distribution Paths, one for Instantiation Trail data and one for CDC Trail data, from the source Oracle GoldenGate Hub server to the target Oracle GoldenGate Hub server as specified in the Oracle GoldenGate documentation.

Target Oracle GoldenGate Hub System

If you are unfamiliar with how to create Replicats, please refer to the Oracle GoldenGate documentation.

Apply Initial Load Data

Before continuing, there are a couple of things we should understand about the data contained within Oracle GoldenGate Initial Load Trails and how it affects downstream processing.

  • Because the Initial Load Extract selects data direct from the source tables, there are no transaction records in the Initial Load Trail files.
  • Oracle GoldenGate does not have an Initial Load Replicat, data created by the Initial Load Extract is applied using our CDC Replicats.

Because CDC Replicat uses transaction boundary records from the Trail when creating its data apply transaction, a fake transaction id is generated for all of the data within the Initial Load Extract Trail. The transaction id is comprised of the Tral file sequence number, RBA, and timestamp of the first Trail record. The transaction end boundary is determined by the following:

  1. Replicat reaches end of file (EOF) in each Trail being processed, or
  2.  The record count read by Replicat reaches the number specified by the Oracle GoldenGate setting, MAXTRANSOPS, which defaults to 10,000,000 records.

Knowing this, we need to make adjustments to the Replicat used for Initial Load Data apply; especially if we’re using Parallel Replicat. If we do not adjust the Replicat apply characteristics, Replicat will appear to be “hung” as it will not report statistics and will not checkpoint until the large transaction commits.

Here are the Parallel Replicat settings we’ll use to apply the data from the Initial Load Trail:

REPLICAT ilcc
USERIDALIAS dc1ggeast DOMAIN OracleGoldenGate

-- Set minimum number of Appliers to 1 and spawn
-- up to 6 to improve initial load performance.
-- When everything has been applied, only 1 Applier
-- will be running, signaling initial load is complete.
MIN_APPLY_PARALLELISM 1
MAX_APPLY_PARALLELISM 6

-- Every 10 minutes, report the count of records 
-- processed since Extract start to the process 
-- report file.
-- Rate reports the number of operations per second
-- The report file also contains DELTA throughput which is
-- the number of records since the last report divided by 
-- the time since the last report
REPORTCOUNT EVERY 10 MINUTES, RATE

-- Organize similar SQL statements into arrays and apply 
-- them at an accelerated rate.
BATCHSQL

-- Large transactions should be broken into pieces of 
-- specified size and applied concurrently.
SPLIT_TRANS_RECS 10000

MAP dbo.*, TARGET pdb_east.tpcc.*;

Start the Replicat. Note, the Replicat will not stop when it has completed applying the Initial Load Data. There are a couple of ways to determine if instantiation is complete.

1. Compare the Replicat checkpoint with the file size.

If you have access to the Oracle GoldenGate Hub server, obtain the file information for the Trails read by Replicat. In the image below, we have three Trail files that contain our Initial Load Data. The last file created contains 1535 bytes of data (the fifth column in the output of the ls utility.)

Trail disk files
Image 18. Trail disk files

We compare this to the current checkpoint information reported by Replicat in the Administration Service WebUI. Image 19 shows the Replicat Checkpoint is Trail ip, sequence 2, offset 1535. This equates to the disk file ip000000002 and Replicat has processed everything in that Trail file up to relative byte offset (RBA) 1535.

Replicat Checkpoint
Image 19. Initial Load Replicat checkpoint

Comparing this to the disk file details in Image 18, we can ascertain that Replicat has applied all outstanding Initial Load Data.

2. Compare Initial Load Extract table statistics with Replicat statistics.

When the Initial Load Extract stopped, details about the number of records selected from each table where written to its report file.

Initial Load Extract final stats
Image 20. Initial Load Extract final statistics

Compare the Initial Load Extract table statistics with the total statistics of the Replicat.

Initial Load Replicat stats
Image 21. Initial Load Replicat statistics

If the record counts reported by Initial Load Extract and Replicat match, instantiation is complete.

Apply CDC Data

Once instantiation has completed, create a CDC Replicat and begin applying queued and real time data. The Parallel Replicat parameters settings we used are below.

REPLICAT cdccc
USERIDALIAS dc1ggeast DOMAIN OracleGoldenGate

-- Every 60 minutes, report the count of records processed since
-- Extract start to the process report file.
-- Rate reports the number of operations per second
-- The report file also contains DELTA throughput which is
-- the number of records since the last report divided by 
-- the time since the last report
REPORTCOUNT EVERY 60 MINUTES, RATE

-- Organize similar SQL statements into arrays and apply 
-- them at an accelerated rate.
BATCHSQL

-- Large transactions should be broken into pieces of 
-- specified size and applied concurrently.
SPLIT_TRANS_RECS 10000

MAP dbo.*, TARGET pdb_east.tpcc.*;

Start the Replicat. View the Replicat statistics to verify data is being applied to the target tables.

CDC Parallel Replicat stats
Image 22. CDC Replicat statistics

Post Instantiation Cleanup

We recommed the following to cleanup the Oracle GoldenGate installations post instantiation:

  • Stop and delete the Replicat used to apply instantiation data.
  • Stop and delete the Distribution Path used for transmitting instantiation data.
  • Delete the Initial Load Extract.
  • Create a Trail Purge Task in the Administration Service WedUI for each Oracle GoldenGate Hub to ensure disk space is freed when Oracle GoldenGate Trails are consumed.

Summary

In this article we presented detailed steps to perform precise instantiation of an Oracle target database from a source Microsoft SQL Server database using Oracle GoldenGate. To learn more about Oracle GoldenGate functionality, you may refer to: