In many organizations, Oracle databases power critical systems;  order management, billing, customer profiles, while MySQL drives e-commerce, content management, web-based, or healthcare applications. Bridging the two often means moving static data and data that changes in real time; this creates a perfect use case for Oracle GoldenGate.

That’s where Oracle GoldenGate comes in. 

Setting up replication between MySQL and Oracle starts with instantiation, capturing a consistent snapshot of the MySQL data to quickly populate the Oracle database with the static data, while preparing for real-time synchronization.

Whether replicating healthcare information to a data warehouse, syncing e-commerce data across regions, or migrating legacy apps, instantiation is the first critical step.

In this article we shall demonstrate using multiple Oracle GoldenGate Initial Load  processes to instantiate an Oracle Database from a MySQL Database to create a heterogeneous (MySQL to Oracle Database) replication implementation without taking an application outage.

Architecture Diagram
Image 1. Architecture Diagram

Prerequisites

Before starting the instantiation process, the source and target Oracle GoldenGate Hub machines and the source and target database environments need to be configured to enable replication. It is critical to complete all of these steps in their entirety, failure to do so will lead to setup failures, inconsistent data, or unsupported configuration errors.

  1. Convert MySQL tables to Oracle : Oracle GoldenGate does not provide database conversion tools. To convert your MySQL tables, views, stored procedures, and functions to Oracle you will need to select a utility to perform this task. There are numerous utilities available, such as Oracle SQL Developer and SQLines.
  2. Install OpenSSL on source OGG Hub : Oracle GoldenGate 23ai for MySQL requires that OpenSSL 1.1 are present on the Oracle GoldenGate server prior to creating a deployment. See the Oracle GoldenGate documentation for details.
  3. GoldenGate Installation : Install the latest patch release of Oracle GoldenGate 23ai (Microservices Architecture) for MySQL on the source Oracle GoldenGate Hub server and Oracle GoldenGate 23ai (Microservices Architecture) for Oracle on the target Oracle GoldenGate Hub server. See the Oracle GoldenGate documentation for details regarding software installation and patching.
  4. Create Deployments for MySQL and Oracle GoldenGate : Create the source and target Oracle GoldenGate deployments. See the Oracle GoldenGate documentation for more details.
  5. Database version : Ensure your source (MySQL) and target (Oracle) databases are up to date with patch releases.
  6. Database Configuration

Pre-Instantiation Steps

When setting up Oracle GoldenGate for replication, taking time to understand your table metadata isn’t just good practice, it’s essential for performance and reliability.

Oracle GoldenGate’s Initial Load Extract process is single-threaded and performs a fetch from each table (processed in alphabetical order). While this approach is straightforward, it can be inefficient for environments with large tables or high table counts. All rows are selected and streamed to the target, which can quickly become time-consuming without the right preparation.

To optimize this, it’s important to:

  • Capture the row count and size of each source table being replicated.
    • Knowing how much data needs to move helps you estimate the time and resources required for instantiation.
  • Check the source tables for Large Object data types.
    • Fetching Large Objects from the database is resource intensive and will impact Initial Load Extract performance.
  • Review target table key constraints.
    • Primary keys, unique keys, and foreign keys impact Oracle GoldenGate’s ability to replicate changes accurately and efficiently.
    • Tables without keys will require special handling for change data replication and initial load completion.
    • Foreign keys should be disabled for instantiation to prevent data apply errors.

Understanding this metadata upfront allows you to make informed decisions, identify potential bottlenecks, and tailor your instantiation strategy to your environment’s scale and complexity.

Source Table Row Count and Size

After updating table statistics, we used this SQL query to retrieve source table sizes:

SELECT
    LEFT(TABLE_NAME,10) AS `Table`,
    LEFT(TABLE_ROWS,10) AS `Rows`,
    LEFT(ROUND(((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024), 2),10) AS `Size (MB)`
FROM
    information_schema.tables
WHERE
    TABLE_SCHEMA = ‘[database name]’
ORDER BY
    (DATA_LENGTH + INDEX_LENGTH) DESC;
Source Table Size
Image 2. Source table size

In the above output, except for table_17, our source tables are very small in size and should be efficiently handled via a single Oracle GoldenGate Initial Load Extract.

We can create parallelism in the data fetch for table_17 by splitting it across multiple Initial Load Extract processes using the Oracle GoldenGate @RANGE function. For details on this function, refer to the Oracle GoldenGate documentation. For this demonstration we will use two Initial Load Extracts; however, further parallelization maybe used to split the workload across three, four, or any number of Initial Load Extracts.

When using @RANGE, data hashing and filtering occurs in the Initial Load Extract not the database. Each Initial Load Extract configured to fetch the table’s data must perform an initial read of the entire table to determine the correct data range of values it will fetch. Depending upon the size of the table, this initial read could take a significant amount of time.

Query Source for Large Objects

Here is the query we used to find MySQL tables with Large Object data types:

select 
    LEFT(tab.table_name,10) as `TABLE`,
    LEFT(col.column_name,10) as `COLUMN`,
    LEFT(col.data_type, 10) as `Data Type`
from information_schema.tables as tab
inner join information_schema.columns as col
    on col.table_schema = tab.table_schema
    and col.table_name = tab.table_name
where tab.table_schema = ‘[database name]’
    and tab.table_type = ‘BASE TABLE’
    and col.data_type in (‘blob’, ‘mediumblob’, ‘longblob’,
                         ‘text’,’mediumtext’,’longtext’)
order by tab.table_name,
    col.column_name;
Tables with LOB data
Image 3. Tables with large object data

The table, mylob, will not be part of our MySQL to Oracle implementation. However, tables with Large Objects will negatively impact Initial Load Extract performance.

Identify Target Tables Without Constraints

The following query will list all tables in the Oracle schema that do not have a Primary Key or Unique Key Constraint:

SELECT DISTINCT TABLE_NAME
  FROM (SELECT t.TABLE_NAME
          FROM USER_TABLES t
          WHERE t.TABLE_NAME NOT IN (SELECT DISTINCT u.TABLE_NAME
                                       FROM USER_CONSTRAINTS u
                                       WHERE u.CONSTRAINT_TYPE IN (‘U’, ‘P’)) AND
                t.TABLE_NAME NOT IN (SELECT i.TABLE_NAME
                                       FROM USER_INDEXES i
                                       WHERE i.UNIQUENESS = ‘UNIQUE’))
  ORDER BY TABLE_NAME;
Tables without constraints
Image 4. Tables without constraints

Tables without a Primary or Unique Key Constraint require additional research and may require special handling for instantiation. Using our demonstration table above, let’s walk through these additional steps.

1. Does the source table have a Primary or Unique Key? This MySQL query will list all constraints for a table:

SELECT
    CONSTRAINT_NAME,
    CONSTRAINT_TYPE
FROM
    INFORMATION_SCHEMA.TABLE_CONSTRAINTS
WHERE
    TABLE_SCHEMA = ‘[database name]’ AND TABLE_NAME = ‘[table name]’;
MySQL Constraints
Image 5. Table constraints

 

The source MySQL table does not have a Primary or Unique Key.

2. Are there any numeric or timestamp columns that can be used to create a unique pseudo-key?

If there are no constraints on the source MySQL table, check the table metadata for columns that could be used to create a pseudo-key; numeric and timestamp data works best for this. This query will return the columns with those data types:

SELECT 
  COLUMN_NAME, 
  DATA_TYPE
FROM 
  INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = ‘[database name]’
  AND TABLE_NAME = ‘[table name]’
  AND DATA_TYPE in (‘int’, ‘decimal’, ‘datetime’);
Columns
Image 6. Columns

3. Do any of the potential pseudo-key columns contain duplicate data?
For each column returned, check if it contains duplicate data by executing this query:

SELECT 
  [column name], 
  COUNT([column name]) AS duplicate_count
FROM 
  [table name]
GROUP BY [column name]
HAVING COUNT([column name]) > 1;
Duplicate rows
Image 7. Column with duplicates

4. Does the selected pseudo-key column(s) contain NULL data?
For the column(s) that do not contain duplicate data, check if they contain NULL.

select
  count(*)
from 
  [table name]
where 
  [column name] IS NULL;
Null data
Image 8. NULL count

For our demonstration table, table_25, this research determined that the INT data type column “id” meets the criteria for use as a pseudo-key; uniqueness and no NULL data.

If we are unable to identify a column, or columns, that can be used to create a pseudo-key, then the table cannot be instantiated using Oracle GoldenGate Initial Load.

Identify Target Table Constraints

The following query will identify the type of constraint on each table in our Oracle schema and its status.

SELECT
    acc.table_name,
    acc.column_name,
    ac.constraint_type,
    ac.validated,
    ac.status
FROM
    all_cons_columns acc
JOIN
    all_constraints ac ON acc.owner = ac.owner
    AND acc.constraint_name = ac.constraint_name
    AND acc.table_name = ac.table_name
WHERE
    ac.constraint_type in (‘P’,’U’,’R’)
    and acc.owner = ‘[schema]’
    order by table_name;
Oracle constraints
Image 9. Oracle table constraints

It is important to note that Oracle GoldenGate for Oracle Database will not use Primary Key or Unique Key constraints that are not enabled or that have been create as NOVALIDATE as there is no guarantee of uniqueness for the constraint column.

Any Referential Integrity (Foreign Key) constraints should be disabled prior to starting the Initial Load Apply processes to prevent data integrity failures. These constraints can be enabled once the Initial Load completes.

Source GoldenGate Processes

With the prerequisites and research out of the way, we can move on to setting up the Oracle GoldenGate for MySQL source deployment for Change Data Capture (CDC) and Initial Load extraction.

Important: We are only creating the Extracts, do not select “Create and Run”.

Change Data Capture Extract

Login to the Administration Service of the Oracle GoldenGate for MySQL Hub server and create a Change Data Capture Extract.

The parameter settings we used are:

EXTRACT edemo
USERIDALIAS [extract database alias] DOMAIN OracleGoldenGate
TRANLOGOPTIONS ALTLOGDEST REMOTE 
EXTTRAIL dm
— Every 60 minutes, report the count of records 
— processed since Extract start to the process 
— report file.
— Rate reports the number of operations per second
— The report file also contains DELTA throughput which is
— the number of records since the last report divided by 
— the time since the last report
REPORTCOUNT EVERY 60 MINUTES, RATE

— These two parameters are required because table_25
— does not have “not null” constraints enabled.
— Capture all table columns for update operations
NOCOMPRESSUPDATES
— Capture all table columns for delete operations
NOCOMPRESSDELETES

— “table_25” does not have a PK or UK, use the “id”
— column as the pseudo-key
TABLE demodb.table_25, KEYCOLS (id);
TABLE demodb.*;

Initial Load Extracts

Login to the Administration Service of the Oracle GoldenGate for MySQL Hub server and create three Initial Load Extracts; two for splitting the workload related to our large table, table_17, and one for all other tables.

Here is the parameter file for all tables, excluding table_17:

EXTRACT L1DEMO
USERIDALIAS [extract database alias] DOMAIN OracleGoldenGate
EXTFILE L1, MEGABYTES 2000, PURGE
— Every 5 minutes, report the count of records 
— processed since Extract start to the process 
— report file.
— Rate reports the number of operations per second
— The report file also contains DELTA throughput which is
— the number of records since the last report divided by 
— the time since the last report
REPORTCOUNT EVERY 5 MINUTES, RATE
— Exclude table_17
TABLEEXCLUDE demodb.table_17
TABLE demodb.*;

Here are the parameter files for table_17:

EXTRACT L2DEMO
USERIDALIAS [extract database alias] DOMAIN OracleGoldenGate
EXTFILE L2, MEGABYTES 2000, PURGE
— Every 5 minutes, report the count of records 
— processed since Extract start to the process 
— report file.
— Rate reports the number of operations per second
— The report file also contains DELTA throughput which is
— the number of records since the last report divided by 
— the time since the last report
REPORTCOUNT EVERY 5 MINUTES, RATE
— Split table_17 over 2 Extracts using the
— column “id” to create the hash split
TABLE demodb.table_17, FILTER (@RANGE (1,2,id));

EXTRACT L3DEMO
USERIDALIAS [extract database alias] DOMAIN OracleGoldenGate
EXTFILE L3, MEGABYTES 2000, PURGE
— Every 5 minutes, report the count of records 
— processed since Extract start to the process 
— report file.
— Rate reports the number of operations per second
— The report file also contains DELTA throughput which is
— the number of records since the last report divided by 
— the time since the last report
REPORTCOUNT EVERY 5 MINUTES, RATE
— Split table_17 over 2 Extracts using the
— column “id” to create the hash split
TABLE demodb.table_17, FILTER (@RANGE (2,2,id));

Distribution Path

Create Distribution Paths from the source Oracle GoldenGate Hub server to the target Oracle GoldenGate Hub server as specified in the Oracle GoldenGate documentation. Four Distribution Paths will be required, one for the Change Data Capture Extract and one for each Initial Load Extract.

Target GoldenGate Processes

If you are unfamiliar with how to create Replicats, please refer to the Oracle GoldenGate documentation.

Important: We are only creating the Replicats, do not select “Create and Run”.

Create Initial Load Replicats

Before proceeding, it’s important to understand how Oracle GoldenGate handles Initial Load Trail data and how it impacts downstream processing.

The Initial Load Extract reads data directly from the source tables, meaning no transactional metadata (like BEGIN/COMMIT records) is written to the Initial Load Trail files. Oracle GoldenGate does not use a dedicated Initial Load Replicat. Instead, the Change Data Replicats are responsible for applying the data captured during the Initial Load. Since Change Data Replicats rely on transaction boundaries in the trail files to group operations into transactions, Oracle GoldenGate generates a synthetic (fake) Transaction ID for the Initial Load data. This ID is derived from: The trail file sequence number, the relative byte address (RBA), and the timestamp of the first record in the trail

A synthetic transaction is considered complete under one of the following conditions:

  • End-of-file (EOF) is reached in each trail being processed, or
  • The number of records read by Replicat reaches the value set by the MAXTRANSOPS parameter (default: 10,000,000 records)

Because the entire Initial Load data may be treated as a single large transaction, it’s critical to adjust the Replicat configuration, especially when using Parallel Replicat. If not properly tuned:

  • Replicat may appear to be hung or stalled.
  • No statistics will be reported.
  • No checkpoints will be recorded until the synthetic transaction is committed.

To avoid these issues, we shall enable the setting SPLIT_TRANS_RECS in our Parallel Replicat.

  • SPLIT_TRANS_RECS is specific to Parallel Replicat and is used to break large transactions into smaller, concurrent chunks, improving checkpointing and performance.

Create Parallel Replicats to apply the Initial Load data. One Parallel Replicat is required for each source Initial Load Extract. For this demonstration we shall be creating three, P1LD, P2LD, and P3LD; all with the same parameter settings:

REPLICAT P1LD
USERIDALIAS [replicat database alias] DOMAIN OracleGoldenGate
— Set minimum number of Appliers to 1 and 
— spawn up to 6 to improve initial load 
— performance.
— When everything has been applied, only 1 
— Applier will be running, signalling 
— initial load is complete.
MIN_APPLY_PARALLELISM 1
MAX_APPLY_PARALLELISM 6

— Every 60 seconds, report the count of 
— records processed since Replicat starts 
— to the process report file.
— Rate reports the number of operations per 
— second.
— The report file also contains DELTA 
— throughput which is the number of records 
— since the last report divided by the time 
— since the last report.
REPORTCOUNT EVERY 60 SECONDS, RATE

— Large transactions should be broken into 
— pieces of specified size and applied 
— concurrently.
SPLIT_TRANS_RECS 10000
MAP demodb.*, TARGET pdb_east.demo.*;

 

Create Change Data Apply Replicat

Create a Parallel Replicat to apply both queued and real-time change data.

Because we are using Oracle GoldenGate Initial Load, we must specify a special parameter setting, HANDLECOLLISIONS. HANDLECOLLISIONS is only to be used for Oracle GoldenGate Initial Load and must be disabled for continuing production data apply. See the Oracle GoldenGate documentation for more details.

Due to the target table, table_25, not having a Primary or Unique key we need to set the pseudo-key to match the setting in the source Change Data Capture Extract.

REPLICAT PDEMO
USERIDALIAS [replicat database alias] DOMAIN OracleGoldenGate
— Every 60 minutes, report the count of 
— records processed since Replicat starts 
— to the process report file.
— Rate reports the number of operations per second.
— The report file also contains DELTA 
— throughput which is the number of records 
— since the last report divided by the time 
— since the last report.
REPORTCOUNT EVERY 60 MINUTES, RATE

— Enabled for initial load to handle duplicate and
— missing data fir a brief period. Must be disabled
— after initial load completes.
HANDLECOLLISIONS

— table_25 does not have a PK or UK, KEYCOLS specifies
— a pseudo-key for update and delete operation efficiency.
MAP demodb.table_25, TARGET pdb_east.tpcc.table_25, KEYCOLS (id);
MAP demodb.*, TARGET pdb_east.demodb.*;

Execute Initial Load

With everything configured, we are ready to execute the Initial Load.

  1. Start the Change Data Capture Extract.
    • If it has been more than 1 hour since Extract creation, you should alter the Extract to “begin now”. This will prevent unnecessary data from being captured and sent to the target Oracle GoldenGate Hub.
    • Verify the Change Data Capture is running and capturing data by viewing the Statistics pane in the WebUI.
  2. Check the MySQL database for long running transactions.
    • Transactions opened prior to the start time of the Change Data Capture Extract must either commit, or be rolled back, before starting the Initial Load Extracts. Failure to do this will result in missed data and an out-of-sync target database.
  3. Start the Initial Load Extracts.
  4. Start the Distribution Paths.
  5. Wait for the Initial Load Extracts to stop.
  6. Stop the Distribution Path associated with the Change Data Extract.
    • This will prevent further change data from being shipped downstream and provides a point for stopping the Change Data Apply Replicat for removal of the HANDLECOLLISIONS parameter.
  7. Start the Initial Load Replicats.
  8. Wait for the Initial Load Replicats to complete.
    • The Initial Load Replicat wills not stop when they have completed applying the Initial Load Data. There are a couple of ways to determine if instantiation is complete.
    • Check Replicat Checkpoint vs. Trail File Size
      • On the GoldenGate Hub Server, list the trail files used for initial load and compare the last trail sequence number and size with Replicat’s checkpoint sequence number and RBA (from the Web UI).
    • Compare Extract and Replicat Statistics
      • Review the Initial Load Extract report to see how many rows were fetched per table. Then, check Replicat statistics to verify the same row counts. Matching numbers indicate that the initial load completed successfully.
      • If the record counts reported by Initial Load Extract and Replicat match, instantiation is complete, you may stop the Initial Load Replicat.
  9. Start the Change Data Apply Replicat.
  10. Wait for the Change Data Apply Replicat to complete applying all queued data.
    • When the Checkpoint Lag reported in the WebUI shows “0”, view the Replicat Checkpoint information. After waiting 2-3 minutes and refreshing the page view, if the checkpoint has not increased all queued data has been applied.
  11. Stop the Change Data Apply Replicat.
  12. Remove the parameter “HANDLECOLLISIONS”.
  13. Start the Change Data Apply Replicat.
  14. Start the Distribution Path associated with the Change Data Extract.
  15. Verify the Change Data Apply Replicat is applying data to the target Oracle tables by viewing the WebUI Statistics pane.
  16. Initial Load and setup of ongoing Change Data Apply synchronization is now complete.

Post Instantiation Activities

There are several activities that should be performed post instantiation:

  1. Validate the target data
    • The target data should be validated for accuracy.
    • We encourage the use Oracle GoldenGate Veridata for this task. Oracle GoldenGate Veridata may be used for heterogeneous compare and repair with minimum impact to the database and application users as table are not locked during compare operations. Data comparisons can be performed against source and target databases with, or without, real-time data replication.
  2. Delete the Initial Load Replicats.
  3. Stop and delete the Initial Load Distribution Paths.
  4. Delete the Initial Load Extracts.
  5. Create a Trail Purge Task in the Administration Service WebUI for each Oracle GoldenGate Instance to ensure disk space is freed when Oracle GoldenGate Trails are consumed.

Summary

In this article we presented detailed steps to perform Oracle GoldenGate Initial Load of an Oracle target database from a source MySQL database, using multiple Oracle GoldenGate Initial Load processes, and enabled real-time replication without taking an application outage. To learn more about Oracle GoldenGate functionality, you may refer to: