Scaling GoldenGate SOURCEISTABLE Initial Load Extracts

Oracle GoldenGate initial loads are a foundational step in implementing Oracle GoldenGate, and this blog will review different methods of improving the performance of a specific type of initial load called a SOURCEISTABLE Extract. This type of extract reads directly from the table instead of from the transaction logs (TRANLOG Extract). These initial load extracts are used as part of the initial synchronization / instantiation process for heterogenous workloads or to resynchronize / add a new table to replication.

Before discussing ways to scale the SOURCEISTABLE Extract, I want to clear up a common misconception about the length of time that it takes to perform the initial load being important (when it is really not) which can lead teams down the wrong optimization path. During a SOURCEISTABLE initial load, the source database remains fully open for both read and write operations. This means that the initial load does not block application activity. Instead, the real concern should be the overhead the initial load places on the source system—CPU, I/O, memory, and overall database performance.

With that in mind, improving or scaling the “performance” of a SOURCEISTABLE extract is not just about finishing faster—it’s about controlling and distributing workload efficiently.

Rethinking Performance: Speed vs. Impact

Before diving into techniques it’s important to understand how a SOURCEISTABLE extract works. When the extract is started, it will have a single connection to the database that goes through each TABLE statement one-by-one, performing a select on the table and writing all that information to a trail file. This method is the least performant method, but it minimizes impact on the production database and is the most common method used. However, the goal of this blog is not to promote using the method of least impact, but to help scale the SOURCEISTABLE extract performance and make it run as fast as possible.

A baseline truth:

Faster initial loads = higher resource consumption
Slower, distributed loads = lower impact on the source database

Keep in mind, that while you can use these methods to make the initial load finish faster, it’s not always the best method, and is it very important to strike the right balance between completion time and impact to the production database.

There are multiple methods that can be used to help scale the SOURCEISTABLE initial load extracts. Most of these methods can be combined to further scale the performance of the phase.

General Performance Tips for SOURCEISTABLE initial loads

These are some general tips to ensure higher speed SOURCEISTABLE initial loads:

Disable foreign key constraints on the target database. While I don’t cover it in this blog, the optimal Replicat to use with trail files generated by SOURCEISTABLE extracts is Coordinated Replicat with the THREADRANGE setting for your MAP parameters. This allows the replicat to use multiple threads to insert the data into your target tables. Due to the way this works, you should disable any foreign key constraints on the target tables.
Remove all indexes except for the primary key index. Create the indexes after the initial load is completed.
Ensure the target database is following performance best practices, especially for write intensive workloads.

Splitting Loads with multiple SOURCEISTABLE Extracts

Due to the single threaded nature of initial load Extracts (each table listed is processed one at a time), a single Extract may not achieve the necessary performance and you might need to use multiple threads. This requires configuring multiple Extracts. A very easy option is to create multiple initial load Extracts and each one would process a subset of tables. However, sometimes even that isn’t going to provide the performance you need. If that’s the case, there are other options:

Splitting Loads with SQLPREDICATE

If you have such large tables that putting them in an Extract by themselves isn’t fast enough, you can take a single table and split it into multiple extracts through the use of SQLPREDICATE. This is also a good way to avoid snapshot too old errors. This allows you to parallelize data extraction by dividing a single table into logical segments. SQLPREDICATE adds the information between the double quotes onto the SELECT * FROM query that the Extract process performs. This reduces the result set coming back from the database and improves performance.

Example: Splitting a Table into Three Extracts

Suppose you have a large table ORDERS. You can split the initial load into three separate extracts:

Extract 1

TABLE ORDERS, SQLPREDICATE "WHERE ORDER_ID < 1000000";

Extract 2

TABLE ORDERS, SQLPREDICATE "WHERE ORDER_ID >= 1000000 AND ORDER_ID < 2000000";

Extract 3

TABLE ORDERS, SQLPREDICATE "WHERE ORDER_ID >= 2000000";

Each extract runs independently, allowing parallel reads and improved throughput.

Key Considerations

Ensure predicates are mutually exclusive and collectively exhaustive
Choose a column with good distribution and that is indexed (e.g., primary key)
Avoid skewed splits that create uneven workloads
NOTE: While using the @RANGE function in Extract can also be used to split the data, this is not recommended as the query to pull the data will still query every row, and then Extract is used to filter it.

While it is very easy to use generative AI to create hundreds of SOURCEISTABLE extracts, with each pulling a subset of data from the tables, you are still likely to cause I/O issues. To alleviate that, we can use the next method.

2. Addressing I/O Constraints with Materialized Views

As you increase the number of parallel extracts using SQLPREDICATE, you will eventually hit I/O bottlenecks on the source system.

At this point, adding more threads will not improve performance—in fact, it may degrade it.

Solution: Use Materialized Views on Separate Storage

To overcome I/O limitations:

Create materialized views that segment the data similarly to your SQLPREDICATE logic
Place each materialized view on separate storage devices / volumes / LUNs
Run SOURCEISTABLE extracts against these materialized views instead of or in addition to the base table

Benefits

Distributes I/O across multiple storage paths
Reduces contention on the base table
Improves scalability beyond what SQLPREDICATE alone can achieve

This approach is especially useful in environments where storage architecture can be leveraged for parallelism. If these methods still don’t provide the scalability required, you can run the initial load from multiple databases.

3. Running Initial Loads from Static Backups

Another effective strategy to reduce impact on the source database is to perform initial loads from one or more backups that have been opened for read/write activity. Something like an Oracle Snapshot Standby.

Instead of reading from a live, actively changing database:

Take a consistent backup (e.g., RMAN, snapshot, or storage-level clone)
Run SOURCEISTABLE extracts against the backup copy

Advantages

Eliminates load on the production database
Allows more aggressive parallelism without affecting live workloads
Reduces risk during peak business hours

This approach is particularly valuable for very large datasets or highly sensitive production environments.

4. Merging Back into TRANLOG Extract Processing

Splitting initial loads introduces an important operational challenge:

How do you transition each table back into normal transactional replication (TRANLOG extract) once its initial load is complete?

This step must be handled carefully to ensure data consistency and avoid gaps or duplication.

Recommended Approach

Use the methodology outlined in KB195939.

At a high level:

Track completion of each split initial load segment
Coordinate the point at which the table is fully loaded
Enable or align TRANLOG extract processing accordingly

Key Considerations

Ensure no data divergence between initial load and ongoing transactions
Maintain proper checkpoint coordination
Validate data consistency after cutover

This step is critical when using multiple parallel extracts, as each segment may complete at different times.

Final Thoughts

Improving performance in GoldenGate SOURCEISTABLE initial load extracts is not simply about making the process faster—it’s about managing resource consumption intelligently.

By:

Splitting the workloads across multiple SOURCEISTABLE Extracts
Splitting workloads even further with SQLPREDICATE
Scaling beyond I/O limits using materialized views
Offloading work to backups

…you can build an initial load strategy that is both efficient and minimally disruptive to your production systems.

Ultimately, the best approach depends on your environment, data size, and tolerance for system impact—but the principles remain the same: parallelize wisely, monitor constraints, and prioritize production customers over raw speed.

Scaling GoldenGate SOURCEISTABLE Initial Load Extracts

Rethinking Performance: Speed vs. Impact

General Performance Tips for SOURCEISTABLE initial loads

Splitting Loads with multiple SOURCEISTABLE Extracts

Splitting Loads with SQLPREDICATE

Example: Splitting a Table into Three Extracts

Key Considerations

2. Addressing I/O Constraints with Materialized Views

3. Running Initial Loads from Static Backups

4. Merging Back into TRANLOG Extract Processing

Final Thoughts

Nick Wagner

Senior Director, Product Management, Oracle

AIDP Private Workspace connectivity to Oracle Databases

OCI Data Flow integration with DDN Infinia

Scaling GoldenGate SOURCEISTABLE Initial Load Extracts

Authors

Nick Wagner

Senior Director, Product Management, Oracle

AIDP Private Workspace connectivity to Oracle Databases

OCI Data Flow integration with DDN Infinia