One of the key features of Parallel Replicat (PR) is the ability to split large transactions into multiple pieces and apply them in parallel while maintaining transaction consistency. Because Parallel Replicat can deal with large transactions, it is not only useful for source databases with OLTP workloads, it can also be used for Data Warehouses and Batch-oriented source databases.
As an example: The common use case of Data Loads in your source system is not a challenge for GoldenGate. In some high-Performance environments, throughput rates of 1 million changes per second have been observed. This is five times higher than Integrated Replicat!
Parallel Replicat applying OLTP-like transaction:
Parallel Replicat applying a large transaction:
The Mapper processes of Parallel Replicat read the Trail File records from the Trail Files and send change records to the Main process which calculates transaction dependencies and schedules the transactions for the Appliers. In general, each Applier applies a full transaction to the target database. In case of a large transaction that is split into multiple pieces, the Appliers exercise those virtual transactions. As you have the control of the parallelism of the Appliers, you can dramatically increase the throughput as each active Applier operates in parallel.
Not only does Parallel Replicat split those transactions; it also guarantees global data consistency by calculating the dependencies between the split transactions applied across multiple Appliers.
The parameter SPLIT_TRANS_RECS is used to split large transactions. There is no default value for this parameter, meaning that by default no transactions are split. First of all, you use the parameter to split large transaction into meaningful pieces. Secondly, those pieces are applied by the individual Appliers. That means that this large transaction is processed much faster as multiple Appliers are exercising the database operations in parallel, if there is no dependency within the transaction. You may consider increasing the parallelism of the Appliers, if most of them are actively processing and you do not hit any resource boundaries from the host machine.
Assume, you have a transaction at the source database that updates a column for the full table:
SQL> UPDATE oe.orders SET status = UPPER(SUBSTR(status,1,1));
2,706,042 rows updated.
SQL> COMMIT;
commit completed.
If you set SPLIT_TRANS_RECS to 100k and you use a parallelism of 10, PR split the transactions into 28 pieces and each applier will apply one 100k piece by another, until the last Applier finishes with the rest.
REPLICAT reps
USERIDALIAS ggsouth DOMAIN OracleGoldenGate
MIN_APPLY_PARALLELISM 4 -- Default
MAX_APPLY_PARALLELISM 10
SPLIT_TRANS_RECS 100000
MAP oe.*, TARGET oe.*;
The status shows the efficiency of the parallelism
ADMINCLIENT> SEND REPLICAT reps, STATUS
Current status: Processing data
Map Parallelism: 2
Min Apply Parallelism: 10
Current Apply Parallelism: 10
Max Apply Parallelism: 10
Active Appliers: 9
The statistics also show the efficiency of SPLIT_TRANS_RECS
ADMINCLIENT> STATS REPLICAT reps
*** Latest statistics since ***
Total transactions 48.00
Split transactions 1.00
Average splits per transaction 28.00
Average size (rows) 2706042.00
Even there was only one transaction at the source database, you see that PR managed 1 transaction being split into 28 pieces which are processed in parallel.
SPLIT_TRANS_RECS should be used with care as the customer should be aware that all large transactions (in terms of changes) will be split into pieces and the pieces can be applied in different order.
So, if the customer is running a time sensitive report on the target table while a large transaction was being applied, they may see parts of large transaction. Depending on the dependencies between the split transactions and with other transactions, they may see the last piece as committed but the first piece is still yet to applied (due to dependencies).
Parallel Replicat is also able to manage large transactions, which are alerted in the report file:
INFO OGG-06080 Large transaction processing has started.
XID: 383699985.3.28.5537, Seqno: 9, RBA: 10621286.
Size of first chunk: 1,000,000,776.
[...]
INFO OGG-06081 Large transaction has completed.
Total size: 1,216,688,958.
The CHUNK_SIZE controls how large a transaction must be for parallel Replicat to consider it as large.
A transaction greater than CHUNK_SIZE (unit is in bytes, default 1GB) will serialize and this transaction will be applied as a barrier transaction. A Barrier transaction is processed in the following steps:
This is analogue to EAGER_SIZE for Integrated Replicat.
Here are two examples. The 1st example shows a transaction in a moderate size while the 2nd example shows a large transaction.
Assume SPLIT_TRANS_RECS is set to 100k and CHUNK_SIZE is no set, meaning the default at 1GB is in use.
A transaction contains 200k changes, and the size is 250MB.
Parallel Replicat manages the transaction in 2 pieces: (a) changes 1-100,000 and (b) changes 100,001-200,000 when reading from trail. After dependency computation, the transactions and pieces are scheduled for the Appliers. If there are no dependencies between these two split transactions, then they can be applied in parallel. So, it is possible for split (b) to be applied and commited before split (a). If, however there is a dependency between these two pieces, the piece (a) must be applied before piece (b) and they may be committed as two separate transactions.
A transaction contains 500k changes, and the size is 50GB (because there is heavily LOB processing in those changes).
Parallel Replicat, while reading this transaction, will treat this transaction in 5 pieces (each of size 100k changes).
Assume that the size for those pieces is 10GB each. Then Parallel Replicat will still consider each of these pieces as a large transaction and will treat it as a barrier. As a barrier, no other transaction will be applied.
Note that SPLIT_TRANS_RECS has a positive impact as the large transaction is split into smaller pieces. Thus, hitting the CHUNK_SIZE limit is less likely. Increasing the CHUNK_SIZE might blow up memory usage by Parallel Replicat. It is best to leave the value to its default (1GB). However, setting SPLIT_TRANS_RECS to a reasonable value, helps managing large transaction as pieces can be applied in parallel.
Volker is a Senior Principal Product Manager working in the GoldenGate Development group.
His primary focus is on the GoldenGate Core Product, mainly GoldenGate for Oracle. Key topics are Performance, High Availability, Security, and Resilience.
Volker has worked for more than 20 years in the field of database technology and data replication.
He has supported customers worldwide in different industries to develop & manage distributed database systems/applications and build Data Integration Solutions.