When troubleshooting Extract performance issues in Oracle GoldenGate, one recurring theme I often encounter is related to extract recovery—especially in environments with long-running uncommitted transactions.

Oracle GoldenGate’s Extract process buffers uncommitted transactions in memory and doesn’t write them to trail files until those transactions are committed. This in-memory storage, however, is not persistent—which means if the Extract process restarts (planned or unplanned), all uncommitted transactions in memory are lost.

To recover from this, Extract must rebuild its memory state by reading from the redo or archive logs—a process known as Extract recovery. Depending on the starting SCN (System Change Number) and the volume of uncommitted transactions, this phase can sometimes take a significant amount of time. During this period, although the Extract process appears to be running, it won’t resume writing data to trail files until recovery is fully complete. This can cause confusion, as the process seems active but no new data shows up in the trails. In such cases, users can verify that Extract is in recovery by monitoring the current SCN, which will continue to advance even though trail files remain unchanged.

Extract in Recovery
Extract in Recovery

Standard Recovery

Standard recovery is the foundational mechanism GoldenGate Extract uses to recover uncommitted transactions after a restart, particularly when bounded recovery files are not available or are invalid.

Extract maintains several internal checkpoints:

  • Startup Checkpoint

  • Recovery Checkpoint

  • Current Checkpoint

  • Write Checkpoint

The recovery checkpoint tracks the SCN of the oldest open transaction in the source database. Upon restart, if no valid bounded recovery files exist, Extract uses standard recovery by reading from the recovery checkpoint up to the current checkpoint in the redo/archive logs to rebuild its in-memory state. Only after this does Extract resume writing to the trail files.

Bounded Recovery

Bounded recovery is a performance optimization built on top of standard recovery, designed to reduce restart times by persisting long-running uncommitted transactions to disk. It’s governed by the BOUNDED RECOVERY INTERVAL parameter, which defaults to 4 hours.

Here’s how it works:

  • At every bounded recovery checkpoint (e.g., every 4 hours), GoldenGate identifies transactions open longer than the configured interval.

  • These transactions are written to bounded recovery files.

  • If a transaction is newly opened, it won’t be included until a subsequent checkpoint if it remains open long enough.

  • Upon restart, Extract checks for valid bounded recovery files to rebuild memory faster.

  • If bounded recovery files are missing or invalid (due to major binary upgrade, bounded recovery file permissions issue, etc.), Extract gracefully falls back to standard recovery.

You can lower the interval (as low as 20 minutes), but doing so increases I/O. For most setups, the default setting is a practical balance.

🛠 Best Practices

  • Bounded recovery is a performance enhancement, not a replacement for standard recovery.

  • Files may become invalid post major-upgrade—be prepared for fallback.

  • Always ensure archive logs are retained from the SCN of the oldest uncommitted transaction.

  • Never delete archive logs prematurely—it could lead to abends or data loss.

Standard and Bounded Recovery
Standard and Bounded Recovery

 

Summary

Extract recovery plays a critical role in Oracle GoldenGate performance during restarts. GoldenGate employs standard recovery by default, rebuilding its in-memory state by scanning archive logs from the SCN of the oldest uncommitted transaction. To reduce downtime and improve recovery speed, GoldenGate also offers bounded recovery—an optimization that persists long-running uncommitted transactions to disk and minimizes redo log scans on restart.

Understanding how each recovery method works, how to tune the bounded recovery interval, and how to prepare for scenarios where recovery files may become invalid is essential to maintaining efficient, resilient replication pipelines.

Other Blog posts by the Author       

https://blogs.oracle.com/dataintegration/post/oracle-goldengate-parallel-replicat-performance-tuning