Taking another step to reinforce Oracle’s open data strategy, the Oracle GoldenGate team is thrilled to announce the general availability of Oracle GoldenGate for Distributed Applications and Analytics (DAA) 23ai, which introduces new connections, enhancements, and fixes, as well as a focus on Apache Iceberg integration for real-time data analytics and AI use cases. 

GoldenGate captures real-time operational data and continuously replicates data for analytics and AI into the highly popular Apache Iceberg table formats used in data lakes, data lakehouses, data warehouses, and analytic engines.  This is in stark contrast to batch, ETL strategies that are unable to move source data in real-time.

Apache ICEBERG Support:

Apache Iceberg is an open-source table format designed for large-scale analytics on data lakes. Iceberg optimizes data storage, management, and query-access in distributed storage systems like Oracle Cloud Infrastructure Object Storage, Azure Storage, Apache Hadoop, Amazon S3, and Google Cloud Storage. Apache Iceberg brings the reliability and simplicity of SQL tables to GoldenGate, while making it possible for engines like Autonomous AI Lakehouse, Spark, Trino, Flink, Presto, Hive, and Impala to work safely with the same tables, simultaneously.

An important innovation is that this new support is possible with or without an Iceberg compatible SQL engine.  A new engineless Apache Iceberg replicat writes changed data continuously into Iceberg v2 table formats within object storage buckets available across all cloud providers. Additional support includes writing to Iceberg tables managed within Databricks, Snowflake and Google Cloud BigQuery.

 

GoldenGate Iceberg

GG for DAA Iceberg and Oracle Autonomous AI Lakehouse:

Oracle provides a comprehensive foundation for open lakehouse architectures, with native support for Apache Iceberg and Delta table formats and the flexibility to operate across multicloud environments. 

Oracle Autonomous AI Lakehouse combines Oracle Autonomous AI Database with Apache Iceberg to break down analytic silos, delivering an open, interoperable data platform that accelerates AI solution delivery on OCI, AWS, Microsoft Azure, and Google Cloud. Oracle Autonomous AI Lakehouse includes a native Iceberg catalog for OCI-hosted tables and interoperates with external catalogs—such as Databricks Unity Catalog, AWS Glue, and Snowflake Polaris—to access Iceberg tables across clouds without data movement.

The Iceberg Tables: A New Data Source for Oracle Autonomous AI Database blog describes the details about Oracle Autonomous AI Lakehouse Apache Iceberg configuration.  In this Oracle GoldenGate Integration with Apache Iceberg for AI-Powered Analytics Platforms demonstration, you can see an end-to-end ingestion to Apache Iceberg and how you can consume Apache Iceberg tables with Autonomous AI Lakehouse.

Why Oracle GoldenGate is critical for open lakehouse workloads:

Oracle GoldenGate delivers real-time integration at scale using log-based change data capture (CDC) to minimize source-system overhead. It provides exactly-once delivery semantics, handles schema evolution, and performs idempotent writes to Apache Iceberg targets—ensuring consistent, reliable data for AI and machine learning.

With broad technology coverage, Oracle GoldenGate continuously streams data into Oracle Autonomous AI Lakehouse from relational databases, document and NoSQL stores, streaming platforms, vector sources, and SaaS applications. This real-time pipeline makes live information immediately actionable for AI model training and inference.

GoldenGate also provides the enterprise controls teams expect: built-in monitoring and lag metrics, integration with OCI Logging and Monitoring for operational SLAs, and robust security with role-based access control and encryption in transit and at rest.

Oracle GoldenGate’s “Engineless” Apache Iceberg Ingestion:

The engineless Apache Iceberg Replicat supports a rich set of Apache Iceberg catalogs that provide data consumers access to various file formats stored in different storage services.

Oracle GoldenGate Iceberg Support

The engineless optimization capability is more efficient due to in-memory operation aggregation, automatic Iceberg table creation, and bulk Iceberg table inserts.

For delete operations, “equality deletes” provide efficient delete operations by allowing records to be deleted without looking up the position of the rows in the Iceberg data file. The write.update.mode configuration is always set to merge-on-read, enabling fast and efficient handling of updates and deletes without rewriting large data files.

Snowflake Iceberg Replication:

Apache Iceberg tables for Snowflake combine the performance and query semantics of typical Snowflake tables with external cloud storage that you manage. They are ideal for existing data lakes that you cannot, or choose not to, store in Snowflake.

Using Snowflake Stage and Merge Replicat, GoldenGate can automatically creates Snowflake Iceberg tables in an external cloud storage service. Currently, Polaris is only external Iceberg catalog that is supported by Snowflake for synchronization. A single Snowflake Stage and Merge replicat process can replicate to both Snowflake native and Iceberg tables.

GCP BigQuery Iceberg Replication:

BigQuery tables for Apache Iceberg, provide the foundation for building open-format lakehouses on Google Cloud. Iceberg tables offer the same fully managed experience as BigQuery tables, but store data in customer-owned storage buckets using Parquet to be interoperable with Iceberg open table formats.

Using GCP BigQuery Stage and Merge Replicat, GoldenGate can automatically create Iceberg tables with external Google Cloud Storage buckets that you manage.

Databricks Iceberg Replication:

The Databricks Unity Catalog Uniform is an enterprise-wide data governance framework that provides consistent access controls and security policies across all data assets. GoldenGate creates target tables in the Delta Lake Universal Format, which any other Iceberg client can read. 

Other GoldenGate for Distributed Applications and Analytics 23ai Improvements:

GoldenGate for Distributed Applications and Analytics 26ai latest bundle patch includes many improvements and fixes. Refer to the release notes for all the latest information.

Find additional information here: