OCI GoldenGate Real-Time Ingestion for Oracle Cloud Data Lakehouse

October 25, 2022 | 7 minute read
Deniz Sendil
Senior Principal Product Manager
Text Size 100%:

A data lakehouse is a modern, open architecture that enables you to store, understand, and analyze all your data. It combines the power and richness of data warehouses with the breadth and flexibility of the most popular open source data lake technologies.

As you can see in Nino Bice's blog post, a data lakehouse has 3 main layers: ingestion layer, data persistance/ management layer and analysis layer. In data lakehouse reference architecture, different services are defined for the persistence layer. Some of these services are Oracle Autonomous Data Warehouse, MySQL Heatwave, or OCI Object Storage

In this blog, I will focus on OCI (Oracle Cloud Infrastructure) Object Storage and how to ingest real-time data in different formats into OCI Object Storage with OCI GoldenGate. I will also deep dive into the configuration details.

OCI Object Storage is an internet-scale, high-performance storage platform that offers reliable and cost-efficient data durability. The Object Storage service can store an unlimited amount of unstructured data of any content type, including analytic data and rich content, like images and videos.

OCI Object Storage is widely used by many Oracle customers for storing raw data in desired file formats and plays a key role in modern data lakehouse architectures.

Oracle Cloud Infrastructure GoldenGate is a fully managed, native cloud service that moves data in real-time, at scale. OCI GoldenGate processes data as it moves from one or more data management systems to target databases. You can also design, run, orchestrate, and monitor data replication tasks without having to allocate or manage any compute environments. With its real-time data replication capability, OCI GoldenGate plays a key role in ingesting real-time data into OCI Object Storage. 

OCI GoldenGate makes real-time data ingest into OCI Object Storage faster, better, and cheaper.

It is faster; with OCI Goldengate it is much easier to start a deployment, create connections and configure replications. Most of the processes are simplified and automated in OCI GoldenGate.

It is the most powerful; OCI Goldengate is built on Oracle GodlenGate 21c and is 100% cloud native.  With out of the box features cloud native capabilities like autoscaling, enhanced maintenance, automatic backups, monitoring, etc,  allowing for more time to focus on your data.

It is cheaper; OCI GoldenGate pricing is based on actual usage and deployment sizes are sized to match the  data replicated.

Creating an OCI GoldenGate replication process is simple: create an OCI Object Storage Connection, secondly create a the replication.  In this blog, the focus is on the replication (Replicat) creation. After you configure the connection, you can configure the Replicat in OCI GoldenGate console. 

To create an OCI GoldenGate Connection for OCI Object Storage, click Create connection button. Provide a Name for the connection and selec the Type as OCI Object Storage. Click Next.

OCI GoldenGate

Select Tenancy. You can Use current tenancy or Specify another tenancy by providing the tenancy ocid. Select Region from drop-down list. Select User. You can Use current user or Specify another user by providing used ocid. Upload Private Key and paste Public key fingerprint. Private Key and Public key fingerprint can be obtained from user settings, resources, api keys.

OCI GoldenGate

Once connection is created, Assign Connection to OCI GoldenGate Big Data deployment. When created, connections and deployments are detached from each other. Assigning the connection to a deployment enables replicate to access connection details to log into the target connection. You can assign a connection in connection details, resources, assigned deployments, assign deployments.

OCI GoldenGate

To start creating a Replicat in OCI GoldenGate console, click Add Replicat.

OCI GoldenGate

Select the Replicat Type. There are two different Replicat types available: Classic and Coordinated. Classic Replicat is a single threaded process whereas Coordinated Replicat is a multithreaded one that applies transactions in parallel. Coordinated Replicat will result in multiple files being created as there is a multithreaded process running. 

OCI GoldenGate

Next, provide Replicat Option. Trail Name is the name of the trail file that is used for replication. Also, you need to select Target as OCI Object Storage. Once selected, Available Aliases will be listed, and you will need to choose an alias. Once an alias is selected, the Replicat will be using the Credential details from the Connection that is selected.

OCI GoldenGate

In the Parameter File, you can either specify source to target mapping or leave it as-is with a wildcard selection. If Coordinated Replicat is selected as the Replicat Type, an additional parameter needs to be provided: TARGETDB LIBFILE libggjava.so SET property=/u02/Deployment/etc/conf/ogg/your_replicat_name.properties

OCI GoldenGate

Last step is the Properties File configuration. In this screen, you will notice that some of the properties are pre-configured and populated. 

  • gg.target=oci is the handler selection and you must not modify it. 
  • gg.handler.oci.connectionId is the OCID of the connection that you selected in “Replicat Options” part. With this OCID, OCI GoldenGate knows which credentials it should use.
  • gg.eventhandler.oci.compartmentID is the OCID of the compartment where your OCI Object Storage Buckets exist/ will exist. To find Compartment OCID, refer to documentation.
  • gg.eventhandler.oci.bucketMappingTemplate is the name of the OCI Object Storage Bucket that will be used. If the bucket already exists, OCI GoldenGate will load the files into that bucket. If the bucket does not exist, it will automatically be created by OCI GoldenGate depending on the configuration. For OCI Object Storage Bucket names, there are two possible options: using a static name or using a dynamic formatting which can be configured with Template Keywords.

These are the basic and bare minimum properties needed for starting a replication into OCI Object Storage. If you hit Create & Run, OCI GoldenGate will load the files in the default file format, which is JSON, into OCI Object Storage.

OCI GoldenGate

By default, OCI GoldenGate is configured to use the JSON format. However, OCI GoldenGate supports DelimitedText, Avro_OP, Avro_OCF and XML file formats as well. To change the file format, you need to apply gg.handler.oci.format property. For example: gg.handler.oci.format=delimitedtext 

OCI GoldenGate

Depending on the selected format, you can also apply Pluggable Formatters. The pluggable formatters are used to convert operations from the Oracle GoldenGate trail file into formatted messages that you can send to OCI Object Storage. 

There are different settings available for different format types. For more details on specifics of pluggable formatters, you can check these blogs: JSON, DelimitedText, Avro OCF. Please note that these blogs are written for different connectors; but you can apply the properties in OCI Object Storage replication as well. 

For example, in the DelimitedText blog you see an example of adding column delimiters using gg.handler.filewriter.format.fieldDelimiter=CDATA[;]. You can apply a similar approach for OCI Object Storage replication with DelimitedText format: use gg.handler.oci.format.fieldDelimiter=CDATA[;].

You can also apply metacolumn functionality while replicating into OCI Object Storage. Metacolumns allow you to select the metadata fields that you want
to see in the generated output messages. 

Conclusion:

OCI GoldenGate is an easy to use, flexible and cost effective solution for real-time data lakehouse ingestions. You can experience it yourself with Oracle LiveLabs!
 

For More Details:

OCI GoldenGate
OCI GoldenGate Quick Starts
Data Lakehouse Reference Architectures
 

 

Deniz Sendil

Senior Principal Product Manager

Deniz Sendil is a Senior Principle Product Manager in Oracle GoldenGate product group with a focus on GoldenGate for Big Data. Deniz has over 17 years of experience in data management, analytics and data integration domains. 


Previous Post

GoldenGate 23c and an invitation to the Beta program

Alex Lima | 4 min read

Next Post


Oracle GoldenGate Community Forums