OCI GoldenGate is a fully managed, native cloud service that moves data in real-time, at scale. OCI GoldenGate processes data as it moves from one or more data management systems to target systems. You can also design, run, orchestrate, and monitor data replication tasks without having to allocate or manage any compute environments.

OCI Object Storage is an internet-scale, high-performance storage platform that offers reliable and cost-efficient data durability. The Object Storage service can store an unlimited amount of unstructured data of any content type, including analytic data and rich content, like images and videos.

You can use OCI GoldenGate for real-time data ingestion into OCI Object Storage in the format of your choice. OCI GoldenGate can publish JSON, Parquet, DelimitedText, Avro_OCF and XML file formats into OCI Object Storage. You can reference to OCI GoldenGate Real-Time Ingestion for Oracle Cloud Data Lakehouse blog for other formats. In this blog, we will focus on parquet format. 

Creating an OCI GoldenGate replication process is simple: create an OCI Object Storage Connection, secondly create the replication.

To create an OCI GoldenGate Connection for OCI Object Storage, click Create connection button. Provide a Name for the connection and selec the Type as OCI Object Storage. Click Next.

OCI GoldenGate

Select Tenancy. You can use current tenancy or Specify another tenancy by providing the tenancy ocid. Select Region from drop-down list. Select User. You can use current user or specify another user by providing used ocid. Upload Private Key and paste Public key fingerprint. Private Key and Public key fingerprint can be obtained from user settings, resources, api keys.

OCI GoldenGate

Once connection is created, Assign Connection to OCI GoldenGate Big Data deployment. When created, connections and deployments are detached from each other. Assigning the connection to a deployment enables replicate to access connection details to log into the target connection. You can assign a connection in connection details, resources, assigned deployments, assign deployments.

OCI GoldenGate

 

Replicating Parquet files into OCI Object Storage:

OCI GoldenGate uses a 3-step process for OCI Object Storage Parquet format replication:

  1. Generating local files from trail files (these local files are not accessible in OCI GoldenGate)
  2. Converting local files to Parquet format
  3. Loading files into OCI Objest Storage bucket

We need to update the Replicat Properties for Parquet replication into OCI Object Storage and we’ll see how in this part. 

To start creating a Replicat in OCI GoldenGate console, click Add Replicat.

OCI GoldenGate

Select the Replicat Type. There are two different Replicat types available: Classic and Coordinated. Classic Replicat is a single threaded process whereas Coordinated Replicat is a multithreaded one that applies transactions in parallel. Coordinated Replicat will result in multiple files being created as there is a multithreaded process running. 

OCI GoldenGate

 

Next, provide Replicat OptionTrail Name is the name of the trail file that is used for replication. Also, you need to select Target as OCI Object Storage. Once selected, Available Aliases will be listed, and you will need to choose an alias. Once an alias is selected, the Replicat will be using the Credential details from the Connection that is selected.

OCI GoldenGate

 

In the Parameter File, you can either specify source to target mapping or leave it as-is with a wildcard selection. If Coordinated Replicat is selected as the Replicat Type, an additional parameter needs to be provided: TARGETDB LIBFILE libggjava.so SET property=/u02/Deployment/etc/conf/ogg/your_replicat_name.properties

OCI GoldenGate

Last step is the Properties File configuration. In this screen, you will notice that some of the properties are pre-configured and populated. These are the basic and bare minimum properties needed for starting a replication into OCI Object Storage. For loading parquet files, we need to make changes in the properties file. 

You can follow the below steps for customizing the properties:

  1. As the first step, we need to remove the following property: gg.target=oci
  2. Leave gg.eventhandler.oci.connectionId as is. Any changes in this property may lead into replication fail.
  3. Provide gg.eventhandler.oci.compartmentID. This is the OCID of the compartment where your OCI Object Storage Buckets exist/ will exist. To find Compartment OCID, refer to documentation.
  4. Provide gg.eventhandler.oci.bucketMappingTemplate. This is the name of the OCI Object Storage Bucket that will be used. If the bucket already exists, OCI GoldenGate will load the files into that bucket. If the bucket does not exist, it will automatically be created by OCI GoldenGate depending on the configuration. For OCI Object Storage Bucket names, there are two possible options: using a static name or using a dynamic formatting which can be configured with Template Keywords.
  5. Update gg.classpath property as followed: gg.classpath=$THIRD_PARTY_DIR/oci/*:$THIRD_PARTY_DIR/hadoop/*:$THIRD_PARTY_DIR/parquet/*
  6. Add the below set of properties:

#The File Writer Handler
gg.handlerlist=filewriter
gg.handler.filewriter.type=filewriter
gg.handler.filewriter.mode=op
gg.handler.filewriter.pathMappingTemplate=./dirout
gg.handler.filewriter.stateFileDirectory=./dirsta
gg.handler.filewriter.fileRollInterval=7m
gg.handler.filewriter.inactivityRollInterval=30s
gg.handler.filewriter.fileWriteActiveSuffix=.tmp
gg.handler.filewriter.finalizeAction=delete
 
### Avro OCF
gg.handler.filewriter.format=avro_row_ocf
gg.handler.filewriter.fileNameMappingTemplate=${groupName}_${fullyQualifiedTableName}_${currentTimestamp}.avro
gg.handler.filewriter.format.pkUpdateHandling=delete-insert
gg.handler.filewriter.format.metaColumnsTemplate=${optype},${position}
gg.handler.filewriter.format.iso8601Format=false
gg.handler.filewriter.partitionByTable=true
gg.handler.filewriter.rollOnShutdown=true
 
##The Parquet Event Handler
gg.handler.filewriter.eventHandler=parquet
gg.eventhandler.parquet.type=parquet
gg.eventhandler.parquet.pathMappingTemplate=./dirparquet
gg.eventhandler.parquet.fileNameMappingTemplate=${groupName}_${fullyQualifiedTableName}_${currentTimestamp}.parquet
gg.eventhandler.parquet.writeToHDFS=false
gg.eventhandler.parquet.finalizeAction=delete
#Selecting the OCI  Event Handler
gg.eventhandler.parquet.eventHandler=oci

gg.eventhandler.oci.type=oci

gg.eventhandler.oci.pathMappingTemplate=/ogg/${fullyQualifiedTableName}

Your properties file should look like as below:

OCI GoldenGate

Once ready, click Create & Run. When replicat starts successfully, you’ll see it checked with a green mark in a running state.

OCI GoldenGate

Conclusion:

OCI GoldenGate is an easy to use, flexible and cost-effective solution for real-time data lakehouse ingestions. You can experience it yourself with OCI GoldenGate Quickstarts.