OCI GoldenGate is a fully managed, native cloud service that moves data in real-time, at scale. OCI GoldenGate processes data as it moves from one or more data management systems to target systems. You can also design, run, orchestrate, and monitor data replication tasks without having to allocate or manage any compute environments.

Amazon Simple Storage Service (Amazon S3) is an object storage service which is used for various uses cases such as data lakes, cloud-native applications, and mobile apps. 

You can use OCI GoldenGate for real-time data ingestion into Amazon S3 in the format of your choice. OCI GoldenGate can publish the following file formats into Amazon S3:

  • JSON
  • Parquet
  • DelimitedText
  • Avro_OCF
  • XML 

In this blog, I will focus on how to ingest real-time data in different formats into Amazon S3 with OCI GoldenGate.

Apart from Parquet format, OCI GoldenGate uses a 2-step process for Amazon S3 replication. For Parquet, it uses a 3-step process as you’ll see later in this blog. 

  1. Generating local files from trail files (these local files are not accessible in OCI GoldenGate)
  2. Loading files into Amazon S3

There are different pieces working together in this process; but, these different pieces are automated and defaulted to make the process simpler. As you can see below, different properties can be configured for customizing the replication. 

As users, we need to create an OCI GoldenGate Connection for Amazon S3 and we need to create the Replicat

OCI GoldenGate Amazon S3 Connection: 

OCI GoldenGate Amazon S3 connection uses IAM access keys for accesing the S3 buckets. You need to create access keys before configuring the connection. 

To create an OCI GoldenGate Connection for Amazon S3, click Create connection button. Provide a Name for the connection and select the Amazon S3. Click Next.

OCI GoldenGate

Provide your Access Key ID and Secret Access Key and click Create

OCI GoldenGate

Once connection is created, assign connection to OCI GoldenGate Big Data deployment. When created, connections and deployments are detached from each other. Assigning the connection to a deployment enables replicate to access connection details to connect to the target. You can assign a connection in connection details, resources, assigned deployments, assign deployments.

OCI GoldenGate

 

Creating the replicat:

To start creating a Replicat in OCI GoldenGate console, click Add Replicat.

OCI GoldenGate

Select the Replicat Type. There are two different Replicat types available: Classic and Coordinated. Classic Replicat is a single threaded process whereas Coordinated Replicat is a multithreaded one that applies transactions in parallel. Coordinated Replicat will result in multiple files being created as there is a multithreaded process running.

Next, provide Replicat OptionsTrail Name is the name of the trail file that is used for replication. Also, you need to select Target as Amazon S3. Once selected, Available Aliases will be listed, and you will need to choose an alias. Once an alias is selected, the Replicat will be using the credential details from the connection that is selected.

OCI GoldenGate

In the Parameter File, you can either specify source to target mapping or leave it as-is with a wildcard selection. If Coordinated Replicat is selected as the Replicat Type, an additional parameter needs to be provided: TARGETDB LIBFILE libggjava.so SET property=/u02/Deployment/etc/conf/ogg/your_replicat_name.properties

OCI GoldenGate

Last step is the Properties File configuration. In this screen, you will notice that some of the properties are pre-configured and populated. These are the basic and bare minimum properties needed for starting a replication into Amazon S3. 

Please provide your target Amazon S3 region in gg.eventhandler.s3.region and your target Amazon S3 bucket name in gg.eventhandler.s3.bucketMappingTemplate property. If the target bucket already exists, OCI GoldenGate will load the files into that bucket. If the S3 bucket does not exist and if your access has the appropriate permissions, OCI GoldenGate will automatically create the bucket. For Amazon S3 bucket names, there are two possible options: using a static name or using a dynamic formatting which can be configured with Template Keywords. You can use any of the template keywords in bucket mapping template. For example, if you use ${tableName}, OCI GoldenGate will create the buckets with the table names coming from the source database. 

If you hit Create & Run, OCI GoldenGate will load the files in the default file format, which is JSON, into Amazon S3.

OCI GoldenGate

By default, OCI GoldenGate is configured to use the JSON format. However, OCI GoldenGate supports DelimitedText, Avro_OP, Avro_OCF and XML file formats as well. To change the file format, you need to apply gg.eventhandler.s3.format property. For example: gg.eventhandler.s3.format=delimitedtext

OCI GoldenGate

Depending on the selected format, you can also apply Pluggable Formatters. The pluggable formatters are used to convert operations from the Oracle GoldenGate trail file into formatted messages that you can send to Amazon S3. 

There are different settings available for different format types. For more details on specifics of pluggable formatters, you can check these blogs: JSONDelimitedTextAvro OCF. Please note that these blogs are written for different connectors; but you can apply the same properties in Amazon S3 replication as well.

For example, in the DelimitedText blog you see an example of adding column delimiters using gg.handler.filewriter.format.fieldDelimiter=CDATA[;]. You can apply a similar property for Amazon S3 replication with DelimitedText format: gg.eventhandler.abs.format.fieldDelimiter=CDATA[;].

You can also apply metacolumn functionality while replicating into Amazon S3. Metacolumns allow you to select the metadata fields that you want
to see in the generated output messages.

Replicating Parquet files into Amazon S3:

OCI GoldenGate uses a 3-step process for Amazon S3 Parquet replication:

  1. Generating local files from trail files (these local files are not accessible in OCI GoldenGate)
  2. Converting local files to Parquet format
  3. Loading files into Amazon S3

We need to update the Replicat Properties for Parquet replication into Amazon S3 and we’ll see how in this part. 

Adding the replicat, selecting Replicat Type, updating Replicat Options and Parameter File steps are the same as above. If you follow the above steps, you’ll see the Properties File as below.

OCI GoldenGate

You can follow the below steps for customizing the properties:

  1. As the first step, we need to remove the below property:
    • gg.target=s3
  2. Leave gg.eventhandler.s3.connectionId as is. Any changes in this property may lead into replication fail.
  3. Provide your AWS region in gg.eventhandler.s3.region
  4. Provide bucket name in gg.eventhandler.s3.bucketMappingTemplate
  5. Update gg.classpath property as below:
    • gg.classpath=$THIRD_PARTY_DIR/s3/*:$THIRD_PARTY_DIR/hadoop/*:$THIRD_PARTY_DIR/parquet/*
  6. Add the below set of properties:

#The File Writer Handler
gg.handlerlist=filewriter
gg.handler.filewriter.type=filewriter
gg.handler.filewriter.mode=op
gg.handler.filewriter.pathMappingTemplate=./dirout
gg.handler.filewriter.stateFileDirectory=./dirsta
gg.handler.filewriter.fileRollInterval=7m
gg.handler.filewriter.inactivityRollInterval=30s
gg.handler.filewriter.fileWriteActiveSuffix=.tmp
gg.handler.filewriter.finalizeAction=delete

### Avro OCF
gg.handler.filewriter.format=avro_row_ocf
gg.handler.filewriter.fileNameMappingTemplate=${groupName}_${fullyQualifiedTableName}_${currentTimestamp}.avro
gg.handler.filewriter.format.pkUpdateHandling=delete-insert
gg.handler.filewriter.format.metaColumnsTemplate=${optype},${position}
gg.handler.filewriter.format.iso8601Format=false
gg.handler.filewriter.partitionByTable=true
gg.handler.filewriter.rollOnShutdown=true

#The Parquet Event Handler
gg.handler.filewriter.eventHandler=parquet
gg.eventhandler.parquet.type=parquet
gg.eventhandler.parquet.pathMappingTemplate=./dirparquet
gg.eventhandler.parquet.fileNameMappingTemplate=${groupName}_${fullyQualifiedTableName}_${currentTimestamp}.parquet
gg.eventhandler.parquet.writeToHDFS=false
gg.eventhandler.parquet.finalizeAction=delete
#Selecting the S3  Event Handler
gg.eventhandler.parquet.eventHandler=s3
gg.eventhandler.s3.type=s3
gg.eventhandler.s3.pathMappingTemplate=${fullyQualifiedTableName}

 

Your properties file should look like as below:

OCI GoldenGate

Once ready, click Create & Run. When replicat starts successfully, you’ll see it checked with a green mark in a running state.

You can customise parquet file replication into Amazon S3 by changing the above property settings. You can refer to following documents/ blogs for more details: File Writer HandlerAvro OCFParquet Event Handler

Conclusion:

OCI GoldenGate is an easy to use, flexible and cost-effective solution for real-time Amazon S3. You can experience it yourself with OCI GoldenGate Quickstarts.