Welcome Everyone….
In this blog we would setup Goldengate to replicate data in parquet format files into Google Cloud Storage.
We are using Goldengate for Bigdata 21.9 installed on a VM Server for this blog.
You can use Goldengate for bigdata Marketplace in OCI or Install Goldengate for Bigdata on a server and set this up.
Use Case:
In this setup , we would need to replicate data from any source supported by Goldengate and generate parquet files in Google Cloud Storage. These files then can be consumed by end users for example by databrics.
In this setup we already have the trail files generated by source db which is an Oracle database. So we would be using these existing trail file and only configure the replicat.
You can refer to the below oracle document for more information.
Using the Google Cloud Storage Event Handler
Steps to follow for the setup:
Before setting up the GG replication we need library files for replicat to generate the parquet files and load them to Google Cloud Storage.
- GCS client libraries
- Hadoop libraries (Only needed if you need to generate files in parquet format)
You can download these by using dependency downloader scripts provided in Goldengate software.
You can find those script in $GG_Home/opt/DependencyDownloader directory use the relevant script and the latest version of the library needed.
For Example:
For GCS Client Libraries à ./gcs.sh 1.113.9
For Hadoop libraries à ./hadoop.sh 3.3.6
Once you run the above command you would have the libraries downloaded in $GG_Home/opt/DependencyDownloader/dependencies directory.
Once you have the libraries ready you would also need a GCP service account as Oracle GoldenGate for BigData uses a service account key for accessing GCS service.
I have created and Service account with relevant permission and downloaded the JSON key type Service account key file and save it in a directory on the Goldengate Bigdata Server.
Once you are ready with above two requirements, we can start creating a replicat.
# Properties file for Replicat RGCS
#File Writer Handler Template
gg.handlerlist=filewriter
gg.handler.filewriter.type=filewriter
gg.handler.filewriter.mode=op
gg.handler.filewriter.pathMappingTemplate=./dirout
gg.handler.filewriter.stateFileDirectory=./dirsta
gg.handler.filewriter.fileNameMappingTemplate=${fullyQualifiedTableName}_${currentTimestamp}.avro
gg.handler.filewriter.finalizeAction=delete
gg.handler.filewriter.fileRollInterval=0
gg.handler.filewriter.inactivityRollInterval=0
gg.handler.filewriter.format=avro_row_ocf
gg.handler.filewriter.format.metaColumnsTemplate=${optype},${position}
gg.handler.filewriter.fileWriteActiveSuffix=.tmp
gg.handler.filewriter.partitionByTable=true
gg.handler.filewriter.rollOnShutdown=true
#Selecting the GCS Event Handler
gg.handler.filewriter.eventHandler=parquet
#The Parquet Event Handler
gg.eventhandler.parquet.type=parquet
gg.eventhandler.parquet.pathMappingTemplate=./dirparquet
gg.eventhandler.parquet.writeToHDFS=false
gg.eventhandler.parquet.finalizeAction=delete
#Selecting the S3 Event Handler
gg.eventhandler.parquet.eventHandler=gcs
gg.eventhandler.parquet.fileNameMappingTemplate=${tableName}_${currentTimestamp}.parquet
#The GCS Event handler
gg.eventhandler.gcs.type=gcs
gg.eventhandler.gcs.pathMappingTemplate=${fullyQualifiedTableName}
#TODO: Edit the GCS bucket name
gg.eventhandler.gcs.bucketMappingTemplate=ggbucket14
#TODO: Edit the GCS credentialsFile
gg.eventhandler.gcs.credentialsFile=/u01/downloads/GoogleServiceAccount.json
gg.eventhandler.gcs.finalizeAction=none
#TODO: Edit to include the GCS Java SDK .
gg.classpath=/u01/ogg/ggbd21_home/opt/DependencyDownloader/dependencies/gcs_1.113.9/*:/u01/ogg/ggbd21_home/opt/DependencyDownloader/dependencies/parquet_1.13.1/*:/u01/ogg/ggbd21_home/opt/DependencyDownloader/dependencies/hadoop_3.3.6/*
jvm.bootoptions=-Xmx512m -Xms32m
Conclusion
In this blog , we learned how we can generate parquet files in Google Cloud Storage using Goldengate for Bigdata 21c.
