DATA REPLICATION TO AWS KINESIS DATA STREAM USING ORACLE GOLDENGATE

Contributed by: Shrinidhi Kulkarni, Staff Solutions Engineer, Oracle

Use case: Replication of data trails present on AWS AMI Linux instance into Kinesis Data Stream (AWS Cloud) using Oracle GoldenGate for Big Data.

Architecture:

GoldenGate For Big Data: Oracle GoldenGate 12.3.2.1
AWS EC2 Instance: AMI Linux
Amazon Kinesis

Highlights:

How to configure GoldenGate for Big Data(12.3.2.1)
How to configure GoldenGate Big Data Target handlers
How to create AWS Kinesis Data Stream

Connecting To Your Linux Instance from Windows Using PUTTY

Please refer to the following link & the instructions in it that explain how to connect to your instance using PUTTY.
And also on how to Transfer files to your instance using WinSCP. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/putty.html

Download the GoldenGate for Big Data Binaries, Java (JDK or JRE) version 1.8 & Amazon Kinesis Java SDK

Download and install GoldenGate for Big Data 12.3.2.1.1, Here is the link: http://www.oracle.com/technetwork/middleware/goldengate/downloads/index.html

The Oracle GoldenGate for Big Data is certified for Java 1.8. Before installing and running Oracle GoldenGate 12.3.2.1.1, you must install Java (JDK or JRE) version 1.8 or later. Either the Java Runtime Environment (JRE) or the full Java Development Kit (which includes the JRE) may be used.

The Oracle GoldenGate Kinesis Streams Handler uses the AWS Kinesis Java SDK to push data to Amazon Kinesis. The Kinesis Steams Handler was designed and tested with the latest AWS Kinesis Java SDK version 1.11.429 and for creating streams/ shards.

https://docs.oracle.com/goldengate/bd123110/gg-bd/GADBD/using-kinesis-handler.htm#GADBD-GUID-4E4E0A52-9AE7-4E9F-84BA-32B18B63C4B0

Create a Kinesis data stream(not included under Free-tier)on your AWS Instance, Follow the link for reference-

https://docs.aws.amazon.com/streams/latest/dev/learning-kinesis-module-one-create-stream.html

It is strongly recommended that you do not use the AWS account root user or ec2-user for your everyday tasks, even the administrative ones. You need to create a new user with access key & secret_key for AWS, use the following link as reference to do the same :

https://docs.aws.amazon.com/general/latest/gr/managing-aws-access-keys.html

Attach the following policies to the newly created user to allow access and GET/Put Operations on Kinesis data stream:

AWSLambdaKinesisExecutionRole-Predefined Policy in AWS

You need to attach the following inline policy as json:

"Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "kinesis:*", "Resource": [ "arn:aws:kinesis:<your-aws-region>:<aws-account-id>:stream/<kinesis-stream-name>" ] }

Unzip the GoldenGate for big data (12.3.2.1) zip file :
After you Unzip the Downloaded GoldenGate for Big Data Binary, the directory structure looks like this:
Now extract the GoldenGate 12.3.2.1.1 .tar file using “tar -xvf” command.
After the “tar –xvf” operation finishes, the following Big-Data target handlers are extracted:
You can have a look on the directory structure( files extracted) and then go to “AdapterExamples” directory to make sure kinesis streams handler is extracted:

The Kinesis_Streams directory under big-data contains Kinesis Replicat parameter file(kinesis.prm) and kinesis properties file (kinesis.props).
Before you log into GoldenGate instance using GGSCI, set the JAVA_HOME & LD_LIBRARY_PATH to the JAVA 1.8 directory otherwise it would show up an error as following:
Export the JAVA_HOME & LD_LIBRARY_PATH as shown below:

export JAVA_HOME=<path-to-your-Java-1.8>/jre1.8.0_181

export LD_LIBRARY_PATH=<path-to-your-Java-1.8>/lib/amd64/server:$JAVA_HOME/lib

Once you’re done, log into GoldenGate Instance using ./ggsci command and issue create subdir command to create the GoldenGate specific directories:
Configure the Manager parameter file and add an open PORT to it:

Example: edit param mgr

PORT 1080

Traverse back to GoldenGate Directory, execute ./ggsci and Add replicat in the GoldenGate instance using the following command:

add replicat kinesis, exttrail AdapterExamples/trail/tr

[NOTE: A demo trail is already present at the location: AdapterExamples/trail/tr]

Copy the parameter file of the replicat (mentioned above) to ./dirprm directory of the Goldengate Instance.
Copy the properties file (kinesis.props) to dirprm folder after making the desired changes.

Replicat Param File & kinesis properties file:

REPLICAT kinesis -- Trail file for this example is located in "AdapterExamples/trail" directory

-- Command to add REPLICAT

-- add replicat kinesis, exttrail AdapterExamples/trail/tr

TARGETDB LIBFILE libggjava.so SET property=dirprm/kinesis.props

REPORTCOUNT EVERY 1 MINUTES, RATE

GROUPTRANSOPS 1

MAP QASOURCE.*, TARGET QASOURCE.*;

Kinesis Properties File(kinesis.props):

gg.handlerlist=kinesis

gg.handler.kinesis.type=kinesis_streams

gg.handler.kinesis.mode=op

gg.handler.kinesis.format=json

gg.handler.kinesis.region=<your-aws-region>

#The following resolves the Kinesis stream name as the short table name

gg.handler.kinesis.streamMappingTemplate=<Kinesis-stream-name>

#The following resolves the Kinesis partition key as the concatenated primary keys

gg.handler.kinesis.partitionMappingTemplate=QASOURCE

#QASOURCE is the schema name used in the sample trail file

gg.handler.kinesis.deferFlushAtTxCommit=true

gg.handler.kinesis.deferFlushOpCount=1000

gg.handler.kinesis.formatPerOp=true

#gg.handler.kinesis.proxyServer=www-proxy-hqdc.us.oracle.com

#gg.handler.kinesis.proxyPort=80

goldengate.userexit.writers=javawriter

javawriter.stats.display=TRUE

javawriter.stats.full=TRUE

gg.log=log4j

gg.log.level=DEBUG

gg.report.time=30sec

gg.classpath=<path-to-your-aws-java-sdk>/aws-java-sdk-1.11.429/lib/*:<path-to-your-aws-java-sdk>/aws-java-sdk-1.11.429/third-party/lib/*

##Configured with access id and secret key configured elsewhere

javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar

##Configured with access id and secret key configured here

javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar -Daws.accessKeyId=<access-key-of-new-created-user> -Daws.secretKey=<secret-ke-new-created-user>

Make sure you edit the classpath, accessKeyId & Secret Key (of newly-created-user) correctly.
After making all the necessary changes you can start the kinesis replicat, which would replicate the trail data to kinesis Data stream.
Crosscheck for kinesis replicat’s status, RBA and stats.
Once you get the stats, you can view the kinesis.log from. /dirrpt directory which gives information about data sent to kinesis data stream and operations performed.

You can also monitor the data that has been pushed into Kinesis data stream through AWS CloudWatch. Amazon Kinesis Data Streams and Amazon CloudWatch are integrated so that you can collect, view, and analyze CloudWatch metrics for your Kinesis data streams. For example, to track shard usage, you can monitor the following metrics:
- IncomingRecords: The number of records successfully put to the Kinesis stream over the specified time period.
- IncomingBytes: The number of bytes successfully put to the Kinesis stream over the specified time period.
- PutRecord.Bytes: The number of bytes put to the Kinesis stream using thePutRecord operation over the specified time period.

DATA REPLICATION TO AWS KINESIS DATA STREAM USING ORACLE GOLDENGATE

Thomas Vengal

Director, Product Management

GoldenGate for Distributed Applications and Analytics 21.13

Real-Time Data Replication to Snowflake on AWS using GoldenGate for Big Data

DATA REPLICATION TO AWS KINESIS DATA STREAM USING ORACLE GOLDENGATE

Authors

Thomas Vengal

Director, Product Management

GoldenGate for Distributed Applications and Analytics 21.13

Real-Time Data Replication to Snowflake on AWS using GoldenGate for Big Data