X

Welcome to All Things Data Integration: Announcements, Insights, Best Practices, Tips & Tricks, and Trend Related...

GoldenGate Data Replication to Azure BLOB Storage and Azure Data Lake Gen2

Thomas Vengal
Director, Product Management

Content contributed by Mr. Shrinidhi Kulkarni, Staff Solution Engineer, Solution Engineering NA Technology, Oracle

Use case: Real-time replication of transaction data from an on-premises database to Azure BLOB Storage and Azure Data lake Gen2 using GoldenGate & GoldenGate for Big Data.

Architecture:

Following are the configuration steps:

1. Spin-up a Linux virtual machine on Azure Cloud.

2. Download the GoldenGate for Big Data 19.1 Binaries and transfer it to the Azure VM.

https://www.oracle.com/middleware/technologies/goldengate-downloads.html

 

3. Oracle GoldenGate for Big Data requires that you install Oracle Java 8 Java Runtime Environment (JRE) at a minimum. So, Download Java 1.8.202

https://www.oracle.com/java/technologies/javase/javase8-archive-downloads.html

Download the file "Linux x64.tar.gz"

4. Download Hadoop 3.2.0 binaries from http://hadoop.apache.org/releases.html.

https://archive.apache.org/dist/hadoop/common/

 

5. Extract/Unzip GoldenGate for Big data,Java1.8 and Hadoop 3.2.0 into  new separate directories.

 

6. Traverse to the GoldenGate for Big Data directory and login into the ggsci prompt

Edit the parameter file of manager and specify the manager PORT, Save and close it

Edit params mgr

PORT 7809

Start mgr

 

7. Export
JAVA_HOME=<Path_To_Java1.8_Directory>/jdk1.8.0_202

export HADOOP_OPTIONAL_TOOLS="hadoop-azure"

Now, we need to start with the prerequisites on Azure end.

 

8. From the Azure portal select "Storage Accounts" from the commands on the left to create storage accounts. And click on ADD https://docs.oracle.com/en/middleware/goldengate/big-data/19.1/gadbd/connecting-microsoft-azure-data-lake-2.html#GUID-048E5893-DCBA-4EFE-B9AF-1C78F7ECE0C0

 

9. If you want the Target to be Azure Data Lake, select General-purpose v2 as your Account kind and ensure the Hierarchical Namespace option is set to Enabled. And if your target is Azure Blob Storage or Snowflake (that currently supports blob storage only) So, we need to choose Account kind as Blob Storage. Click on Add Storage accounts option & Ensure all the details are filled in properly. Then click on Create. You will notice that if you have chosen Account type as BlobStorage then, within Storage explorer it will show your containers as BLOB Container otherwise it shows up as just Containers for Azure data Lake gen2.

 

 

10. From the Azure portal select All services from the list of commands on the left, Type app into the filter command box and select App registrations from the filtered list of services. Create an App registration of type Web app/API

 

11. Generate a Key for the App Registration. The generated key string is your client secret and is only available at the time the key is created. Therefore, ensure you document the generated key string.

12. Select the Access Control (IAM) command to bring up the Access Control (IAM) panel. Select the Role Assignments tab and add a roll assignment for the created App Registration. Select Add role assignment. Select the Contributor role. And then Select the principal you created earlier.

 

13. Edit the {hadoop install dir}/etc/hadoop/core-site.xml file and add the following configuration:

<configuration>

<property>

  <name>fs.azure.account.auth.type</name>

  <value>OAuth</value>

</property>

<property>

  <name>fs.azure.account.oauth.provider.type</name>

  <value>org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider</value>

</property>

<property>

  <name>fs.azure.account.oauth2.client.endpoint</name>

  <value>https://login.microsoftonline.com/{insert the Azure instance id here}/oauth2/token</value>

</property>

<property>

  <name>fs.azure.account.oauth2.client.id</name>

  <value>{insert your client id here}</value>

</property>

<property>

  <name>fs.azure.account.oauth2.client.secret</name>

  <value>{insert your client secret here}</value>

</property>

<property>

  <name>fs.defaultFS</name>

  <value>abfss://{insert your file system name here example-oracle}@{insert your storage account name here that you created}.dfs.core.windows.net</value>

</property>

<property>

  <name>fs.azure.createRemoteFileSystemDuringInitialization</name>

  <value>true</value>

</property>

</configuration>

 

Edit the fs.defaultFS value to abfs instead of abfss

To obtain your Azure instance id go to the Microsoft Azure portal. Select Azure Active Directory from the list on the left to view the Azure Active Directory panel. Select Properties in the Azure Active Directory panel to view the Azure Active Directory properties. The Azure instance Id is the field marked as Directory ID.

 

To obtain your Azure Client Id and Client Secret go to the Microsoft Azure portal. Select All Services from the list on the left to view the Azure Services Listing. Type App into the filter command box and select App Registrations from the listed services. Select the App Registration that you have created to access Azure Storage. The Application Id displayed for the App Registration is the Client Id. The Client Secret is the generated key string when a new key is added. This generated key string is available only once when the key is created. If you do not know the generated key string, create another key making sure you capture the generated key string.

 

14. Go Back to the GoldenGate directory, Traverse to AdapterExamplesbig-data-HDFS

And copy both rhdfs.prm and hdfs.props file to GoldenGate dirprm directory.

 

15. Open the HDFS props file and change the gg.classpath to the correct Hadoop installation directory path.

And change the parameter gg.handler.hdfs.format = sequencefile

 

16. Open the rhdfs.prm file , and change the MAP/TARGET schema name according to the schema name at the source database.

 

17. Add the replicat using the following command-

add replicat rhdfs, exttrail <Trail_file_path_from_Source>

 

18. For testing, you can use the sample trail file that comes with the Golden Gate binaries-

add replicat rhdfs, exttrail AdapterExamples/trail/tr

 

19. Enter into GoldenGate instance and start the replicat-

./ggsci

Start rhdfs


20. Go to Storage Explorer on Azure portal to check if the files are replicated to Azure Blob Storage. Traverse through the Storage Account you created, then to the Container under it(A container organizes a set of blobs, like a directory in a file system)

 

21. You can see files created with name as "schemaname_tablename_replicatname_timestamp.txt".

 

 

Product Documentation Reference:

 

     

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.