In this post we will show how to install and configure Oracle GoldenGate for Big Data in order to be able to receive trail files from a GoldenGate for Oracle classic architecture. Once the trail files will arrive on the machine hosting Oracle GoldenGate for Big Data we will show how to configure BigQuery handler and how to create a replication process that writes the transactions in real time from an Oracle Database to a Google BigQuery platform. There is yet another API for streaming ingestion, i.e BigQuery Job REST API, but for now we will test the BigQuery Streaming API. In the second part of this blog we will also show the “Stage and Merge” functionality that uses the aforemoentioned API.

Below you can find the architecture of the solution we are going to demonstrate.

The bill of material you need to execute this architecture is the following:

One Oracle Database used as source. We recommend using a 19c+ version of an Oracle DB.
Two machines (VMs or Compute hosts), one for each GoldenGate installation ( classic and microservices)
One machine to install the source database ( optional, you can install the database on the same server with the source GoldenGate distribution)
One distribution of GoldenGate for Oracle classic architecture
One distribution of GoldenGate for BigData Microservices architecture
Oracle Database installation software (19c+)
One Google Cloud account with a BigQuery Instance active

To download the required distributions of oracle software use: https://edelivery.oracle.com

The activities required in order to setup this solution are as follows:

Source setup. Installation of Oracle Database used a s a source and the instantiation of the replicated schema ( this part is not covered by this guide but there are a lot of resources that show the creation and the population of an Oracle database).
Target setup. Provisioning and instantiation of Google Big Query platform
Replication software setup:
- Installation and configuration of Oracle GoldenGate core (for Oracle) – not covered in this guide. The installation guide can be found here:

https://docs.oracle.com/en/middleware/goldengate/core/21.3/installing/installing-oracle-goldengate-oracle-database.html

- Installation and configuration of Oracle GoldenGate for Big Data (+ BigQuery handler configuration)
- Configuration of extraction process
- Configuration of replication process
- Configuration of the pump process (optional)
Database instantiation. Initial load operation needed for the synchronization of all the transactions that exists in an Oracle Database before GoldenGate is installed.
CDC synchronization setup
- Extract creation
- Pump creation
- Replicat creation
- Schema creation (in GCP). For testing purposes we will only replicate one table but the replication can be extended to an entire schema.
Smoke test

Provisioning and instantiation of Google Big Query platform

First step is to create a Google cloud account or use your existing account to log in.

After login choose use the resource search bar to locate the required service (Big Query platform).

After the Big Query platform is open choose the already existing project or create a new one. I have create a project named “GGDemo”.

Open the project and create a new dataset in it. This dataset is the equivalent of an Oracle schema and will contain all the replicated tables.

Create the structures you need for the replication. In this case I have replicated the data coming from one table named “Orders”. The query can be found in the Appendices.

Installation of Oracle GoldenGate for Big Data

Download the binaries for Oracle GoldenGate for Bug Data the latest release from edelivery.oracle.com and transfer them to the machine you have prepared to host the product.

Login to the machine and unzip the files in the directory you have prepared, in this case I have pre-created the “/u01/Install”:

[opc@goldengate-ma Disk1]$ unzip V1018358-01.zip -d /u01/install

Navigate to /u01/install/ggs_Linux_x64_BigData_services_shiphome/Disk1 and locate the .runInstaller program.

Execute the silent install script:

[opc@goldengate-ma Disk1]$ ./runInstaller -silent -showProgress -waitforcompletion -responseFile /home/opc/oggcore_ggbd21.3_ma.rsp

The response file I have used can be found in the Appendices section

After the execution a success message should be displayed:

Starting Oracle Universal Installer…

[…]

Finish Setup successful.

The installation of Oracle GoldenGate Services for BigData was successful.

Please check ‘/u01/app/oraInventory/logs/silentInstall2022-01-17_10-18-59AM.log’ for more details.

As a root user, execute the following script(s):

/u01/app/oraInventory/orainstRoot.sh

The installation is now completed.

Configuration of Oracle GoldenGate for Big Data ( Microservices)

After the installation of the software, in microservices architecture we need to create the infrastructure of microservices that are responsible with the replication process.

The following will be created:

Service Manager on port 7500. You can use any port you want.
Administration service on port 7501
Distribution service on port 7502
Performance metrics service on port 7503
Receiver service on port 7504

Note : all the port used have to be opened in the Security List for incoming traffic if you are using an OCI VCN infrastructure for the networking part. Also the firewall at the OS level has to allow TCP?IP connection on these ports.

Due to the fact that the data is coming from a classic data pump process we will not enable the SSL for this deployment.

Before creating the deployment create some standard locations :

For service manager login to the station and execute from shell:

[opc@goldengate-ma Disk1]$ mkdir -p /u01/app/oracle/gg_deployments/svmbd_ma

For the services under the service manager

[opc@goldengate-ma Disk1]$ mkdir –p /u01/app/oracle/gg_deployments/oggbd_ma

To create the deployment with all the aforementioned services enabled create a response file like the one attached in the Appendices and execute this script:

Navigate to /u01/app/oracle/product/21.4/ggbd_ma/bin

[opc@goldengate-ma bin]$ cd /u01/app/oracle/product/21.4/ggbd_ma/bin

Using the response file execute the script below. The response file can be found in the Appendices:

[opc@goldengate-ma bin]$ ./oggca.sh -silent –responseFile /home/opc/oggcore_ggbd21.3_ma_dep01.rsp

If the execution has completed with no errors the following message should be received:

Successfully Setup Software.

In order to open the ports in the firewall of the station where the deployment has been instantiated use sudo command from your user if possible, otherwise run it without sudo from the root user itself:

[opc@goldengate-ma bin]$ sudo firewall-cmd –zone=public –add-port=7501-7505/tcp –permanent

[opc@goldengate-ma bin]$ sudo firewall-cmd –reload

If the hosts are in cloud open the ports also from the security lists of your virtual network.
Once created the service manager can be started and stopped using the followings scripts:

./startSM.sh

./ stopSM.sh

Scripts are located in the service manager home directory defined in the response file, under the bin subdirectory: /u01/app/oracle/gg_deployments/svmbd_ma/bin

Reverse proxy configuration

Reverse Proxy allows a single point of contact for various microservices associated with an Oracle GoldenGate Microservices Architecture deployment. You can configure a proxy server depending on your environment setup and network requirements.

You can configure Oracle GoldenGate Microservices Architecture to use a reverse proxy. Oracle GoldenGate MA includes a script called ReverseProxySettings that generates configuration file for only the NGINX reverse proxy server.

For example, the Administration Server is available on HTTPS://goldengate.example.com:9001 and the Distribution Server is on HTTPS://goldengate.example.com:9002. With reverse proxy, each of the microservices can simply be accessed from the single address. For example, https://goldengate.example.com/distsrvr for the Distribution Server.

To download and configure the NGINX follow the steps below. All the command are executed in shell script at the OS level of the host holding the GG for Big Data deployment:

Download the software

[opc@goldengate-ma bin]$ sudo yum -y install nginx

Navigate to the location of ./ReverseProxySettings script

[opc@goldengate-ma bin]$ cd /u01/app/oracle/product/21.4/ggbd_ma/lib/utl/reverseproxy

Execute the following script to create a configuration file names ogg.congf. The name is at choice. Do not forget to replace the password with your own.

[opc@goldengate-ma bin]$ ./ReverseProxySettings -u oggadmin -P <your password> ogg.conf https://localhost:7500

Place the configuration file in the NGINX conf.d directory:

[opc@goldengate-ma bin]$ sudo cp ogg.conf /etc/nginx/conf.d/nginx.conf

Generate a certificate for the proxy server

[opc@goldengate-ma bin]$ sudo sh /etc/ssl/certs/make-dummy-cert /etc/nginx/ogg.pem

Start or reload the NGINX server:

[opc@goldengate-ma bin]$ sudo nginx

[opc@goldengate-ma bin]$ sudo nginx –t

nginx: the configuration file /etc/nginx/nginx.conf syntax is ok

nginx: configuration file /etc/nginx/nginx.conf test is successful

[opc@goldengate-ma bin]$ sudo nginx -s reload

If not open already open the port 443 in the firewall

[opc@goldengate-ma bin]$ sudo firewall-cmd –zone=public –add-port=443/tcp –permanent

[opc@goldengate-ma bin]$ sudo firewall-cmd –reload

You should be ready to connect to the newly deployed GoldenGate Microservices platform using the browser:

Error! Hyperlink reference not valid. or IP adress>/?root=account

The following login page should appear:

Enter the user and password you have defined in the response when you have created the deployment and click Sign in.

You should be able to see the home screen. From this screen you can visualize the services and their status and also navigate to each of the service you need.

Dependency configuration for Google Cloud Platform – Big Query handler

Starting with 21.3 Oracle GoldenGate comes with a dependency downloader that aims to simplify the complex task of installing libraries and solving dependencies for various technologies for which a handler will be configures in GoldenGate for Big Data.

There is a very nice blog post from Sydney Nurse on this topic:

https://blogs.oracle.com/dataintegration/post/handling-those-pesky-dependencies-with-the-goldengate-21c-for-bigdata-dependency-downloader

In order to install the libraries for Google Big Query follow the steps:

Navigate to {OGGBD install}/DependencyDownloader

[opc@goldengate-ma ~]$ cd /u01/app/oracle/product/21.4/ggbd_ma/opt/DependencyDownloader

Execute the ./bigquery.sh script followed by the version of the libraries you wish to download

[opc@goldengate-ma DependencyDownloader]$ ./bigquery.sh 1.135.4

If the installation fails check the java location or set the bash_profile as follows:

[opc@goldengate-ma ~]$ cat .bash_profile

# .bash_profile

# Get the aliases and functions

if [ -f ~/.bashrc ]; then

. ~/.bashrc

# User specific environment and startup programs

export JAVA_HOME=/u01/app/oracle/product/21.4/ggbd_ma/jdk

export OGG_HOME=/u01/app/oracle/product/21.4/ggbd_ma

export LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/amd64/server

PATH=$JAVA_HOME/bin:$OGG_HOME:$PATH:$HOME/.local/bin:$HOME/bin

export PATH

After the execution of the script the output should render the result of the installation. An excerpt from the message I have received has been attached for example:

java version “1.8.0_291”

Java is installed.

Apache Maven 3.8.2 (ea98e05a04480131370aa0c110b8c54cf726c06f)

Maven is accessible.

Root Configuration Script

INFO: This is the Maven binary [../../ggjava/maven/bin/mvn].

INFO: This is the location of the settings.xml file [./docs/settings_np.xml].

INFO: This is the location of the toolchains.xml file [./docs/toolchains.xml].

INFO: The dependencies will be written to the following directory[../dependencies/bigquery_1.135.4].

INFO: The Maven coordinates are the following:

INFO: Group ID [com.google.cloud].

INFO: Artifact ID [google-cloud-bigquery].

INFO: Version [1.135.4]

[INFO] Scanning for projects…

[INFO] —————< oracle.goldengate:dependencyDownloader >—————

[INFO] Building dependencyDownloader 1.0

[INFO] ——————————–[ pom ]———————————

Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-clean-plugin/2.5/maven-clean-plugin-2.5.pom

[INFO] Copying gax-1.66.0.jar to /u01/app/oracle/product/21.4/ggbd_ma/opt/DependencyDownloader/dependencies/bigquery_1.135.4/gax-1.66.0.jar

[INFO] Copying google-http-client-gson-1.39.2.jar to /u01/app/oracle/product/21.4/ggbd_ma/opt/DependencyDownloader/dependencies/bigquery_1.135.4/google-http-client-gson-1.39.2.jar

[INFO] ————————————————————————

[INFO] BUILD SUCCESS

[INFO] ————————————————————————

[INFO] Total time: 16.047 s

[INFO] Finished at: 2022-01-24T13:18:33Z

[INFO] ————————————————————————

The gg.classpath parameter that needs to be configured as part of the java adapter for big data repplication has to point to the location where all the dependencies have been downloaded. In this case:

gg.classpath=/u01/app/oracle/product/21.4/ggbd_ma/opt/DependencyDownloader/dependencies/bigquery_1.135.4/*

Replication process configuration ( Audit Log Mode = false)

To configure the replication process we can use the browser based UI, the AdminClient utility or a cli client that has permissions to access the GoldenGate microservices deployment (i.e the admin service).

To create the process using the UI console open the browser and navigate to the address of the GG Service Manager like indicated above. Access the Administration Service by clicking its port number:

Administration Server access

Once the Administration Service login window opens fill in the credentials and login.

Once the service UI opens click the plus sign in order to create a new Replicat:

Choose the classic Replicat or Coordinated replicat and click next:

Fill in the following details:

Process Name: RBCP -> this fiels is mandatory and has to have at most five characters

Description: this is optional and can be left blanc

Source: Trail

Trail Subdirectory: can be left default. It will take the trais from the location of OGG_DATA_HOME specified in the response file.

Begin: You can choose where to start reading the trail from . In this case choose “Position in Trail”

Trail Sequence Number: 0 -> this indicates that you start from the first SCN in the trail

Trail RBA Offset: 0 -> read from the first RBA. Set no offset

Target: Choose Google Big Query from the dropdown list

Optionally you can set also other details related to security and management options. Once you finished setting the parameters click Next.

Fill in the parameter file will all the required parameters. At minimum I have used a one to one mapping between table Orders coming from source schema Car_store and going to target data store DWH in Google Big Query.

REPLICAT repBQ

MAP car_store.orders, TARGET “DWH”.”Orders”;

Click Next to fill in the last page, the Properties file used by the Google Big Query handler.

First, a template for Google Big Query gets displayed. Fill in the details from the properties files attached in the Appendices.

Click Create to create the process. Do not run it yet as we still need to create the extractor and the pump process.

What is outstanding here is the method used for authenticating against Big Query API. The authentication is done using a JSON file we have to create and download from Google Cloud Platform. We will show the process in detail in the next section.

You can configure the BigQuery Handler in two ways:

Audit Log Mode = false

When the handler is configured to run with audit log mode false, the data is pushed into Google BigQuery using a unique row identification key. The Google BigQuery is able to merge different operations for the same row. However, the behavior is complex. The Google BigQuery maintains a finite deduplication period in which it will merge operations for a given row. Therefore, the results can be somewhat non-deterministic.

The trail source needs to have a full image of the records in order to merge correctly.

Example 1:

An insert operation is sent to BigQuery and before the deduplication period expires, an update operation for the same row is sent to BigQuery. The resultant is a single row in BigQuery for the update operation.

Example 2

An insert operation is sent to BigQuery and after the deduplication period expires, an update operation for the same row is sent to BigQuery. The resultant is that both the insert and the update operations show up in BigQuery.

This behavior has confounded many users, as this is the documented behavior when using the BigQuery SDK and a feature as opposed to a defect. The documented length of the deduplication period is at least one minute. However, Oracle testing has shown that the period is significantly longer. Therefore, unless users can guarantee that all operations for a give row occur within a very short period, it is likely there will be multiple entries for a given row in BigQuery. It is therefore just as important for users to configure meta columns with the optype and position so they can determine the latest state for a given row. To read more about audit log mode read the following Google BigQuery documentation:Streaming data into BigQuery.

Setting the Audit log mode to false should render the following behaviour.

As per my tests the behaviour is not totally true for updates. Another row is inserted for each update ( sometimes even two updates are merge if they are issued within the same deduplication period). My conclusion is that setting the audit log mode to false can render false results because of the API streaming interface used by BigQuery.

auditLogMode = false

insert – If the row does not already exist in Google BigQuery, then an insert operation is processed as an insert. If the row already exists in Google BigQuery, then an insert operation is processed as an update. The handler sets the deleted column to false.

update –If a row does not exist in Google BigQuery, then an update operation is processed as an insert. If the row already exists in Google BigQuery, then an update operation is processed as update. The handler sets the deleted column to false.

delete – If the row does not exist in Google BigQuery, then a delete operation has no effect. If the row exists in Google BigQuery, then a delete operation is processed as a delete. The handler sets the deleted column to true.

pkUpdate—When pkUpdateHandling property is configured as delete-insert, the handler sets the deleted column to true for the row whose primary key is updated. It is followed by a separate insert operation with the new primary key and the deleted column set to false for this row.

Audit Log Mode = true

gg.handler.name.auditLogMode=true

When the handler is configured to run with audit log mode true, the data is pushed into Google BigQuery without a unique row identification key. As a result, Google BigQuery is not able to merge different operations on the same row. For example, a source row with an insert operation, two update operations, and then a delete operation would show up in BigQuery as four rows, one for each operation.

Also, the order in which the audit log is displayed in the BigQuery data set is not deterministic.

To overcome these limitations, users should specify optype and postion in the meta columns template for the handler. This adds two columns of the same names in the schema for the table in Google BigQuery. For example: gg.handler.bigquery.metaColumnsTemplate = ${optype}, ${position}

The optype is important to determine the operation type for the row in the audit log.

Configuring Handler Authentication

You have to configure the BigQuery Handler authentication using the credentials in the JSON file downloaded from Google BigQuery. Follow the steps below to get the file from Google BigQuery:

Login into your Google account at cloud.google.com.
Click Console and then go to the Dashboard and select your project.
I have selected my project GGDemo:

Click API & Services and choose Big Query API:
Go to Credentials tab (figured by a small key) and click Create Credentials:

Choose Service Account:
Fill in the Name, Description and grant access to your project.
Once created click Keys and create a key:

Add a JSON Key

Download and store this key securely.

Configuring the extraction

Preparing the source database ( Oracle) for the extraction

If the database is already in ARCHIVELOG mode go to the next step. Otherwise run:

SQL> shutdown immediate

SQL> startup mount

SQL> alter database archivelog;

SQL> alter database open;

SQL> select name, db_unique_name,log_mode from v$database;

Check if the replication is active. In case the result is false, activate it

SQL > select * from v$parameter where name = ‘enable_goldengate_replication’;

SQL > alter system set enable_goldengate_replication = true scope=both;

If you are using integrated extract and integrated replicat size the memory needed for the logmining server (for extract) and inbound server (for replicat). By default, one integrated capture Extract requests the logmining server to run with MAX_SGA_SIZE of 1GB. Thus, if you are running three Extracts in integrated capture mode in the same database instance, you need at least 3 GB of memory allocated to the Streams pool. As a best practice, keep 25 percent of the Streams pool available. For example, if there are 3 Extracts in integrated capture mode, set STREAMS_POOL_SIZE for the database to the following value:

3 GB * 1.25 = 3.75 GB

Configure minimum supplemental logging at the database level:

SQL> ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;

Create the user used by Oracle GoldenGate for extraction

SQL> create user C##OGGADMIN identified by <Your_password> default tablespace users;

Grant the following rights to this user:

SQL> grant create session, resource, alter system to C##OGGADMIN container=ALL;

[optional]SQL> GRANT SELECT ANY TRANSACTION TO C##OGGADMIN CONTAINER=ALL;

[optional]SQL > GRANT SELECT ANY DICTIONARY TO C##OGGADMIN CONTAINER=ALL;

[optional]SQL> GRANT FLASHBACK ANY TABLE TO C##OGGADMIN CONTAINER=ALL;

SQL > exec dbms_goldengate_auth.grant_admin_privilege(‘C##OGGADMIN’,container=>’ALL’);

Create the source table orders. The code can be found under paragraph one in the Appendices.

Creation of extraction process

Connect to the machine where Oracle GoldenGate Core ( for Oracle) resides. Log into the ggsci console and add the credential keys for the user created above in the source database.

GGSCI >> alter credentialstore add user <username>@<database_service> password <your_password> alias oggadmin domain OracleGoldenGate;

Example: alter credentialstore add user oggadmin@srcorcl password Demo12345# alias srcoggadmin domain OracleGoldenGate;

In order for the user to be able to connect you need to write the service details in the tnsnames.ora file from the Oracle client that resides on the GoldenGate machine. In my case I have added this service for example:

SRCORCL =

(DESCRIPTION =

(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.0.6)(PORT = 1521))

(CONNECT_DATA =

(SERVER = DEDICATED)

(SERVICE_NAME = ORCL)

)

GGSCI (ogg19cora) 1> dblogin useridalias srcoggadmin

Successfully logged into database.

GGSCI (ogg19cora as oggadmin@ORCL) 2> add trandata car_store.orders

Create the extract parameter file:

EXTRACT eorcl

USERIDALIAS srcoggadmin

LOGALLSUPCOLS

EXTTRAIL ./dirdat/es

–DDL INCLUDE ALL

TABLE car_store.orders;

GGSCI>> register extract eorcl, database

Add the extract to the manager and associate it with the trail file to write the transactions in

GGSCI>> add extract eorcl, integrated tranlog, begin now

GGSCI>> add exttrail ./dirdat/es, extract eorcl

Creation of the pump process

The pump process is responsible of transferring the trail files over the network, possibly encrypted, to the host where GoldenGate for Big Data will have the replication process reading them and writing the data to BigQuery.

Create the parameter file as follows:

EXTRACT dpgcp

RMTHOST 192.168.0.85, PORT 7503

RMTTRAIL dp

PASSTHRU

table car_store.orders;

The port specified is the port of the receiver service from the GoldenGate for BigData platform. Because the classic pump cannot use TLS/mTLS for the communication the port used is directly the port used by the service itself and not port 443 that we have used to configure the reversed proxy.

Add the pump process:

GGSCI>> add extract dpgcp, exttrailsource ./dirdat/es

Set the remote trail path for GoldenGate for Big data and associate it with the pump process:

GGSCI>> add rmttrail dp, extract dpgcp

Smoke Test

Scenario 1:

For the first test we have configured the audit_log_mode to true. Also two metacolumns have been included in the table definition, op_type and position ( as of position in trail). The configuration parameters used:

gg.handler.bigquery.auditLogMode=true
gg.handler.bigquery.enableAlter=false
gg.handler.bigquery.metaColumnsTemplate=${optype},${position}

We have inserted 17 rows using a script in the source database ( Oracle Database). Then we have issued a delete where order_id=17.

Oracle Database:

SQL> select * from orders order by order_id;

The output in Google Big Query is the following:

We have used also the unique identifier appended by Big Query for every row in our query:

SELECT *,row_number() over (order by order_id) as Entry_Key FROM `ggdemo-338508.DWH.Orders`

Row	optype	position	ORDER_ID	product_sku	quantity	amount	Entry_Key
1	I	00000000000000002075	1	17	8	23	1
2	I	00000000000000002245	2	8	1	12	2
3	I	00000000000000002414	3	7	10	22	3
4	I	00000000000000002581	4	23	4	4	4
5	I	00000000000000002748	5	2	5	17	5
6	I	00000000000000002915	6	20	1	18	6
7	I	00000000000000003082	7	11	2	25	7
8	I	00000000000000003249	8	27	3	28	8
9	I	00000000000000003417	9	8	2	21	9
10	I	00000000000000003583	10	30	2	9	10
11	I	00000000000000003750	11	7	3	27	11
12	I	00000000000000003919	12	6	8	13	12
13	I	00000000000000004085	13	23	9	5	13
14	I	00000000000000004252	14	14	7	28	14
15	I	00000000000000004419	15	20	10	5	15
16	I	00000000000000004586	16	29	9	26	16
17	I	00000000000000004753	17	26	6	16	17
18	D	00000000000000005088	17	26	6	16	18
19	I	00000000000000005088	17	26	6	20	19

The outcome reveals the fact that we have two operations with the same position in trail ( because the delete in the trail is one operation but in Big Query that is transferred to a insert and a delete. This can be problematic because the last version of the row cannot be found out unless we introduce a new metadata filed with the insertion timestamp:

$currenttimestampiso8601 [targettimestamp]

Scenario 2:

For the second test we have used another table in the source database ( Appendices script number 2) and a table for the target side ( Appendices script number 3). We have changed the following parameter :

gg.handler.bigquery.auditLogMode=false
gg.handler.bigquery.enableAlter=true
gg.handler.bigquery.includeDeletedColumn=true

we have eliminated the metacolumns:

gg.handler.bigquery.metaColumnsTemplate=${optype},${position}

We have created another replicat :

REPLICAT repgc
MAP gcp_src.product_hierarchy, TARGET “DWH”.”Product_hierarchy” keycols (product_code);

The difference from the first test is that we have set the audit log mode to false , we have enabled the capability of creating DDL on the target table and also we have enabled “Deleted colum” to be created on the target table for the processing of I/U/D opperations with audit log mode = false.

We have done the following operations on the source:

INSERT INTO product_hierarchy VALUES (1,NULL, 10,’Wood’);

INSERT INTO product_hierarchy VALUES (2,1, 100,’Wood for deck’);

INSERT INTO product_hierarchy VALUES (3,1, 300,’Wooden door’);

INSERT INTO product_hierarchy VALUES (100,NULL,NULL,’Steel’);

INSERT INTO product_hierarchy VALUES (4,100,50,’Steel Door’);

INSERT INTO product_hierarchy VALUES (5,100,500,’Armour coating’);

INSERT INTO product_hierarchy VALUES (6,100,10,’Double door’);

INSERT INTO product_hierarchy VALUES (7,100,10,’Test Update’);

UPDATE product_hierarchy SET product_description =’Real Description’ where product_code=7;

update product_hierarchy set PRODUCT_DESCRIPTION=’Test allcols 3′ where PRODUCT_CODE=6;

commit;

update product_hierarchy set PRODUCT_DESCRIPTION=’Test allcols 4′ where PRODUCT_CODE=6;

commit;

update product_hierarchy set PRODUCT_DESCRIPTION=’Test allcols 5′ where PRODUCT_CODE=6;

ALTER TABLE product_hierarchy add column Notes VARCHAR2(200);

INSERT INTO product_hierarchy VALUES (8,1,80,’Wooden cob’,’Not released’);

update product_hierarchy set Notes=’Released’ where PRODUCT_CODE=8;

commit;

update product_hierarchy set PRODUCT_DESCRIPTION=’Test allcols 4′ where PRODUCT_CODE=6;

commit;

On the source we have the following data set. We have filtered the amount_in_warehouse because we have used also a filter statement in the pump parameter file:

On the GCP target we have executed the following SQL to reveal the status of the table:

SELECT *, row_number() over (order by product_code) as Entry_Key FROM `ggdemo-338508.DWH.Product_hierarchy`

Row	product_code	parent_product_code	amount_in_warehouse	product_description	deleted	NOTES	Entry_Key
1	4	100	50	Steel Door	null	null	1
2	6	100	10	Double door	null	null	2
3	6	100	10	Test allcols	false	null	3
4	6	100	10	Test allcols 2	false	null	4
5	6	100	10	Test allcols 3	false	null	5
6	6	100	10	Test allcols 4	false	null	6
7	6	100	10	Test allcols 5	false	null	7
8	7	100	10	Test Update	null	null	8
9	7	100	10	Test allcols	null	null	9
10	7	100	10	Test allcols BQ	null	null	10
11	7	100	10	Test allcols BQ 2	null	null	11
12	7	100	10	Test allcols BQ 2	true	null	12
13	8	1	80	Wooden cob	false	Not released	13
14	8	1	80	Wooden cob	false	Released	14
14	8	1	80	Wooden cob	false	Version 2	14

We have tested chronologically a couple of inserts (the first product ID was inserted before the extract process has been started so we have not inserted in it the target) with id 4, 6 and 7. Then we have performed some updates on products 6 and 7. At a specific point we have added :

gg.handler.bigquery.enableAlter=true

together with

gg.handler.bigquery.includeDeletedColumn=true

After adding the below parameters a new column has been added “deleted”. For the rows inserted prior to the enablement this value is automatically set to null.

At a point in time we have deleted product 7. In Google BigQuery the product is not deleted but the “deteled” column is set to true.

All the updates have not actually been treated as updates. But they have been inserted as new rows with the modified column plus a deleted flag set to false. It is impossible to know the order of the operations can only be tracked using the row_number function.

Also in the second test we added a column “notes” that has been replicated successfully.

Conclusions:

Google BigQuery uses the streaming API to stream data in real time. Oracle GoldenGate uses Big Query handler to interact with this API. Note that using audit log mode false may render unexpected results due to deduplication period. Also there is no other way to track the order of the results rather than using row_number analytical function over the primary key of the table, logically declared as BQ does not support primary keys.

It seems like using audit log true together with metacolumns and local apply timestamp gives the best results with the possibility of using the timestamp to filter the latest version of a row based on operation type and position in trail file.

The DDL operations are supported unidirectional.

Oracle has just release Oracle GoldenGate for Big Data 21.5 ( see the announcement made by product management here). Among other great new features like Autonomous JSON Database (AJD) Delivery, Mongo DB Capture we have also Google BigQuery Stage & Merge.

BigQuery Stage & Merge uses stage and merge data flow to ingest data into Google Cloud Platform BigQuery. The changed data is staged in Google Cloud Storage (GCS) buckets temporarily and merged into target tables in micro batches.

In the second part of this blog we will see how stage and merge works and what are the differences between the ingestion with BigQuery Streaming API (presented in this first part) and BigQuery job API used by “Stage and Merge” data flow in Oracle GoldenGate for Big Data.

Real-time data replication to Google BigQuery using Oracle GoldenGate for Big Data

Scenario 1:

Conclusions:

Ionut Bruma

Principal Technology Specialist

Realtime Message Ingestion into Azure Service Bus with GoldenGate for Distributed Applications and Analytics

How to Replicate data to Azure Event Hub using Oracle GoldenGate for Big Data

Real-time data replication to Google BigQuery using Oracle GoldenGate for Big Data

Scenario 1:

Conclusions:

Authors

Ionut Bruma

Principal Technology Specialist

Realtime Message Ingestion into Azure Service Bus with GoldenGate for Distributed Applications and Analytics

How to Replicate data to Azure Event Hub using Oracle GoldenGate for Big Data