Recently I was asked by one of our HeatWave users about using Airbyte to synchronize data to HeatWave in Oracle Cloud Infrastructure (OCI), specifically I was asked to provide guidance to build a test environment to test out moving data from one source to a target in HeatWave on OCI. 

Based on my research on Airbyte, I chose to use the Airbyte cloud service from the following list of options availalbe:

  1. Self-hosted using Airbyte open source edition
  2. Self-hosted using Airbyte Enterprise edition
  3. Managed service hosted by Airbyte Cloud

In my test environment, I had built 2 HeatWave MySQL instances in one of the OCI region with one instance as the source and the other instance as the destination. Both these instances were exposed to public internet with OCI Load Balancer so that Airbyte ETL service to connect to both of these databases (detail of how you can use OCI load balancer to acces HeatWave). The following diagram illustrates my test environment:

HeatWave Airbyte Test Environment
Connecting Airbyte to HeatWave on OCI

Airbyte offers 14-day trial accout to test drive Airbyte after you have registered and successfully perform the initial synchronization using the following steps:

  1. Define source
  2. Define destination
  3. Select stream (select tables on the source to be streamed to the destination)
  4. Create connection (specify scheduling and execution of the data stream)

On Airbyte cloud, there are 3 tiers of Airbyte connectors available with different support levels that can be selected to define the source and destination

  • Airbyte connectors
  • Marketplace connectors
  • Custom connectors

In my source definition, I chose to use the MySQL connector from the Airbyte connectors list. The MySQL connector provided by Airbyte supports update method using changed data capture by reading the MySQL binanry log, and the following is the list of configuration parameters you need to configured in the your MySQL source database (default configuration file is my.cnf)

log_bin = mysql-bin
binlog_format = ROW
binlog_row_image = FULL
binlog_expire_logs_seconds = 864000

binlog_expire_logs_seconds specifies the retention period of binary logs in the MySQL source, the binanry logs will be kept in the MySQL source for 24 days before they are purged in the example above. You can change this value according to your requirements. If your database is busy and generates lots of binanry logs, you may want to keep the binary logs longer so that Airbyte can extract and load to the desitation. 

Besides these parameters, you would also need to ensure the user account configured in your source definition has the privileges required to read the binary log as below:

GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO <user_name>;
 
MySQL Source
Define MySQL as Source

In my destination configuration, I could only use the MySQL connector from the Marketplace connectors list, hence there are 2 different MySQL connectors used in source defition and destination definition, one supported by Airbyte and ther other supported by the Airbyte community.  

MySQL Destination
Define MySQL as Destination

You need to specify the following values for both the source and destination:

  • Host (IP or domain name) of the database
  • User account and password of MySQL
  • Port number (TCPIP port to connect to MySQL, default is 3306)

Once the definition is completed, you should test the connections to make sure you can access to both your source and target databases.

One of the prerequisites to use Airbyte cloud is to allow a list of Airbyte servers to access both the source and destionation in HeatWave/OCI

Allow list for Airbyte servers
Allow list for Airbyte Servers

In my test environment, I was using OCI Load Balancer to connect to HeatWave instances for testing purpose, therefore I added these list of IPs in the Security List of Load Balancer 

OCI Security List to allow Airbyte access
OCI Security List to allow Airbyte access

The next step was to select the tables on the source to be synchronized to the destionation and the synchronization mode between source and destination. 

Data source seletion
Table selection and synchronization mode

The final step of the Airbyte stream is to create the connection to stream data from the source to the destination and attached scheduling and synchronization policies to the stream.

Create data stream connection
Create data stream to connect the source and the destination with scheduling policies

The connection is completed and I have all the data synchronized from HeatWave MySQL source to the other HeatWave MySQL destination!

Connection is live!
Data stream connection was created
Source synchronized successfully to Destination
Source synchronized successfully to Destionation

 

If you are interested to know more about HeatWave on OCI, please check out the following: