Change data capture (CDC) is a technique to read changes to data from the source, usually a database, and convert them to events. Generated events are delivered to a middleware from which other services consume these events and act.

This blog post discusses a CDC solution based on Debezium MySQL Connector. This connector makes use of the MySQL bin logs to read changes to the database state and translate them to events. These events are delivered to Oracle Cloud Infrastructure (OCI) Streaming service, which is Kafka-compliant. For the MySQL system, we use a managed MySQL instance in OCI.

Solution overview

We used the following OCI components to build the solution:

  • Stream pools

  • Streams

  • Connect harness

  • MySQL database

  • Kafka Connect virtual machine (VM)

  • Virtual cloud network (VCN)

Within OCI, the Kafka Connect ecosystem uses the OCI Streaming Kafka Connect harness. The Connect harness provides the compacted streams for configuration, offset, and status topics. For more information on the OCI Streaming Kafka Connect harness, see Using Kafka Connect.

The following graphic depicts the architecture for this solution:

A graphic depicting the architecture for the Debezium setup.

This solution also uses the following details for its setup:

  • A VCN with one private subnet to host the MySQL database and one public subnet for the VM where the Kafka Connect instance runs.

  • Stream pool with public endpoints: The pool has streams created by Debezium MySQL Connector. One stream is created per table to record changes to data events in the table.

  • The OCI Streaming Connect harness configuration is created to make compacted topics for status, offset, and configuration, which are used by the Kafka Connect instance.

  • A public subnet hosts the VMs running the open source Apache Kafka Connect.

  • Users can connect to the VMs using the public IP assigned for access.

Try it yourself

You need quotas and permission to manage the following types of resources in your tenancy:

  • VCNs, internet gateways, route tables, and subnets

  • Compute Instances

  • Stream pools

  • Streams

  • Connect harness

  • MySQL

If you don’t have the required permissions and quota, contact your tenancy administrator. For more details, see Policy Reference, Service Limits, and Compartment Quotas.

Deploy using Oracle Resource Manager

Use the following steps to deploy the architecture:

  1. Click A graphic depicting the Deploy to Oracle Cloud button.. If you aren’t already signed in, when prompted, enter the tenancy and user credentials.

  2. Review and accept the terms and conditions.

  3. Select the region where you want to deploy the stack.

  4. Follow the on-screen prompts and instructions to create the stack.

  5. After creating the stack, click Terraform Actions, and select Plan.

  6. Wait for the job to be completed and review the plan. To make any changes, return to the Stack Details page, click Edit Stack, and make the required changes. Then, run the Plan action again.

  7. If no further changes are necessary, return to the Stack Details page, click Terraform Actions, and select Apply.

Conclusion

The CDC approach simplifies integrating applications with multiple components. It allows you to focus on business logic and recording state in a database. The CDC pipeline takes care of the rest by publishing events to the downstream systems. Downstream systems can now update the cache or take appropriate action.

Try out the solution yourself! Sign up for the Oracle Cloud Free Tier or sign into your account.