HeatWave is the only fully managed MySQL database service that combines transactions, analytics, machine learning, and GenAI services, without ETL duplication. HeatWave also includes HeatWave Lakehouse, allowing users to query data stored in object storage, MySQL databases, or a combination of both. Users can deploy HeatWave MySQL–powered apps on a choice of public clouds: Oracle Cloud Infrastructure (OCI), Amazon Web Services (AWS), and Microsoft Azure.
Are you migrating your MySQL workloads to OCI’s HeatWave service? If you are using Debezium Change Data Capture (CDC) to stream data changes from MySQL binary log to Kafka, you might be facing a puzzling problem: while INSERT and DELETE events are extracted correctly from binary log, UPDATE events are mysteriously missing and skipped by Debezium CDC. This can be very frustrating and totally unexpected, as Debezium worked flawlessly in your previous MySQL environments, making this issue difficult to diagnose and to resolve on HeatWave.
Let’s roll up our sleeves and get your Debezium CDC working perfectly with HeatWave MySQL.
The Root Cause: binlog_row_value_options=PARTIAL_JSON
According to Debezium CDC documentation (https://debezium.io/documentation/reference/stable/connectors/mysql.html), there are a few mandatory parameters that need to be configured on MySQL in order for Debezium CDC to extract data changes by reading change events recorded in MySQL’s binary log. Here is the list of binary log related parameters:
| Parameters | Standard MySQL | HeatWave MySQL |
|---|---|---|
| binlog_row_image | FULL | FULL |
| binlog_row_value_options | “” (empty) | PARTIAL_JSON |
In the standard MySQL, the binlog_row_image parameter (which defines how row images are written to the binary log) is often set to FULL by default. This ensures that for UPDATE events, the binlog records both the “before” and “after” images of the modified rows, containing all column values. Debezium CDC relies heavily on these before and after images to provide comprehensive change events, especially for UPDATE operations.
However, one of the related default binlog parameters, binlog_row_value_options, is set to PARTIAL_JSON in HeatWave MySQL to optimize binary log operations, instead of the default empty string (“”) in the standard MySQL. This setting, while potentially optimizing for certain internal HeatWave operations, affects how UPDATE events on JSON data type are written to the binary log.
When binlog_row_value_options is set to PARTIAL_JSON, HeatWave MySQL will be enabled to write only the modified parts of the JSON document to the after-image for the update in the binary log, rather than writing the full document for optimization purposes. This prevents Debezium from reconstructing the complete “before” and “after” state of the row as it expects. As a result, Debezium cannot properly parse these UPDATE events, leading to them being ignored (as in our observations as missing updates) in the CDC stream to Kafka. Although this setting only affects data rows with JSON data types, Debezium would choose to ignore UPDATE events because the before and after images of the updated data rows are not captured in full.
The Fix
To enable Debezium to correctly capture UPDATE events, you must explicitly set binlog_row_value_options="" (empty value)
You can modify this parameter through the OCI Console for your HeatWave MySQL DB System’s configuration. You will need to create a custom configuration for your DB System if you haven’t already.
Action: Set binlog_row_value_optios to an empty string
- Navigate to your MySQL DB System in the OCI Console.
- Go to Configuration.
- Create a new configuration from one of the existing configuration template. For example, MySQL.2
Create a new MySQL configuration based on existing template - Add the binlog_row_value_options, the initial value will be defaulted to PARTIAL_JSON
Add the binlog_row_value_options parameter to the new configuration - Clear its value so that the value is an empty string
Reset the binlog_row_value_options to empty string - Save the new configuration
- Navigate to MySQL DB System, select your DB System and edit the configuration
Edit the configuration of your MySQL DB System - Select the new configuration
Select the newly created custom configuration - Save changes to apply the new configuration with the new binlog_row_value_options to your DB System
Apply the new binlog_row_value_options to your MySQL DB System
Debunking the RELOAD Privilege Myth for Initial Snapshots
Many Debezium users, encountering issues, might consult Debezium’s documentation that lists the RELOAD privilege as a requirement for the MySQL user account configured for the Debezium CDC connector. This leads to confusion because HeatWave MySQL DB Systems generally do not grant the RELOAD privilege by default for security and operational reasons.
The Good News: The lack of the RELOAD privilege is not the real cause of the missing UPDATE events!
The primary purpose of RELOAD privilege as one of the requirements for Debezium CDC is for the initial data snapshot. HeatWave MySQL provides dynamic privileges to support Debezium’s snapshot mechanism: FLUSH_TABLES and LOCK TABLES (https://docs.oracle.com/en/database/mysql/heatwave-aws/mysql-server-default-privileges.html) These privileges allow Debezium to acquire the necessary global lock or tables locks required for a consistent initial snapshot without needing the broader RELOAD privilege.
Conclusion
Migrating your MySQL workloads to HeatWave is extremely beneficial because you would enjoy blazing fast performance on HeatWave and other advanced capabilities but default parameter changes can introduce unexpected challenges. By understanding and adjusting the binlog_row_value_options in your HeatWave MySQL DB system, and by recognizing that the RELOAD privilege isn’t a blocker, you can ensure your Debezium CDC connector seamlessly capturing all UPDATE transactions from HeatWave MySQL.
This simple adjustment can save you hours of debugging and unlock the full potential of migrating your workloads to HeatWave MySQL. As the result of this finding, I collaborated with RedHat to produce an enhancement request on Debezium CDC to support PARTIAL_JSON with binlog_row_value_options for optimized performance (https://issues.redhat.com/projects/DBZ/issues/DBZ-9123?filter=allissues).
