HeatWave is the only fully managed MySQL database service that combines transactions, analytics, machine learning, and GenAI services, without ETL duplication. HeatWave also includes HeatWave Lakehouse, allowing users to query data stored in object storage, MySQL databases, or a combination of both. Users can deploy HeatWave MySQL–powered apps on a choice of public clouds: Oracle Cloud Infrastructure (OCI), Amazon Web Services (AWS), and Microsoft Azure.
Any database administrator (DBA) will tell you how vital a reliable backup feature is to ensure data integrity and disaster recovery in their database systems (DB systems). Backups should be automated and run frequently, and in the event of failure, immediate alerts must be sent to the DBA before any data loss occurs.
The HeatWave Service in Oracle Cloud Infrastructure (OCI) provides a backup feature that meets these requirements, with backups scheduled every 24 hours or created on demand. While backups are designed to be reliable, proactive monitoring is necessary to ensure they complete successfully.
This blog explores how to monitor HeatWave backups using OCI’s Monitoring and Alarm services and includes details on how to set up alarms that instantly notify the user in case of backup failures, enabling quick action.
Overview of OCI Monitoring and Alarm Services
Oracle Cloud Infrastructure (OCI) Monitoring and Alarm services provide tools to track the health, performance, and availability of cloud resources. These services allow setting up alarms to notify users of critical issues, such as backup failures.
With OCI Monitoring, both active and passive monitoring of resources is possible through its Metrics and Alarms services, ensuring timely awareness of potential problems. The Monitoring service uses metrics to track resources, while the Alarms service triggers notifications based on specified conditions. The Alarms service ensures continuous awareness of resource status by sending notifications to the configured destination whenever an alarm-triggering condition is met. For more details, refer to the Overview of Monitoring.
Launch a Heatwave DB system and enable Backups
To monitor HeatWave backups, first ensure that a HeatWave DB system is launched and backups are enabled. Backups can be created manually or scheduled automatically. Refer OCI documentation to launch a DB system. To create a manual backup, refer to the guide on creating manual backup or configure automatic scheduled backups using this page.
Backup Failures
Backup Failures and Handling
In the event of a backup failure, the HeatWave service typically logs the failure and may attempt retries, depending on the configuration. However, it is critical for the user to be notified immediately of such failures to take corrective action.
For example, if a backup fails due to the disk being full, the database system will not have enough space to store the backup data, leading to repeated failures until storage is expanded. In such a case, increasing the DB system storage can help ensure there is sufficient space available for future backup operations to be successful.
For manual backups, the user can verify their status by checking the console immediately after initiating them. This allows the user to take quick action if something goes wrong.
Automatic backups, however, occur in the background without immediate oversight. Consequently, an automatic backup failure might go unnoticed until it’s too late, potentially compromising data integrity and recovery efforts.
This makes it crucial to have alerts in place that signal any automatic backup failures. Setting up alarms ensures quick action can be taken when needed.
Monitoring Backup Metrics and Configuring Notifications
Understanding and tracking the metrics emitted by HeatWave during backup operations is essential for ensuring the reliability of database systems.
-
Backup Failure Metric: HeatWave Service emits the BackupFailure metric that indicates whether a backup was successful or not.
- Alarms and Notifications: Setting up alarms based on this metric ensures that failures are promptly addressed.
Steps to set up Alarms for Backup Failures
To receive notifications for any backup failures, alarms can be configured based on the BackupFailure metric in OCI Monitoring. Here’s a simple way to get this done.
Start by accessing the OCI Console and navigating to your HeatWave DB system. From the Database section, click on DB systems under the HeatWave category. After locating the desired DB system, go to its details page as shown in Figure 1.

Figure 1. DB system Details Page
In the Monitoring section on the left-hand side, select Metrics as shown in Figure 2 to view the performance data related to the DB system.

Figure 2. Monitoring Section: Metrics Overview
The metric to focus on here is the BackupFailure metric. This metric indicates the success or failure of the backups: a value of 1 means the backup failed, while 0 means it succeeded. To create an alarm based on this metric, click on Options in the BackupFailure metric section and select Create an Alarm on this query as shown in Figure 3, which will lead to the alarm creation page.

Figure 3. Create an Alarm Option for BackupFailure Metric
The alarm creation page will have most of the details pre-filled, as shown in Figure 4. Provide a descriptive name for the alarm, such as “Backup Failure Alarm for DBSystemName” to easily identify it later. Fill in the required metric description details.

Figure 4. Alarm Creation Page
The Trigger Rule defines the condition that activates the alarm. Here’s a general approach:
For a backup failure alarm on a specific DB system, consider the following condition template:
| BackupFailure[Interval]{dbSystemId = “your_db_system_ocid”}.max() > Threshold |
-
Interval: Represents the duration to check for failures (e.g.,
1mfor one minute). -
Threshold: Sets the failure count threshold that will trigger the alarm (e.g.,
Threshold = 0for any backup failure).
Example Condition: If you want to trigger an alarm for any detected backup failure within the last minute, set:
| BackupFailure[1m]{dbSystemId = “your_db_system_ocid”}.max() > 0 |
This condition will trigger the alarm if even a single backup failure occurs within the specified minute. Technically, the monitoring service is checking if the failure count exceeds 0, so the condition triggers as soon as any failure (1 or more) occurs. Adjust threshold, interval, and other values as needed to suit your monitoring requirements as shown in Figure 5.

Figure 5. Alarm Trigger Rule Setup
Notification settings can be configured according to preferred communication channels, such as email, SMS, or a custom endpoint. Either select an existing notification topic or create a new one directly during the alarm setup, as shown in Figure 6.

Figure 6. Notification Configuration for Alarm
Provide the rest of the information needed to create the alarm. For reference, see Creating a Basic Alarm.
Conclusion
Without a reliable backup and monitoring strategy, database failures can go unnoticed, putting critical data at risk and potentially leading to irreversible loss. Relying solely on backups without proactive monitoring means failures might be discovered when it is too late to take preventive action.
To mitigate this risk, OCI’s Monitoring and Alarm services provide a robust solution for tracking backup performance and instantly alerting administrators in case of failures. By setting up alarms for backup failures in HeatWave DB systems, potential issues can be detected early, ensuring timely corrective actions.
Immediate notifications of backup failures enable rapid corrective actions, ensuring data integrity and compliance. OCI’s robust Monitoring and Alarm services enhance the reliability of the HeatWave environment, guaranteeing that critical data remains securely backed up.
