MySQL HeatWave is the only fully managed MySQL database service that combines transactions, analytics, machine learning, and GenAI services, without ETL duplication. Also included is HeatWave Lakehouse, which enables users to query data in object storage, MySQL databases, or a combination of both. Users can deploy MySQL HeatWave–powered apps on a choice of public clouds: Oracle Cloud Infrastructure (OCI), Amazon Web Services (AWS), and Microsoft Azure. 

 

Introduction

HeatWave Lakehouse provides the capability to store and query files in the OCI Object Store. Changes done to these files can be updated in HeatWave, but to do this, the user must issue a command. We have now made this task automatic, and this automation has been achieved by integrating with the OCI event mechanism. This post introduces this exciting new feature.

Analytics and timely business decision making require access to the latest data, whether from your database, object storage, or both. While MySQL HeatWave Lakehouse makes it easy to refresh one or multiple tables, even across schemas, with one simple command, the ability to automatically propagate changes from underlying data in object storage to the tables in HeatWave means fewer delays, avoiding stale data issues for frequently changing data, and with fewer operational overheads.

In this blog we will see how the automatic table refresh feature of HeatWave Lakehouse makes possible real-time analytics, simplifies data pipelines, and reduces maintenance overheads, while still providing unmatched query performance and price-performance. This feature is available with MySQL HeatWave release 9.4.1 and above.

In other words, life is simpler, and as fast as before, with the automatic refresh feature in HeatWave Lakehouse.

How to set it up

There are three requirements to enabling automatic refreshes in MySQL HeatWave Lakehouse (on OCI):

  1. The Object storage bucket where your table data is must emit events when data changes.
  2. An event stream that listens to these events when data is changed in your source bucket/folder.
  3. Telling your Lakehouse table to use this stream for automatic refreshes.

Let us look at these three steps in a little more detail. As always, you have the documentation for more details, including the syntax, etc… (link to documentation).

Note: you can watch this video, or follow the instructions below, or both.

 

Enabling Object events

Navigate to your bucket that contains the data you will be loading into HeatWave Lakehouse. Under the ‘Bucket Information’ tab, click the ‘Edit’ button to change this setting. By default, this option is disabled. Once you enable this option, any addition, deletion, or updation of objects in the bucket will emit an event (or events).

Bucket to enable events for
Enable bucket to emit object events

 

Checkbox to enable emitting object events
Checkbox to enable emitting object events

 

Enabling bucket for object events
Caption

The next step is for us to create an OCI Stream and then configure it to listen to events from this bucket by creating rules for the same.

2. Setting up a stream

From the Streams page, select the compartment⏤this is the compartment where your DB system and data is in. You can locate it by clicking the menu icon at the top left of your OCI page, clicking on ‘Analytics & AI’, and then selecting ‘Streaming’.

OCI Streaming
OCI Streaming

 

Click the ‘Create Stream’ button.

Creating a stream
Creating a stream

You will need to enter the following information on the next page. For more information on Streams, you can refer to the doc (link to doc).

 

Steps to create a stream
Steps to create a stream

 

  1. Name: a descriptive name
  2. Compartment
    • Verify the compartment is correct.
  3. Stream Pool
    • Stream pools are logical groupings for streams. Every stream must be a member of a stream pool. If you don’t create a stream pool, the Streaming service uses a default pool to contain your streams.
    • It is sufficient to use the default stream pool (link to Stream Pools doc)
  4. Stream Settings

Note that these settings cannot be changed once the stream is created. You will have to delete the stream and create a new one if you want to change these settings. The ‘Retention’ period is in hours.

    • Defines how long the messages are kept in the stream.
      • Number of partitions: it is recommended that you create at least 2 streams, one with a single partition and one with multiple partitions. Depending on how many events are created from the buckets, multiple partitions may be required to stay inside the write rate limit.
Setup
Stream created

 

Once the stream is created, it must be accessible to your MySQL HeatWave DB system. This means should assign the right policy to your event stream. For more information on setting up dynamic groups and policies for your HeatWave system, refer to this blog (link to blog).

Navigate to where you defined your policies and add a statement.

From ‘Identity & Security’, select ‘Policies’.

Test
Create or Edit Policies

Select the policy; in our case the policy is named ‘lakehouse-policy’. Click ‘Edit Policy Statements’.

Edit Policy statements
Edit policy statements

 

Click ‘Additional rule’ and then enter the new policy rule.

Edit policy statements
Edit policy statements

 

This is how the statement will look like: substitute the stream name, compartment, and dynamic group to match yours:

Allow dynamic-group some-group to use stream-pull in compartment my-compartment where any {target.stream.id = ' ocid1.stream.oc1.iad.amaaaaaaao2swpxxxxxxxxxxxxxxdjicz2pvq'}

 

If you need the stream OCID, simply go back to the stream you had created and click the ‘Copy’ link next to the OCID of your stream:

OCID for stream created
OCID for stream created

The final step is to set up a rule and its event filtering. This means creating a rule to send events from our bucket to this particular stream that we have just created.

Go to the Events Service’s Rules page (you can search for ‘Rules’ in the OCI Search box).

Rules
Creating a rule

 

Creating a rule
Creating a rule and editing it
  1. Enter a descriptive name for the rule.
  2. Select ‘Event Type’ as the condition.
    1. Select ‘Object Storage’ from the Service Name dropdown.
    2. Select and add all these: ‘Object – Create’, ‘Object – Update’, ‘Object – Delete’ from the Event Type dropdown.
  3. From the Actions section, select ‘Streaming’ as the Action Type.
    1. Select the Compartment where your HeatWave system is running, or will run.
    2. Select the stream you created from the Stream dropdown.

Now click ‘Another Condition’.

  1. Select ‘Attribute’ as the condition.
  2. Select ‘resourceName’ from the attribute name dropdown.

In the ‘Attribute Value’ box, enter the prefix string (or exact name). In our case, it is ‘autorefresh/*’ (without the quotes).

Click ‘Save Changes’.

Filtering for events from a specific folder
Filtering for events from a specific folder

Enabling auto-refresh for your table(s)

At this point, your object store bucket is emitting events, you have an event stream that is receiving these events, and you have a policy defined that lets your DB system (in the dynamic group you specify) receive these events.

The final step is to provide the stream OCID to the ENGINE_ATTRIBUTE section of your table definition to enable automatic refreshes.

You can provide this attribute when running Autopilot, or via the CREATE TABLE statement if manually creating the table.

ENGINE_ATTRIBUTE = '{"auto_refresh_event_source": " ocid1.stream.oc1.iad.amaaaaxxxxxxxxxxxxxxxxxxxxxjicz2pvq",
    
      ...}'

If, after creating a table (without automatic refresh enabled), you decide that you need a particular table to automatically refresh, you can do so by running the ALTER TABLE command and adding this engine attribute.

ALTER TABLE `schema`.`table` ENGINE_ATTRIBUTE = '{
    
        <existing contents>,
    
        "auto_refresh_event_source" = "<stream OCID>"
    
    }';

Once you have created your table and have the above steps implemented, you are all set. As data comes into your object storage bucket, or is modified, or deleted, your table will refresh automatically.

This feature is available with MySQL HeatWave release 9.4.1 and above and makes near real-time analytics a reality with data from object storage, with a few simple steps.

Do give this feature a try and let us know your feedback!

More Information

You can find more information about HeatWave at:

General Information

  • Try HeatWave on OCI (link)
  • Try HeatWave on AWS (link)
  • HeatWave Lakehouse website (link)
  • Technical brief (PDF)
  • Documentation (link)
  • Live Labs (link)

Blog posts

  • Automatically detect delimiters in CSV files (link)
  • Getting started with MySQL HeatWave Lakehouse (link)
  • Configuring Resource Principals for HeatWave Lakehouse (link)
  • Exporting to object storage with HeatWave (link)
  • MySQL HeatWave Lakehouse on AWS (link)
  • Oracle ClouldWorld Keynote: The Future of Scale-out Data Processing with HeatWave Lakehouse (link)

Videos and demos

  • CloudWorld Keynote with HeatWave Lakehouse (link)
  • HeatWave Lakehouse on AWS (link)
  • Data Warehouse and Lakehouse Analytics with HeatWave Lakehouse (link)