In-Database Machine Learning with MySQL HeatWave

MySQL HeatWave is a fully managed database service that lets users develop and deploy secure cloud native applications using the world’s most popular open source database. MySQL HeatWave is the only MySQL service with a highly scalable integrated query accelerator which enables customers to run OLTP, OLAP and mixed workload with a single database, without requiring ETL and without the need to change their applications.

With this release of MySQL HeatWave, we have introduced a number of new capabilities:

In-database Machine Learning

Until now, customers who need to apply machine learning (ML) on data in MySQL had to extract data out of the database (ETL) and then use third-party libraries or services to train a model or make a prediction. In addition to being onerous, time-consuming and expensive, this process also has the potential to proliferate data outside of the database, causing security and governance issues.

With HeatWave ML, MySQL HeatWave now has native support for in-database machine learning. Customers can now perform machine learning operations—training, inference, explanation—inside the database, using SQL. Neither the data nor the ML model leave the database service, as shown below.

MySQL HeatWave in-database Machine Learning

HeatWave ML provides several advantages:

Fully Automated: HeatWave ML fully automates the training process and eliminates the need of a data scientist.

SQL Interface: All ML operations including training, inference and explanations can be performed with a SQL interface which is very familiar to database users.
Secure: The data and the model can be secured by database access control and security mechanisms
Explanations: All models created by HeatWave ML can be explained. This encourages the adoption of machine learning among enterprises since ML explanations help with transparency, build trust, demonstrate fairness, and comply with regulatory requirements.
Performance: The performance of HeatWave ML is much better than competing services like Redshift ML, with 25X faster performance and better accuracy at 1% the cost—all without data or models leaving the database. Furthermore, HeatWave ML scales with the size of the cluster. Your customers can perform the tests for themselves with the fully transparent benchmark scripts and configurations.
No additional charge: HeatWave ML is available at no additional cost to MySQL HeatWave users.
Easy Upgrades: HeatWave ML leverages state-of-the-art open-source Python ML packages that enable continual and swift uptake of newer and improved versions.

Real-Time elasticity

HeatWave is a highly scalable system and the performance of the system improves nearly linearly with the size of the cluster. As a result, many users of MySQL HeatWave start off with a cluster of a certain size and then scale up or scale down the cluster based on the needs of their workload. Until now, customers had to perform this step manually and for the duration of the operation, although the MySQL database was available, the HeatWave cluster was unavailable for queries or DML.

MySQL HeatWave Real-time Elasticity

We are now introducing support for real-time elasticity, with which MySQL HeatWave can resize to any number of nodes, without any manual step and the HeatWave cluster is fully available for all operations—queries, DML, load—throughout the duration of the operation. This operation is performed in such a manner that there is no data movement among the HeatWave nodes, as a result there is no impact to query performance on the existing cluster while the cluster is being scaled up. Furthermore, the time it takes to resize is independent of the number of new nodes which are being added since data is loaded in the HeatWave nodes in parallel. At the end of the resize operation, the data across the various HeatWave nodes is fully balanced.

More data per node

MySQL HeatWave is an in-memory system and the size of the HeatWave cluster required depends on the amount of data to be processed in order for the data to reside in the aggregate memory of the HeatWave cluster. We have introduced two features in HeatWave which nearly doubles the amount of data that is processed by HeatWave– bloom filters and compression. This results is nearly a 50% reduction in the cost for customers.

MySQL HeatWave Data Processed x2

The amount of data which can be processed per node depends upon the data characteristics. For industry standard benchmarks like TPC-H, we are now able to process 820G of data per node up from 410G data per node. Since the amount of data that can processed per node has doubled, the minimum size of the cluster required to run a workload is reduced in half, which results in reducing the cost to the customer by half. We achieve this reduction in cost without any impact to the compelling price performance of HeatWave and without any impact to the time it takes to load it into the memory of HeatWave.

Pause & resume HeatWave

For customers who sporadically use a HeatWave cluster or who wish to pause the HeatWave cluster while not using it, in order to reduce cost, we introduced an efficient pause & resume capability. HeatWave maintains the copy of the in-memory representation of data, metadata and MySQL Autopilot statistics in the HeatWave storage (which resides in the object store) and continuously updates it. When the user pauses the cluster, the system instantaneously stops the cluster. Upon resume, the system reloads the data, metadata and MySQL Autopilot statistics in the HeatWave memory at near network bandwidth.

MySQL HeatWave Pause & Resume

HeatWave is a scale-out system and all HeatWave nodes can load data in parallel, hence the time it takes to load only depends on the amount of data per node is independent of the overall data size. Hence, regardless of the size of the data, this resume operation can be performed in constant time, which in a large OCI region would be around 5 minutes. Since the MySQL Autopilot statistics and metadata are also saved and restored, users can pause the cluster without losing the workload history, which is used for various optimizations. The pause and resume capability of HeatWave is another way that we reduce cost for customers without impacting performance.

Other New Features

Support for Views: MySQL HeatWave now supports views which results in the acceleration of queries with views. Queries executed on views have the same offload prerequisites as queries executed on the base tables—for example, the table needs to be loaded into HeatWave.

More columns and wider columns: The supported number of columns in base relation loaded into HeatWave has been increased from 473 to 1,017 columns, which is also the InnoDB column limit. The VARCHAR/TEXT column width limit has been increased from 8KB to 64KB.

Native IN-clause Support: HeatWave now supports native execution of IN-clause. During execution the IN-clause list elements are stored in a cache efficient data structure and the column is streamed through by each core in the HeatWave node. This technique provides up to 100x acceleration for queries with large number of elements in the IN-clause list.

Today we also released new industry standard 10TB TPC-DS* benchmark results that demonstrate that other cloud database services are slower and more expensive than MySQL HeatWave:

• 14.4 times better in price performance than Snowflake

• 4.8 times better in price performance than Amazon Redshift

• 14.9 times better in price performance than Azure Synapse

• 12.9 times better in price performance than Google BigQuery

MySQL HeatWave Price-Performance

Dig Deeper:

See product details: Oracle MySQL HeatWave
Get more technical insights on how to use Machine Learning in HeatWave:
Get US$300 in credit and try MySQL HeatWave for 30 days: Try MySQL HeatWave for Free
MySQL HeatWave Benchmarks
MySQL HeatWave Technical Brief
MySQL HeatWave ML technical brief

Visit Oracle.com/mysql and follow us on Twitter @MySQL, Facebook and LinkedIn.

* Benchmark queries are derived from the TPC-DS benchmark, but results are not comparable to published TPC-DS benchmark results since they do not comply with the TPC-DS specification.

In-Database Machine Learning with MySQL HeatWave

Nipun Agarwal

Senior Vice President, MySQL HeatWave

Introducing MySQL Shell for VS Code

Democratizing AI means much more than simply empowering business users

In-Database Machine Learning with MySQL HeatWave

Authors

Nipun Agarwal

Senior Vice President, MySQL HeatWave

Introducing MySQL Shell for VS Code

Democratizing AI means much more than simply empowering business users