Introducing Natural Language to SQL for MySQL HeatWave

MySQL HeatWave is the only fully managed MySQL database service that combines transactions, analytics, machine learning, and GenAI services, without ETL duplication. It also includes MySQL HeatWave Lakehouse, allowing users to query data stored in object storage, MySQL databases, or a combination of both. Users can deploy MySQL HeatWave–powered apps on a choice of public clouds: Oracle Cloud Infrastructure (OCI), Amazon Web Services (AWS), and Microsoft Azure.

We are excited to announce the release of a new Natural Language to SQL (NL2SQL) feature in version 9.4.1 for MySQL HeatWave on OCI, AWS, and Azure.

This innovative capability enables users to easily convert natural language queries into SQL statements, simplifying data analysis and accelerating insights.

With NL2SQL, both technical and non-technical users can interact with their MySQL HeatWave databases more intuitively, enhancing productivity and unlocking greater value from their data.

Feature Overview

Large Language Models (LLMs) are at the heart of the NL2SQL feature, enabling the translation of natural language queries into SQL. However, relying solely on LLMs is not enough to ensure precise and relevant SQL statements, especially when working with complex databases containing numerous tables and columns.

To address this, our solution automatically collects relevant database schema metadata—including table names, column names, and their relationships—for each Natural Language query.

This metadata is then formatted as additional textual context and seamlessly incorporated into the LLM prompt. By narrowing the scope to only the relevant parts of the schema, the model can more accurately interpret the user’s intent and generate SQL statements tailored to the database’s structure.

Features of the NL2SQL support in MySQL HeatWave

The process begins when a user submits a natural language question. The system selects and summarizes only the schema elements most relevant to the question, forming what’s known as an “Augmented Prompt.”

This prompt, containing both the user’s question and the schema details, is then passed to the LLM.

After the LLM generates a SQL statement, the system parses the output, validates its SQL syntax, and applies self-reflection—an additional review step to catch potential inconsistencies or errors.

Only after passing these validation stages is the SQL statement executed against the database, retrieving a result set that accurately answers the user’s original question.

This multi-layered approach ensures both ease of use and reliability, making data exploration intuitive and efficient for all users.

Using NL2SQL in MySQL HeatWave

The NL2SQL feature is accessed using the stored procedure

CALL sys.NL_SQL("What is the total number of bookings priced over $200?", @output, NULL);
    +---------------------------------------------------------------------------+
    | Executing generated SQL statement... |
    +---------------------------------------------------------------------------+
    | SELECT COUNT(`booking_id`) FROM `airportdb`.`booking` WHERE `price` > 200 |
    +---------------------------------------------------------------------------+
    
    +---------------------+
    | COUNT(`booking_id`) |
    +---------------------+
    | 32699080 |
    +---------------------+

This will automatically identify all relevant database objects and execute the generated SQL statement.
To restrict database schema metadata collection to a single database we can use the schemas parameter.

CALL sys.NL_SQL("What is the total number of bookings priced over $200?", @output, '{"schemas":["airportdb"]}');
    +---------------------------------------------------------------------------+
    | Executing generated SQL statement... |
    +---------------------------------------------------------------------------+
    | SELECT COUNT(`booking_id`) FROM `airportdb`.`booking` WHERE `price` > 200 |
    +---------------------------------------------------------------------------+
    
    +---------------------+
    | COUNT(`booking_id`) |
    +---------------------+
    | 32699080 |
    +---------------------+

We can also disable execution by setting execute to false.

CALL sys.NL_SQL("What is the total number of bookings priced over $200?", @output, '{"execute": false}');

Calling the stored procedure this way produces no output. We can still inspect `@output` object to retrieve the generated SQL statement and additional information

SELECT JSON_PRETTY(@output);
    {
    "tables": [
    "airportdb.booking",
    "airportdb.airplane_type",
    "airportdb.passengerdetails",
    "airportdb.airplane",
    "airportdb.flightschedule",
    "airportdb.passenger",
    "airportdb.airport_reachable",
    "airportdb.airline",
    "airportdb.airport_geo",
    "airportdb.employee",
    "airportdb.flight",
    "airportdb.airport",
    "airportdb.flight_log"
    ],
    "schemas": [
    "airportdb"
    ],
    "model_id": "meta.llama-3.3-70b-instruct",
    "sql_query": "SELECT COUNT(`booking_id`) FROM `airportdb`.`booking` WHERE `price` > 200",
    "is_sql_valid": 1
    }

This JSON object shows the schemas and tables used for metadata collection, the Large Language Model used for generating the SQL statement, the generated SQL statement and whether the SQL statement is syntactically valid.

We can also choose to use a different Large Language Model for generating SQL statements using the `model_id` parameter. For a list of available Large Language models please see the documentation.

CALL sys.NL_SQL("What is the total number of bookings priced over $200?", @output, '{"model_id": "llama3.1-8b-instruct-v1"}');
    +----------------------------------------------------------------+
    | Executing generated SQL statement... |
    +----------------------------------------------------------------+
    | SELECT COUNT(*) FROM `airportdb`.`booking` WHERE `price` > 200 |
    +----------------------------------------------------------------+
    
    +----------+
    | COUNT(*) |
    +----------+
    | 32699080 |
    +----------+

Best Practices and Limitations

To achieve optimal results when using the NL_SQL feature, consider following these best practices. First, restrict the focus of metadata collection by specifying relevant schemas or tables using the schemas and tables parameters, which helps the LLM generate more accurate SQL statements. Second, ensure that your tables, columns, and views have semantically meaningful names—this enables the LLM to better understand the structure and context of your database. Using descriptive views is especially helpful when working with complex JOIN operations, as it improves accuracy in query generation. Additionally, provide exact values within your natural language input to minimize errors related to literal filters.

While the NL_SQL feature offers powerful natural language querying capabilities, there are some important limitations to consider. The generated SQL queries may not always be fully valid, and their execution time can be unpredictable depending on query complexity. In some cases, the queries may produce large result sets, which could lead to unintended side effects such as excessive resource consumption. The feature does not maintain the state of previous calls, meaning each invocation operates independently and does not leverage previous responses. Additionally, accuracy may decline if the database contains a large amount of metadata context, as this can complicate prompt construction and interpretation. Finally, please note that temporary tables are not supported and cannot be specified in the tables parameter.

Summary

The new NL2SQL feature for MySQL HeatWave empowers users to seamlessly convert natural language questions into accurate SQL queries using advanced Large Language Models, guided by database schema metadata for contextual accuracy.

With flexible options for customization, best practices to maximize precision, and clear guidance on its limitations, NL2SQL makes data exploration more intuitive while enabling users to tailor the feature to their needs.

This functionality streamlines access to insights, unlocks the power of complex databases, and supports diverse analytic workflows within MySQL HeatWave.

Learn More

Read the documentation.

Try MySQL HeatWave for Free.

Try a self-paced, self-service workshop to get started with MySQL HeatWave.

Watch overview and deep-dive videos on the MySQL YouTube channel.

Introducing Natural Language to SQL for MySQL HeatWave

Feature Overview

Using NL2SQL in MySQL HeatWave

Best Practices and Limitations

Summary

Learn More

Jayant Sharma

Sr. Director, MySQL HeatWave Product Mgmt

Fotis Savva

Principal Member of Technical Staff

Onur Kocberber

Director of Development, HeatWave

Two Models, One World: Introducing JSON Relational Duality Views in MySQL HeatWave

Using MySQL HeatWave as a knowledge base with the OCI AI Agent Platform

Introducing Natural Language to SQL for MySQL HeatWave

Feature Overview

Using NL2SQL in MySQL HeatWave

Best Practices and Limitations

Summary

Learn More

Authors

Jayant Sharma

Sr. Director, MySQL HeatWave Product Mgmt

Fotis Savva

Principal Member of Technical Staff

Onur Kocberber

Director of Development, HeatWave

Two Models, One World: Introducing JSON Relational Duality Views in MySQL HeatWave

Using MySQL HeatWave as a knowledge base with the OCI AI Agent Platform