SQL Dialog: Everything you need to know about Training Data

January 16, 2023 | 6 minute read
Daniel Martins Teixeira
Principal Product Manager ODA
Text Size 100%:

Oracle Digital Assistant release 22.12 brings us a ground-breaking new capability – SQL Dialog – which can translate a user's natural language input into SQL queries, send the queries to a backend data source, and display the response. All of these without the need to create intents, utterances, or custom components!

If you read the first article in this series, you should have (or know how to do it) a SQL Dialog Skill. This article will focus on how can we improve the baseline model.

Train the Skill to Convert Natural Language Utterances into SQL

After importing the model, the initial step is to enhance the SQL Dialog skill's comprehension of natural language phrases and its ability to map them to the data structure.

We begin by examining the identified phrases - training corpus - that the skill is designed to handle. This analysis will reveal the various ways in which users refer to entities and attributes. Again, this is an exercise where the business / end-user input is critical.

Provide Training Data Through Names and Synonyms

What happens when the user uses “employee” in the query, but the table name is “emp”? Or if he asks to see an employee's “salary”, but the attribute name is “sal”?

That is why we need to start by compiling a list of the ways people refer to entities and attributes and choose the term that best represents the primary name of each entity and attribute.

Selecting terms that closely match the most commonly used phrases is best.

But that alone is not enough as there are many other ways to refer to the same names. To address that, we use synonyms!

employee


The primary names are also the names displayed in the result columns, so having a name that self-describes the content is good practice.

The same principle applies to all attributes, for which we can also provide synonyms.

attributes

You can also use other entities like value lists, regular expressions, and dynamic entities. This provides extra ability to recognize the user's input and match it to a certain attribute.

edit attribute

To facilitate, you can make use of the auto-generation feature that will create a value list entity with just one click.

location

LOV


It generates the value list entity based on the existing values in the data source! Quite handy right?

Obviously, this also should be extended with appropriate synonyms - For example: Add NYC, NY, and Big Apple as synonyms of New York.

After this exercise, the model will better understand the natural language queries and associate them with SQL.

Copied to Clipboard
Error: Could not Copy
Copied to Clipboard
Error: Could not Copy
can you show me all the employees in the big apple?

bigapple

Provide Custom Training Data

What happens when the skill is not able to convert the utterance into an OMRQL* query?

When that happens we can instruct the skill on how to accurately interpret and convert the utterance into OMRQL, by adding utterances with a mapping to an OMRQL query.

*If this is the first time hearing about OMRQL, please check this post first.

For example, "show me all the managers“ will not create a correct OMRQL as it does not filter by the job attribute.

custom data

We can manually correct the query.

corrected_query

Provide Autocomplete suggestions

How can the user know what kind of queries can he/she perform?

One way to assist users in understanding the types of database queries they can make is by offering to autocomplete suggestions. These suggestions give clues about the types of questions the logical model can answer.

autocomplete1

These are only visible while using the WebSDK.

autocomplete2

When the user starts typing, immediately the autocomplete suggestions appear.

Routing Data

If your skill has intents or is integrated into a DA, how can we make sure the skill directs SQL queries to the appropriate SQL Dialogs conversation?

The routing mechanism utilizes autocomplete suggestions, training data, generated and manually crafted routing phrases to learn how to identify SQL queries.

Accurate routing of utterances to the SQL conversation requires a lot of sample utterances with the translated query. On the Generated Routing Data tab, you can quickly generate 100 utterances that reflect questions the logical model can answer.

generated

Those generated utterances are based on templates (buckets) and can be used to filter the data.

bucket


Approved utterances are included in the overall routing information. If a phrase has been modified, it will be labeled as Refined in the Combined Routing Data tab. If it has not been altered, it will be labeled as Synthetic.

If there are valid SQL queries that the DA or skill is not routing to the SQL conversation, then you need to add those utterances to the routing data from the Combined Routing Data tab. these are called Handcrafted Routing Data.

Combined Routing Data

The combined routing data is an aggregation of all autocomplete suggestions, custom training data, and generated and handcrafted routing data.

combined

Take a look at the below video to see the entire process!

More resources

Introducing the new Oracle Digital Assistant SQL Dialog

SQL Dialog: Customize how to present data

You can also check the documentation where we have plenty of details about SQL Dialogs!

Daniel Martins Teixeira

Principal Product Manager ODA

Daniel Teixeira is a Principal Product Manager within Oracle's Digital Assistant development team responsible for technical enablement. With a strong background in consulting and pre-sales, in and out of Oracle, he brings those experience elements into creating and delivering enablement sessions for partners and customers. He actively writes about Oracle Digital Assistant and other Oracle technology on techtrantor.com.

Show more

Previous Post

Analytics and digital assistants with Peak Indicators Webcast January 24th 2023

Jürgen Kress | 2 min read

Next Post


TechExchange: Unlocking the Power of Multi-lingual Digital Assistants with OCI AI Services

Frank Nimphius | 1 min read
Oracle Chatbot
Disconnected