Overview
Large Language Models (LLM’s) and Generative AI systems are rapidly evolving. For the developer, a Generative AI system enables peer programming where the peer is an expert in programming languages, analytics, data modelling, documentation, unit testing and much more. This article explores the use of Oracle APEX and Large Language Models for Text to SQL translation, a common task for every APEX developer. Given a question (in text or spoken form), the Generative AI system is tasked to come up with an Oracle SQL statement that “implements” the question given by the developer. The developer might then want to validate and test or further modify the SQL statement for their needs.
APEX Assistant
The developer-facing user interface of the prototypical Generative AI system is called the “APEX Assistant” within the Oracle APEX application development platform. In the context of this research project, the APEX Assistant is enabled for every SQL code editor; wherever the developer would have previously been asked to write SQL code, the APEX Assistant can be used to generate SQL from a textual (or spoken) description of the desired result.
Example: Creating Charts
Let’s assume we want to create a new bar- or pie chart in an APEX application that shows employees, departments or other data. In the chart we want to show the average salaries of employees grouped by the employee department name. We can use the “Create Chart” wizard as shown below –
Rather than asking the developer to provide the SQL statement to collect the data to populate the chart, we provide an interface to allow the developer to ask the following question in the code editor text area:
“show the average salaries of employees by their department names“.
The developer can then click the “APEX Assistant” button to create the matching SQL statement. This sends the text to a Generative AI REST service and returns a result in the form of an Oracle SQL statement as shown in the example below –
The developer can then verify that the SQL statement gives the correct result, validate the statement in the code editor, and then finish the Create Chart or Page wizard. The result is a new page in the APEX application with the desired data given by the SQL statement.
Example: Create a new Page
The process of creating a new application page is an area where a natural language interface is also a great fit, as shown below.
In this example an application page will be created to reflect the user request. Please note that we are not generating any code for this. APEX uses the concept of Application Blueprints and we’re using this infrastructure to create APEX metadata that will be used to create the new page.
High Level Architecture
The diagram below illustrates the high-level architecture of our current solution.
The solution is made up of the following components
- APEX Assistant
APEX Assistant is a user interface for the APEX developer. It is embedded into APEX at many places where SQL input is required. It currently expects a question that can be translated into SQL. In the future we are planning to enhance the APEX Assistant to offer more capabilities, including the creation of data models or entire APEX applications based on natural language guidance.
- NL2SQL infrastructure
The NL2SQL infrastructure handles the requests from the APEX Asssistant. It is primarily responsible for prompt (user request) engineering. Retrieval Augmented Generation (RAG) embeds user questions and finds nearest embeddings in the vector database; for example the most relevant tables to be added into the prompt.
- Oracle Database
The Oracle Database stores the tables and other objects related to the APEX application. In addition, the vector database component of the Oracle Database stores, indexes and retrieves embeddings.
The architecture is designed with clear functional boundaries to give the developer a choice of using other Generative AI Services like the OCI Generative AI Service in the future.
Retrieval Augmented Generation
Retrieval Augmented Generation (RAG) aims to improve the quality of the response by using an external datastore (like an Oracle Database with vector database functionality) at inference time to build a richer prompt that includes some combination of context, history and relevant “knowledge”. The diagram below illustrates where RAG is used for the APEX Assistant.
When a developer using the APEX Assistant asks for a response, the following sequence of steps is executed, including RAG:
- The developer creates a prompt in the form of a request. For example “Show me the average salary of employees by department name”
- The prompt provided by the developer is enriched with information retrieved from the external datastore- in our case the Oracle Database. This adds the following information –
– The table names used in the APEX application
– The attributes of all tables with their data type
– The primary key information for each table
– The foreign key relationships of the tables used in the APEX application - The query with the augmented prompt is sent to the LLM for inferencing
- The response is sent back to the APEX Assistant
For the use case we’re describing in this blog, RAG is used to augment the prompt with information about the tables used in the APEX application. This allows the LLM to make appropriate assumptions about the prompt provided by the developer and the table names and relationships within the APEX applicaiton.
| Text | Augmented prompt |
|---|---|
| employees | OEHR_EMPLOYEES |
| departments | OEHR_DEPARTMENTS |
| employees by department name | OEHR_EMPLOYEES.DEPARTMENT_ID = OEHR_DEPARTMENTS.DEPARTMENT_ID |
RAG integrates information retrieval (or searching) into a LLM and significantly improves the quality of the Large Language Model. RAG is useful when
- User or application context (table names, object structures) must be considered
- Factual information in the form of documents, sheets etc. must be considered
- Private data (invoices, sales contracts, but also data dictionary of app) must be considered
- Real-time context is required (location, weather)
Pre-trained LLMs do not learn over time, often hallucinate and may leak private data. Retrieval Augmented Generation is therefore a useful technique to provide more context to the prompts.
APEX Assistant Tests
We tested the APEX Assistant Text to SQL functionality from the APEX code editor using a range of test cases. The tests were performed on an HR data set that can be installed from APEX SQL Workshop –> Utilities –> Data Sets
| Prompt | Response | Comment |
|---|---|---|
| How many orders were made in 2015? What was their revenue? | select count(order_id),
sum(order_total)
from oehr_orders
where order_date between to_date('2015-01-01', 'YYYY-MM-DD')
and to_date('2016-01-01', 'YYYY-MM-DD') |
|
| Show the revenue of large orders by year | select extract (year from order_date) as year,
sum(order_total) as revenue
from oehr_orders
where order_total > 10000
group by year
order by year |
|
| Show the revenue of orders worth at least 50k by year, and the revenue of orders worth less than that | select extract(year from order_date) as year,
sum(order_total) as revenue)
from oehr_orders
where order_total >= 50000
group by year
union all
select extract(year from order_date) as year,
sum(order_total) as revenue
from oehr_orders
where order_total < 50000
group by year |
|
| Show the total revenue of orders for each customer, along with their name and country in a single column | select cust_first_name || ' '|| cust_last_name || ',' || country_name as customer_name_country,
sum(order_total) as revenue
from oehr_customers
join oehr_orders on oehr_customers.customer_id=oehr_orders.customer_id
join oehr_countries on oehr_customers.country_id = oehr_countries.country_id
group by customer_name_country
order by revenue desc |
|
| Show the total revenue of orders made in 2019 by country | select country_name,
sum(order_total) as revenue
from oehr_orders
join oehr_customers on oehr_orders.customer_id = oehr_customers.customer_id
join oehr_countries on oehr_customers.country_id = oehr_countries.country_id
where order_date >= '2019-01-01' and order_date <= '2019-12-31'
group by country_name
order by revenue desc |
|
Overall, the results are quite promising and we see a lot of value in text to SQL generation. We do recommend though that developers continue to carefully verify and test the results. The APEX Assistant is something an experienced developer can use to quickly get started or as a programming peer one can ask from time to time.
These tests are also model-dependent; any limitations can be addressed by fine-tuning the Large Language Model and further curation.
Acknowledgements
The work presented here is a collaborative effort between various Oracle research and development organizations. Thanks for invaluable input and passion about this project go out to
- Oracle Labs: Gojko Cutura, Trung-Dung Hoang, Sungho Park
- Oracle Machine Learning: Boriana Milenova
- Oracle APEX: Zsolt Angyal, Shakeeb Rahman, Ricardo Balam, Ronaldo Ruiz, Neil Fernandez
Outlook
The work presented here is just the beginning of bringing Generative AI capabilities to the no-code/low-code Oracle APEX platform. We’re planning to explore many more use cases in the following areas –
- Data model creation and authoring
A data model is the foundation of every APEX Application. We’re exploring the use of LLM’s for this task. The idea here is for a developer to say “I need a data model for inventory management” and then the LLM would return the SQL DDL for the tables used in Inventory Management (Products, Warehouses, Locations etc.). The developer can then author the model using a LLM. For example, the developer might want to add additional attributes to the tables or establish more relationships to existing tables etc.
- API generation
APEX applications often expose a (REST) API. The generation of the API might be performed by a LLM, e.g. a “Create an API for inventory management” request would create ORDS REST API’s with CRUD operations on the tables involved in inventory management.
- Business process creation
Last but not least, we’re exploring the creation of entire business processes or process blueprints using LLM’s. There is a lot of publicly available process knowledge available on the Internet, from simple approval processes for travel, requistions, and employee promotions to more complicated processes like Order-to-Cash, Procure-to-Pay or Patient Admission and Discharge. We want to explore the use of this knowledge to drive the creation of APEX Workflows.




