Large Language Models (LLM's) and Generative AI systems are rapidly evolving. For the developer, a Generative AI system enables peer programming where the peer is an expert in programming languages, analytics, data modelling, documentation, unit testing and much more. This article explores the use of Oracle APEX and Large Language Models for Text to SQL translation, a common task for every APEX developer. Given a question (in text or spoken form), the Generative AI system is tasked to come up with an Oracle SQL statement that "implements" the question given by the developer. The developer might then want to validate and test or further modify the SQL statement for their needs.
The developer-facing user interface of the prototypical Generative AI system is called the "APEX Assistant" within the Oracle APEX application development platform. In the context of this research project, the APEX Assistant is enabled for every SQL code editor; wherever the developer would have previously been asked to write SQL code, the APEX Assistant can be used to generate SQL from a textual (or spoken) description of the desired result.
Let's assume we want to create a new bar- or pie chart in an APEX application that shows employees, departments or other data. In the chart we want to show the average salaries of employees grouped by the employee department name. We can use the "Create Chart" wizard as shown below -
Rather than asking the developer to provide the SQL statement to collect the data to populate the chart, we provide an interface to allow the developer to ask the following question in the code editor text area:
"show the average salaries of employees by their department names".
The developer can then click the "APEX Assistant" button to create the matching SQL statement. This sends the text to a Generative AI REST service and returns a result in the form of an Oracle SQL statement as shown in the example below -
The developer can then verify that the SQL statement gives the correct result, validate the statement in the code editor, and then finish the Create Chart or Page wizard. The result is a new page in the APEX application with the desired data given by the SQL statement.
The process of creating a new application page is an area where a natural language interface is also a great fit, as shown below.
In this example an application page will be created to reflect the user request. Please note that we are not generating any code for this. APEX uses the concept of Application Blueprints and we're using this infrastructure to create APEX metadata that will be used to create the new page.
The diagram below illustrates the high-level architecture of our current solution.
The solution is made up of the following components
The architecture is designed with clear functional boundaries to give the developer a choice of using other Generative AI Services like the OCI Generative AI Service in the future.
Retrieval Augmented Generation (RAG) aims to improve the quality of the response by using an external datastore (like an Oracle Database with vector database functionality) at inference time to build a richer prompt that includes some combination of context, history and relevant "knowledge". The diagram below illustrates where RAG is used for the APEX Assistant.
When a developer using the APEX Assistant asks for a response, the following sequence of steps is executed, including RAG:
For the use case we're describing in this blog, RAG is used to augment the prompt with information about the tables used in the APEX application. This allows the LLM to make appropriate assumptions about the prompt provided by the developer and the table names and relationships within the APEX applicaiton.
Text |
Augmented prompt |
---|---|
employees | OEHR_EMPLOYEES |
departments | OEHR_DEPARTMENTS |
employees by department name | OEHR_EMPLOYEES.DEPARTMENT_ID = OEHR_DEPARTMENTS.DEPARTMENT_ID |
RAG integrates information retrieval (or searching) into a LLM and significantly improves the quality of the Large Language Model. RAG is useful when
Pre-trained LLMs do not learn over time, often hallucinate and may leak private data. Retrieval Augmented Generation is therefore a useful technique to provide more context to the prompts.
We tested the APEX Assistant Text to SQL functionality from the APEX code editor using a range of test cases. The tests were performed on an HR data set that can be installed from APEX SQL Workshop --> Utilities --> Data Sets
Prompt | Response | Comment |
---|---|---|
How many orders were made in 2015? What was their revenue? |
|
|
Show the revenue of large orders by year |
|
|
Show the revenue of orders worth at least 50k by year, and the revenue of orders worth less than that |
|
|
Show the total revenue of orders for each customer, along with their name and country in a single column |
|
|
Show the total revenue of orders made in 2019 by country |
|
|
Overall, the results are quite promising and we see a lot of value in text to SQL generation. We do recommend though that developers continue to carefully verify and test the results. The APEX Assistant is something an experienced developer can use to quickly get started or as a programming peer one can ask from time to time.
These tests are also model-dependent; any limitations can be addressed by fine-tuning the Large Language Model and further curation.
The work presented here is a collaborative effort between various Oracle research and development organizations. Thanks for invaluable input and passion about this project go out to
The work presented here is just the beginning of bringing Generative AI capabilities to the no-code/low-code Oracle APEX platform. We're planning to explore many more use cases in the following areas -
Ralf Mueller is an Architect in the Oracle APEX Development team focusing on Digital Process Automation. He works for Oracle for more than 26 years; 20+ years in the area of Integration and Process Automation. Ralf was involved in the OASIS standards for BPEL4People and WS-HumanTask and represents Oracle in OMG for BPMN 2.0 and CMMN 1.0.
Rhicheek Patra joined Oracle Labs after completing his Ph.D. at EPFL and is interested in the domain of machine learning (including graph learning), privacy & security, and recommender systems. His research work has led to publications in many premier conferences like ICML, VLDB, ICDE and others. While he likes to continue his research in the above-mentioned topics, he also likes to put them into practice.
Giulia Carocari joined Oracle Labs in May 2022 after completing her Master's degree in Computer Science at ETH Zurich. At Oracle Labs she contributes to natural language processing and automated machine learning projects. Giulia loves both natural and programming languages; even more so, she enjoys researching how to bridge the gap between the two.
Moein Owhadi Kareshk is a Senior Member of Technical Staff at Oracle Labs. He is passionate about Large Language Models (LLMs), generative models, Automated ML (AutoML), and scalable ML infrastructure design. Since he joined Oracle Labs in 2021, he has collaborated with the AutoMLx and Oracle machine Learning (OML) teams in their R&D projects to deliver novel and robust ML solutions to their customers.
Hesam earned his bachelor's and master's degrees from University of Waterloo and Stanford University. He has been with the Sun/Oracle Labs team since 2007, consistently contributing to cutting-edge research and development. Hesam's primary areas of expertise and passion lie in the fields of Machine Learning, Explainability, and Generative Artificial Intelligence.