Integrating OCI Generative AI with Select AI and APEX to query data using natural language

February 28, 2024 | 8 minute read
Rekha Mathew
Cloud Solution Architect | A-Team
Text Size 100%:

Introduction

One remarkable capability of Large Language Models (LLMs) is their ability to generate code, including Structured Query Language (SQL) for databases. LLMs can understand the natural language question and generate a corresponding SQL query as an output. 

A large amount of enterprise data resides in databases, and the ability to query databases in natural language makes databases more accessible to users. Business users and data analysts can ask questions about data in natural language. For technical users, it reduces the time required to generate queries, simplifies query building, and helps to minimize or eliminate specialized SQL knowledge.

The Select AI feature allows Autonomous Database to use  Large Language Models (LLMs) to convert user's input text into Oracle SQL.

In this blog, we will see how to integrate an OCI Generative AI model with Select AI to query data stored in an Autonomous database using natural language prompts.

Architecture

The following diagram depicts the reference architecture.

Architecture

Components

Following are the components used in the reference architecture.

Autonomous Database is a fully automated database service that makes it easy to develop and deploy application workloads, including Data Warehousing, Transaction Processing, JSON-centric Applications, and Oracle APEX Low Code Development.

Select AI This feature allows Autonomous Database to use  Large Language Models (LLMs) to convert the user's input text into Oracle SQL. The database augments the user-specified prompt with database metadata to mitigate hallucinations from the LLM. The augmented prompt is then sent to the user-specified LLM to produce the query. This metadata may include schema definitions, table and column comments, and content available from the data dictionary and catalog. 

OCI Generative AI is a fully managed service that provides a set of customizable large language models (LLMs) that cover a wide range of use cases, including writing assistance, summarization, analysis, and chat.

APEX is a fully managed, low-code application development platform for building and deploying applications in Oracle Cloud. 

Prerequisites

  • Access to an Oracle Cloud Infrastructure cloud account, an Autonomous Database instance, and OCI Generative AI service.
  • You have database Admin access and access to a schema where you want to use Select AI.

Steps

Following are the steps to integrate Select AI with OCI Generative AI service.

1. The DBMS_CLOUD_AI package in Autonomous Database enables integration with a user-specified LLM. This package works with AI providers like OpenAI, Cohere, Azure OpenAI Service,  and Oracle Cloud Infrastructure Generative AI.      

    As the first step, grant the EXECUTE privilege on the DBMS_CLOUD_AI package to the schema where business data resides.

    Run the following command as ADMIN.

    GRANT EXECUTE ON DBMS_CLOUD_AI TO <<schema name>>;

2. To access OCI Generative AI service from Select AI, set the required IAM policies. An example is given below.

    allow group <your-group-name> to manage generative-ai-family in compartment <your-compartment-name>

3. Connect as an ADMIN user to the database and enable OCI resource principal authentication using DBMS_CLOUD_ADMIN.ENABLE_PRINCIPAL_AUTH procedure.

    BEGIN

        DBMS_CLOUD_ADMIN.ENABLE_PRINCIPAL_AUTH(provider => 'OCI',username =>'<<your schema name>>');

    END; 

4. Create a new AI profile in your business schema using DBMS_CLOUD_AI.CREATE_PROFILE procedure.

  BEGIN                                                                        
       DBMS_CLOUD_AI.CREATE_PROFILE(                                              
      'OCI_GENAI',                                                             
      '{
        "provider": "oci",
        "credential_name": "OCI$RESOURCE_PRINCIPAL",
        "object_list": [
                         {
                         "owner": "<<your schema name>>"
                         }
                        ],
        "model": "cohere.command-light",
        "oci_runtimetype": "COHERE",
        "temperature":"0.4"
      }');                                                                  
    END;                                                                        

     Attributes of an AI profile help to manage and configure the behavior of the AI profile. For the full list of attributes refer to this.

     provider: oci

     credential_name: OCI$RESOURCE_PRINCIPAL

     object_list: It is an array of JSON objects specifying the owner and object names that are eligible for natural language translation to SQL. To include all objects of a given user, omit the            "name" and only specify the "owner" key in the JSON object. 

      model: The name of the AI model being used. Pretrained models for OCI Generative AI are all supported by Select AI.  The default is cohere.command. Custom models can also be supplied with their full OCIDs.

      oci_runtimetype: This attribute indicates the runtime type of the provided model. This attribute is required when the model attribute is specified. 

      temperature: Temperature is used to tune the degree of randomness in the text generated. Lower temperatures mean fewer random generations.

5.  Set an AI profile before running Select AI commands. DBMS_CLOUD_AI.SET_PROFILE  procedure sets the AI profile for the current session.

      EXEC DBMS_CLOUD_AI.set_profile('OCI_GENAI');

6. After setting an AI profile for the database session, any SQL statement with the prefix SELECT AI is considered a natural language prompt.  The AI keyword in a SELECT statement instructs the SQL execution engine to use the LLM identified in the active AI profile to process natural language and generate SQL. You cannot run PL/SQL statements, DDL statements, or DML statements using the AI keyword.

The syntax for running AI prompt is:

SELECT AI action natural_language_prompt;

Actions can be runsql, showsql, narrate, and chat.

runsql, runs the provided SQL command using a natural language prompt. This is the default action and it is optional to specify this parameter. showsql, displays the SQL statement for a natural language prompt. narrate, explains the output of the prompt in natural language. chat is for general conversation with LLM. An example of using Select AI is shown below.

SELECT AI show all employees in Accounting department;

Select AI is not supported in Database Actions or APEX Service. You can use only DBMS_CLOUD_AI.GENERATE function.

    SELECT DBMS_CLOUD_AI.GENERATE(prompt       => 'show all employees in Accounting department',
                              profile_name => 'OCI_GENAI'
                             )
    FROM dual;

 

7. If you need a UI to run your prompts, one option is to use Oracle APEX.

    In the APEX workspace, create an Application with a page.

    Create a Page Item called PROMPT of type Textarea and a button to submit the prompt.

APEX

  Create a  Classic Report Region and Type as Function Body returning SQL Query. Enable Use Generic Column Names property and enter number of  columns in Generic Column Count.

  Enter PL/SQL Function Body as follows. DBMS_CLOUD_AI.GENERATE returns the SQL query using SELECT AI.

  BEGIN
    IF :PROMPT IS NOT NULL THEN
        RETURN DBMS_CLOUD_AI.GENERATE(:PROMPT,
                              profile_name => 'OCI_GENAI');
     END IF;
   END;

APEX

 

In the Attributes tab of the Report region, set Heading Type as Column Names

apex

When the page is run, you can ask a question and get a response from the database. The schema used in this example is the HR sample schema.

Entering a prompt, show all employees in Accounting Department , returns the results as below.

APEX

 

Entering a prompt, show all locations in the US ,returns the results as below.

APEX

Conclusion

The ability of LLMs to generate queries adds more value to enterprise data. It substantially enhances the user experience, making data access more efficient. However, it's important to remember that LLMs are susceptible to hallucinations, and their generated results may not always be accurate. There's a possibility that a specific natural language prompt may not produce the intended SQL query, or the generated SQL may not be executable. Despite these limitations, the integration of LLMs remains a valuable asset for data management, provided users remain aware of its potential inaccuracies and exercise caution in interpreting and implementing generated queries.

 

Rekha Mathew

Cloud Solution Architect | A-Team


Previous Post

Managing Public DNS records During a Recovery using OCI Full Stack DR

Next Post


OCI Object Storage Vanity URL using the Cloudflare CDN - Public Buckets

Radu Nistor | 7 min read