A common challenge in government and enterprise environments is the ability to analyze a complex subject area and cross-subject analysis across tables, schemas, and data types. As complexity grows, the need to simplify and streamline analysis and intelligence reporting becomes more urgent.

Now with Oracle Database 23ai and the rapid evolution of running LLMs locally, such as ollama, we can run various models and tune them based on their requirements and computing constraints.

These challenges become increasingly difficult at the edge. This blog  will show you a simple example of how you can experiment with Oracle Database 23ai, OLLAMA, and Oracle Cloud Roving Edge Infrastrcture to address these concerns – all inside a private network.

Here is what’s currently open-source:

Model

Parameters

Size

Download

Llama 3

8B

4.7GB

ollama run llama3

Llama 3

70B

40GB

ollama run llama3:70b

Phi-3

3.8B

2.3GB

ollama run phi3

Mistral

7B

4.1GB

ollama run mistral

Neural Chat

7B

4.1GB

ollama run neural-chat

Starling

7B

4.1GB

ollama run starling-lm

Code Llama

7B

3.8GB

ollama run codellama

Llama 2 Uncensored

7B

3.8GB

ollama run llama2-uncensored

LLaVA

7B

4.5GB

ollama run llava

Gemma

2B

1.4GB

ollama run gemma:2b

Gemma

7B

4.8GB

ollama run gemma:7b

Solar

10.7B

6.1GB

ollama run solar

 

With the rapid pace of generative artificial intelligence, we’re committed at Oracle Cloud Infrastructure (OCI) to show our customers and partners how to incrementally adopt and innovate their enterprise environments and empower their technical teams.

Enterprise and public sector organizations can now explore these capabilities further at the edge with our roving edge infrastructure tied to your Oracle Cloud Infrastructure tenancy.

 

Setting up an example environment

All lab code and references can be found on Github.

 

Infrastructure Deployment

Here is a quick start guide on how to integrate these technologies with Oracle Cloud Infrastructure (OCI) and Oracle Database 23ai on the OCI Base Database service:

  • Create an Oracle Database virtual machine
  • Provision a Linux VM with GPU drivers
  • Transfer all ONNX or LLM models to the object store
    • Example CLI Command: oci os object put -ns aigitops -bn mybucket –name example.onnx –file /Users/me/example.onnx –metadata ‘{“lab”:”dataisfun”,”costCenter”:”dataengineering257”}’
  • Request a RED node and attach the two created VMs (Oracle Database VM and Linux VM), along with the object bucket containing your LLM/ONNX models.

 

Operating System and LLM API Setup

  • Install a Linux VM with OL8/9 and Podman, or use an OL VM.
  • Follow these steps for the database (DB) installation:

  • Install Podman and pull the Oracle Database Image (Autonomous Database 23ai Free Container Image – atp-free).

  • For the LLM installation:


    # Step 1: Check GPU Devices
    ubuntu-drivers devices
    
    # Step 2: Install NVIDIA Library Packages
    sudo apt update
    sudo apt install libnvidia-common-535 libnvidia-gl-535 -y
    
    # Step 3: Install NVIDIA Driver
    sudo apt install nvidia-driver-535-server -y
    
    # Step 4: Reboot System
    sudo reboot
    
    # Step 5: Validate Driver Installation
    nvidia-smi
    
    # Step 6: Install Ollama AI Platform
    curl -fsSL https://ollama.com/install.sh | sh
    
    # Step 7: Configure Ollama Service
    sudo tee -a /etc/systemd/system/ollama.service > /dev/null << EOF
    [Service]
    Environment=OLLAMA_HOST=0.0.0.0
    Environment=OLLAMA_KEEP_ALIVE=-1
    EOF
    
    # Step 8: Open Firewall Port
    sudo iptables -A INPUT -p tcp --dport 11434 -j ACCEPT
    sudo -s iptables-save -c
    
    # Step 9: Reload and Restart Services
    sudo systemctl daemon-reload
    sudo systemctl restart ollama.service
    
    # Step 10: Check Service Status
    netstat -tulpn
    
    # Step 11: Pull and Use a Model (Manual Step)
    ollama pull 
      
       :
       
        "
    curl http://IP:11434/api/generate -d '{\"model\": \"
        
         \", \"keep_alive\": -1}'"
      
        
       
      

Example ouput of ollama and GPU binded 

  • Test out /generate endpoint

    Example curl to test the /generate private endpoint 

  • Test out the /embeddings endpoint

    Test of private endpoint for LLM embeddings

  • Test out the /chat endpoint

    Curl command to private endpoint /chat 

​​​​​​For an example bash script, click here.

 

Database User & Table Setup for Pre-Trained In-DB Model from HuggingFace

Starting with OML4Py 2.0, the client converts pre-trained transformer models from Hugging Face to the Open Neural Network Exchange (ONNX) format for Oracle Database 23ai AI Vector Search. Hugging Face is an open-source Python library for deep learning, offering thousands of pre-trained models for natural language processing (NLP), computer vision, audio, and more. ONNX is an open format for representing various machine learning models, including transformers, classification, regression, and other types.

The OML4Py client streamlines the path to embedding generation in the database by downloading pre-trained models from the Hugging Face repository, augmenting them with pre- and post-processing steps, converting the augmented model to ONNX format, and loading it into an in-database model or exporting it to a file. Once loaded in ONNX format, you can seamlessly use the in-database ONNX Runtime to generate vector embeddings for Oracle Database 23ai AI Vector Search.  The proceeding steps below assume you have acquired and created the ONNX models and placed on the DB Operating System or Roving Edge Object Store.

As the SYS user, or a user with the necessary privileges, connect to the database and create a new user, in this case, named “Analyst”. This user will serve as the database user with access to the local in-database models in ONNX format. By defining this user, you enable the database to locally call the LLM server.

Once the "Analyst" user is set up, you can utilize local in-database models in ONNX format to create vectors and perform further analysis or model training. Ensure that the necessary permissions and access controls are in place for the "Analyst" user to interact with the ONNX models and any relevant data within the database.

 

  1. Create database directory or pull from object storage onto the DB VM. Here I’m using a directory /u01/models and gave it a name of DM_DUMP.
  2. Grant read, write on directory to analyst;
  3. Grant DB privs to allow for database data mining api calls to analyst.
  4. Load Model as analyst user with :
    1. exec dbms_vector.load_onnx_model('DM_DUMP', 'all-MiniLM-L6-v2.onnx', 'doc_model', JSON('{"function" : "embedding", "embeddingOutput" : "embedding", "input" : {"input": ["DATA"]}}'));

 

  1. Load various documents or word files into database table (only doing one file as example).

    drop table demo_tab;
    
    create table demo_tab (id number, data blob);
    
    insert into demo_tab values(2, to_blob(bfilename('DM_DUMP', 'oracle-ai-vector-search-users-guide.pdf')));
    
    commit;
    
     
    --generate chunks and vectors using in-database model
    
    var chunk_params clob;
    
    exec :chunk_params := '{"by":"words", "max": "50", "split":"sentence", "normalize":"all"}';
    
     
    
    var embed_params clob;
    
    exec :embed_params := '{"provider":"database", "model":"doc_model"}';
    
     
    
    create table vectab as (
    
        select dt.id docid,
    
        et.embed_id chunk_id,
    
        et.embed_data chunk,
    
        to_vector(et.embed_vector) vector
    
    from demo_tab dt,
    
    dbms_vector_chain.utl_to_embeddings(dbms_vector_chain.utl_to_chunks(dbms_vector_chain.utl_to_text(dt.data), json('{"normalize":"all"}')),
    
         json('{"provider":"database", "model":"doc_model"}')) t,
    
         json_table(t.column_value, '$[*]' columns (
    
            embed_id number path '$.embed_id',
    
            embed_data varchar2(4000) path '$.embed_data',
    
            embed_vector clob path '$.embed_vector')) et
    
    );
    

Created table for vectorized input

 

Table embeded with chunked blog data

Create Index for Vectors

CREATE VECTOR INDEX vectab_vidx ON vectab (vector) ORGANIZATION NEIGHBOR PARTITIONS;

Simple PL/SQL search but can be REST, APEX, etc. as exposed.


    declare
        vec vector;
        query varchar2(128) := 'embedding';
        ans varchar2(256);
    cursor c_t2 is
            select chunk
            from vectab
            order by vector_distance(vector, vec, cosine)
            fetch first 5 rows only;
    begin
        dbms_output.put_line('Similiarity Search using PL/SQL and vector_embedding');
        dbms_output.put_line('====================================================');
        dbms_output.put_line('');
        select vector_embedding(doc_model using query as data) into vec;
        dbms_output.put_line(to_char(vec));
        for item in c_t2
        loop
            dbms_output.put_line('======item======');
            dbms_output.put_line(item.chunk);
        end loop;
    end;
    

External Database LLM service call to another VM with GPU exposing the model endpoint


    declare
        vec vector;
        query varchar2(128) := 'embedding';
        ans varchar2(256);
    cursor c_t2 is
            select chunk
            from vectab
            order by vector_distance(vector, vec, cosine)
            fetch first 5 rows only;
    begin
        dbms_output.put_line('Similiarity Search using PL/SQL and vector_embedding');
        dbms_output.put_line('====================================================');
        dbms_output.put_line('');
        select vector_embedding(doc_model using query as data) into vec;
        dbms_output.put_line(to_char(vec));
        for item in c_t2
        loop
            dbms_output.put_line('======item======');
            dbms_output.put_line(item.chunk);
        end loop;
    end;
    

Now we’re calling the private IP on the RED Device the VM of the LLM service private ip RESTful endpoint.


    SELECT
        dbms_vector.utl_to_embedding(
            'hello world',
            json('{
                "provider": "ollama",
                "host": "local",
                "url": "http://10.x.x.x:11434/api/embeddings",
                "model": "llama3"
            }')
        )
    FROM dual;
    

Testing the private LLM endpoint directly into pl/sql

Now we test with embedding call for vectors.


    SELECT
    --How to create vectors, vector indexes using local LLM calls
    SELECT /*+ parallel 2 */
        JSON_VALUE(ch.COLUMN_VALUE, '$.chunk_id') embed_id,
        JSON_VALUE(ch.COLUMN_VALUE, '$.chunk_data') embed_data,
        dbms_vector_chain.utl_to_embedding(
            JSON_VALUE(ch.COLUMN_VALUE, '$.chunk_data'),
            json('{
                "provider": "ollama",
                "host": "local",
                "url": "http://10.x.x.x:11434/api/embeddings",
                "model": "llama3",
                "num_ctx":"4096"
            }')
        ) embed_vector
    FROM
        dbms_vector_chain.utl_to_chunks(
            dbms_vector_chain.utl_to_text(
                (select data from demo_tab)
            ),
            json('{"normalize":"all"}')
        ) ch;
    
    

Testing the private LLM embedding directly into pl/sql

Same document test previously, let’s chunk, create vectors and store in database but use the LLM Service endpoint vs in-database. 


    SELECT /*+ parallel 2 */
        JSON_VALUE(ch.COLUMN_VALUE, '$.chunk_id') embed_id,
        JSON_VALUE(ch.COLUMN_VALUE, '$.chunk_data') embed_data,
        dbms_vector_chain.utl_to_embedding(
            JSON_VALUE(ch.COLUMN_VALUE, '$.chunk_data'),
            json('{
                "provider": "ollama",
                "host": "local",
                "url": "http://10.x.x.x:11434/api/embeddings",
                "model": "llama3",
                "num_ctx":"4096"
            }')
        ) embed_vector
    FROM
        dbms_vector_chain.utl_to_chunks(
            dbms_vector_chain.utl_to_text(
                (select data from demo_tab)
            ),
            json('{"normalize":"all"}')
        ) ch;
    

 

Benchmarks

For this benchmark, we utilized a powerful RED system with 40 cores (80 virtual CPUs), 512 GB of RAM, and 61 TB of storage. Enhanced with high-speed networking and an embedded GPU, the system excels in AI processing capabilities. We configured a 2-core DB Node VM and a 2-core GPU VM to demonstrate the full potential of GPU passthrough technology. With this setup, our virtual machine gained exclusive access to the entire GPU, showcasing the RED system’s power and flexibility in providing resources for computational tasks, especially those leveraging GPU acceleration.

This setup also highlights the converged database approach (i.e., support for multi-model and multi-workload types) of Oracle Database 23ai to efficiently utilize in-database ONNX models or externally call upon GPU-optimized large language models, all at the tactical edge. Furthermore, with tools like Data Sync and Golden Gate, customers are empowered with flexible options for data movement, even in scenarios with intermittent communications.

LLM benchmark test of llama3 inside Oracle Roving Edge Infrastructure

 

Conclusion

Unlock the potential of your enterprise data with the next generation of analysis and intelligence powered by Oracle Cloud Infrastructure and a Roving Edge Device. By leveraging the capabilities of Oracle Cloud and extending it to the edge, you can bring the transformative power of Large Language Models (LLMs) directly to your edge devices.

Imagine the possibilities of having advanced language understanding, generation, and decision-making capabilities right where your data is stored and modified. With Oracle Cloud Infrastructure, Oracle Database 23ai and a Roving Edge Device, you can harness the full potential of LLMs to drive innovation, gain insights, and make autonomous decisions at the edge. Speak with your Oracle representative today to learn more about how you can revolutionize your data analysis and unlock new intelligence with this combination of technologies, bringing the power of LLMs directly to your edge operations.