INTRODUCTION
You have documents, text, image, audio files and relational data stored in your Oracle Autonomous Database, and you need to compliment your business analysis with semantic search capabilities to analyze this data. Whether you are looking for usage anomalies related to fraud, monitoring news related to your business or simply tracking customer support calls related to broken ski helmet buckles, Autonomous Database offers semantic search capabilities today, in 19c, by invoking Oracle Infrastructure Cloud (OCI), AWS Lambda or Azure cloud functions as SQL functions to perform vector search in the Oracle Autonomous Database without requiring AI or ML skills.
Vector search is a method of information retrieval where texts, documents and queries are represented as vectors instead of plain text. In vector search, Large Language embedding Models (LLMs) generate vector representations of source inputs, which can be text, images, or other content. It is an efficient way to query and browse unstructured data while providing a semantic understanding of content. User-defined functions are used for vector conversion and can be written in PL/SQL (calling external REST APIs using the DBMS_CLOUD package) or Python code wrapped as a User-Defined OCI function.
So how do we accomplish this today in Oracle Autonomous Database? The use case we explore for this blog is a semantic search through aggregated news article titles to answer a very specific question ‘What is the future of crypto currency?’ demonstrating these two common methods – PL/SQL and Python.
Method 1: PL/SQL USER DEFINED FUNCTIONS
For PL/SQL, we create two user defined functions, one is for embedding text query to vector mapping, another is cosine similarity function to determine similarity between two vectors.
Vector Embedding Function:
Requires OPENAI_CRED described later in this page
CREATE OR REPLACE FUNCTION VECTOREMBED(QUERY in VARCHAR2)
RETURN CLOB
IS
l_headers VARCHAR2(4096);
l_req_payload JSON_OBJECT_T;
l_data_array JSON_ARRAY_T;
l_query JSON_ARRAY_T;
l_embedding JSON_ARRAY_T;
l_resp dbms_cloud_types.resp;
l_ans CLOB;
BEGIN
l_headers := '{''Content-Type'':''application/json''}';
l_query := new json_array_t;
l_query.append(QUERY);
l_req_payload := json_object_t.parse('{}');
l_req_payload.put('input', l_query);
l_req_payload.put('model', 'text-embedding-3-small');
l_req_payload.put('dimensions', 200);
l_resp := DBMS_CLOUD.SEND_REQUEST (
credential_name => 'OPENAI_CRED',
uri => 'bearer://api.openai.com/v1/embeddings',
headers => l_headers,
body => l_req_payload.to_blob,
method => DBMS_CLOUD.METHOD_POST
);
l_data_array := json_object_t(DBMS_CLOUD.get_response_text(l_resp)).get_array('data');
l_ans := json_object_t(l_data_array.get(0)).get_array('embedding').to_clob;
return l_ans;
END VECTOREMBED;
Cosine Function:
CREATE OR REPLACE FUNCTION COSINE_SIMILARITY(VECTOR1 in clob, VECTOR2 in clob)
RETURN NUMBER
IS
l_arr_1 JSON_ARRAY_T;
l_arr_2 JSON_ARRAY_T;
l_ans_1 NUMBER;
l_ans_2 NUMBER;
l_ans_3 NUMBER;
num_1 NUMBER;
num_2 NUMBER;
ans NUMBER;
BEGIN
l_arr_1 := JSON_ARRAY_T.parse(VECTOR1);
l_arr_2 := JSON_ARRAY_T.parse(VECTOR2);
l_ans_1 := 0;
l_ans_2 := 0;
l_ans_3 := 0;
for indx in 0 .. l_arr_1.get_size - 1
LOOP
num_1 := l_arr_1.get_number(indx);
num_2 := l_arr_2.get_number(indx);
l_ans_1 := l_ans_1 + num_1 * num_2;
l_ans_2 := l_ans_2 + num_1 * num_1;
l_ans_3 := l_ans_3 + num_2 * num_2;
END LOOP;
ans := l_ans_1 / sqrt(l_ans_3) / sqrt(l_ans_2);
RETURN ROUND(ans, 5);
END COSINE_SIMILARITY;
Method 2: OCI PYTHON USER DEFINED FUNCTIONS
Oracle Cloud Infrastructure (OCI) Functions is a serverless compute service based on the open source Fn Project that lets developers create, run and scale applications without managing any infrastructure. As with the previous PL/SQL example, we create two user defined functions: one is for the user-defined function for cosine similarity, func_cosine.py, to determine similarity between two vectors; the other function, func_embed.py is for embedding text query to vector mapping.
- Create and deploy OCI Functions – Steps to create and deploy OCI Functions
- Include required packages in function specific requirement.txt files located in the function specific foders created by OCI Functions init.
requirements.txt for func_cosine.py function:
fdk==0.1.60 numpy
Create User-Defined Function func_cosine.py
import io
import json
from fdk import response
import numpy as np
def cosine_distance(query_vector, text_vector):
query_vector = np.asarray(query_vector, dtype=float)
text_vector = np.asarray(text_vector, dtype=float)
# Use cosine as measurement of distance
dot_product = np.dot(query_vector, text_vector)
denominator = (np.linalg.norm(query_vector) * np.linalg.norm(text_vector))
return dot_product/denominator
def handler(ctx, data: io.BytesIO=None):
try:
body = json.loads(data.getvalue())
sql_query = body.get("QUERY")
text_vector = body.get("TEXT_VECTOR")
response_body = json.loads(sql_query)["RESPONSE_BODY"]
json_query = json.loads(response_body)["message"]
query_vector = list(json_query)
# Normalize text vector and embed query
norm_text = text_vector.strip('][').split(', ')
for i in range(len(norm_text)):
query_vector[i] = float(query_vector[i])
norm_text[i] = float(norm_text[i])
d = round(cosine_distance(query_vector, norm_text), 5)
except (Exception, ValueError) as ex:
print(str(ex), flush=True)
return response.Response(
ctx, response_data=json.dumps(
{"message": d}),
headers={"Content-Type": "application/json"}
)
Here, query_vector is embedded from a query like “What is the future of crypto currency?”, text_vector represents each vector row ‘embedded’ in the database. Cosine distance measures the distance or similarity between two vectors of an inner product space
Create OCI Function func_embed.py using OCI Function Development Kit (FDK) for Python. The function handler is the entry point to invoke the UDF.
requirements.txt for func_embed.py function:
fdk==0.1.60 langchain langchain_openai
NOTE: if Langchain package is required, fdk package has to be an older version, otherwise it will throw unexpected error)
func_embed.py
import io
import json
from langchain_openai import OpenAIEmbeddings
from fdk import response
OPENAI_API_KEY = "****"
EMBEDDINGS_OPENAI = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY, model="text-embedding-3-small", dimensions=200)
def handler(ctx, data: io.BytesIO=None):
try:
body = json.loads(data.getvalue())
sql_query = body.get("QUERY")
# Normalize text vector and embed query
query_vector =EMBEDDINGS_OPENAI.embed_query(sql_query)
for i in range(len(query_vector)):
query_vector[i] = round(query_vector[i], 8)
except (Exception, ValueError) as ex:
print(str(ex), flush=True)
return response.Response(
ctx, response_data=json.dumps(
{"message": query_vector}),
headers={"Content-Type": "application/json"}
)
NOTE: Replace OPENAI_API_KEY = “****” with your actual OPENAI_API_KEY value
Above is the OCI cloud function handler func_embed.py created with FDK package. This function embeds the query into query_vector and calculates the distance between query vector and text vectors stored in database. Here we using the OpenAI Embedding LLM.
Embed and Insert
In this section, we will transform texts to vectors and store vectors in database tables using the two methods.
For the Python method we create two tables, UCI_NEWS_AGGREGATOR and NEWS_EMBEDDED. Load the news-aggregator-dataset in to the table UCI_NEWS_AGGREGATOR using the table description defined below. The NEWS_EMBEDDED table is used to embed and store the field TITLE data into vectors.
When using the PL/SQL method, we create the UCI_NEWS_AGGREGATOR table as in the Python method, but for the purposes of this blog we create a separate NEWS_EMBEDDED_SQL table so that we can compare the vector embedding between the two methods. The NEWS_EMBEDDED_SQL table for the PL/SQL Vector Transfer function described below, is identical to the Python NEWS_EMBEDDED table.
For this blog we use a public domain news aggregator dataset:
https://www.kaggle.com/datasets/uciml/news-aggregator-dataset

Create and insert data into the UCI_NEWS_AGGREGATOR table
This table has the following definition:
desc UCI_NEWS_AGGREGATOR
Name Null? Type
——— —– ————–
ID NUMBER
TITLE VARCHAR2(4000)
URL VARCHAR2(4000)
PUBLISHER VARCHAR2(4000)
CATEGORY VARCHAR2(4000)
STORY VARCHAR2(4000)
HOSTNAME VARCHAR2(4000)
TIMESTAMP NUMBER
NOTE: This table can be created as an external table eliminating the need to execute a load operation to populate the table. For more detail on creating external tables in Oracle Autonomous Database using DBMS_CLOUD please see DBMS_CLOUD.CREATE_EXTERNAL_TABLE.
Create NEWS_EMBEDDED and NEWS_EMBEDDED_SQL tables with the following description
We need to embed field TITLE into vectors and store them in the NEWS_EMBEDDED table. This table has the following definition:
desc NEWS_EMBEDDED
Name Null? Type
———— —– ————–
TITLE VARCHAR2(4000)
URL VARCHAR2(4000)
CATEGORY VARCHAR2(4000)
TIMESTAMP NUMBER
TITLE_VECTOR CLOB
NOTE: as mentioned above, for PL/SQL our example uses the same table description for the NEWS_EMBEDDED_SQL for the TITLE embedding.
When using the PL/SQL method, create an OPENAI_CREDENTIAL and use the following PL/SQL to transfer and embed the TITLE vectors
Method 1: PL/SQL Embed and Insert
Create OPENAI_CREDENTIAL
BEGIN
DBMS_NETWORK_ACL_ADMIN.APPEND_HOST_ACE(
host => 'api.openai.com',
ace => xs$ace_type(privilege_list => xs$name_list('http'),
principal_name => 'ADMIN',
principal_type => xs_acl.ptype_db)
);
END;
EXEC DBMS_CLOUD.CREATE_CREDENTIAL('OPENAI_CRED', 'OPENAI', 'OPENAIKEY');
Use PL/SQL to perform TITLE embedding directly by DBMS_CLOUD.SEND_REQUEST API.
Create and execute the tranfer vector PL/SQL
-- Vector Transfer Using OPENAI Embedding LLM
-- Inserts in to table NEWS_EMBEDDED_SQL
DECLARE
type refcur is ref cursor;
l_cursor refcur;
l_sqlstmt clob;
l_insert_stmt clob;
l_headers VARCHAR2(4096);
l_title VARCHAR2(4096);
l_resp dbms_cloud_types.resp;
l_req_payload JSON_OBJECT_T;
l_resp_json JSON_OBJECT_T;
l_data_array JSON_ARRAY_T;
l_titles JSON_ARRAY_T;
l_embedding JSON_ARRAY_T;
BEGIN
l_headers := '{''Content-Type'':''application/json''}';
l_sqlstmt := 'select title from UCI_NEWS_AGGREGATOR';
l_titles := new json_array_t;
OPEN l_cursor FOR l_sqlstmt;
LOOP
FETCH l_cursor INTO l_title;
EXIT WHEN l_cursor%NOTFOUND;
BEGIN
l_titles.append(l_title);
END;
END LOOP;
l_req_payload := json_object_t.parse('{}');
l_req_payload.put('input', l_titles);
l_req_payload.put('model', 'text-embedding-3-small');
l_req_payload.put('dimensions', 200);
l_resp := DBMS_CLOUD.SEND_REQUEST (
credential_name => 'OPENAI_CRED',
uri => 'bearer://api.openai.com/v1/embeddings',
headers => l_headers,
body => l_req_payload.to_blob,
method => DBMS_CLOUD.METHOD_POST
);
l_resp_json := json_object_t(DBMS_CLOUD.get_response_text(l_resp));
l_data_array := l_resp_json.get_array('data');
FOR indx in 0 .. l_data_array.get_size - 1
LOOP
l_embedding := JSON_OBJECT_t(l_data_array.get(indx)).get_array('embedding');
l_insert_stmt := 'insert into NEWS_EMBEDDED_SQL(title, title_vector) values (:1, :2)';
EXECUTE IMMEDIATE l_insert_stmt USING l_titles.get(indx).to_string, l_embedding.to_clob;
END LOOP;
END;
Method 2: Python Embed and Insert
Create ReadAndInsert.py
For the Python method, we will use a python script to perform TITLE embedding. We will read the dataset using the Python oracledb package, embed texts of TITLE to vectors, and insert these vectors to NEWS_EMBEDDED table using the same Python oracledb package.
# Python script to embed TITLE vectors in the NEWS_EMBEDDED
# table using OPENAI embeddings LLM
from langchain_openai import OpenAIEmbeddings
import oracledb
OPENAI_API_KEY = "****"
EMBEDDINGS_OPENAI = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY, model="text-embedding-3-small", dimensions=200)
SQL_QUERY = "select TITLE,URL,CATEGORY,TIMESTAMP from UCI_NEWS_AGGREGATOR"
def read_insert_data():
oracle_connection = oracledb.connect(user=USERNAME, password=PASSWORD, dsn=CONNECTION_STRING)
cursor = oracle_connection.cursor()
cursor.execute(SQL_QUERY)
rows = cursor.fetchall()
all_data = []
for row in rows:
all_data.append(list(row))
# Embed title into vector
embeddings = EMBEDDINGS_OPENAI.embed_documents([row[0] for row in all_data])
for x in embeddings:
for i in range(len(x)):
x[i] = round(x[i], 8)
# Insert embedded data into table
for i in range(len(all_data)):
cursor.execute("""insert into news_embedded values (:1, :2, :3, :4, :5)""", [all_data[i][0],
all_data[i][1],
all_data[i][2],
all_data[i][3],
str(embeddings[i])])
cursor.close()
oracle_connection.commit()
oracle_connection.close()
read_insert_data()
NOTE: Replace OPENAI_API_KEY=”****” with your OPEN_API_KEY
Invoke OCI functions as SQL functions
In this section, we will use DBMS_CLOUD_FUNCTION package to invoke the above OCI cloud function as a SQL function. Similar instructions can be found here OCI: oci user-defined function. The following is specific to the Python method and does not apply to PL/SQL method.
Create OCI Credential
Create credential, private_key and fingerprint generation: Step to get singing key and fingerprint
BEGIN
DBMS_CLOUD.CREATE_CREDENTIAL (
credential_name => 'OCI_CRED2',
user_ocid => 'user_ocid',
tenancy_ocid => 'tenancy_ocid',
private_key => 'private_key',
fingerprint => 'fingerprint'
);
END;
/
Create function catalog OCI_DEMO_CATALOG
BEGIN
DBMS_CLOUD_FUNCTION.CREATE_CATALOG (
credential_name => 'OCI_CRED2',
catalog_name => 'OCI_DEMO_CATALOG',
service_provider => 'OCI',
cloud_params => '{"region_id":"us-phoenix-1",
"compartment_id":"comp_id"}'
);
END;
/
A catalog is a collection of wrapper functions that reference and call their respective cloud functions via their API endpoints. It provides a structured way to organize and manage functions, make it easier to find and access them. It is required for OCI Function use. Please see dbms-cloud-function documentation for details.
Create SQL functions VECTOREMBEDDED and VECTORSEARCH(cosine_similarity)
# SQL function for VECTOREMBEDDED and VECTORSEARCH(cosine_similarity)
BEGIN
DBMS_CLOUD_FUNCTION.CREATE_FUNCTION (
credential_name => 'OCI_CRED2',
catalog_name => 'OCI_DEMO_CATALOG',
function_name => 'vectorembedded',
function_id => 'ocid1.fnfunc.oc1.phx.aaaaaaaa3or4vaylyhyxpsvp4nr2ucpfrr5aspj6mqfak7r7cfuoq3gl37ga',
input_args => TO_CLOB('{"QUERY":"VARCHAR2"}')
);
END;
CREATE SQL FUNCTION VECTORSEARCH
# SQL Function for VECTORSEARCH cosine similarity
VAR function_args CLOB;
EXEC :function_args :=
TO_CLOB('{"QUERY":”VARCHAR2”,”TEXT_VECTOR”:”CLOB"}');
BEGIN
DBMS_CLOUD_FUNCTION.CREATE_FUNCTION (
credential_name => 'OCI_CRED',
catalog_name => 'OCI_DEMO_CATALOG',
function_name => 'vectorsearch',
function_id => 'ocid1.fnfunc.sample',
input_args => :function_args
);
END;
/
Vectorsearch function description
desc VECTORSEARCH
FUNCTION VECTORSEARCH RETURNS CLOB
Argument Name Type In/Out Default?
————- ——– —— ——–
QUERY VARCHAR2 IN unknown
TEXT_VECTOR CLOB IN unknown
VECTORSEARCH queries – PL/SQL and Python
PL/SQL method
In this step we use PL/SQL function VECTORSEARCH to perform a vector search to answer the question:
‘What is the future about crypto currency?’
Sql statement for PL/SQL function:
with queryvector as (select VECTOREMBED('What is the future of crypto currency?') as qvector from DUAL)
select title, COSINE_SIMILARITY(a.qvector, b.title_vector) as distance from queryvector a, NEWS_EMBEDDED_SQL b order by distance desc fetch first 5 rows only;
Here are the top 5 relevant rows with largest cosine similarity (distance) ordered by relevance:

BENEFITS OF EXECUTING QUERY IN SQL
When functions are executed within the database, it often leads to better performance compared to connecting to database through external Python scripts to fetch data. Executing SQL will minimize the data transformation between database and external application, it will also reduce latency due to network response. Additionally, running functions through external python scripts requires wallet and password configuration, which may lead to the security leakage of database credentials.
Python method
SQL statement for OCI Python cloud functions
with queryvector as (select VECTOREMBEDDED('What is the future of crypto currency?') as qvector from dual)
select b.title, JSON_VALUE(JSON_VALUE(VECTORSEARCH(a.qvector, b.TITLE_VECTOR), '$.RESPONSE_BODY'), '$.message') as distance from queryvector a, NEWS_EMBEDDED b order by distance desc fetch first 5 rows only;

As you can see, whether you use the PL/SQL method or the Python OCI function method the inference and cosign similarity is identical.
CONCLUSION
Leveraging OCI Functions and the inherent security and core functionality of the Oracle Autonomous Database, we illustrate two common methods where you can create embedded vectors on your target data set and execute vector search with cosine similarity in Oracle Autonomous Database 19c today. Oracle is committed to enriching the built-in AI and vector search functionality in the Oracle Autonomous Database but there is no need to wait! Get started today!
Contributing Author: Xiao Sai
