Hands-On Hybrid Vector Search with Oracle Database 26ai

Imagine a insurance specialist investigates water damage complaints. A semantic search for “water damage caused by heavy rain” should also find documents describing flooding, storms, leaking roofs, or weather-related property damage such as basement flooded etc., even if those exact words are not used. However, the specialist may also need documents that explicitly mention a specific policy clause, product code, or exclusion such as “flood exclusion”.

Pure semantic search return related claims but miss the exact policy. Traditional keyword search might find the clause but miss relevant claims that use different terminology. This is where Hybrid Vector Search comes into play. Hybrid search can address both components of such a query by running traditional full-text (lexical) search and vector search on the same data and then combining the two search results into a single result set. This leads to more relevant results and fewer false positives than either approach alone. If you’d like to learn more about the technology, we’ve included links to additional blog posts and documentation at the end of this article.

If you’ve already played with Similarity Search and are curious about what comes next, this tutorial is for you. We’ll focus on how Hybrid Vector Search combines semantic search with traditional keyword search to deliver more precise and relevant results. We recently used this demo during a customer workshop and thought it would be useful to share it with a wider audience.

What is the use case in this tutorial?
Consider a travel website where users search for “places for activities such as fishing, swimming, etc”. Semantic search can identify parks offering similar outdoor activities, even if the exact words are not present in the description. However, users may also want results that explicitly mention picnic areas or exclude locations where swimming is prohibited. This is a perfect example of where Hybrid Vector Search can improve the quality and precision of search results.

Tutorial Overview

Setup
To make it simple for you to try it out yourself, we use parts of the National Parks sample data set from the very well received Oracle LiveLab Getting Started with AI Vector Search built by the product management team AI for Data.
You can either run the exercises directly in the LiveLab environment inside the provided sandbox or in your own Oracle Database 26ai setup. The example uses the table PARKS and the embedding model MINILM_L12_V2 for the creation of the Hybrid Vector Index.
The table PARKS should look like this:

desc parks
Name            Null?    Type                
--------------- -------- ------------------- 
PARK_ID         NOT NULL VARCHAR2(36)        
PARK_CODE                VARCHAR2(4)         
URL                      VARCHAR2(34)        
NAME            NOT NULL VARCHAR2(51)        
DESCRIPTION              VARCHAR2(414)       
LATITUDE                 NUMBER(11,8)        
LONGITUDE                NUMBER(11,8)        
LOCATION                 PUBLIC.SDO_GEOMETRY 
DIRECTIONS_INFO          VARCHAR2(524)       
DIRECTIONS_URL           VARCHAR2(162)       
CONTACT_PHONE            VARCHAR2(13)        
EMAIL_ADDRESS            VARCHAR2(44)        
STREET                   VARCHAR2(95)        
CITY                     VARCHAR2(35)        
STATES                   VARCHAR2(2)         
POSTAL_CODE              VARCHAR2(10)        
COUNTRY_ID      NOT NULL VARCHAR2(3)         
DESC_VECTOR              VECTOR              

SELECT count(*) FROM parks;
 
 COUNT(*)
----------
       472

The sample table PARK dmp file can also be downloaded here.

If not already done you can import the model with the procedure DBMS_VECTOR.LOAD_ONNX_MODEL (see also documentation). The embedding model can be accessed here.

BEGIN
DBMS_VECTOR.LOAD_ONNX_MODEL(
directory  => 'DATA_PUMP_DIR',
file_name  => 'all_MiniLM_L12_v2.onnx',
model_name => 'minilm_l12_v2',
metadata   => JSON('{"function" : "embedding", "embeddingOutput" : "embedding", "input": {"input": ["DATA"]}}'));
END;
/

You can verify the available models in the database with the following query:

SELECT model_name, mining_function,algorithm, algorithm_type, model_size
FROM user_mining_models;

MODEL_NAME    MINING_FUN ALGORITHM  ALGORITHM_ MODEL_SIZE
------------- ---------- ---------- ---------- ----------
MINILM_L12_V2 EMBEDDING  ONNX       NATIVE      133322334

Similarity Search
Let’s start with a simple similarity search. Semantic search is great at finding information based on meaning. Search for “activities such as fishing and swimming” and you’ll find relevant parks even if those exact words never appear in the description. But what if the result must also contain the word “picnic”?

The following query generates an embedding for the search phrase “activities such as fishing, swimming, etc” and compares it with the embeddings stored in the DESC_VECTOR column. The parks name and description with the smallest vector distance are returned first.

SELECT name, description
FROM parks
ORDER BY VECTOR_DISTANCE(desc_vector,
VECTOR_EMBEDDING(minilm_l12_v2 using 'activities such as fishing, swimming, etc' AS data), COSINE)
FETCH EXACT FIRST 10 ROWS ONLY;

NAME                   DESCRIPTION                                                                                                                                                                                                                                                                                                                                                                                                                   
---------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Chickasaw              Springs,  streams,  lakes - whatever its form,  water is the attraction at Chickasaw National Recreation Area.                                                                                                                                                                                                                                                                                                                
Boston Harbor Islands  . . . where you can walk a Civil War-era fort,  view historic lighthouses,  explore tide pools,  hike lush trails,  camp under the stars,  or relax while fishing,  picnicking,  or swimming—all within reach of downtown Boston. Youth programs,  visitor services,  research,  wildlife management,  and more are coordinated on the park's 34 islands and peninsulas by the Boston Harbor Islands Partnership.            
Lake Mead              Experience outdoor thrills amongst striking landscapes and brilliant blue waters. Hike,  swim,  boat,  cycle,  camp,  fish,  volunteer,  and learn! This year-round national park spreads across 1.5 million acres of mountains,  canyons,  valleys and two vast lakes. Just a stone's throw from the dazzling lights of Las Vegas,  this extraordinary playground beckons with a symphony of possibilities.                  
Delaware Water Gap     Situated within the most densely populated region of the United States,  Delaware Water Gap National Recreation Area provides a unique opportunity to experience tranquil landscapes,  rich human history,  and striking scenery along 40 miles of the longest free-flowing river east of the Mississippi. The park offers year-round recreation including hiking,  paddling,  fishing,  and hunting.                         
...                           
10 rows selected.

The results include parks such as Chickasaw, Boston Harbor Islands, Lake Mead, and Delaware Water Gap—all locations which are related to our search intent.

To get a better overview, we can display only the park names:

SELECT name FROM parks ORDER BY VECTOR_DISTANCE(desc_vector, VECTOR_EMBEDDING(minilm_l12_v2 using 'activities such as fishing, swimming, etc' AS data), COSINE) FETCH EXACT FIRST 10 ROWS ONLY;

NAME                                               
---------------------------------------------------
Chickasaw
Boston Harbor Islands
Lake Mead
Delaware Water Gap
Cape Lookout
Pictured Rocks
Niobrara
Saint Croix
Curecanti
Cape Cod

10 rows selected.

Now let’s add keyword search capabilities by creating a Hybrid Vector Index.

Creating a Hybrid Vector Index and Running Queries
The Hybrid Vector Index combines Oracle Text and AI Vector Search into a single search structure. This allows us to run pure keyword searches, pure vector searches, or combine both approaches in a single query.

CREATE HYBRID VECTOR INDEX parks_hybrid_idx ON parks(description)
PARAMETERS('MODEL MINILM_L12_V2');

Our example shows the most simplest form of an hybrid vector index creation statement, we only specify the model. However, you can also specify vectorizer preferences e.g. for externally hosted vector embedding models or Text Search Preferences such as filter, lexer, storage etc. More details on the syntax can be found in the documentation.

If you’re curious, you can inspect the index metadata using data dictionary views such as USER_IND_COLUMNS, V$VECTOR_INDEX, and CTX_USER_INDEXES. These views provide information about the vector and text components that make up the hybrid index.

Now let’s execute some hybrid vector queries. The DBMS_HYBRID_VECTOR.SEARCH function provides a JSON-based interface for querying Hybrid Vector Indexes (see documentation).

The query can contain:

a vector clause for semantic search
a text clause for keyword search,
and a fusion strategy that defines how the results are combined.

In the following example, we search for parks that are semantically related to “activities such as fishing, swimming, etc.” while also containing the keyword “Picnic” in the park description.
The fusion strategy is INTERSECT, because it should only return results that satisfy both conditions. In other words, a park must be relevant to the semantic search and contain the specified keyword to be included in the final result set. The result is returned as JSON and includes the combined relevance score, the rowid of the matching row, and a text snippet (chunk_text) showing the matching content.
Note: You don’t need to refer to the table. All what you need is the index name (parks_hybrid_idx).

set long 10000

SELECT DBMS_HYBRID_VECTOR.SEARCH(JSON(
'{"hybrid_index_name": "parks_hybrid_idx", 
  "vector": {"search_text": "activities such as fishing, swimming, etc"},
  "text": {"contains": "Picnic"},                           
  "search_fusion": "INTERSECT",
  "return": {"values": ["score", "rowid", "chunk_text"],"topN": "10"}}')) result;

RESULT                                                                          
--------------------------------------------------------------------------------
[{"score":57.2,"rowid":"AAAmtuAAAAABeb/AAG","chunk_text":"Today the river valle
y attracts us for so many reasons. Take a solitary walk to enjoy nature's displa
y, raft leisurely through the rocky shoals with friends, fish the misty waters a
s the sun comes up, or have a picnic on a Sunday afternoon. Get Outdoors and exp
erience your Chattahoochee River National Recreation Area as you have never done
 before."},{"score":56.02,"rowid":"AAAmtuAAAAAB0OvAAF","chunk_text":"Experience 
a park so rich it supports 19 distinct ecosystems with over 2, 000 plant and ani
mal species. Go for a hike, enjoy a vista, have a picnic or learn about the cent
uries of overlapping history from California's indigenous cultures, Spanish colo
nialism, the Mexican Republic, US military expansion and the growth of San Franc
isco. All of this and more awaits you, so get out and find your park."}]

To identify the parks behind the returned rowids, we can query the PARKS table:

SELECT name FROM parks 
WHERE rowid IN ('AAAmtuAAAAABeb/AAG','AAAmtuAAAAAB0OvAAF' );

NAME                     
-------------------------
Chattahoochee River
Golden Gate

You usually want to work with relational data instead of JSON. Use the standard table function DBMS_HYBRID_VECTOR.SEARCHPIPELINE to return a pipeline of row records. The results is of type RECORD which contains fields such as doc_rowid, score, vector_score, text_score etc.
You can see an example here. We use DBMS_HYBRID_VECTOR.SEARCHPIPELINE and join the returned rowids back to the PARKS table to get in addition the park names.

SELECT
   p.name
   h.score,
   h.vector_score,
   h.text_score,
   h.vector_rank,
   h.text_rank,
   h.chunk_text
FROM parks p, 
DBMS_HYBRID_VECTOR.SEARCHPIPELINE(JSON(
'{"hybrid_index_name": "parks_hybrid_idx", 
  "vector": {"search_text": "activities such as fishing, swimming, etc"},
  "text": {"contains": "Picnic"},                              
  "search_fusion": "INTERSECT"
 }')) h
WHERE doc_rowid=p.rowid;

NAME                                                     SCORE VECTOR_SCORE TEXT_SCORE VECTOR_RANK  TEXT_RANK
--------------------------------------------------- ---------- ------------ ---------- ----------- ----------
CHUNK_TEXT                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Chattahoochee River                                       57.2        61.92         10           6          1 
Today the river valley attracts us for so many reasons. Take a solitary walk to enjoy nature's display, raft leisurely through the rocky shoals with friends, fish the misty waters as the sun comes up, or have a picnic on a Sunday afternoon. Get Outdoors and experience your Chattahoochee River National Recreation Area as you have never done before.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        

Golden Gate                                              56.02        60.62         10          12          1 
Experience a park so rich it supports 19 distinct ecosystems with over 2, 000 plant and animal species. Go for a hike, enjoy a vista, have a picnic or learn about the centuries of overlapping history from California's indigenous cultures, Spanish colonialism, the Mexican Republic, US military expansion and the growth of San Francisco. All of this and more awaits you, so get out and find your park.

If you do not want to use the table function DBMS_HYBRID_VECTOR.SEARCHPIPELINE(), the original SEARCH API can be wrapped in a JSON_TABLE specification. Using JSON_TABLE, we can project the JSON response into rows and join the returned rowids back to the PARKS table. Here is an example:

SELECT p.name, h.score, h.text_score, h.vector_score
FROM parks p,
JSON_TABLE (
(SELECT DBMS_HYBRID_VECTOR.SEARCH(JSON(
'{"hybrid_index_name": "parks_hybrid_idx", 
  "vector": {"search_text": "activities such as fishing, swimming, etc"},
  "text": {"contains": "Picnic"},                              
  "search_fusion": "INTERSECT",
  "return":{"values": ["score","text_score", "vector_score", "rowid"],"topN": "10"}
 }'))), '$[*]'
COLUMNS (
score        number(5,2) PATH '$."score"',
text_score   number(5,2) PATH '$."text_score"',
vector_score number(5,2) PATH '$."vector_score"',
row_id       varchar2(18) PATH '$."rowid"') ) h
WHERE p.rowid=h.row_id;

NAME                        SCORE TEXT_SCORE VECTOR_SCORE
---------------------- ---------- ---------- ------------
Chattahoochee River          57.2         10        61.92
Golden Gate                 56.02         10        60.62

You can experiment with other fusion strategies such as UNION, VECTOR_ONLY etc.

Using Oracle Text Features such as Fuzzy Search
One of the advantages of Hybrid Vector Search is that it gives you access to the powerful linguistic capabilities of Oracle Text. For example, users often misspell search terms. Let’s intentionally misspell Picnic and search for Piccnics using Oracle Text’s fuzzy matching:

"text": {
  "contains": "fuzzy(Piccnics)"
}

Even though the keyword is misspelled, Oracle Text still finds relevant matches. In our example, an additional park, Fort Dupont Park, is returned because its description contains the term Picnics.

SELECT p.name, h.score, h.text_score, h.vector_score
FROM parks p,
JSON_TABLE (
(SELECT DBMS_HYBRID_VECTOR.SEARCH(JSON(
'{"hybrid_index_name": "parks_hybrid_idx", 
  "vector": {"search_text": "activities such as fishing, swimming, etc"},  
  "text": {"contains": "fuzzy(Piccnics)"},                            
  "search_fusion": "INTERSECT",
  "return":{"values": ["score","text_score", "vector_score", "rowid"],"topN": 
  "10"}}'))), '$[*]'
COLUMNS (
score        number(5,2) PATH '$."score"',
text_score   number(5,2) PATH '$."text_score"',
vector_score number(5,2) PATH '$."vector_score"',
row_id       varchar2(18) PATH '$."rowid"') ) h
WHERE p.rowid=h.row_id;

NAME                                               SCORE TEXT_SCORE VECTOR_SCORE
--------------------------------------------- ---------- ---------- ------------
Chattahoochee River                                 57.2         10        61.92
Golden Gate                                        56.02         10        60.62
Fort Dupont Park                                    55.5         10        60.05

This illustrates the strength of Hybrid Vector Index Search: semantic similarity helps us find conceptually related content, while Oracle Text contributes linguistic intelligence such as fuzzy matching, stemming, and other advanced text-search capabilities. Try a few searches of your own and experiment with different fusion strategies such as UNION, INTERSECT, TEXT_ONLY, and VECTOR_ONLY to see how the ranking changes and other Oracle Text terms.

Further Readings

Oracle AI Vector Search (oracle.com)
Oracle AI Vector Search User’s Guide
DBMS_HYBRID_VECTOR Package
LiveLab Getting Started with AI Vector Search

Hands-On Hybrid Vector Search with Oracle Database 26ai

Ulrike Schwinn

Distinguished Product Manager

Validating Fusion–IAM Trust for API Integrations

Hands-On Hybrid Vector Search with Oracle Database 26ai

Authors

Ulrike Schwinn

Distinguished Product Manager

Validating Fusion–IAM Trust for API Integrations