Using Vectors with Oracle True Cache

A short developer guide to caching semantic search retrieval for RAG and chatbot applications

Why this matters

AI-powered applications, especially Retrieval-Augmented Generation (RAG) systems, generate a high volume of read requests as they retrieve relevant context before generating a response. As these applications scale, repeatedly querying the primary database for similar requests can increase latency and consume valuable database resources.

Many of these requests are not identical—they are simply different ways of asking the same question. Traditional caches rely on exact key matches, so semantically similar requests often miss the cache and trigger another vector search against the database. Semantic caching addresses this by retrieving cached results based on vector similarity rather than exact text matches. By serving semantically similar requests directly from the cache, applications can reduce repeated database lookups, improve response times, and keep the primary database focused on transactional workloads.

This is where Oracle AI Vector Search and Oracle True Cache complement each other. Oracle AI Vector Search enables semantic similarity searches, while Oracle True Cache keeps frequently accessed vector data in memory, allowing applications to satisfy many semantic retrieval requests directly from the cache.

Vectors in simple terms

A vector is a list of numbers that captures the meaning of text. Similar text typically produces nearby vectors, while unrelated text tends to produce vectors that are farther apart. Instead of asking the database for exact keyword matches, a vector query searches for similar meanings.

For example, “How do I route chatbot reads to True Cache?” and “Can semantic search use the cache?” may not share many keywords, but their embeddings may be close because they relate to a similar concept.

Where True Cache fits

True Cache does not replace the embedding model, the large language model (LLM), or Oracle AI Vector Search. Instead, it can accelerate and offload the database read path.

A practical pattern is straightforward: keep ingestion, writes, corpus refreshes, and freshness-sensitive operations on the primary database. Route read-only vector retrieval to True Cache. The application continues to use SQL and JDBC, while repeated retrieval requests may be served from cache memory rather than repeatedly accessing the primary database.

A concrete use case

Consider a payments company launching a new dispute workflow. For several days, merchants repeatedly ask how to upload evidence, why a dispute is pending, what file formats are supported, and how long review takes.

The chatbot searches the same support corpus repeatedly. With True Cache, frequently accessed vector index and document blocks may remain cached close to the retrieval workload. This can help reduce read pressure on the primary database and improve response behavior during periods of high demand.

Developer flow

The implementation has four parts: a vector table for chunks and embeddings, a vector index for approximate similarity search, a primary database connection for writes and baseline testing, and a True Cache connection for read-only retrieval.

Store chunks and embeddings

create table CHATBOT_DOCUMENTS_IVF ( doc_id number primary key, title varchar2(300), content varchar2(4000), source_url varchar2(1000), embedding vector(16, float32), created_at timestamp default systimestamp );

Create a vector index

create vector index CHATBOT_DOCS_IVF_IDX on CHATBOT_DOCUMENTS_IVF (embedding) organization neighbor partitions distance cosine with target accuracy 90;

Run semantic retrieval

select doc_id, title, source_url, vector_distance( embedding, to_vector(:query_vector, 16, float32), cosine ) as distance from CHATBOT_DOCUMENTS_IVF order by vector_distance( embedding, to_vector(:query_vector, 16, float32), cosine ) fetch approx first :top_k rows only;

Route retrieval to True Cache in Java

try (Connection conn = trueCacheDataSource.getConnection(); PreparedStatement ps = conn.prepareStatement(sql)) { conn.setReadOnly(true); ps.setString(1, queryVector); ps.setString(2, queryVector); try (ResultSet rs = ps.executeQuery()) { while (rs.next()) { // Send top chunks to the LLM as grounding context. } } }

Benchmark guidance

Avoid drawing conclusions from a single request. Use the same prompt set, TOP_K value, vector index, and application code when comparing the primary database and True Cache. Capture p50, p95, p99 latency, logical reads, physical reads, SQL*Net round trips, and connected service names.

In one laboratory test, the primary database completed retrieval in approximately 100 ms with physical reads, while True Cache completed retrieval in approximately 9 ms with no physical reads observed. Treat these figures as example measurements from a specific test environment rather than expected results for all deployments.

The broader observation is that repeated, read-heavy vector retrieval workloads may be good candidates for evaluation with True Cache.

Verify the service before comparing results

select sys_context('USERENV', 'SERVICE_NAME') as service_name, sys_context('USERENV', 'INSTANCE_NAME') as instance_name from dual;

Production checklist

Use embeddings generated by an approved model.
Match the VECTOR dimension to the selected embedding model.
Store model and chunking metadata.
Apply appropriate authorization filters.
Route only read-only retrieval workloads to True Cache.
Measure warm-cache and cold-cache behavior separately.

Takeaway

Vectors help applications retrieve information by meaning rather than exact keywords. Oracle True Cache can help read-heavy applications serve repeated database reads from memory while the primary database remains the system of record.

Together, Oracle AI Vector Search and Oracle True Cache can be an effective architecture pattern for chatbots and RAG applications that repeatedly search a relatively stable knowledge corpus..

Learn more

Livelabs: https://livelabs.oracle.com/ords/r/dbpm/livelabs/view-workshop?wid=3933&clear=RR%2C180&session=103853267931988

True Cache Documentation: https://docs.oracle.com/en/database/oracle/oracle database/26/odbtc/overview-oracle-true-cache.html

Connecting to True Cache: https://docs.oracle.com/en/database/oracle/oracle database/26/odbtc/methods-connecting-true-cache.html

JDBC Thin sample for True Cache: https://docs.oracle.com/en/database/oracle/oracle database/26/odbtc/sample-java-code-using-jdbc-thin-driver.html

DML redirection: https://docs.oracle.com/en/database/oracle/oracle-database/26/odbtc/enabling dml-redirection.html

True Cache load balancing: https://docs.oracle.com/en/database/oracle/oracle database/26/odbtc/best-practices-load-balancing-uniform-configuration.html

Oracle IVF vector index blog: https://blogs.oracle.com/database/using-ivf-vector-indexes

Using Vectors with Oracle True Cache

Why this matters

Vectors in simple terms

Where True Cache fits

A concrete use case

Developer flow

Store chunks and embeddings

Create a vector index

Run semantic retrieval

Route retrieval to True Cache in Java

Benchmark guidance

Verify the service before comparing results

Production checklist

Takeaway

Learn more

Nithin Thekkupadam Narayanan

Senior Principal Product Manager

Sambit Panda

Trusted Enterprise AI Powered by OCI Managed MySQL & PostgreSQL Database Services

Scale ONNX Embedding Models with External Data in Oracle AI Database

Using Vectors with Oracle True Cache

Why this matters

Vectors in simple terms

Where True Cache fits

A concrete use case

Developer flow

Store chunks and embeddings

Create a vector index

Run semantic retrieval

Route retrieval to True Cache in Java

Benchmark guidance

Verify the service before comparing results

Production checklist

Takeaway

Learn more

Authors

Nithin Thekkupadam Narayanan

Senior Principal Product Manager

Sambit Panda

Trusted Enterprise AI Powered by OCI Managed MySQL & PostgreSQL Database Services

Scale ONNX Embedding Models with External Data in Oracle AI Database