A short developer guide to caching semantic search retrieval for RAG and chatbot applications
Why this matters
AI chatbots and retrieval-augmented generation (RAG) applications often appear model-heavy, but one of their busiest components is database retrieval. For every user question, an application may create an embedding, search a vector table for the most relevant chunks, and pass those chunks to the model as grounding context.
That retrieval path is often read-heavy. During support events, launches, incidents, or onboarding waves, many users may ask similar questions against the same knowledge corpus. Oracle True Cache can help serve repeated read-only vector retrieval from an in-memory cache while the primary database remains the system of record.
This article assumes the use of generally available Oracle Database features, including Oracle True Cache and Oracle AI Vector Search.
Vectors in simple terms
A vector is a list of numbers that captures the meaning of text. Similar text typically produces nearby vectors, while unrelated text tends to produce vectors that are farther apart. Instead of asking the database for exact keyword matches, a vector query searches for similar meanings.
For example, “How do I route chatbot reads to True Cache?” and “Can semantic search use the cache?” may not share many keywords, but their embeddings may be close because they relate to a similar concept.
Where True Cache fits
True Cache does not replace the embedding model, the large language model (LLM), or Oracle AI Vector Search. Instead, it can accelerate and offload the database read path.
A practical pattern is straightforward: keep ingestion, writes, corpus refreshes, and freshness-sensitive operations on the primary database. Route read-only vector retrieval to True Cache. The application continues to use SQL and JDBC, while repeated retrieval requests may be served from cache memory rather than repeatedly accessing the primary database.
A concrete use case
Consider a payments company launching a new dispute workflow. For several days, merchants repeatedly ask how to upload evidence, why a dispute is pending, what file formats are supported, and how long review takes.
The chatbot searches the same support corpus repeatedly. With True Cache, frequently accessed vector index and document blocks may remain cached close to the retrieval workload. This can help reduce read pressure on the primary database and improve response behavior during periods of high demand.
Developer flow
The implementation has four parts: a vector table for chunks and embeddings, a vector index for approximate similarity search, a primary database connection for writes and baseline testing, and a True Cache connection for read-only retrieval.
Store chunks and embeddings
create table CHATBOT_DOCUMENTS_IVF (
doc_id number primary key,
title varchar2(300),
content varchar2(4000),
source_url varchar2(1000),
embedding vector(16, float32),
created_at timestamp default systimestamp
);
Create a vector index
create vector index CHATBOT_DOCS_IVF_IDX
on CHATBOT_DOCUMENTS_IVF (embedding)
organization neighbor partitions
distance cosine
with target accuracy 90;
Run semantic retrieval
select doc_id,
title,
source_url,
vector_distance(
embedding,
to_vector(:query_vector, 16, float32),
cosine
) as distance
from CHATBOT_DOCUMENTS_IVF
order by vector_distance(
embedding,
to_vector(:query_vector, 16, float32),
cosine
)
fetch approx first :top_k rows only;
Route retrieval to True Cache in Java
try (Connection conn = trueCacheDataSource.getConnection();
PreparedStatement ps = conn.prepareStatement(sql)) {
conn.setReadOnly(true);
ps.setString(1, queryVector);
ps.setString(2, queryVector);
try (ResultSet rs = ps.executeQuery()) {
while (rs.next()) {
// Send top chunks to the LLM as grounding context.
}
}
}
Benchmark guidance
Avoid drawing conclusions from a single request. Use the same prompt set, TOP_K value, vector index, and application code when comparing the primary database and True Cache. Capture p50, p95, p99 latency, logical reads, physical reads, SQL*Net round trips, and connected service names.
In one laboratory test, the primary database completed retrieval in approximately 100 ms with physical reads, while True Cache completed retrieval in approximately 9 ms with no physical reads observed. Treat these figures as example measurements from a specific test environment rather than expected results for all deployments.
The broader observation is that repeated, read-heavy vector retrieval workloads may be good candidates for evaluation with True Cache.
Verify the service before comparing results
select sys_context('USERENV', 'SERVICE_NAME') as service_name,
sys_context('USERENV', 'INSTANCE_NAME') as instance_name
from dual;
Production checklist
- Use embeddings generated by an approved model.
- Match the VECTOR dimension to the selected embedding model.
- Store model and chunking metadata.
- Apply appropriate authorization filters.
- Route only read-only retrieval workloads to True Cache.
- Measure warm-cache and cold-cache behavior separately.
Takeaway
Vectors help applications retrieve information by meaning rather than exact keywords. Oracle True Cache can help read-heavy applications serve repeated database reads from memory while the primary database remains the system of record.
Together, Oracle AI Vector Search and Oracle True Cache can be an effective architecture pattern for chatbots and RAG applications that repeatedly search a relatively stable knowledge corpus..
Learn more
True Cache Documentation: https://docs.oracle.com/en/database/oracle/oracle database/26/odbtc/overview-oracle-true-cache.html
Connecting to True Cache: https://docs.oracle.com/en/database/oracle/oracle database/26/odbtc/methods-connecting-true-cache.html
JDBC Thin sample for True Cache: https://docs.oracle.com/en/database/oracle/oracle database/26/odbtc/sample-java-code-using-jdbc-thin-driver.html
DML redirection: https://docs.oracle.com/en/database/oracle/oracle-database/26/odbtc/enabling dml-redirection.html
True Cache load balancing: https://docs.oracle.com/en/database/oracle/oracle database/26/odbtc/best-practices-load-balancing-uniform-configuration.html
Oracle IVF vector index blog: https://blogs.oracle.com/database/using-ivf-vector-indexes

