Are you looking to run semantic search on data stored in Apache Iceberg tables but don’t want to duplicate or reformat that data? We have you covered!
With Oracle AI Database 26ai and AI Vector Search, you can run similarity search over vector embeddings stored externally—without first loading the embeddings or their source text chunks into the database. This reduces data duplication, shortens time-to-value, and keeps data closer to its existing governance controls.
Since Oracle Database 23.7, Oracle has supported creating external tables over files that include dense or sparse vector embeddings. This lets you query vectors where they reside using familiar interfaces such as SQL. You can point those external tables at common file formats such as CSV, Parquet, Avro, ORC, and more (see Vectors in External Tables for the full list).
Fast forward to Oracle AI Database 26ai (23.26.1), and external tables with vector embedding support now extend to include Apache Iceberg tables. This means you can generate vector embeddings with your preferred embedding model, store them in Iceberg tables in cloud object storage (for example, OCI Object Storage or AWS S3), and run semantic search from Oracle AI Database—without an extra “load everything first” step.
It’s simple. First, we create an external table on an Iceberg table as shown below
define oci_path= 'https://objectstorage.<region>.oraclecloud.com/n'
define oci_tenant= '<tenant-name>'
define oci_bucket= '/b/<bucket-name>/o'
define iceberg_table= '&oci_bucket/iceberg/<db-name>/<iceberg-table_name>'
define iceberg_path= '&oci_path/&oci_tenant/&iceberg_table'
CREATE TABLE IF NOT EXISTS ext_vect_table
(
id VARCHAR(10) PRIMARY KEY RELY DISABLE,
information VARCHAR2(100),
embedding VECTOR(384, float32)
)
ORGANIZATION EXTERNAL
(
TYPE ORACLE_BIGDATA
DEFAULT DIRECTORY DATA_PUMP_DIR
ACCESS PARAMETERS
(
com.oracle.bigdata.credential.name=OCI_CRED
com.oracle.bigdata.fileformat=parquet
com.oracle.bigdata.access_protocol=iceberg
)
LOCATION ('iceberg:&iceberg_path/metadata/v1.metadata.json')
)
REJECT LIMIT UNLIMITED;
Figure 1: Creating an External Table on an Iceberg Table.
Once the external table is created, you can perform vector similarity search queries on data in Iceberg tables and embed the query-text on the fly (leveraging the ONNX runtime in the database), so you can search across Oracle and Iceberg data in real-time. Because the embedding model can run locally, this approach can also fit air-gapped deployments where calling a remote model endpoint isn’t desired or perhaps not even possible.
SELECT id,
information,
vector_distance(embedding,
vector_embedding(my_model using :INPUT_TERM as data) DIST
FROM ext_vect_table
ORDER BY DIST
FETCH FIRST 5 ROWS ONLY;
Figure 2: Query the External Table to display the 5 most similar entries to :INPUT_TERM.
And best of all, for large Apache Iceberg datasets, Oracle vector indexes can be created (as shown below) to speed up approximate nearest neighbor (ANN) search over Iceberg-resident vectors. You don’t need to copy Iceberg data into an Oracle AI Database to use them; the vector indexes are stored standalone in the database. These indexes can scale with Iceberg datasets and support capabilities such as transactional updates and consistent queries. If object storage access becomes intermittent, the index structures may still be available, helping your application degrade more gracefully while connectivity recovers.
CREATE VECTOR INDEX ext_vect_idx_ivf ON ext_vec_table (embedding)
ORGANIZATION NEIGHBOR PARTITIONS;
Figure 3: Create a Neighbor Partition (IVF_Flat) Vector Index on EXT_VECT_TABLE.
Together, with these capabilities you can:
- Query vectors in Iceberg where the data lives using Oracle’s SQL and AI Vector Search features.
- Improve performance at scale with Oracle vector indexes for faster ANN search—without copying Iceberg data into Oracle.
If you’re a DBA, this reduces new pipelines to manage. If you’re an executive sponsor, it accelerates delivery of semantic search and RAG-style experiences while keeping data in established platforms and storage.
How to Get Started
Learn how you can use vectors to perform similarity search with Oracle AI Database with this Oracle LiveLab: Getting Started with Oracle AI Vector Search