Enhanced Embedding Model Deployment Options with OML4Py and OML4SQL

Oracle Machine Learning for Python (OML4Py) version 2.1.1 and OML4SQL introduce support for large ONNX embedding models through external data and in-memory sharing capabilities in Oracle AI Database 23.26.1. These capabilities enhance scalability by increasing embedding model size limits and optimizing memory consumption in concurrent user environments.

Previously, each database session loaded its own copy of the embedding model weights and parameters, or initializers, into memory, and ONNX serialization limited models to 2 GB unless the model initializers were split into external files. In environments with concurrent users, this could lead to higher memory consumption and limited scalability. OML4Py 2.1.1 removes the 2 GB ONNX model size limit by storing model initializers in separate external files. OML4SQL in Oracle AI Database 23.26.1 imports these initializers into the database and shares them across sessions, reducing memory consumption for concurrent workloads.

About ONNX and Embedding Models

ONNX (Open Neural Network Exchange) is an open standard format for representing machine learning models. Embedding models convert text, images, or other data into numerical vector representations for semantic similarity search.

Model initializers are the weights and parameters that the model learned during training. These numerical values enable the model to generate embeddings. For embedding models, initializers typically comprise 95% or more of the model’s total size. Larger embedding models (typically >1GB) generally provide more accurate semantic understanding and better search results, though they require more memory and computational resources. Learn more about ONNX and embedding models with Oracle AI Vector Search here.

Scalability Improvements

These capabilities improve scalability in two areas:

Model Size: External data storage removes the previous 2GB ONNX serialization limit, enabling larger embedding models to be imported.
Memory Efficiency: In-memory sharing reduces memory consumption for concurrent workloads by loading model initializers once rather than per session.

Requirements

These capabilities require both OML4Py 2.1.1 and Oracle AI Database 23.26.1.

External Model Data and Shared Memory in OML4Py 2.1.1

OML4Py 2.1.1 and OML4SQL work together to enable these capabilities. External data refers to storing model initializers in separate files rather than in the main ONNX model file. Models larger than 1 GB automatically use external data, which enables in-memory sharing across sessions and removes the previous 2 GB import limitation.

The following table summarizes external data usage and in-memory sharing by model size:

Model Size	External Data	Enabling In-Memory Sharing
<1 GB	Optional	Set use_external_data = True, then enable with INMEMORY_ONNX_MODEL
>1 GB	Automatic	Enable with INMEMORY_ONNX_MODEL

Oracle Machine Learning supports importing ONNX format models to the database for inferencing as one of several bring-your-own-model options. For embedding models, the total model size largely comes from initializers. Previously, the 2 GB ONNX serialization limit prevented larger models from being imported. Additionally, each session loaded its own copy of model initializers, causing memory usage to scale linearly with the number of concurrent users. With shared memory support, OML4SQL loads model initializers once and shares them across sessions to reduce total memory usage for concurrent workloads.

For example, a 130 MB model like all-minilm-l12 would require approximately 13 GB of memory to support 100 concurrent sessions if each session loads its own copy. With in-memory sharing enabled, the same workload requires only 130 MB for the model initializers, demonstrating the benefit even for smaller models.

OML4Py 2.1.1 enhances the ONNX pipeline to split model initializers into separate files rather than storing all data in a single ONNX file. OML4SQL introduces new capabilities in Oracle AI Database 23.26.1 that enable in-memory sharing of these models with external data, allowing all database sessions to access the same model initializers in shared memory.

Using External Model Data in OML4Py

OML4Py creates models with external data based on model size. For models larger than 1 GB, OML4Py automatically splits model initializers into external files. For example, the bge-large-en-v1.5 model (~1.3GB) automatically uses external data.

from oml.utils import ONNXPipeline

pipeline = ONNXPipeline(model_name="BAAI/bge-large-en-v1.5")

The export2db function loads the ONNX model into a connected database and export2file exports the model to a zip file, which can then be imported into another instance using DBMS_VECTOR.LOAD_ONNX_MODEL.

# Option 1: Load to the database
import oml
oml.connect(user="username", password="password", dsn="myadb_medium")
pipeline.export2db("BGE_LARGE")

# Verify the model was loaded
SELECT model_name, mining_function, algorithm
FROM user_mining_models
WHERE model_name = 'BGE_LARGE';

MODEL_NAME	MINING_FUNCTION	 ALGORITHM
BGE_LARGE	EMBEDDING	 ONNX

# Option 2: Export to zip file for importing into another database (no database connection needed)
pipeline.export2file("BGE_LARGE", output_dir="/path/to/dir")

For models smaller than 1 GB, you can manually enable external data. If you plan to use the model across many concurrent sessions, enabling external data is essential to reduce memory consumption at the database through in-memory sharing.

from oml.utils import ONNXPipelineConfig

config = ONNXPipelineConfig("intfloat/multilingual-e5-small")
config.use_external_data = True
pipeline = ONNXPipeline("intfloat/multilingual-e5-small", config)
pipeline.export2db("MULTILINGUAL_E5_SMALL")

Note: The export2db function is available for models where each individual initializer is under 1 GB. Models containing initializers larger than 1 GB must be exported using export2file and then loaded to the database using DBMS_VECTOR.LOAD_ONNX_MODEL.

In-Memory Sharing of External Model Data

To use in-memory sharing, first verify that your model has external data:

SELECT EXTERNAL_DATA 
FROM USER_MINING_MODELS 
WHERE MODEL_NAME = 'BGE_LARGE';

EXT
---
YES

Once verified, enable in-memory sharing with the DBMS_DATA_MINING.INMEMORY_ONNX_MODEL procedure:

EXECUTE DBMS_DATA_MINING.INMEMORY_ONNX_MODEL('BGE_LARGE');

You can monitor memory usage and status through the V$IM_ONNX_MODEL view:

SELECT name, populate_status, pin_count
FROM V$IM_ONNX_MODEL 
WHERE name = 'BGE_LARGE';

After a required database restart, models that are marked INMEMORY = YES are still registered in the database, but they are not immediately loaded into the shared memory. In this state, POPULATE_STATUS = INIT indicates that the model is registered but not yet loaded into shared memory.

NAME       POPULATE_STATUS  PIN_COUNT
---------- ---------------- ----------
BGE_LARGE  INIT                     0

This confirms that scoring is the trigger that transitions an INMEMORY-enabled ONNX model from registered to populated in shared memory. The pin_count column shows how many database processes are currently using the shared model. This count increases when sessions create ONNX runtime sessions for inference and tracks active model usage. The populate_status column indicates whether the model is currently loaded into shared memory. This deferred loading behavior allows the database to start quickly without loading all in-memory models at startup.

The first scoring operation automatically loads the model into shared memory. Once loaded, multiple sessions can perform inference without duplicating model weights.

SELECT id, text,
       VECTOR_EMBEDDING(BGE_LARGE USING text AS data) as embedding
FROM documents
WHERE category = 'product_reviews';

EMBEDDING
--------------------------------------------------------------------------------
[6.4764224E-002,-2.557832E-002,-3.38763669E-002,-8.18748921E-002,7.19875321E-002
,-4.26965654E-002,4.41656597E-002,4.36238833E-002,8.98259785E-003,-1.13873659E-0
02,5.90157695E-002,6.37062266E-003,1.75855495E-002, ...

After scoring, the model is now in shared memory and actively in use:

SELECT name, populate_status, pin_count
FROM V$IM_ONNX_MODEL 
WHERE name = 'BGE_LARGE';

NAME       POPULATE_STATUS  PIN_COUNT
---------- ---------------- ----------
BGE_LARGE  ENABLED                 1

At this point:

POPULATE_STATUS reflects that the model has transitioned from INIT to ENABLED.
This indicates that the model is now loaded into the database’s shared memory.
PIN_COUNT has increased to 1, reflecting the number of active sessions using the model during inference

This approach enables efficient model deployment by sharing initializers across sessions while removing the previous 2GB size limitation.

Getting Started

To get started with large ONNX model support and in-memory sharing, download OML4Py 2.1.1 from OML4Py Downloads and follow the code examples above.

Start using these optimizations for your embedding models today.

Resources

OML4Py Downloads
Oracle AI Vector Search User’s Guide
Oracle Machine Learning for Python User’s Guide

Enhanced Embedding Model Deployment Options with OML4Py and OML4SQL

About ONNX and Embedding Models

Scalability Improvements

Requirements

External Model Data and Shared Memory in OML4Py 2.1.1

Using External Model Data in OML4Py

In-Memory Sharing of External Model Data

Getting Started

Resources

Sherry LaMonica

Consulting MTS, AI and Machine Learning Product Management

Introducing Datastore Export and Import in OML4Py 2.1.1

New Version of Embedded Execution for Python and R REST and SQL APIs now available

Enhanced Embedding Model Deployment Options with OML4Py and OML4SQL

About ONNX and Embedding Models

Scalability Improvements

Requirements

External Model Data and Shared Memory in OML4Py 2.1.1

Using External Model Data in OML4Py

In-Memory Sharing of External Model Data

Getting Started

Resources

Authors

Sherry LaMonica

Consulting MTS, AI and Machine Learning Product Management

Introducing Datastore Export and Import in OML4Py 2.1.1

New Version of Embedded Execution for Python and R REST and SQL APIs now available