Artificial intelligence and machine learning are pivotal technologies today. With Oracle Database 23ai, we’re introducing multiple capabilities in Oracle Machine Learning. I’ll briefly summarize these features and point you to resources to help you get started.
Oracle Database 23ai focuses on three key focus areas: AI for Data, Dev for Data, and Mission Critical for Data. Aligned with these areas, Oracle Machine Learning continues to enable data science teams to explore and prepare data, build and evaluate models, and use SQL, Python, and R APIs to develop and augment applications easily with AI/ML functionality.
Importing text transformer models for AI Vector Search
A transformer, or embedding model, is a deep learning model that represents text, images, and other data as vectors in a high-dimensional space. This representation, called an embedding, captures the semantic relationships between inputs – a key aspect for semantic similarity search in vector databases.
One prominent Oracle Database 23ai feature is AI Vector Search. It is a powerful technology that enables you to perform semantic similarity search on your content by generating vectors using transformer models as well as storing and managing those vectors at scale in Oracle Database. With 23ai, Oracle Machine Learning enables loading transformer models in Oracle Database as first-class database objects for use with AI Vector Search.
Using PL/SQL packages DBMS_DATA_MINING (from OML4SQL) or DBMS_VECTOR (from Oracle AI Vector Search), you can load text embedding models in ONNX-format to the database. The Open Neural Network Exchange (ONNX) is an open-source format that defines a common file format and a set of operators for easily sharing and running a wide range of ML models. This feature enables you to deploy such ONNX-format models in the database.
A new related feature in Oracle Machine Learning for Python (OML4Py) makes it easy to access text transformers from Hugging Face, convert these to ONNX format with appropriate pre- and post-processing, and import the ONNX representation in the database for use by AI Vector Search.
Importing ONNX-format ML models in Oracle Database
The ONNX format also supports traditional machine learning techniques like classification, regression, and clustering. Oracle Machine Learning enables using such models using the ONNX Runtime integrated with Oracle Database. Imported ONNX-format models behave similar to native in-database models and can be used with the same prediction operators.
OML in-database algorithm enhancements
On the algorithm front, we’ve enhanced several in-database algorithms:
Extreme Gradient Boosting (XGBoost) now also supports survival analysis, which is particularly useful for use cases like predicting equipment failures and healthcare outcomes. XGBoost also supports feature interaction constraints and monotonic constraints so that data scientists can further limit variable interactions. Such enhancements help to increase model accuracy.
Expectation Maximization (EM) now supports anomaly detection in addition to clustering, complementing the One-class Support Vector Machine anomaly detection algorithm. This is valuable for several use cases, including fraud detection, predictive maintenance, and patient health conditions. EM uses an anomaly probability to classify whether an entity is normal or anomalous. EM estimates the probability density of a given data record, which is mapped to an anomaly probability.
Exponential Smoothing (ESM), which supports time series forecasting, now supports automated selection of model type, enabling you to produce better models without manual or exhaustive search.
ESM also supports Multiple Time Series forecasting in support of time series regression – a powerful technique that supports one or more time series as predictors, along with flag variables, like holidays or events. Use cases include energy demand, economic forecasting, sales and inventory management, public transportation usage, and more. Multiple time series forecasting conveniently generates backcasts and forecasts on one or more time series in the same table or view. The output of multiple time series serves as an input to other machine learning techniques, like regression.
In the following example, we show predicting a meter reading where temperature is a predictor time series, and we combine this with other structured data, for example, flags for holidays or events like promotions.
Explicit Semantic Analysis ((ESA), which supports text analytics feature extraction, supports integrated text mining for in-database algorithms, meaning that the more powerful ESA algorithm will perform the feature extraction to complement other structured data in the table before building models. ESA also supports doc2vec dense projections with embeddings. This is geared toward enhancing the use of structured data with unstructured data, such as call center rep notes on customers or physician notes on patients.
Generalized Linear Model (GLM) for classification now supports additional link functions for enhanced support of binary targets, among other uses. These include probit, cloglog, and cauchit.
Additional OML enhancements
Oracle Database 23ai introduces a Boolean data type, which is also supported by OML4SQL. You may have a Boolean target (e.g., whether the customer churned) or Boolean predictors (e.g., if the customer purchased a particular product).
The database table column limit has been expanded to 4K columns. This allows you to include even more features for in-database machine learning and analytics. Prior to 23ai, OML supported data with more than 1000 attributes using NESTED columns, but this 23ai feature enables support for much wider data sets using a standard table and view representation.
High-cardinality categorical features can pose performance issues. In 23ai, OML efficiently addresses large datasets with millions of categorical values by recoding categorical values to include only those with sufficient support. You can adjust the threshold or disable this feature, depending on solution requirements.
OML in-database models now maintain data lineage. This includes the data query used when a model is built. Data lineage enhances traceability and transparency, supports compliance and auditing requirements, and aids model maintenance.
The in-database partitioned model feature automates the building of multiple sub-models based on partitions of the data. These partitions are determined by the column or columns specified. In 23ai, OML enhances performance when building models with a high number of partitions and efficiently dropping individual model partitions.
Try OML on Oracle Database 23ai today
You can try Oracle Database 23ai with the latest Oracle Machine Learning features now.
Learn more about Oracle Machine Learning with these resources: