Announcing the general availability of OCI Anomaly Detection features

February 22, 2023 | 5 minute read
Aparna Chaturvedi
Product Manager
Text Size 100%:

At Oracle Cloud Infrastructure (OCI), we strive to continuously improve our Anomaly Detection service. OCI Anomaly Detection enables you to monitor and detect anomalies in your time-series data. Today, we’re excited to announce the general availability of the following new capabilities for the Anomaly Detection service:

  • Univariate anomaly detection

  • Multivariate anomaly detection improvements

  • Asynchronous detection

Univariate anomaly detection

Univariate anomaly detection (UAD) refers to the problem of identifying anomalies in a single time series data. A single time series data contains timestamped values for one signal, such as metric or measure.

A graph showing the server health over time with spikes and anomalies caught by UAD.

With this release, we now have fully fledged support for detecting anomalies in univariate signals that allow you to detect different types of anomalies in univariate signals: Point, collective, and contextual anomalies.

Point anomaly detection

Point anomaly detection finds single data points that are unusual compared to the rest of the data in the dataset.

A graphic depicting a line graph for network service usage training data and a line graph for network service usage: test data and service flagged anomalies.

Collective anomaly detection

Collective anomaly detection finds related anomalous data instances compared to the whole dataset.

A graphic showing two line graphs comparing the train original and test original datasets and their range anomalies.

Contextual anomaly detection

Contextual anomaly detection finds data points that are considered abnormal when viewed against contextual attributes associated with the data points. Viewing data in the context of time, or in the context of time-related concepts such as seasons, weekdays, and weekends, can reveal anomalous behavior directly correlated with such context.

A graphic depicting two line graphs of the train original and test original, showing the contextual anomalies.

Other improvements include the following examples:

  • Performance improvements for both training and evaluation

    • Now using fast numerical expression evaluator for NumPy (NumExpr) instead of Numpy for algebraic and transcendental function evaluations

    • Intel Math Kernel Library (MKL) support to accelerate function evaluation

     

  • Improve performance of sklearn pair wise distance metric calculation for improved detection: Implemented step-wise matrix multiplication to replace loops used in sklearn package

  • Implement efficient memory handling for large batch size (Up to 10K): Implemented batch-based column processing to calculate the pair-wise distance to avoid memory issues due to storing large matrix

  • Preprocessing improvements

    • Inter quartile range (IQR) based outlier detection and removal

    • Trend and seasonality decomposition: Seasonal trend decomposition using Loess (STL) or linear detrending

  • Kernel improvements for one-class support vector machines (OCSVM) using automatic hyperparameter tuning: Dynamic window size selection using periodicity detection and autocorrelation function and heuristic based frequency detector.

  • Postprocessing improvements during detection: We prune excess anomalies by suppressing anomalies that appear consecutively in groups larger than the window size to avoid excessive flagging beyond window size data points.

  • User-specified tuning: Added sensitivity parameter in detection allows you to adjust the number of anomalies flagged by selecting the appropriate threshold, without having to retrain.

Multivariate Anomaly Detection MSET2

This release also introduces MSET2 Multivariate AD kernel to support large batch size calls with asynchronous detection, which greatly improves detection accuracy specifically for prognostics use cases. This capability helps surveillance-based anomaly detection use cases to detect anomalies in the context of a historical state with the following details:

  • Call the service using explicit option for multivariate MSET using asynchronous detection.

  • The service computes states based on cumulative sums using appropriately large batch size internally.

  • This availability offers improved performance by retaining the historical context, resulting in a lower missed alarm rate when compared synchronous detection.

A graphic depicting an example use for MSET2.

Asynchronous detection

Customers can now use the Asynchronous Detection API on large to very large data sets (100 million–billions of data points). This API has the following capabilities:

  • Extends the existing Anomaly Detection service capabilities

    • Supports large datasets (From 30K data points to 100 million+ data points)

    • Supports Training Data with up to 1000+ signals (Available on request)

    • High model accuracy by enabling model training with better model characteristics, such as window size and memory vectors

  • Allows input inline or a list of objects in OCI Object Storage. The different modes of input provide you flexibility before onboarding to the service.

  • Frees you from having to develop custom apps to perform anomaly detection on large data sets

  • Can be extended to provide more capabilities such as automated preprocessing, retraining, and ground truth integration for continuous model improvements.

Other Asynchronous improvements include the following examples:

  • Encryption of intermediate data

  • Load balancing: IP virtual service (IPVS) for network routing within K8s cluster for load balancing

  • Parallel request handling from database queries: Optimistic locking for database queries to handle parallel requests

  • Autoscaling for pods horizontally and clusters

Conclusion

For more information on Oracle Cloud Infrastructure Anomaly Detection service, see the following resources:

Aparna Chaturvedi

Product Manager


Previous Post

Advancing investment portfolio management with AI: The impact of M6 competition

Ankit Aggarwal | 5 min read

Next Post


What is synthetic data?

Bob Peulen | 5 min read