Oracle AI & Data Science Blog
Learn AI, ML, and data science best practices

SAX and Matrix Profile Techniques for Root Cause Analysis

Supreet Oberoi
Vice President, IoT and Big Data Applications

For a long time now, Oracle applications have been used to plan and execute manufacturing, supply chain, logistics, and product lifecycle management at Fortune companies. As many of these industries go through the natural cycle of rebirth (perhaps catalyzed by transformational technologies such as IoT, big data, and visualization), executives are asking us how they can make their businesses more responsive to the customer, market, and operations of the business.

Root Cause Analysis Use Case 

Let’s take a real-world example. This use case comes from one of our first Oracle IoT customers. The company is a leading manufacturer of vehicle sensors. Most of these are low-cost devices, so producing these products at scale—without having a large percentage of part rejects and machine downtime—is a key part of the manufacturer staying profitable. 

As background, a typical part goes through multiple production steps before it is finished. At each step, the machine operating on the part collects data about the product and the state of the machine at that moment. Products that fail the quality test are discarded from the line. However, sometimes, the product defects are discovered after it has been sold to their final customer. When such incidents are reported, a process engineer is tasked with answering the following questions:

  • Which production batch created this part? What are the other parts?

  • What caused the defect?

  • Can the cause of the defect be present in manufacturing other products?

  • How do I predict and prevent such defects? 

To proceed, the process engineer has to collect all the operational data and develop a hypothesis on what may have gone wrong. Once the hypothesis is developed, the process engineer must then simulate the check on historical data in order to ensure that no false positives are detected. Finally, the process engineer has to deploy the check in operations in order for failures to be identified in real-time.


Figure 1: Root Cause Analysis is expensive in all ways

This is an incredibly time-consuming and expensive process. Breaking down the operational data silos, aggregating and correlating the data, and using domain knowledge in order to create hunches is no easy matter. The analytical tools for such roles are also severely limiting—they cannot handle the scale nor the complexity of the data. Finding if two patterns are similar to each other is computationally very expensive, making it tough to compute in real-time. Traditional techniques to approximate the data such as dimensionality often lead to false positives—no bueno. 

Detecting such trends visually through an Oscilloscope-like interface is not useful either. Many events are either microscopic and not visible to the human eye, or they occur over a large span of time so they are, again, undetectable. What is required is a real-time method to track patterns in shapes of time series data over large ranges of time.

Introduction To SAX

Time series data has high dimensionality, noise, repetition, and throughput when processed in real-time. Before any meaningful analytics can be done on it, we need to simplify the data set. The purpose of simplification is twofold: first, to reduce the volume so that we can do data mining efficiently; second, to simplify the data structure so that we can apply existing clustering and classification techniques to the time series data set.

Symbolic Aggregate approXimation (SAX) is a well-known and tested technique invented by Dr. Eamonn Keogh and his research team at the University of California at Riverside. SAX allows time series of arbitrary length n (the data set can be as big as you want) to be reduced to string of arbitrary length w (data can be reduced to a deterministic size) without losing the accuracy required to do comparison searches.

SAX achieves dimensionality reduction by implementing a very famous technique called the Piecewise Aggregate Approximation.


Figure 2: A time series chart with 128 readings

To understand how it works, consider the chart in Figure 3. In order to reduce the number of points in the chart without losing its differentiating feature, we will do the following:

Normalize the Data

We will normalize any time series data set to have a mean of zero and a standard deviation of 1. This helps us in comparing similar series with huge invariances.

Discretize the Y-Axis

In other words, we take the readings on the y-axis and assign it to a fixed number of buckets. The size of the bucket is determined by our choice of the number of alphabets we plan to use to label the y-axis—the alphabet size. The alphabet size is usually between 6 and 8. To pick the boundaries of buckets for the y-axis, we need to use a breakpoint table. This is because the measurement values follow a Gaussian distribution (i.e., most measurements are concentrated in the middle). To get a uniform distribution of y-values across different buckets, we need to use the breakpoint table to identify the boundaries.


Figure 3: Breakpoint table for Gaussian distribution and number of frames

Discretize the X-Values 

This is done by taking all the measurements on x-axis for a given word size and finding a mean value. 

Apply Labels

Finally, we apply labels (“alphabets”) to the discretized x values based on the Gaussian chart. In this way, we can take a time series chart and get a simple string such as accdfbad—now we can do so much stuff on this string using standard analytical techniques!

Measure the Similarity Between Two Strings

A simple string match may not be what we desire. For example, “abcd” is closer to “abcc” than “abca” since their PAA values are closer (“d” is closer to “c” than “a”)—simple string comparison will return both as equally similar. For this, we implemented the MINDIST function to measure distance. One of the key parameters for MINDIST is tolerance, which specifies the distance between two strings may have between them to be called as the “same.” 

To summarize, implementing SAX allows us to do the following on time series data sets: 

  • Determine when the shape has broken a known pattern

  • Classify data set behavior into clusters

  • Make predictions using classical regression techniques

  • Implement time series JOINS for similar shapes by employing string search-like functions

Commercializing SAX

To commercialize the offering for our industrial IOT customers, we added the ability to track patterns in shapes of time series data in our IOT Asset Monitoring Cloud Service.

When implementing the SAX profile and deploying it in commercial settings, we in Oracle IOT product development made the following observations: 

  • Smaller alphabet sizes lead to large number of false positives, while very large alphabet sizes can lead to missing legitimate matches. We found that an alphabet size of 8 works well in most cases, but the alphabet size should not exceed 10 or be less than 6. 

  • The number of false positives is much less if the pattern is distinct compared to the rest of the data (for example, a strong spike). If the data changes gradually, and there are many such instances in the data, the number of false positives will turn out very high.

  • We should not support a tolerance value when using the MINDIST function for comparison. Even with a simple MINDIST function, we get some false positives. Adding tolerance on top of this only increased the number of reported false positives. Plus, it may not be obvious to the user what is the best value of tolerance to use. Setting it too high might further increase the number of false positives.

  • We have to configure the alphabet size depending on the amount of variation in the data, but for now we do not have a way to do this automatically.

Making SAX Efficient With Matrix Profile

While SAX met our goals to identify anomalous events of interest and search for them, we realized that we had to tune parameters for word length and alphabet size. In addition, to ensure that we could track these events in the future, what we wanted was the ability to search for these patterns in real-time. For this, we built Matrix Profile, which computes a companion time series data set. Matrix Profile, for a sequence, records the distance to its most similar neighbor. For example, the sequence starting at 921 has its nearest neighbor at a distance of 177.


Figure 4: Matrix Profile is a companion time series that specifies distance to the similar sequence 

Computing Matrix profile is expensive (by default, it is O(N^2), but there are additional techniques to make matrix profile computation efficient by specifying exclusion zones (STMP), using convolutions (MASS) and computing distances randomly (STAMP) which reduce it to an O(nlogm) problem. For our implementation, we used the STAMP technique to compute matrix profiles.

Oracle’s IoT Applications Cloud Service sits on a big data platform with Spark, which proved very useful in parallelizing the tasks for computing the matrix profile. In addition, the matrix profile gave us additional insights without extra computation.


Figure 5: Matrix Profile helps in motif and anomaly discovery

For example, a dip in the Matrix Profile represents a similar pattern nearby. This can help us in identifying a default (or baseline) behavior or do motif discovery. Conversely, a spike in Matrix Profile means that the there is no close repeating pattern nearby—an anomaly.

We can also infer some complex behavior changes without significant additional complexity. For example, if a series of dips changes to a series of spikes, it means that the behavior of the signal has changed—a regime change. One example of this could be a pedometer reading when a person starts running. Understanding when regime changes occur helps us in avoiding false positives in anomaly detection.

The Matrix Profile can compute distance to the nearest neighbor not only within the same time series data set, but also with another time series—in this case, the Matrix Profile vector functions as a foreign key in a table, allowing for efficient joins.

One of the questions developers implementing the algorithms for their services ask is when to use SAX and the Matrix Profile. If you are dealing with “short” patterns, Matrix Profile is what you need. If you are dealing with “long” patterns, some summary method such as SAX is what you need.

Putting It Together: Revisiting Root Cause Analysis


Figure 6: Our solution weaved implementations for SAX and Matrix Profile with innovations in UX.

Now let’s go back to the root case analysis. We used Matrix Profile to continually scan the real-time sensor reading to give hints and hunches to the operator that something different is occurring. We implemented UX inspired by simple Oscilloscope-like interface where the operator can either ignore our hunch (they have a bigger context—for example, the line just changed to building a new part) or search for that pattern in history. For this search, we use SAX. If the operator discovers a new pattern that she wants to monitor (for example, create a maintenance order when that signals arises), she will create that pattern with our UX and deploy it for real-time monitoring using Matrix Profile.

Design Principles

This algorithm was intended to benefit plant managers identify and track anomalous events. So, understanding how the algorithm would be finally used was important in the design process. In order to make the algorithm commercially successful in a variety of verticals, we designed it with certain principles in mind.

First, we did not position our solution as prescriptive—i.e., telling you what went wrong. That is a high stakes bet that we were not willing to make at this stage. Instead, we give a hunch to the process engineer. It is up to the operator to determine if that hunch is correct (and we learn from it). For the process engineer, even figuring out where to start looking for the root cause is a big step, and us contributing to make this step efficient counts as significant value add.

Second, we modeled our UX around how an Oscilloscope operates. This is important because electrical and mechanical engineers relate best to this tool. Our goal is not to just provide our users with an emotionally-comfortable user interface, but fit it into an existing operating model. Disruptive and innovative technologies are good for enterprise sales, but disruptive operating models are usually not good for sales. By building an oscilloscope-like interface, we ensured that our techniques for root-cause analysis could make their existing processes efficient, and not require new processes to be designed.

Lastly, when building the product, we drove requirements for algorithms through UX in our software development and prototyping stage. We resisted the urge of data scientists to unintentionally hijack the value narrative with complex stuff. From these flowcharts, we designed the UX experience. Then we challenged the data scientists to enable the UX experience.


Overall, we found the SAX algorithm to provide a great foundation to explore machine data and efficiently apply clustering and classification techniques. With Matrix Profile, we demonstrated how to parallelize the computation of the distance profile to determine in real-time if the event is anomalous. Using novel UI controls that let the user explore time series datasets as graphs (instead of traditional charts), we validated that to gain true productivity improvements, visualization has to be an equal partner in the design of the algorithm.

Special thanks to Cristian Toma, Vlad Petrovici and Marius Trufas who were instrumental in implementing SAX and Matrix Profile for Oracle IoT.


Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.