The obvious way to start this particular post is to pose a couple of simple questions: what is a state machine and why should you care? In general I would say that you don't need to know about or care about state machines. That's the beauty of using SQL for pattern matching. The MATCH_RECOGNIZE clause encapsulates all the deep technical modelling and processing that has to be performed to run pattern matching on a data set. However, there are times when it is useful, probably vital, that you understand what is going on behind the scenes and one of the most obvious situations is when backtracking happens.
Therefore, the content covered in this post is a going to be a gently lead-in into my next post where I am going to discuss the concept of “backtracking” and the dreaded ORA-30009 error.
Let’s start our voyage of discovery…when you attempt to run a SQL statement containing a MATCH_RECOGNIZE clause during the compilation phase we generate a finite state machine based on the PATTERN and DEFINE clauses in your statement. What is an Finite State Machine? According to wikipedia:
A finite-state machine (FSM)…is a mathematical model of computation....it is conceived as an abstract machine that can be in one of a finite number of states. The machine is in only one state at a time…changes from one state to another when initiated by a triggering event or condition; this is called a transition. A particular FSM is defined by a list of its states, and the triggering condition for each transition.
Reference from wikipedia - https://en.wikipedia.org/wiki/Finite-state_machine
A state machine, which is the PATTERN and DEFINE elements of your MATCH_RECOGNIZE clause, can be represented by a directed graph called a state diagram. This diagram shows each of the possible states for the “machine” and the conditions that force the machine to either remain in its current state or move to the next state. Below is a simple example of a state machine:
On the above diagram each state is represented by a node (ellipse) which in this case are marked as “State 1” to “State 4”. The arrows, known as Edges, show the transition(s) from one state to another. If you look at states 2 and 4 you will notice that they have two edges although these edges are shown in different vertical positions on the diagram. When drawing a proper state diagram each event is labeled with the event (condition) that triggers transition. Events (conditions) that don’t cause a change of state are represented by a circular arrow returning to the original state and these can be seen on states 2 and 4.
The precedence for reading the information is to read from top-down. What this means is that when in State 2 the FSM will test to see if State 3 can be achieved and if it can’t it will then test to see if State 2 can be maintained. The reverse is true for State 4 where the FSM will test to see if State 4 can be maintained and if it can’t it will then, in this example, either end having determined that a match has completed or start backtracking to try and complete a match. I am sure you can now see how this is going to link into my next blog post.
State machines are not limited to just pattern matching. They have all sorts of other uses. If you want a gentle diversion to look at state machines in a little more detail then try this article by Enrique Ortiz from the OTN magazine in August 2004: Managing the MIDlet Life-Cycle with a Finite State Machine.
All of this flows directly into keywords that appear (or don’t appear) in the explain plans which was covered in this post MATCH_RECOGNIZE and the Optimizer from January 2015. As quick refresher…essentially there are four new keywords that you need to be aware of:
The fist three bullet points are reasonably obvious. The last keyword is linked to the use of “state machine”. Its appearance, or lack of appearance, affects the way our pattern is applied to our data set but that is all explained in the blog post. Most of my MATCH_RECOGNIZE examples are based on the stock ticker data set. Let’s assume that we are searching for V-shaped patters in our data set (https://docs.oracle.com/database/121/DWHSG/pattern.htm#CACHHJJG):
SELECT *
FROM Ticker MATCH_RECOGNIZE (
PARTITION BY symbol
ORDER BY tstamp
MEASURES STRT.tstamp AS start_tstamp,
LAST(DOWN.tstamp) AS bottom_tstamp,
LAST(UP.tstamp) AS end_tstamp
ONE ROW PER MATCH
AFTER MATCH SKIP TO LAST UP
PATTERN (STRT DOWN+ UP+)
DEFINE
DOWN AS price < PREV(down.price),
UP AS price > PREV(up.price)
) MR
ORDER BY MR.symbol, MR.start_tstamp;
this is what the state diagram would look like:
These diagrams can be really helpful when you have more complex patterns and you need to consider the impact of backtracking. This posts is all about laying the building blocks for my next post on backtracking and the dreaded ORA-30009 error. If you have managed to read this far then you are guaranteed to be ready for an in-depth look at what happens inside MATCH_RECOGNIZE when we have move from right to left through our state diagram in an attempt to find a complete match. Now you should know everything you need to know about state machines and I am going to carry over the “why care” part to the next post…
If you want a recap of where we are in this series of pattern matching deep dive posts here is the full list:
As per usual, if you have any comments or questions then feel free to contact me directly via email: keith.laker@oracle.com
Technorati Tags: Analytics, Data Warehousing, Database 12c, SQL, SQL Analytics