Usually I take three or four user-submitted questions from the past two months and present those questions and my answers here in each Ask Tom column. In the previous column and the next three, however, I take a look at some key Oracle Database 12c features. These features are all part of the “12 Things About Oracle Database 12c” presentation I gave at Oracle OpenWorld 2012 in San Francisco. (You can find the slides for that presentation on asktom.oracle.com on the Files tab.) The first three Oracle Database 12c features I looked at last time were improved defaults, bigger datatypes, and top-n queries. In this issue, I take a look at a new row-pattern-matching clause and how undo for temporary tables has changed in Oracle Database 12c.
In the beginning, SQL provided the ability to look left and right in a result set. That is, you could look at a row of data, possibly the result of a join of many tables, and apply predicates to that row. You could compare any column in that row with any other column in that row. However, you could not look up and down in a result set. The ability to look up and down was included in Oracle8i Database in the form of analytic windowing functions. Using these analytic functions, you could look not only left and right in a result set but also up and down, using windowing functions such as LAG, LEAD, FIRST_VALUE, LAST_VALUE, and NTH_VALUE. These windowing functions are extremely powerful, and they opened up a whole new way of analyzing data with SQL. As powerful as they are, however, they had some limitations.
Often when processing data, you want to recognize a pattern in an ordered stream of data. For example, you might have an audit trail of data, with columns representing the username, an event that took place, and a time stamp representing when that event took place. You might be interested in finding all the people in that audit trail who used “application X” then “application Y” and then “application Z” and, finally, went back to a specific part of “application X.” Preceding, following, and in between those events might be countless other events. For some users, there might be no events between their interesting actions, but for other users, there might be hundreds or thousands of events. Trying to find such a pattern with analytic windowing functions falls short. You don’t know how many rows backward or forward in the result set you would have to look—LAG and LEAD analysis won’t really help. At the very least, you would have to make multiple passes on the data, use multiple self-joins, or resort to scalar subqueries. The query you’d need to write would be extremely complex, and the cost of executing it would be extremely high.
Enter row pattern matching, implemented via the MATCH_RECOGNIZE clause in Oracle Database 12c. This clause enables you to take a set of data (a result set—your audit trail, for example); partition it into nonoverlapping sets of data (by username in your audit trail); sort these sets (by time stamp in your audit trail); and then look for a pattern that spans many rows in that partitioned, ordered set. With the audit trail example, you would query all records for applications X, Y, and Z. You would partition the records by username and sort within each username by time stamp. Then you would look for one or more Xs, followed by one or more Ys, followed by one or more Zs, and finally followed by the specific part of application X you were interested in. You can do all this processing in a single pass through the data—no self-joins, no Cartesian joins, and no scalar subqueries are required. The query would be relatively easy to code—the MATCH_RECOGNIZE clause is very compact—and the performance of this query would be much better than that of one written without the MATCH_RECOGNIZE clause.
To demonstrate this new feature, I am going to analyze a set of stock data. Stock analysts are often interested in seeing a V- or W-shaped pattern in stock data. That is, they would like to know at which points in time a stock hits a high value, followed by a series of drops in price and then followed by a series of rises in price. They’d like to know when the pattern that defines the V shape started, when it hit the bottom, and when it hit the top again. I’ll start this example by defining a simple stock table:
SQL> create table stocks 2 ( symbol varchar2(10), 3 tstamp date, 4 price number, 5 primary key (symbol,tstamp) 6 ) 7 organization index 8 / Table created.
Now I’ll create a bit of data to analyze:
SQL> declare 2 l_data sys.odciNumberList := 3 sys.odciNumberList 4 ( 35, 34, 33, 34, 35, 5 36, 37, 36, 35, 34, 35, 36, 37 ); 6 l_cnt number := l_data.count; 7 begin 8 for i in 1 .. l_cnt 9 loop 10 insert into stocks 11 ( symbol, tstamp, price ) 12 values 13 ('XYZ', sysdate-l_cnt+i, l_data(i) ); 14 end loop; 15 commit; 16 end; 17 / PL/SQL procedure successfully completed.
Because this set of data is so small, I can analyze it with ASCII art in SQL*Plus easily, as shown in Listing 1.
Code Listing 1: Displaying ASCII art patterns
SQL> select symbol, tstamp, price, 2 rpad('*',price,'*') hist 3 from stocks 4 order by symbol, tstamp; SYMBOL TSTAMP PRICE HIST —————————— —————————————— ————— —————————————————————————————————————— XYZ 01-SEP-12 35 *********************************** XYZ 02-SEP-12 34 ********************************** XYZ 03-SEP-12 33 ********************************* XYZ 04-SEP-12 34 ********************************** XYZ 05-SEP-12 35 *********************************** XYZ 06-SEP-12 36 ************************************ XYZ 07-SEP-12 37 ************************************* XYZ 08-SEP-12 36 ************************************ XYZ 09-SEP-12 35 *********************************** XYZ 10-SEP-12 34 ********************************** XYZ 11-SEP-12 35 *********************************** XYZ 12-SEP-12 36 ************************************ XYZ 13-SEP-12 37 *************************************
This set of data is only 13 rows, so using a brute force method to analyze it works and I can readily see the pattern I am looking for. I can see clearly two V shapes in this data. I can see that on September 1, I have the beginning of a V, which bottoms out on September 3 and peaks again on September 7. I can see the second V easily, too; it begins on the 7th (the beginning of the second pattern is the end of the first—think about how you’d report on that with basic SQL), bottoms out on the 10th, and peaks again on the 13th. I am interested, therefore, in getting two rows of data—each with the three relevant dates. I would like you to think about how you might write a SQL statement in Oracle Database 11g Release 2 and how, before that, you might output two rows (repeating September 7, which is tricky) with the information I am looking for. It can be done, but it is rather complex and the performance would be questionable at best.
With the MATCH_RECOGNIZE clause, this is rather easy to query. For example:
SQL> SELECT * 2 FROM stocks MATCH_RECOGNIZE 3 ( PARTITION BY symbol 4 ORDER BY tstamp 5 MEASURES 6 STRT.tstamp AS start_tstamp, 7 LAST(DOWN.tstamp) AS bottom_tstamp, 8 LAST(UP.tstamp) AS end_tstamp 9 ONE ROW PER MATCH 10 AFTER MATCH SKIP TO LAST UP 11 PATTERN (STRT DOWN+ UP+) 12 DEFINE 13 DOWN AS DOWN.price < PREV(DOWN.price), 14 UP AS UP.price > PREV(UP.price) 15 ) MR 16 ORDER BY MR.symbol, MR.start_tstamp; SYMBOL START_TST BOTTOM_TS END_TSTAM —————— ————————— ————————— ————————— XYZ 01-SEP-12 03-SEP-12 07-SEP-12 XYZ 07-SEP-12 10-SEP-12 13-SEP-12
I’ll walk through this query line by line. The first two lines define the query that defines the result set I’ll partition and order and then apply my pattern to. This query can be any query—with joins, aggregation, and so on. The MATCH_RECOGNIZE clause starts on the end of line 2 and is contained in lines 3 through 15.
On lines 3 and 4, I set up my partitions and order this stream of data. I partition by the stock symbol, and within each of these stocks, I order the data by time stamp, from oldest to newest. Note that my result set is deterministic here. Because my primary key is SYMBOL, TSTAMP, the set of rows will be deterministic from run to run of this query. This fact is typically very important in pattern matching—you want to know that the rows that form your pattern are being observed in a “stable” state, a deterministic state. Otherwise, two executions of the same query might return different answers! See bit.ly/Z6nxLL for a further explanation and examples of why this deterministic behavior is desirable, and usually necessary, for correct results.
Now I’m going to skip ahead a bit to the pattern part of the query:
11 PATTERN (STRT DOWN+ UP+) 12 DEFINE 13 DOWN AS DOWN.price < PREV(DOWN.price), 14 UP AS UP.price > PREV(UP.price)
I’m interested in a data pattern in which I have any record, followed by one or more records in which the price of the stock goes down, followed by one or more records in which the stock price increases. That is what my pattern on lines 11 through 14 represents. I’m looking for a starting record—any record—followed by one or more “down” records, followed by one or more “up” records. To identify “any record,” I use the correlation name STRT, short for start. (That name is rather arbitrary—I could have used any valid identifier.) Because STRT is not in the DEFINE, the definition section, it can match any record—every record in the result set could potentially match and become a STRT record. For a record to become the STRT record, however, it must be followed by a DOWN record, at least one and possibly more (that is the meaning of the + after DOWN). Now, DOWN does have a definition in this case: a record can be considered a DOWN record if and only if the price of that record is less than the price of the previous record in the result set (remember, it is a deterministically ordered stream of data). As long as I have records that match this definition, I’ll be matching DOWN records. When I ultimately get to a record that no longer meets the criterion for being a DOWN record, I’ll continue the pattern matching in the hopes that it is an UP record. An UP record is defined as any record whose price is greater than the previous record’s price.
Now I’ll walk through the pattern in lines 11 through 14 with my actual data set. I start by processing the first record, for September 1. That record meets the criterion for being a STRT record, because there is no defining criterion. The first record can be the STRT record of a pattern match only if the following record is a DOWN record. So, I advance in the result set, get the second record, and apply the definition of DOWN to it. Is its price of 34 less than the prior record’s price, 35? Because it is, I am still matching my pattern; I can advance to the third record and apply the definition again. It is another DOWN record. Then I get to the fourth record—which is not a DOWN record—so I’ve finished matching the STRT DOWN+ portion of the pattern. I must now verify that the fourth record conforms to the definition of UP in my definition. And sure enough, it does. I’ve found the pattern—I know I have some record followed by one or more DOWN records followed by at least one UP record, but the pattern won’t stop here. Much as with regular expressions, I am going to continue matching to find the largest conforming pattern. I take a look at the fifth, sixth, and seventh records and find that they are all UP records, so they become part of the pattern. Then I get to the eighth record, and it does not match the definition of UP anymore, because its price is not greater than the prior record’s price. I am done with the first pattern, and the set of rows that match this pattern are September 1 through 7. I am now ready to output this information.
Going back to the MATCH_RECOGNIZE clause, a MEASURES clause starts on line 5. This identifies the outputs of my query, and in this case, I have asked for the rows in the pattern to be tagged with STRT.tstamp, the timestamp associated with the starting record, the beginning of my pattern. I also asked for LAST(DOWN.tstamp)—the last DOWN time stamp—and LAST(UP.tstamp)—the last UP time stamp. Those three values represent the beginning, the bottom, and the end of the V-shaped pattern. Line 9 of the query—ONE ROW PER MATCH—makes the MATCH_RECOGNIZE clause work very much like an aggregate function. For this one big pattern that spans seven rows of data, I’d like one row to be output with my three measures in it. This will help me analyze the data as I turn millions of rows into tens or hundreds or thousands of rows, which is something I can get my head around.
Now that I’ve found the first pattern, I’m ready to start looking for the next one. The question now becomes, “Where do I start looking for the next pattern?” I could start looking for the next pattern in the second row, but that wouldn’t make sense for this pattern, because I’d find V shapes inside of V shapes over and over again, which is not very interesting. Where I would like to start the search for the next pattern is on the last row of the first pattern. I want to pick up where I left off, and line 10 of the query—AFTER MATCH SKIP TO LAST UP—accomplishes that. It permits the last row of the first pattern in this case to potentially become the first row of the next pattern. And as you can see by the output, September 7 does, in fact, become the first row of the next pattern. September 7 is effectively output twice in this result set. (Think about what you would have to do in regular SQL to get a single row output twice in a result set!)
And that’s it. I take all the pattern matches, order them by the stock symbol and the starting price, and display them. In this case, I’ve taken 13 rows and turned them into 2 rows, each of them representing three points of data. It becomes very easy now to see these patterns and start to interpret them. You can see how you might further analyze and aggregate this data to answer questions such as “What is the average period of time elapsed in a V for a given stock symbol?” “What is the max time?” “How many V shapes do you usually see in a year?” “Is there any common time over the years or months when a stock bottoms out?” and so on.
Kyte’s “12 for 12” presentation is part of the Oracle Database 12c launch webcast.
Kyte covered improved defaults, increased size limits for some dataytpes, and easy top-n and pagination queries in the September/October 2013 issue of Oracle Magazine, and he covered row pattern matching and temporary undo in this issue.
Spoiler alert: Kyte covers more of the top 12 in this column in upcoming issues.
For detailed information on the MATCH_RECOGNIZE syntax, see bit.ly/15x5p0o. In particular, the “SQL for Pattern Matching” chapter in Oracle Database Data Warehousing Guide 12c Release 1 (12.1) is an excellent resource to get started with.
I’ll now take a look at a new way to process undo for global temporary tables in Oracle Database 12c.
Many DBAs and developers used to be surprised to discover that their operations on global temporary tables—such as INSERT, UPDATE, MERGE, and DELETE—generated redo: less redo than they observed on a regular, permanent table but still a sizable amount. Their first question was, “Where does this redo come from?” The answer was that it comes from the undo that Oracle Database has to generate on that global temporary table. The undo for global temporary tables must be generated—the database needs it in case an application issues a rollback and to provide for read-consistent results. For situations in which a developer inserts some information into a global temporary table and then issues a SELECT statement against it, followed by an UPDATE or a DELETE, the rules of read consistency state that the SELECT statement cannot see the effects of the UPDATE or DELETE. To make that possible, the database needs that undo. (See bit.ly/120NB0w for more information on read consistency, one of the core tenets of Oracle Database).
So a modification of a global temporary table needs to generate undo, and the undo tablespace must be protected by redo. In the event of an instance crash, the database needs the redo information in order to recover the undo tablespace so it can then roll back any transaction that was in process but not yet committed when the instance failed. And furthermore, before Oracle Database 12c, the undo tablespace did not distinguish between undo for temporary objects and undo for permanent objects.
But starting in Oracle Database 12c, temporary undo can be stored in the temporary tablespace and undo for permanent objects can be stored in the undo tablespace. What this effectively means is that operations on temporary tables will no longer generate redo. If you have large batch operations that utilize global temporary tables, you may well discover that the amount of redo you generate decreases by a large amount. Furthermore, you’ll be generating less undo in your undo tablespace. And that means you’ll be able to support a longer undo_retention time with a smaller undo tablespace.
Another pleasant side effect of this new change is that global temporary tables can now be utilized in a read-only Oracle Active Data Guard database. Yes, you can now have a read-only database in which you can read and write global temporary tables. Because one of the uses of a global temporary table has historically been in reporting systems—for storing intermediate query results—this makes a read-only Oracle Active Data Guard reporting database that much more viable.
How undo is generated in Oracle Database 12c for global temporary tables is controlled by a new init.ora parameter: temp_undo_enabled. It has two settings: TRUE and FALSE. By default, this parameter is set to FALSE and undo will be generated in the same fashion it was in the past. For example:
SQL> alter session set temp_undo_enabled = false; Session altered. SQL> insert into gtt 2 select * 3 from all_objects; 87310 rows created. Statistics ——————————————————————————— … 566304 redo size … SQ> update gtt 2 set object_name = lower(object_name); 87310 rows updated. Statistics ———————————————————————————— … 8243680 redo size …
As you can see, the INSERT generates about half a megabyte of redo (566,304 bytes) while the UPDATE generates upwards of 8 MB of redo (8,243,680 bytes). If I enable temporary undo, however:
SQL> alter session set temp_undo_enabled = true; Session altered. SQL> insert into gtt 2 select * 3 from all_objects; 87310 rows created. Statistics ——————————————————————————————— … 280 redo size … SQL> update gtt 2 set object_name = lower(object_name); 87310 rows updated. Statistics ——————————————————————————————— … 0 redo size …
the redo is either trivial or nonexistent.
In a read-only Oracle Active Data Guard database, the redo would be entirely nonexistent. This means you’ll potentially be shipping less redo, applying less redo, and enjoying a longer undo retention period in your primary database.
FOLLOW Tom on Twitter
DOWNLOAD Oracle Database 12c
Photography by Dmitri Popov, Unsplash