ODI 12c - Sessionizing Data with MATCH_RECOGNIZE and SQL Override
By David Allan-Oracle on Jul 23, 2014
The Oracle 12c database introduced a very useful new pattern matching capability, useful for many different use cases. The example I will use in this blog post within ODI is sessionizing data - some log data with sample URLs, IP addresses initiating the clicks and their timestamps. To incorporate this in ODI, I will use one of the lesser known capabilities that many users often want to use - it is the ability to override the generated SQL. I blogged about how this could be done in ODI 11g here and here - in this post I will show how this can be done in 12c.
There are some great posts and presentations from Keith Laker and many other partners and users on the MATCH_RECOGNIZE capability, the post on Sessionization with 12c pattern matching is super fast is a great read, check it out! I will base the SQL used on that post.
My source data has the following information;
- create table clicks (IP varchar2(16), Click_Timestamp timestamp, URL varchar2(200));
- insert into clicks values ('126.96.36.199','03-JUL-2014 11:33:15','stubhub.com');
- insert into clicks values ('188.8.131.52','03-JUL-2014 11:33:21','mlssoccer.com');
- insert into clicks values ('184.108.40.206','03-JUL-2014 11:59:08','netflix.com');
- insert into clicks values ('220.127.116.11','04-JUL-2014 09:02:19','amazon.com');
- insert into clicks values ('18.104.22.168','04-JUL-2014 09:02:26','amazon.com/travel');
- insert into clicks values ('22.214.171.124','04-JUL-2014 09:48:27','citibank.com');
- insert into clicks values ('126.96.36.199','04-JUL-2014 10:01:05','ebay.com');
The type of query we need to generate a session ID using MATCH_RECOGNIZE is shown below. As in Keith's blog, we defined a session as a sequence of one or more events with the same partition key (IP address) where the gap between the timestamps is less than 10 seconds - in this demo, the gap is completely arbitrary and could be set to any number as required. Here we detect the various sessions and assign a surrogate session id to each session within each partition (IP).
- SELECT ip, session_id start_time, no_of_events, duration
- FROM clicks MATCH_RECOGNIZE
- (PARTITION BY ip ORDER BY click_TimeStamp
- MEASURES match_number() session_id,
- count(*) as no_of_events,
- first(click_timestamp) start_time,
- last(click_timestamp) - first(click_timestamp) duration
- ONE ROW PER MATCH
- PATTERN (b s*)
- s as ((s.click_TimeStamp - prev(s.click_TimeStamp)) <= TO_DSINTERVAL('0 0:00:10.0'))
The Oracle syntax (...FROM clicks MATCH_RECOGNIZE....) for this operation is a little different from other SQL constructs, it isn't a join, it isn't a table function, it might be a custom component - but until that's public need another way. You could create a view and use that in ODI. Another option is to use the SQL override, custom template capability. This let's you define any arbitrary code for a source component in a mapping. Source datastores in mappings have a physical option named CUSTOM_TEMPLATE. You can define code here such as the SQL to produce the sessionized data above. The code can use ODI's substitution reference methods to make the code more generic (parameterized schemas etc.). For example the physical map design below defines a mapping with a source datastore defining the attributes projected from my query. I defined the datastore in an ODI model - it isn't actually in a database, it's just used in the mapping.
As it's description states 'Custom SQL statement as an inline view to override default code generation.' is what this is for.
Note this option is under the extract options on the source datastore's physical node, the value for CUSTOM_TEMPLATE can be edited and arbitrary code, in this case SQL, defined. Double clicking the value field in the KM options table brings up an editor where you can see the entire code block I have defined. Note below, I have hard-wired the clicks table in the SQL;
To make this more generic and flexible, I can use the odiRef methods to get the name of the clicks table generated based on the topology defined in ODI. My clicks table is in a model in ODI and the use of odiRef.getObjectName below let's me get the name correctly generated depending on the generation context - perhaps in one context it is in schema DEVX and in another it is in DEVY, using the SDK shields us from such changes.
You will note also that there is another KM option named SELECT_HINT, this is used by the default 12c code generation capabilities but if you override the code using this custom code option described above you can still incorporate the hints supplied in the mapping. So if you define a value for the KM option SELECT_HINT you can have it included in your custom code by using this getOption method - pretty flexible!
Hope this is a useful insight into some of the capabilities provided in ODI 12c. I used the latest and greatest 12.1.3 release in this post, that's what you will need to use this functionality.