ODI 12c - Sessionizing Data with MATCH_RECOGNIZE and SQL Override

The Oracle 12c database introduced a very useful new pattern matching capability, useful for many different use cases. The example I will use in this blog post within ODI is sessionizing data - some log data with sample URLs, IP addresses initiating the clicks and their timestamps. To incorporate this in ODI, I will use one of the lesser known capabilities that many users often want to use - it is the ability to override the generated SQL. I blogged about how this could be done in ODI 11g here and here - in this post I will show how this can be done in 12c.

There are some great posts and presentations from Keith Laker and many other partners and users on the MATCH_RECOGNIZE capability, the post on Sessionization with 12c pattern matching is super fast is a great read, check it out! I will base the SQL used on that post. 

My source data has the following information;

  1. create table clicks (IP varchar2(16), Click_Timestamp timestamp, URL varchar2(200));
  2. insert into clicks values ('192.65.20.33','03-JUL-2014 11:33:15','stubhub.com');
  3. insert into clicks values ('192.65.20.33','03-JUL-2014 11:33:21','mlssoccer.com');
  4. insert into clicks values ('192.65.20.33','03-JUL-2014 11:59:08','netflix.com');
  5. insert into clicks values ('192.65.9.89','04-JUL-2014 09:02:19','amazon.com');
  6. insert into clicks values ('192.65.9.89','04-JUL-2014 09:02:26','amazon.com/travel');
  7. insert into clicks values ('192.65.9.89','04-JUL-2014 09:48:27','citibank.com');
  8. insert into clicks values ('192.65.20.33','04-JUL-2014 10:01:05','ebay.com');
  9. commit;

The type of query we need to generate a session ID using MATCH_RECOGNIZE is shown below. As in Keith's blog, we defined a session as a sequence of one or more events with the same partition key (IP address) where the gap between the timestamps is less than 10 seconds - in this demo, the gap is completely arbitrary and could be set to any number as required. Here we detect the various sessions and assign a surrogate session id to each session within each partition (IP). 

  1. SELECT ip, session_id start_time, no_of_events, duration
  2. FROM clicks MATCH_RECOGNIZE
  3.  (PARTITION BY ip ORDER BY click_TimeStamp 
  4.   MEASURES match_number() session_id, 
  5.            count(*) as no_of_events,
  6.            first(click_timestamp) start_time, 
  7.            last(click_timestamp) - first(click_timestamp) duration 
  8.   ONE ROW PER MATCH 
  9.   PATTERN (b s*) 
  10.   DEFINE 
  11.        s as ((s.click_TimeStamp - prev(s.click_TimeStamp)) <= TO_DSINTERVAL('0 0:00:10.0')) 
  12.  ); 

The Oracle syntax (...FROM clicks MATCH_RECOGNIZE....) for this operation is a little different from other SQL constructs, it isn't a join, it isn't a table function, it might be a custom component - but until that's public need another way. You could create a view and use that in ODI. Another option is to use the SQL override, custom template capability. This let's you define any arbitrary code for a source component in a mapping. Source datastores in mappings have a physical option named CUSTOM_TEMPLATE. You can define code here such as the SQL to produce the sessionized data above. The code can use ODI's substitution reference methods to make the code more generic (parameterized schemas etc.). For example the physical map design below defines a mapping with a source datastore defining the attributes projected from my query. I defined the datastore in an ODI model - it isn't actually in a database, it's just used in the mapping.

As it's description states 'Custom SQL statement as an inline view to override default code generation.' is what this is for. 

 

Note this option is under the extract options on the source datastore's physical node, the value for CUSTOM_TEMPLATE can be edited and arbitrary code, in this case SQL, defined. Double clicking the value field in the KM options table brings up an editor where you can see the entire code block I have defined. Note below, I have hard-wired the clicks table in the SQL;

To make this more generic and flexible, I can use the odiRef methods to get the name of the clicks table generated based on the topology defined in ODI. My clicks table is in a model in ODI and the use of odiRef.getObjectName below let's me get the name correctly generated depending on the generation context - perhaps in one context it is in schema DEVX and in another it is in DEVY, using the SDK shields us from such changes.

You will note also that there is another KM option named SELECT_HINT, this is used by the default 12c code generation capabilities but if you override the code using this custom code option described above you can still incorporate the hints supplied in the mapping. So if you define a value for the KM option SELECT_HINT you can have it included in your custom code by using this getOption method - pretty flexible! 

Hope this is a useful insight into some of the capabilities provided in ODI 12c. I used the latest and greatest 12.1.3 release in this post, that's what you will need to use this functionality.

Comments:

very good post indeed!!!

I tried to use this option in 12c. But the LKM is not taking this code. Is there any special settings that I should do before using EXTRACT_OPTIONS?

Thanks,
Julia

Posted by guest on December 06, 2014 at 01:27 AM PST #

Hi Julia

Use at least 12.1.3, I think there was an issue in 12.1.2 - which version did you use?

Cheers
David

Posted by Julia on December 06, 2014 at 08:32 AM PST #

Yes David,

Thank you so much for quick response David. Yes, I am using ODI_12.1.3. I are not getting any failures. Just that the interface is not taking custom query during load C$. I am using IKM SCD on my target, Will that be causing an issue?

Thanks,
Julia

Posted by Julia on December 09, 2014 at 09:47 AM PST #

Hi Julia

Is the source datastore the only thing in your source execution unit and you are using LKM SQL to SQL/Oracle? I think I can recreate the issue with that case. If I add a source transformation such as a filter then it looks like the query if taken or if I use LKM Oracle to Oracle (pull dblink) then it also works ok. That's odd.

Cheers
David

Posted by David on December 09, 2014 at 09:56 AM PST #

Hi David,

Me too facing the same issue.The custom query is not picked up by the code.The souce execution unit has only the source datastore.The target and source datastores are in same database schema.Tried with the LKM Oracle to Oracle)pull dblink but did not work.I am using IKM Control Append and using 12.1.3 version.Any help on this is highly appreciated.My custom query is given below:

SELECT
LINE_ID,
COVERAGE_CD,
CHANGE,
ROW_NUMBER() OVER(partition by LINE_ID,COVERAGE_CD order by LINE_ID) AS PORTABLE_LIMIT ,
PDM_KEY,
BATCH_ID,
COVERAGE_XML_ID,
BOAT_ID,
TRAILER_ID,
BOAT_XML_ID,
TRAILER_XML_ID

FROM
<%=odiRef.getObjectName("L","COVERAGE","PDMPUB","D")%>

Thanks ,
NV

Posted by guest on December 17, 2014 at 11:28 AM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality

Search

Archives
« March 2015
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
    
       
Today