Wednesday Dec 18, 2013

ODI 12c - Temporal Data Loading

The temporal validity feature in 12c of the Oracle Database is a great feature for any time based data - see the 12c database OBE here for a demo of the database capability. If you are thinking dimensional data that varies over time.... the temporal validity capabilities of 12c are a great fit, worth checking it out. I wanted to post this to stir discussion in this area, this uses some customized KMs to integrate data into a database table that supports the temporal validity capabilities of the 12c database. This gives a few benefits - one is with respect to querying, you can easily query the current view of the world without adding filters on numerous tables and so on, you can also query back in time - again without having such filters messing up semantics of your logical query.

  1. create table DIM_CUSTOMER (
  2. KEY NUMBER NOT NULL,
  3. H_BID VARCHAR2(5) NULL,
  4. H_ADDR VARCHAR2(20) NULL,
  5. H_ZIP VARCHAR2(5) NULL,
  6. C_BID VARCHAR2(5) NULL,
  7. C_MSTAT VARCHAR2(10) NULL,
  8. C_TELE VARCHAR2(10) NULL
  9. );
  10. alter table DIM_CUSTOMER add period for CURRENT_IND;  

The data tables for your slowly changing dimension tables probably already have start/end timestamps, the above example shows how temporal information can be added to a table, adding the temporal validity capability is as simple as an 'ALTER TABLE' on the existing table. If we then try and reverse this in ODI, the columns to support temporal validity will not be reversed as they are hidden and the RKM Oracle shipped with ODI uses the ALL_TAB_COLUMNS dictionary view which does not project hidden columns. Creating a new RKM Oracle (Inc Hidden) lets us reverse engineer such tables into ODI. You can download my modified RKM here. Using this lets me reverse the datastore above and see the following definition in ODI (the RKM uses all_tab_cols rather than all_tab_columns etc so I can see hidden columns);


You can see the columns CURRENT_IND_START, CURRENT_IND_END and CURRENT_IND have been defined on the table, this was as a result of adding the temporal validity to the table in the ALTER TABLE statement above. I have defined the SCD tags for starting timestamp, ending timestamp and current record flag on the attributes. Note I can also define how new records are versioned, I define that some columns will simply have their values overwritten and other columns will trigger new rows to be added. I have also defined a surrogate key - this is optional, so you do not need to create such annotations to use this IKM.

The data is then loaded and I can specify the start/end timestamps for the row when it is loaded. The IKM will load the data, updating data marked as overwrite and inserting new rows where trigger data has changed.

The IKM is almost identical to the IKMs for slowly changing dimensions with some minor changes - ODI in the slowly changing KM will manage the current row indicator, with Oracle this is a virtual column so is computed. The population of that column has been removed from the KM plus a few other places it was used. I called it IKM Oracle Temporal Data, you can download it here.

 If I load my data then change the status from Married to Single and rerun, then I will get the following rows loaded, note I am querying the surrogate key also since I supplied that metadata (this is optional);

  1. select key, c_mstat,
  2. to_char(current_ind_start,'dd-mon-yyyy') "Start",
  3. to_char(current_ind_end,'dd-mon-yyyy') "End"
  4. from scott.dim_customer;
  5.        KEY C_MSTAT, Start       End               
  6.         11 Married 18-dec-2013 18-dec-2013
  7.         12 Single 18-dec-2013 01-jan-2400

A new surrogate key is generated for the versioned row. If I define only the current time to be used, then the database will automatically filter my data;

  1. exec dbms_flashback_archive.enable_at_valid_time('CURRENT');

  2. select key, c_mstat,
  3.   to_char(current_ind_start,'dd-mon-yyyy') "Start",
  4.   to_char(current_ind_end,'dd-mon-yyyy') "End"
  5. from scott.dim_customer;
  6.        KEY C_MSTAT Start End     
  7.         12 Single 18-dec-2013 01-jan-2400

I can also query as of a particular time without impacting my query

  1. exec dbms_flashback_archive.enable_at_valid_time('ASOF','18-DEC-13 08.28.30.771994 AM');

  2. select key, c_mstat,
  3.   to_char(current_ind_start,'dd-mon-yyyy') "Start",
  4.   to_char(current_ind_end,'dd-mon-yyyy') "End"
  5. from scott.dim_customer;
  6.        KEY C_MSTAT Start End
  7.         11 Married 18-dec-2013 18-dec-2013

Imagine without this, on the query side you will have to have lots of conditional queries based on time for each object that varies by time. With this approach you define the time window and query. Simple. The KMs help you easily load data into such tables too, you can see how flexible ODI is with respect to providing new integration patterns to take advantage of the greatest and latest features. The KMs used in this blog post have been posted on the java.net site to stir ideas and further discussion, I'm interested to hear what you think.

About

Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
2
3
5
6
7
8
9
10
12
13
14
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today