More on Total Recall for the DW - Use "AS OF" to publish data

After writing the post on the ETL part I was talking to folks over here and we discussed some other interesting aspects of using time windows and "AS OF" query patterns with ETL like scenarios.

The scenario goes a little like this. You have a reporting environment on top of the data warehouse (daah, obviously) and want to make sure data that comes in and is loaded only gets published after it is checked within the context of the entire system.

That means you need to do your loads, update the entire system, but shield the end users from that data until it is verified and certified. Once verified, you want to publish the data and update all data sources for the reports.

TR_publish_subscribe.jpg

The diagram above shows these steps. To make this work, the schema (on the left receiving the ETL loads) is covered with a layer of views that handle the exact timestamp of data visible to the end users.

In step 1: Update the view layer to set the timestamp to a moment before the ETL starts
In step 2: Run the regular jobs
In step 3: Ensure the data is correct and all present
In step 4: Publish the data by updating the viewlayer to a point in time after the ETL load
In step 5: The end users now query the updated data in the warehouse

Now, none of the above is required to achieve read consistency for queries! Oracle does that all by itself without any thoughts from the ETL guys (unlike other databases in the DW space). So let's not confuse those two things. The above is really intended as a mechanism to publish data, all at once (across an entire schema) after it is verified, tested and we like it.

Comments:

How does the performance of the 'AS-OF' views get impacted? Or is the 'historical' data simply sitting in the 'flashback archive' tablespace. Does this require additional I/O reads to get all the data or is it no different than querying regular data?

Posted by Jim Josephs on June 15, 2010 at 12:27 AM PDT #

There should be little impact on the reads as we read an older copy from the flashback archive. Even if there is some impact, that should be only there when we load data, so for a short period. As soon as all data is in, we move the AS OF to the new copy (which is roughly the actual data). JP

Posted by jean-pierre.dijcks on June 15, 2010 at 03:15 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

The data warehouse insider is written by the Oracle product management team and sheds lights on all thing data warehousing and big data.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
2
4
5
6
7
8
9
10
11
12
13
14
16
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today