X

Information, tips, tricks and sample code for Big Data Warehousing in an autonomous, cloud-driven world

More on Total Recall for the DW - Use "AS OF" to publish data

Jean-Pierre Dijcks
Master Product Manager

After writing the post on the ETL part I was talking to folks over here and we discussed some other interesting aspects of using time windows and "AS OF" query patterns with ETL like scenarios.

The scenario goes a little like this. You have a reporting environment on top of the data warehouse (daah, obviously) and want to make sure data that comes in and is loaded only gets published after it is checked within the context of the entire system.

That means you need to do your loads, update the entire system, but shield the end users from that data until it is verified and certified. Once verified, you want to publish the data and update all data sources for the reports.

TR_publish_subscribe.jpg

The diagram above shows these steps. To make this work, the schema (on the left receiving the ETL loads) is covered with a layer of views that handle the exact timestamp of data visible to the end users.

In step 1: Update the view layer to set the timestamp to a moment before the ETL starts

In step 2: Run the regular jobs

In step 3: Ensure the data is correct and all present

In step 4: Publish the data by updating the viewlayer to a point in time after the ETL load

In step 5: The end users now query the updated data in the warehouse

Now, none of the above is required to achieve read consistency for queries! Oracle does that all by itself without any thoughts from the ETL guys (unlike other databases in the DW space). So let's not confuse those two things. The above is really intended as a mechanism to publish data, all at once (across an entire schema) after it is verified, tested and we like it.

Join the discussion

Comments ( 2 )
  • Jim Josephs Tuesday, June 15, 2010
    How does the performance of the 'AS-OF' views get impacted?
    Or is the 'historical' data simply sitting in the 'flashback archive' tablespace. Does this require additional I/O reads to get all the data or is it no different than querying regular data?
  • jean-pierre.dijcks Tuesday, June 15, 2010
    There should be little impact on the reads as we read an older copy from the flashback archive. Even if there is some impact, that should be only there when we load data, so for a short period. As soon as all data is in, we move the AS OF to the new copy (which is roughly the actual data).
    JP
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.