More on Total Recall for the DW - Use "AS OF" to publish data
By Jean-Pierre Dijcks on May 18, 2009
After writing the post on the ETL part I was talking to folks over here and we discussed some other interesting aspects of using time windows and "AS OF" query patterns with ETL like scenarios.
The scenario goes a little like this. You have a reporting environment on top of the data warehouse (daah, obviously) and want to make sure data that comes in and is loaded only gets published after it is checked within the context of the entire system.
That means you need to do your loads, update the entire system, but shield the end users from that data until it is verified and certified. Once verified, you want to publish the data and update all data sources for the reports.
The diagram above shows these steps. To make this work, the schema (on the left receiving the ETL loads) is covered with a layer of views that handle the exact timestamp of data visible to the end users.
In step 1: Update the view layer to set the timestamp to a moment before the ETL starts
In step 2: Run the regular jobs
In step 3: Ensure the data is correct and all present
In step 4: Publish the data by updating the viewlayer to a point in time after the ETL load
In step 5: The end users now query the updated data in the warehouse
Now, none of the above is required to achieve read consistency for queries! Oracle does that all by itself without any thoughts from the ETL guys (unlike other databases in the DW space). So let's not confuse those two things. The above is really intended as a mechanism to publish data, all at once (across an entire schema) after it is verified, tested and we like it.