Authors: Divya Nandakumar, Zhi Lin
A snapshot captures an image of source data at certain point of time, and preserves the data image plus a (snapshot) time label. A regular, transactional fact intends to store data in data warehouse format and reflect OLTP transactions with near-real-time latency. In this context, a regular fact basically captures the near-current snapshot of the OLTP transactions, and is capable to support status de quo analysis.
A snapshot history fact, (or snapshot fact, in short,) accumulates a series of snapshots and preserve all, each with a different time label. In this way, the change history of each transaction is preserved by the snapshot series. This fact is very useful to do trend analysis over time.
The typical attribute of a snapshot history fact is–
Many attributes of the original transactions are kept and inherited, like–
Foreign keys to the dimensions
In the case of a Business need to Analyze DAILY quantity-on-hand inventory levels by product and store for a Business process like “Retail store inventory”, the Granularity could be - Daily inventory by product at each store, the Dimensions - Date, product, store & Fact - Quantity on hand.
Storing source system daily snapshot would have a serious impact on storage of Data Warehouse. The first solution is to keep daily snapshots for a limited rolling period, like last 90 days for example. The daily snapshot table would accumulate daily snapshots from 1 day, 2 days,…, until 90 days. After that, it would always drop off the oldest daily snapshot, before it adds one more daily snapshot. Hence the ETL should always delete the snapshots older than 90 days first, and then append a new snapshot.
This method enables to keep a fairly granular of snapshots on daily level. However, older snapshots are not kept, so it’s not good for long term historical trending.
At the end of the month all accounts have their month ending balance captured. The event is the end of the month, and the month is stored as part of the data warehouse. The selection program reads through the operational data and upon encountering a record that meets the qualifications, moves the record to the data warehouse. At the end of the month, each account is queried and the balance of the account at the end of the month is transferred to the data warehouse environment. One account may have had no activities during the month and another account may have had 200 activities during the month. Both accounts will show up as exactly one record in the data warehouse environment. No continuity of activity is assumed using this technique.
The passage of time - day end, week end, month end, etc. - is all common ways of triggering a snapshot. But the periodic passage of time is hardly the only way that snapshots are triggered.
The Monthly snapshot table stores snapshots of all previous day’s historical data. ETL design would have a preload mapping which deletes the data loaded for current month, based on Current Month End date and then load with the latest data for current month.
In this way, we “aggregate” up from daily snapshots and archive great compact on data size. Longer term history trending can be stored and reported. However, we lost gross levels of details in between every two month ends.