Snapshot Facts in OBIA (2)

Authors: Divya Nandakumar, Zhi Lin

Range Snapshot History Fact

In a data warehouse implementation, the volume of OLTP transactions could be very big already, and consequentially the volume of the snapshot fact could be humongous, depending on the snapshot frequency.  The dilemma is that better accuracy of change history would be achieved with more frequent captures, which makes data size overwhelming and performance badly impacted.

A solution is to create a variation of the snapshot history fact, which we call snapshot period history fact. The idea is simple, concatenating consecutive snapshots, if they happen to share identical images, of a transaction into a new snapshot. The new snapshot would be labeled with starting as well as end timestamps, which indicates the time period the image lasts. This way merges duplicate snapshots and reduces the resulted data size significantly.

Typical data model

The typical attributes of a range snapshot history fact are–

EFFECTIVE_FROM_DT

EFFECTIVE_TO_DT

Many attributes of the original transactions are kept and inherited, like–

Primary Key

Foreign keys to the dimensions

Implementations

Concatenate daily snapshots into a range

This is a conventional way to build up range snapshots. Two consecutive daily snapshots sharing identical status can be merged into one snapshot spanning across these two days. Two having different statuses would be stored as two separate snapshots for each day.

Concatenation of these daily snapshots could be created as a result of gathering related daily snapshot records together. The degree of condensation of data that can be achieved is remarkable, because the gathering may span to range of period unlike the fixed period of week or month. In case the triggering event occurs very often, for example 20 times a day then, this approach is not advisable. Meanwhile, every detail got preserved as no daily snapshot got dropped off the concatenation.

The ETL flow requires daily snapshots to start with, and do group-by on “interested” status to merge identical rows. Its dependency on accumulation of daily snapshots is extra task and large storage. Incremental load could be a challenge, especially for a back-dated snapshot. Also, this method assumes no gap between daily snapshots, which could lead to an exception difficult to handle in ETL.

A status change in these related daily snapshots could trigger a snapshot record to be entered into the data warehouse.  

Range snapshots directly from transactions

Here we invented a new way to overcome the shortage of the conventional method above to build range snapshots. We removed the dependency on daily snapshots and directly build range snapshots by scanning through all transaction footprints.

A few key points we have introduced to achieve this.

1)      Create multiple range snapshots trailing each footprint (transaction). For example, one order placed in Mar 25, 2012 by Adam, derives to range snapshots trailing as below. The period duration in each snapshot is one year here, which is configurable.

Customer

Status

Status Start Date

Status End Date

Adam

Active

Mar 25, 2012

Mar 25, 2013

Adam

Dormant

Mar 25, 2013

Mar 25, 2014

Adam

Lost

Mar 25, 2014

Mar 25, 2015

2)       Collapse all trailing series generated in (1), and come out only one status at any point of time, using priority rules. In the same example, the priority rule to override is, Active > Dormant > Lost.

3)       On top of the results from collapsing, concatenate the snapshots having identical statuses.

The new snapshot would be labeled with starting as well as end timestamps, which indicates the time period the image lasts. This way merges duplicate snapshots and reduces the resulted data size significantly.

The challenge on incremental load, especially back-dated records, can be solved here relatively easier, as all the source information here, the transaction footprints, are usually persisted anyway. In similar example, our ETL can be as simple as deleting records from the target table and recreating the records for a particular customer from scratch, every time there is an order placed by the customer.

Here we still achieve a great amount of data compression and robust ETL processing. The incremental load is still not precise yet to the most granular level. One incremental load involving one transaction per customer would end up to truncate and rebuild the entire target table.

Comments:

Is this the way the WORKFORCE EVENT fact is modeled in OBIA 11.1.1.7.1?

Posted by Jai on July 30, 2014 at 10:25 AM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Phil Wang-Oracle

Search

Categories
Archives
« May 2015
SunMonTueWedThuFriSat
     
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
      
Today