Authors: Anbing Xue, Zhi Lin
Besides the trend lines of snapshots, we have requirement to plot the trend lines of the delta changes on a transaction along time line. The changes here can be either quantitative (on amount, units, etc) or qualitative (on names, description, etc). Hence we invented a new kind of snapshot fact, to specifically address it.
A typical attributes of a delta snapshot history fact is–
SNAPSHOT_DT
IMAGE_TYPE
Many attributes of the original transactions are kept and inherited, like–
Primary Key
Foreign keys to the dimensions
The delta snapshot history fact would capture and store a new pair of images, whenever we detect a change on the original transaction. The image pair is essential, especially for qualitative changes. Usually it consists of one pre image (or “negative” image) as well as one post image (or “positive” image). For example,
IMAGE_NUM |
ORDER_ID |
SHIP_TO_LOCATION |
AMOUNT |
SNAPSHOT_DT |
IMAGE_TYPE |
1 |
001 |
New York City |
$100 |
Mar 1, 2014 |
POST |
2 |
001 |
New York City |
-$100 |
Mar 2, 2014 |
PRE |
3 |
001 |
New York City |
$120 |
Mar 2, 2014 |
POST |
4 |
001 |
New York City |
-$120 |
Mar 3, 2014 |
PRE |
5 |
001 |
Boston |
$120 |
Mar 3, 2014 |
POST |
Basically, this delta snapshot history fact stores regular snapshots per change, and adds the “negative” snapshots before change. So it is enriched with a unique feature to report delta’s trend line, simply by clubbing both kinds of snapshots together.
Besides, we also introduced more flexibility for users to configure a subset of columns they are interested to track. Based on the configuration, the ETL would create new snapshots for changes only on the interested columns. Changes on the other columns would trigger an update to existing snapshots instead, in order to sync up with the original transactions.
Though, extra ETL complexity has to be introduced to handle pre vs. post images separately, plus the flexibility to track subset but all changes. The number of records is much less than regular daily snapshot facts. The data size of this fact is proportional to the number of changes.