Data Quality - Defining Accuracy Thresholds
By Rob Reynolds on Oct 29, 2008
As discussed in my previous post, sometimes data doesn’t necessarily have to be 100% accurate to be effective or actionable. So if there are thresholds to when information is “good enough” for decision making, how do we determine them?
The most common technique is to examine the decisions that the information will be supporting. Are the decisions time critical? Forward looking leading indicators that have impact on future results often do not require 100% accuracy due to their predictive nature. An example would be a sudden increase in sales pipeline that would trigger the need to increase production capacity.
Are the decisions based on information that is statistical in nature or broad enough where a wide set of information will be rolled up to a large order of magnitude? Measures that appear on a KPI/Balanced Scorecard can actually be allowed to be in error a few decimal points, because rounding can cover up a host of sins. Other measures may need to be very precise and the data supplying the algorithms must be accurate, such as creating financial reports.
Capturing the requirements for effectiveness of information verses timeliness we can then start to have conversations with business leaders about thresholds for data accuracy. Once thresholds are established, standard deviations can be used to mange the on-going effectiveness of information. Periodic review of data quality deviations is required to avoid gradual erosion in accuracy and effectiveness.
The key principle is to define and measure the degree to which the BI/DW system is becoming a vital source of reliable information. As long as the information creates decisions that add value, the information is "good enough." And most importantly BI team must ensure that the information being produced is "actionable."