This blog is the second in a series by Audrey Rusch, Senior Director of Data Science at Oracle Data Cloud. Read the first post here.
In the first blog post in this series, we talked about the dirty little secrets of deterministic data and warned you about being wooed by faulty promises. Specifically, we talked about how the digital advertising ecosystem commonly thinks that deterministic data is the undeniable answer to all Identity problems.
While the industry uses the word “deterministic” to mean 100% correct, it really means that two identifiers were directly observed together (e.g., a cookie with an email). As a recipient of the deterministic data that the industry trusts, at Oracle Data Cloud we found that 80% of the deterministic links are wrong.
In this simplified world, another concept we like to talk about is probabilistic data. But what is this, exactly? Data manufactured out of thin air? Randomly associated data points from a random number generator?
Whereas deterministic data is directly observed and therefore assumed to be 100% correct, the term “probabilistic” conjures up nightmares of low quality or “fake data.” It has become a damning description for any solution, assuming scale is prioritized over quality.
Hold up, though. In the literal sense, what does it truly mean for data to be “probabilistic”?
Outside the echo chamber of AdTech, probabilistic data are that which have been assigned a probability related to how likely they are to have some desired trait. In fact, examples of probabilistic data are used and widely accepted in other industries. When insurance companies evaluate how likely a person is to get in a car wreck, or credit companies use credit scores to evaluate how probable it is for someone to default on a loan, they're using probabilistic data. Even in our own industry, it's very common to evaluate how likely people are to buy a product or take some action and include them in a target audience.
Probabilistic data is nothing more than observed data–deterministic data, if you will–where extra and valuable information is used to evaluate its integrity. In AdTech, though, when we label data as deterministic, it gets a free pass on a quality evaluation (ignorance is bliss). Label that same data as probabilistic and we've allowed ourselves to immediately assume that it's inferior.
Our industry will benefit by leaving behind the labels of "deterministic" and "probabilistic" and the baggage that comes with it. Instead, let’s start talking about what actually matters: quality. All of these links, regardless of the lazy label imposed on them by our industry, need to be evaluated for quality and kept or discarded based on scored confidence. Making the assumption that "deterministic" is 100% correct is assigning an entirely subjective confidence and is a significant risk for any media buyer. When 80% of deterministic links are judged to be incorrect, marketers should care that only 1 out of 5 ads are going to those they think they're reaching.
At Oracle Data Cloud, we evaluate all identity data to decide what makes it in our graph, meaning observed data doesn’t get a free pass. With the Oracle Identity graph, we provide a single, universal view of identity by evaluating and scoring data–both deterministic and probabilistic–quality. This process ultimately removes any connections that don’t meet our strict thresholds, leaving only those that we consider “qualified."
In part 3 of this series, we'll describe in more detail how this process works and demonstrate how the Oracle ID graph helps ensure your marketing dollars aren’t wasted.
About Audrey Rusch
This week’s guest blog post is contributed by Audrey Rusch, Senior Director Data Science, Oracle Data Cloud. Audrey leads the Identity Data Science team at Oracle Data Cloud. Her team is responsible for construction of the Oracle Identity Graph by starting with data at scale, evaluating it for quality, grounding it in reality, and reconciling universally—all while respecting privacy. The result is a sense of a person and all their devices for use by marketers to reach the right person, with the right ad, on the right device, at the right time. She has worked in digital marketing data science since 2012 with experience constructing data science products for audience, measurement, optimization, and identity.