This blog is part of a series by Audrey Rusch, senior director of Data Science at Oracle Data Cloud. Read her first two articles here:
How deterministic data lies to you
When it comes to identity, probabilistic or deterministic is not the question
In the previous two blogs in this series, we debunked some misconceptions about deterministic and probabilistic data within the digital advertising industry as it pertains to identity. In both cases, the labels have led to poor assumptions:
Deterministic data is assumed to be 100% correct.
Probabilistic data’s evaluation for fidelity leads to the perception that it’s inferior.
Today, the adtech industry continues to force a discussion about probabilistic and deterministic and dodges the real question: “How correct is identity?”
Just as machine learning can help you find look-alikes, it can be used to find identity links that have a higher likelihood of being true. Machine learning is a wildly useful tool to help find patterns in data beyond the few dimensions through which the human brain can reason, and in places where the human brain won’t think to look.
For example, patterns associated with “higher scoring” identity links include interactions being observed by multiple providers, having appropriate connectivity to other IDs, and exhibiting rational behavior one could expect to observe from people going about their daily lives. In other words, the IDs don’t teleport, they aren’t connected to millions of other IDs, and they’re less likely to be used in the middle of the night.
Given the increase in computing power and continued advancements in the accessibility to machine learning tools, why make assumptions about the quality of your identity graph? Let machine learning help!
As mentioned in the second blog post, at Oracle Data Cloud, all data used in our Identity Graph is guilty until proven innocent: We evaluate all identity data; observed data doesn’t get a free pass. Machine learning is the engine that powers our evaluation; all observed data is measured against a benchmark of known, extremely high-quality links.
Generally speaking, the Oracle Data Cloud Identity Graph:
Removes atrociously over-connected links
Trains machine learning models to learn patterns associated with high-quality links
Retains only the highest-scoring connections, leaving only qualified links
Currently, a typical Oracle Data Cloud graph build evaluates about 12 billion links across all ID spaces (i.e., cookies, MAIDs, phone numbers, emails, PII) with less than 20% of these links ultimately comprising the graph promoted by Oracle Data Cloud.
In other words, we use machine learning to qualify and cut, not to expand.
As a marketer, your goal is to find, and correctly target, the right people for your campaign with a certain amount of money. Identity is a huge part of this. When you ask, “What portion of your graph is deterministic versus probabilistic?” you’re missing the question that really matters—that of quality. You can and should demand that your identity solutions be evaluated for quality at scale.
Coming soon: part 4 of this series, highlighting how accuracy claims by Identity providers can be misleading. We cover how to navigate these claims and tips for balancing the trade-off between scale and quality.
About Audrey Rusch
This week’s guest blog post is contributed by Audrey Rusch, Senior Director Data Science, Oracle Data Cloud. Audrey leads the Identity Data Science team at Oracle Data Cloud. Her team is responsible for construction of the Oracle Identity Graph by starting with data at scale, evaluating it for quality, grounding it in reality, and reconciling universally—all while respecting privacy. The result is a sense of a person and all their devices for use by marketers to reach the right person, with the right ad, on the right device, at the right time. She has worked in digital marketing data science since 2012 with experience constructing data science products for audience, measurement, optimization, and identity.