In the digital advertising ecosystem, there is a bit of an obsession with deterministic identity data. Advertisers keep demanding that platforms answer the question, "how much of your identity data is deterministic?" and platforms keep feeling the need to "out-deterministic" their competitors when pitching their solutions. The two assumptions made in this line of questioning are (1) deterministic data is 100% accurate and (2) more deterministic data means better identity.
But wait … is that true? Is more deterministic data inherently better? Is deterministic data undeniably correct?
Spoiler alert: The answer is no to both questions. It’s not true.
Data observed together, at the same time, without the need to impute or infer is classified as deterministic. This can happen if someone completed a form with name, address, birthdate, phone number, and email. It also can happen if someone logs into a website or app with an email address, creating connections between a cookie and email, or mobile advertising ID and email. In the first case, someone wrote the information all at once. In the second case, the information is collected in the same piece of code at the same time.
Since the data was observed together, it is assumed that deterministic data is 100% correct, constantly. But notice that I didn't say someone filled out the form with their name, address, etc., or they logged in using their email. Here are several examples of how deterministic data can lie to you:
I use a friend's email address to log in to a website. Now, my cookie is linked to my friend’s email.
I borrow a friend's subscription login for Hulu, Netflix, HBO GO, or MLB.TV. Now, my device is attached to my friend’s email.
I'm forced to enter an email address on a website, but I enter a fake email like firstname.lastname@example.org. Now, my cookie is attached to a fake email, which may be attached to millions of other cookies where other people did the same thing.
I ship something to my parents’ house. My name and email are now attached to my parents’ home address, even though I don't live with them.
Without my knowledge, a website creates a value for my email address. Turns out, they do the same thing with millions of other cookies. Now, we’re all connected to each other and connected to the wrong email.
All of these scenarios are considered deterministic, and all create identity connections that are 100% wrong and useless. (Gasp! But it’s deterministic—it must be right!)
To make matters worse, some identity providers simply join deterministic data together to produce reconciled identity. If we did that with the above data, my cookie and my mobile device are attached to me, as well as my friend, my parents, and any number of other people who creatively made up the email email@example.com. Through this method, companies claiming to be 100% deterministic end up producing new links that were never observed together, but are still shopped around as an entirely deterministic solution. And the resulting quality isn’t good: Oracle Data Cloud evaluated a deterministic graph and found that 80% of the deterministic pairs were incorrectly connected and, in one case, an email was deterministically linked to 2.3MM cookies.
“Oracle Data Cloud evaluated a deterministic graph and found that 80% of the deterministic pairs were incorrectly connected.”
An advertiser is wise to be skeptical of a graph built on 100% deterministic data without a method to evaluate the correctness of such data and remove inaccuracies. While it's convenient to use deterministic vs. probabilistic as the ultimate indicator of quality, it has no actual bearing on the rightness of the data. In the end, more deterministic data does not mean better identity. Marketers should demand the data be evaluated for correctness, no matter how it is sourced.
In a world of people-based marketing, poor identity can sabotage even the most perfectly executed campaign. Save your marketing dollars, get identity right, and recognize that using deterministic alone is not good enough.
At Oracle Data Cloud, we don’t take deterministic data at face value. We go to great lengths to identify and scrub deterministic data anomalies like the ones outlined above. Stay tuned for part 2 of this post where we discuss the concept of probabilistic data, along with how Oracle Data Cloud evaluates all identity data with data science to ensure an accurate and defensible view of identity is used throughout our products and services.
About Audrey Rusch
This week’s guest blog post is contributed by Audrey Rusch, Senior Director Data Science, Oracle Data Cloud. Audrey leads the Identity Data Science team at Oracle Data Cloud. Her team is responsible for construction of the Oracle Identity Graph by starting with data at scale, evaluating it for quality, grounding it in reality, and reconciling universally—all while respecting privacy. The result is a sense of a person and all their devices for use by marketers to reach the right person, with the right ad, on the right device, at the right time. She has worked in digital marketing data science since 2012 with experience constructing data science products for audience, measurement, optimization, and identity.