Busting performance claims and delivering true Identity Graph quality

September 27, 2019 | 4 minute read
Text Size 100%:

This blog is the final in a 4-part series by Audrey Rusch, senior director of Data Science at Oracle Data Cloud. Read her first three articles here:

 

In the previous 3 blogs in this series, we emphasized the importance of quality when it comes to an Identity solution and the traps we fall into when allowing deterministic and probabilistic approaches serve as proxies for quality. The most recent blog highlighted how Oracle Data Cloud constructs our graph by using machine learning to qualify each link and cut out junk.

So we’ve convinced you that quality matters. But now you’re wondering, “What is the quality of my graph?” Around the ecosystem, it’s common to hear answers such as “Our graph is 97.5% accurate,” or “Our graph is 99% precise!” It starts to feel like each company is trying to win business by wowing with their amazing numbers.

But buyers beware. These sound bites are missing important information about the metric and dataset being used for measurement, and the population being measured.

Given personalized advertising trends, marketers care that they’re reaching the right target audience when showing an ad to an ID and reaching a meaningful number of people during a campaign. Most identity providers quote 3 standard statistical measures of quality:

  1. Accuracy

  2. Precision

  3. Recall

Diving into these specific quality metrics, we start to understand if and how they properly meet marketers’ needs:

 

What it measures

How you can cheat

How it falls short

 

 

 

Accuracy

 

The percent of links that are known to be right or wrong, which providers correctly classify as right or wrong.

This method gives providers credit for saying, “This link is bad” and being correct. Providers can drive up their numbers by having a lot of questionable links that are correctly classified as questionable in the population being measured.

 

It doesn’t answer the question “Did you find the right person?” when you target an ID.

 

 

 

Precision

 

The percent of links used in the graph that are known to be right.

If providers limit the population being measured to their very best links, they can claim a high precision since they’re sure they’re right.

Providers are confident they’re reaching the right people, but this population may only be 100 links, which isn’t meaningful scale.

 

 

Recall

 

The percent of all links that are known to be right that are also used in the graph.

Providers can add all known 'right' links to the population being measured and claim credit for them. At some point, they’ll make a right link.

How many bad links were added to get the result, and how much money was wasted on ads that reached the wrong people to eventually reach all the right ones?

 

Each of the metrics above must also be contextualized based on which dataset was deemed as “truth” for the calculation. That is, quality relative to what? No dataset is perfect, but some are useful on this front.

The reality is that the quality of a graph depends on its ability to meet your needs. Depending on your application, you may care more about scale, or you may care more about being confident that you’re reaching a certain set of people. In the end, your decision as a marketer depends on your willingness to have your ad reach more people but possibly target some less-than-perfect individuals.

At Oracle Data Cloud, the quality of our graph is based on the estimated probability that an ID is linked to the right person. By scoring each link in the Oracle Data Cloud graph, we allow a trade-off between scale and quality. Based on our dataset for measurement (an independent set of links from high-fidelity sources that we do not directly fulfill), our data science approach to graph construction produces a standard promoted graph that is 4.5X better at getting an ID linked to the right person than a rules-based approach. Additionally, this standard graph enables a marketer to target more than 200 million US consumers, providing the scale necessary to target even the largest audiences. Of course, our standard graph can be adjusted to meet your needs around scale or quality. As scale is trimmed, this increase in quality climbs even higher.

Our approach to graph construction and carefully curating links stems from observing data from all over the ecosystem and discovering that there’s a high level of disagreement surrounding identity measurement. So, instead of merely fulfilling the linkages we receive, we qualify every linkage before fulfillment. Through this qualification process, we end up throwing out about 85% of linkages that we deem incorrect, setting a high threshold for quality and pushing the industry to provide even better results.

Over the course of this blog series, we’ve explored different aspects of Identity Graphs and the importance of thinking about quality. In the end, any identity solution is intended to enable marketers to reach the right people with the right ad, at the right time, to meet marketing objectives. Don’t let the perfect campaign be derailed by selected identity solutions; and be careful of promises of deterministic, probabilistic, or high accuracy. And, if you’re in doubt, test us and our ability to meet your needs. We’d love to answer your questions.

***

Have a question about our identity graph? Ask The Data Hotline a question directly on our web form. 

 

About Audrey Rusch

This week’s guest blog post is contributed by Audrey Rusch, Senior Director Data Science, Oracle Data Cloud. Audrey leads the Identity Data Science team at Oracle Data Cloud. Her team is responsible for construction of the Oracle Identity Graph by starting with data at scale, evaluating it for quality, grounding it in reality, and reconciling universally—all while respecting privacy. The result is a sense of a person and all their devices for use by marketers to reach the right person, with the right ad, on the right device, at the right time. She has worked in digital marketing data science since 2012 with experience constructing data science products for audience, measurement, optimization, and identity.

Audrey Rusch


Previous Post

Session Hijacked: A new advanced measurement capability for OTT and CTV

Oracle Data Cloud | 3 min read

Next Post


Using Contextual Intelligence for your digital campaigns: 5 questions to ask

Da'Les Allen | 3 min read