Oracle Data Cloud Blog

  • May 24, 2016

Data onboarding: An industry-wide accuracy challenge

This week’s guest blog is contributed by Curt Blattner, Head of OnRamp Solutions, Oracle Data Cloud.

With all the talk in the industry around data available for audience targeting, much of the discussion has focused on data collected primarily online and the quality and impact of that data.

On the flip side, advertisers and data owners starting with offline CRM data or granular PII-based data assets have largely assumed they only had a ‘reach’ issue. They know the quality of their offline data, they just need to find a large supply of cookies that identify these households online, right?

Unfortunately, it’s not quite that simple.

Reach is certainly important, but accuracy is more of an issue: even your high quality 1st party data can be compromised because of the reliability (or lack thereof) of the method of matching to the cookie universe. We conduct head-to-head data onboarding reach tests all the time for advertisers who want to reach as much of their audience online as possible, which makes sense. Unfortunately, these reach tests are missing half the equation. If the cookies that represent your target audience are inaccurate (the wrong households and wrong consumers), then more reach is really just added cost without the payback.

For example, if 60% of your cookie audience accurately represents your customers, but the remaining 40% are incorrectly targeting households that are not your customers, the ROI from that audience is going to suffer significantly – the non-customers will likely respond and convert at a dramatically lower rate.

So what’s wrong with these cookies?

The problem comes upstream in the data onboarding process as data onboarding providers build out their ‘graph’ – the connection of anonymous cookies to PII at an individual and household level.

Cookies are often matched to individuals and households via email, which requires a match from the email provided on a site login to the email associated with name/postal in an offline PII database. Often, inaccuracy is less about the email that consumers log in with online, and more about the quality/accuracy of the email that onboarding companies have stored in their offline PII database. This email is the key link connecting online and offline so getting it right is paramount for any on-boarder and any advertiser.

The email to PII links are often sourced from licensed email lists. With licensed email lists, sources may take a name and use every combination of first and last name or first initial/last with each major email domain (Gmail, Yahoo, Hotmail, Aol, etc.) and include it in their database linked to name and address.

The result is that user emails consumers are using to log in, even ‘deterministic’ matches, get directly associated to the wrong households. How do on-boarders solve this issue? To start, you need a massive database of high-quality transactional emails, like the ones consumers use for tracking product shipments, as the core source for your online-to-offline linkage.

Even if you have a sizable pool of email to PII, that doesn’t mean you match your cookies correctly. A problem can also reside with the audience match partners themselves when they use different methods to match cookies back to PII. Not all match partners are equal and not all match types have the same level of accuracy.

Data OnboardingThe key to understanding accuracy is to leverage “confidence sets” - associations of anonymous IDs to known PII that are 99%+ accurate. These can be built from transactional data sets (such as online purchases) from retail partners where you have name, address, and a high-quality email paired with an observed cookie to email link.


Each audience match partner – as well as each type of match they provide – can be measured against these “confidence sets” to understand their accuracy.

Then, you have two useful options. First, you can immediately eliminate poor quality match providers, or certain subsets of match types that are poor quality. Second, you can use the “confidence set” and an algorithm to evaluate all of the remaining links in your pool to identify the strongest links and resolve additional conflicts that arise naturally in data. In the end, you are left with the best mapping of cookie to household available for onboarding audiences.

For marketers, it makes sense to verify the accuracy of the cookies that represent your onboarded CRM audience.

The only real way to do this is to run a live campaign against that universe where you collect PII and can compare that back to your initial offline CRM audience.

If you find out that the level of inaccuracy is 30%, 40% or worse, you may want to consider evaluating different providers to find an onboarding partner with the best combined reach and accuracy.

The bottom line:

Accuracy is equally important as reach when it comes to data onboarding. You really need to evaluate both, and identify a partner that values both equally. Otherwise, your poor results may have nothing to do with your campaign and everything to do with an incorrectly matched audience.

Stay up to date with all the latest in data-driven news by following @OracleDataCloud on Twitter and Facebook! Need data-related answers for your next marketing campaign or client partner? Contact The Data Hotline today. (What's The Data Hotline?) 

Photo: Rawpixel.com/Shutterstock
Photo: Africa Studio/Shutterstock

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha