X

Oracle Data Cloud Blog

Is your ID graph data reliable?

There’s a wide range in the quality of ID graphs available today.

While using a high-quality ID graph can dramatically improve the effectiveness of your marketing efforts, using a low-quality ID graph can result in subpar performance, wasted advertising budgets and misleading analytics leading to erroneous strategic decision making. 



When evaluating an ID graph provider’s solution, one of the most critical components is the reliability of their validated data. 

Every identity graph provider has to validate its probabilistic matching algorithms, made from a list of device pairs (device IDs and cookie IDs) known to be used by the same individual, typically based on identical login data used across different devices.

Each pair of validated data is identified by a unique user ID. Different pairs may be associated with the same user ID should more than two devices are determined to be in use by a single user. 

Oracle Data Cloud’s Director of Data Science, Audrey Thompson, discusses this as well as a deeper explanation of probabilistic and deterministic matching in “The identity graph: Why quality matters for targeting.” She says, “The reality is that all links within an ID graph are not created equal and no link can be assumed to be a hundred percent correct.”



Even the so-called “deterministic” link of a mobile ad ID to an email address can be incorrect … “Deterministic” cannot be assumed to be correct all the time.

This means all linkages are “probabilistic” and while some connections are highly likely to be correct, others are lower quality and will be wrong more often.

It’s important to first understand that every ID graph provider—even those with large sets of “deterministic” matches—needs to incorporate probabilistic matching approaches. This is because of two main reasons:

  • Dirty data – Identity matches based on deterministic data (such as logins or email addresses) are sometimes incorrect. This is a result of inaccurate or faked email addresses (e.g., noname@noname.com), logins performed by one person on another person’s device and data not properly cleansed.
  • Scale – There simply isn’t enough high-quality deterministic data available to build an ID graph at meaningful scale, so even large players like Facebook and Google rely on “probabilistic” techniques.

    A probabilistic match means an inferred association; the vendor is making a prediction about the likelihood that two devices belong to one user in real life. Because activity data is available for many billions of devices, probabilistic matching makes it possible to generate a much larger ID graph than using deterministic data alone.

    Combining extensive data collection, some advanced math and the technology to tie it together, vendors can reliably identify huge numbers of these user-to-device connections—even if they weren’t observed directly.

It’s clear that deterministic data cannot be blindly trusted to be accurate. If a vendor trains its matching model against data that's unvalidated, it is bound to get bad results. It’s garbage in, garbage out.

It’s important when choosing a provider to evaluate not only the amount of linkages but the quality behind them. At Oracle Data Cloud, we go beyond simply trusting the deterministic data we collect and use the power of our offline data assets to further ensure we are tying the correct devices to real people.

Contact The Data Hotline to learn more about the market-leading Oracle ID Graph™

Consult with our team of data experts who can provide recommendations for any campaign objective using our best-in-class data solutions.

Stay up to date with all the latest in data-driven news by following @OracleDataCloud on Twitter and Facebook! (What's The Data Hotline?)

Image: WAYHOME studio/Shutterstock

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.