# When it comes to identity, probabilistic or deterministic is not the question

This blog is the second in a series by Audrey Rusch, Senior Director of Data Science at Oracle Data Cloud. Read the first post here.

In the first blog post in this series, we talked about the dirty little secrets of deterministic data and warned you about being wooed by faulty promises. Specifically, we talked about how the digital advertising ecosystem commonly thinks that deterministic data is the undeniable answer to all Identity problems.

While the industry uses the word “deterministic” to mean 100% correct, it really means that two identifiers were directly observed together (e.g., a cookie with an email). As a recipient of the deterministic data that the industry trusts, at Oracle Data Cloud we found that 80% of the deterministic links are wrong.

In this simplified world, another concept we like to talk about is probabilistic data. But what is this, exactly? Data manufactured out of thin air? Randomly associated data points from a random number generator?

## Probabilistic data: What is it, really?

Whereas deterministic data is directly observed and therefore assumed to be 100% correct, the term “probabilistic” conjures up nightmares of low quality or “fake data.” It has become a damning description for any solution, assuming scale is prioritized over quality.

Hold up, though. In the literal sense, what does it truly mean for data to be “probabilistic”?

Outside the echo chamber of AdTech, probabilistic data are that which have been assigned a probability related to how likely they are to have some desired trait. In fact, examples of probabilistic data are used and widely accepted in other industries. When insurance companies evaluate how likely a person is to get in a car wreck, or credit companies use credit scores to evaluate how probable it is for someone to default on a loan, they're using probabilistic data. Even in our own industry, it's very common to evaluate how likely people are to buy a product or take some action and include them in a target audience.

Probabilistic data is nothing more than observed data–deterministic data, if you will–where extra and valuable information is used to evaluate its integrity. In AdTech, though, when we label data as deterministic, it gets a free pass on a quality evaluation (ignorance is bliss). Label that same data as probabilistic and we've allowed ourselves to immediately assume that it's inferior.

## Changing the probabilistic perception

Our industry will benefit by leaving behind the labels of "deterministic" and "probabilistic" and the baggage that comes with it. Instead, let’s start talking about what actually matters: quality. All of these links, regardless of the lazy label imposed on them by our industry, need to be evaluated for quality and kept or discarded based on scored confidence. Making the assumption that "deterministic" is 100% correct is assigning an entirely subjective confidence and is a significant risk for any media buyer. When 80% of deterministic links are judged to be incorrect, marketers should care that only 1 out of 5 ads are going to those they think they're reaching.

## Ensuring quality in identity data

At Oracle Data Cloud, we evaluate all identity data to decide what makes it in our graph, meaning observed data doesn’t get a free pass. With the Oracle Identity graph, we provide a single, universal view of identity by evaluating and scoring data–both deterministic and probabilistic–quality. This process ultimately removes any connections that don’t meet our strict thresholds, leaving only those that we consider “qualified."

In part 3 of this series, we'll describe in more detail how this process works and demonstrate how the Oracle ID graph helps ensure your marketing dollars aren’t wasted.