The Importance of Flexibility in CDP Profile Unification

April 10, 2023 | 5 minute read
Text Size 100%:

By: Brandon Ray - Principal Solution Consultant & Dan DeZutter - VP, Solution Engineering

One of the main functions of a Customer Data Platform (“CDP”) is to integrate first party data, stitch it together, and create a single customer view.  This single customer view allows marketers, sales, service or other operations teams to gain additional insights on their customers by combining profile, behavioral, transactional and other types of information across disparate systems and running machine learning models to provide recommendations and predictions such as next best product or next best action.  Lastly, marketing now can create strategic segments with more consolidated data than ever before.  These segments can be sent to activation platforms for execution within each channel. 

But how do we create that single customer view?  Customer data is likely stored in many different platforms, including but not limited to:

  • CRM
  • Marketing Automation Platforms
  • eCommerce / POS
  • Profile / Preference / Consent Management
  • Web / Mobile Analytics
  • Customer Service
  • Internal or Cloud Data Warehouse

Each of these systems has a different view of the customer and will be updated at different times with different information by different sources.  These differences in data create some fundamental challenges and questions that need to be addressed during data unification.

First, the reliability of these systems also differs – some are sourced directly by the customer (zero-party data) while others are fed by other systems (first-party data) and some data may also be purchased (second or third-party data). 

Next, the accuracy and/or completeness of the data may be different.  I may have Jon Smith at 100 Main St, Hometown, IL in one system and John Smith at 100 N Main St, Hometown, IL in another system.  I may have John Smith at 123 Elm, Hometown, FL in another system because he has moved or is a snowbird.  Which is correct and are they the same person? 

There are further complexities if you are a B2B or B2B2C business because you additionally are working with accounts and contacts within those accounts.  Has a contact changed companies?  Is it the same contact? Do we need to have a different perspective on a customer’s data when we are working with them as part of a business entity compared to when they are an end consumer?

There are several processes that must take place to make sure the single customer view is accurate and complete. 


  • Standardize:  Breaks up fields into parts that are more readily utilized by cleansing and matching processes.  For example, names or addresses may be stored as consolidated fields.  This is likely where data transformations occur, such as making sure all first names are capitalized, or converting an “F” to “Female”.
  • Cleanse:  Uses offline tables or postal processing to correct and/or normalize data.  May also process address moves (USPS® NCOALink®) or correct malformed email addresses.  Often the Standardize and Cleanse steps are performed simultaneously. 
  • Match:   This process identifies duplicate records across various systems using fuzzy or direct matching methods.
  • Consolidate:  Brings together attributes from one or more sources to define the best record combining the most reliable data sources. 

Some CDPs provide Standardize and Cleanse functions whereas others provide add-on products or 3rd party partnerships. 

Let’s dive deeper into the Match and Consolidate functions. 

There are two types of Matching:

  • Deterministic:  In deterministic matching, unique identifiers for each record are compared to determine a match - an exact comparison is used between fields. Unique identifiers can include email address, customer IDs, system IDs, and so on. Deterministic matching is generally not enough to determine a match since in some cases no single field can provide a reliable match between two records.
  • Probabilistic:  In probabilistic matching, several field values are compared between two records and each field is assigned a weight that indicates how closely the two field values match. The sum of the individual fields weights indicates the likelihood of a match between two records.

It is important that your CDP offer both types of functions to match records together.  As discussed, there are fundamental data challenges associated with unifying disparate sources of data. Data in your source platforms is often incomplete, has different field definitions, is inconsistently collected, and may contain only fragments of data.  One system may include name, user ID and phone number, while a different system many include an email address and physical address.  Some of these systems capture data that the customer enters themselves and others may be captured by a customer service representative.  2nd and 3rd party data also frequently have incomplete or different data (e.g. Jon vs. John in our earlier example).

Email address can often be used as a deterministic match between sources, but it’s best practice to include it in probabilistic matching if characters are entered incorrectly during data collection.   Names and addresses are typically matched through probabilistic methods due to shortened names (Ave vs Avenue) or missing secondary addresses such as apartment numbers.  Utilizing several types of matching will help you create the best customer view – utilizing only deterministic matching often undermatches and is not reliable.  In reality, the best matching results usually take several iterations and are continually refined over time. 

When Consolidating sources into one single view, having the ability to consolidate data from different sources into a single record is important.  You need to understand the source of the data (zero-party, first-party, etc.), the quality of the data, and the recency of the data. The data experts in your organization typically know which data is more likely to be accurate – knowing this helps you set some sources as more valuable during the consolidation process, which can be used to prioritize fields from higher-ranked sources for your consolidated customer profile.  As an example, you could take the customer’s address from their most recent order and their phone number from your CRM system, assuming your CRM has comparatively more accurate data. 

Hopefully it is clear, acknowledging all of these data accuracy and source considerations, that the “one size fits all” approach to profile unification fails to deliver a true consolidated view of your customer that is relevant for your unique business. We understand that every business is different, each with their own data nuances and requirements for what an ideal consolidated profile should look like. Given this, you should strive to work with a CDP that allows for a great deal of flexibility in the unification process, with the ability to standardize, transform, and clean data through a variety of methods and processes, the ability to combine both deterministic and probabilistic matching, and the ability to leverage a dynamic set of consolidation criteria that can account for data source, recency, and other metadata attributes. Finally, it is important to ensure that all these unification processes are configurable well past the implementation date – we often see customers whose data requirements change over time, and so it is important not to be “locked in” to one static method for unification as your business grows over time.

Guest Author

Previous Post

Announcing the Oracle Service Statement of Direction

Jim DiFronzo | 3 min read

Next Post

Enterprise Software Appetites Are Increasingly Industry-specific

Pritham Shetty | 4 min read