An Introduction to Predictive Customer Lifetime Value Modeling

February 27, 2017 | 7 minute read
Text Size 100%:

How can you predict the value of a customer over the course of his or her interactions with your business? That's a question many companies are trying to answer, and it was the subject of my Feb. 28 webcast on O’Reilly Media

Customer lifetime value (CLV) is the discounted value of future profits generated by a customer." The word "profits" here includes costs and revenue estimates, as both metrics are very important in estimating true CLV; however, the focus of many CLV models is on the revenue side. The reason for this is that revenue is more difficult to forecast than cost, so a model is more necessary to predict it (and knowing the revenue a customer will generate can inform your spend on that customer). These types of models are often called "customer equity models." 

Customers can generate revenue for a company in many different ways. Obviously, a customer who is making direct purchases certainly increases his or her lifetime value. In addition, referrals from that customer, indirect marketing, and word-of-mouth effects ultimately contribute to the value of a customer. Referrals are very important, and there’s nothing a company likes more than a Like” on Facebook or a share on LinkedIn, for example.

Accounting for these network effects can be challenging at first, which is why, for the sake of simplicity, I will be focusing on direct purchases only in this post.

Historical Customer Lifetime Value

There are many methodologies that deal with the portion of CLV associated with direct purchases, but the two most broad classes are generally defined as historical and predictive CLV. Historical methods look at past data and make a judgment on the value of customers solely based on past transactions, without any attempt to predict what those customers will do next. 

In principle, this is a valid approach if the customers behave similarly and have been interacting with the company for roughly the same amount of time. However, there’s generally a fair amount of heterogeneity among customers. The chart below shows a few purchasing trajectories to illustrate my point. Time goes from left to right. The vertical dashed line represents the present time, and each small, vertical line represents an order/purchase made by a customer:


Typical historical approaches will apply a recency of last purchase criterion to distinguish between active and inactive users. Average past purchase behavior is employed to measure the relative (or in some cases, absolute) value of customers. 

However, there are several problems with such methodologies. For example, the first customer in the chart above has made more purchases than the second customer, but in fact, the first customer is more likely to be inactive than the second one. Value based on past averages would claim that the first customer is more valuable  yet the second customer is still active and could make many more purchases in the future. Methods that account for variation in the behavior of customers will allow us to arrive at more accurate conclusions about customer lifetime and purchase behavior.

Predictive Customer Lifetime Value 

The goal of predictive CLV is to model the purchasing behavior of customers in order to infer what their future actions will be. Whether a predictive CLV model and methodology makes sense for your use case will largely be determined by the business context. For the purpose of this post, business context is defined along two dimensions: non-contractual vs. contractual business settings, and continuous vs. discrete purchase opportunities. This context definition should cover the vast majority of business cases. Below, I have included a table highlighting the differences between these contexts:


Below are some examples of business cases belonging to each one of the four quadrants. CLV models for fitness clubs or insurance policies will differ from the ones targeting grocery purchases, for example:


Probabilistic Models for the Non-contractual and Continuous Purchase Setting 

Perhaps the most common business context is the non-contractual one, in which the purchase opportunity is continuous. A large number of probabilistic models have been built to address the challenges of modeling lifetime value in such a context. These types of models have been used now for several decades. They are applicable to a wide variety of business situations and, in many cases, are your go-to” models. Probabilistic models are definitely a good first step (and sometimes the only one!) toward CLV modeling. 

Machine learning and Markov models are also worthy approaches to CLV modeling, but they need to be tweaked and sometimes customized to fit the particulars of a business situation. In the few case studies comparing the outcome of these different models, probabilistic approaches and machine learning models tend to produce results that are of a similar quality. 

Different Probabilistic Models, but Similar Modeling Frameworks

Let’s take a closer look at probabilistic models. There are several different flavors of probabilistic models out there; however, they all tend to share a similar modeling framework. In this framework, CLV models are often constraining the same three latent (unobserved) parameters characterizing customers behavior: 

  • Lifetime: the period over which a customer is maintaining his or her relationship with the company

  • Purchase rate: this parameter corresponds to the number of purchases a customer will make over a given period of time 

  • Monetary value: this part of the model is concerned with assigning a dollar amount to each future transaction

In the non-contractual setting, these parameters are unobserved. Probabilistic models will help us constrain these parameters at the customer level and make inferences about future purchases and value. 

The Pareto/NBD Model: A Good First Step Toward CLV Modeling 

The Pareto/NBD model is perhaps the most well-known and frequently applied probabilistic model in the non-contractual context. I created the chart below to illustrate how the model works: 


The Pareto/NBD portion is on the left side of the chart in the dashed rectangle. Pareto/NBD only focuses on the purchase count and lifetime. It does not address the monetary value component. There are a few models out there that address monetary value; I've chosen the Gamma Gamma extension to the Pareto/NBD model (as seen in the chart above).  

The Pareto/NBD model makes the following assumptions regarding the customer population: 

  • Purchase count follows a Poisson distribution with rate λ. In other words, the timing of these purchases is somewhat random, but the rate (in counts/unit time) is constant. In turn, this implies that the inter-purchase time at the customer level should follow an exponential distribution.

  • Lifetime distribution follows an exponential distribution with slope μ. The expectation value of such distribution is 1/μ and corresponds to the lifetime of the user. 

  • The latent parameters λ and μ are constrained by two prior gamma distributions representing our belief of how these latent parameters are distributed among the population of customers. These two gamma distributions have parameters (r,α) for the purchase count and (s,β) for the lifetime. The goal is to find these four parameters. From these, all actionable metrics can be derived. 

In practice, this is how we train a Pareto/NBD model to find these four parameters. Below is a simple chart demonstrating the process:


First, you must train the model over a training period with a minimum length that corresponds to three times the typical inter-purchase time of your customers. With customers data and simulations, we found that three times is a minimum. Five to ten is definitely better. 

The training period will give you an estimate for the model parameters. You should then be able to compare what the model predicts vs. what you observed in the training period at the customer level. If the purchase count is in agreement, the next step is to compare predictions with observations made in a validation/holdout period. This period has not been observed by the model. If the model performs well in the validation/holdout period, then you can forecast for a period of time from several months to several years, depending on your business needs. 

The Gamma-Gamma Extension to the Pareto/NBD model 

As mentioned above, the Pareto/NBD model focuses on modeling lifetime and purchase count. The monetary value extension to the Pareto/NBD model noted on the right side of the chart, Gamma-Gamma, makes a few assumptions: 

  • At the customer level, the transaction/order value varies randomly around each customer’s average transaction value. (That, in itself, isn’t too controversial.) 

  • The observed mean value is an imperfect metric of the latent mean transaction value E(M), where M represents the monetary value. 

  • Average transaction value varies across customers, though these values are stationary. (This is a big assumption to make.)

  • The distribution of average values across customers is independent of the transaction process. In other words, monetary value can be modeled separately from the purchase count and lifetime components of the model. This may or may not hold in typical business situations.

Tying These Two Models Together: CLV Estimates at the Customer Level  

The Pareto/NBD model allows you to compute the expected number of purchases in a forecast period at the customer level. Furthermore, the Gamma-Gamma model allows you to assign a value to each of those future purchases. It becomes a trivial exercise to forecast CLV for each customer; you simply have to multiply the expectation values of each model. That should allow anyone to make CLV comparisons during the holdout period before making any forecasts. 

Additional Information 

To help make these concepts very concrete, I have created a public github repo that contains a notebook and a test dataset of an online retailer in order to supplement my O'Reilly webcast. In the notebook, you will find the steps to train both the Pareto/NBD and Gamma-Gamma models and compute CLV at the customer level. 

Learn More 

Want to keep learning? Download our new study from Forrester about the tools and practices keeping companies on the forefront of data science.

Jean-Rene Gauthier

I am a former astronomer working at DataScience as a Data Scientist. Oh and btw, I love hockey :-)

Previous Post

Data Science is Revolutionizing Film

Sarah Sultan | 3 min read

Next Post

The Challenges of Building a Predictive Churn Model

Ruslana Dalinina | 4 min read