It's the essential question of all businesses: How effective is our marketing campaign?
The answer is not as complicated as you might think. By using Oracle Analytics data visualization feature and machine learning, we can gain some valuable insight into our marketing strategy where we can measure our key performance indicators. Then you can determine if your message gets to the right customers at the right time.
Subscribe to the Oracle Analytics Advantage blog and get the latest posts sent to your inbox
First, let's address this through a technique of comparing the performance of two Machine Learning Binary classification models using the Cumulative Gains chart and Lift chart. In one of our earlier blogs, you learned how to compare the performance of two machine learning models: Which Machine Learning Model is Right for Me This blog goes a little further and deeper to explain the capabilities of Oracle Analytics data visualization machine learning feature in performing advanced model comparison techniques.
What are the Cumulative Gains Chart and Lift chart and what are they used for?
Let us suppose that a company wants to perform a direct marketing campaign to get a response (like a subscription or a purchase) from users. It wants to run a marketing campaign for around 10,000 users out of which only 1,000 users are expected to respond. But the company doesn't have a budget to reach out to all the 10,000 customers. To minimize the cost company wants to reach out to as less customers as possible but at the same time reach out to most (user-defined) of the customers who are likely to respond. A company can create machine learning models to predict which users are likely to respond and with what probability. Then the question comes which model should I choose? Which machine learning model is likely to give me the most of the number of respondents with as less selection of original respondents as possible? Cumulative Gains and Lift chart answers these questions.
Cumulative Gains and Lift charts are a measure of the effectiveness of a binary classification predictive model calculated as the ratio between the results obtained with and without the predictive model. They are visual aids for measuring model performance and contain a lift curve and baseline. Effectiveness of a model is measured by the area between the lift curve and baseline: Greater the area between the lift curve and baseline better the model. One academic reference on how to construct these charts can be found here. Gains and Lift charts are popular techniques in direct marketing.
Sample Project for Cumulative Gains and Lift chart computation
Oracle Analytics Store has an example project for this that was build using the Marketing Campaign data of a bank. This is how the charts look like:
Scenario: This Marketing Campaign aims to identify users who are likely to subscribe to one of a company's financial services. They are planning to run this campaign for close to 50,000 individuals out of which only close to 5,000 people (i.e., ~10 percent) are likely to subscribe for the service. Marketing Campaign data is split into Training and Testing data. Using training data, we created two machine learning models using Naive Bayes classifier and Logistic regression to identify the likely subscribers along with prediction confidence (note that the actual values—such as whether a customer subscribed or not—is also available in the dataset). Now they want to find out which model is good at identifying the greatest number of likely subscribers by selecting a relatively small number of campaign base (i.e., 50,000).
Machine learning models are applied to test data and got the Predicted Value and Prediction Confidence for each prediction. Using this prediction data and actual outcome data, we have created dataflows to compute cumulative Gain and Lift.
How to interpret these charts and how to measure the effectiveness of a Model:
The Cumulative Gains chart depicts cumulative of the percentage of actual subscribers (Cumulative Actuals) on the Y-Axis and total population(50,000) on the X-Axis in comparison with a random prediction (Gains Chart Baseline) and the ideal prediction (Gains Chart Ideal Model Line), which depicts all the 5,000 likely subscribers are identified by selecting first 5,000 customers sorted based on Prediction Confidence for Yes. The model with a greater area between the Cumulative Actuals line and Baseline is more effective in identifying a larger portion of subscribers by selecting a relatively smaller portion of the total population.
Lift Chart depicts how much more likely we are to receive respondents than if we contact a random sample of customers. For example, by contacting only 10 percent of customers based on the predictive models we will reach 2.09 and 3.20 times as many respondents as if we use no model for Logistic Regression and Naive Bayes models respectively.
Max Gain shows at which point the difference between cumulative gains and baseline is maximum. For Logistic Regression this occurs when population percentage is 23 percent and the maximum gain recorded is 41.84 percent for Naive Bayes this occurs when population percentage is 41 percent and the maximum gain is 83.88 percent
By simply examining these data visualizations, we can see that the Naive Bayes model has a larger area between the cumulative gains curve and baseline and is a better model for prediction between the two.