Predicting Likelihood of Click with Multiple Presentations

When using predictive models to predict the likelihood of an ad or a banner to be clicked on it is common to ignore the fact that the same content may have been presented in the past to the same visitor. While the error may be small if the visitors do not often see repeated content, it may be very significant for sites where visitors come repeatedly.

This is a well recognized problem that usually gets handled with presentation thresholds – do not present the same content more than 6 times.

Observations and measurements of visitor behavior provide evidence that something better is needed.


For a specific visitor, during a single session, for a banner in a not too prominent space, the second presentation of the

same content is more likely to be clicked on than the first presentation. The difference can be 30% to 100% higher likelihood for the second presentation when compared to the first.

That is, for example, if the first presentation has an average click rate of 1%, the second presentation may have an average CTR of between 1.3% and 2%.

After the second presentation the CTR stays more or less the same for a few more presentations. The number of presentations in this plateau seems to vary by the location of the content in the page and by the visual attraction of the content.

After these few presentations the CTR starts decaying with a curve that is very well approximated by an exponential decay. For example, the 13th presentation may have 90% the likelihood of the 12th, and the 14th has 90% the likelihood of the 13th. The decay constant seems also to depend on the visibility of the content.

Chart representing click likelihood as a function of the presentation number. We can see that the first presentation has less likelihood than the second. Then it plateaus and after the sixth presentation it starts an exponential decay.

Modeling Options

Now that we know the empirical data, we can propose modeling techniques that will correctly predict the likelihood of a click.

Use presentation number as an input to the predictive model

Probably the most straight forward approach is to add the presentation number as an input to the predictive model. While this is certainly a simple solution, it carries with it several problems, among them:

  1. If the model learns on each case, repeated non-clicks for the same content will reinforce the belief of the model on the non-clicker disproportionately. That is, the weight of a person that does not click for 200 presentations of an offer may be the same as 100 other people that on average click on the second presentation.
  2. The effect of the presentation number is not a customer characteristic or a piece of contextual data about the interaction with the customer, but it is contextual data about the content presented.
  3. Models tend to underestimate the effect of the presentation number.

For these reasons it is not advisable to use this approach when the average number of presentations of the same content to the same person is above 3, or when there are cases of having the presentation number be very large, in the tens or hundreds.

Use presentation number as a partitioning attribute to the predictive model

In this approach we essentially build a separate predictive model for each presentation number. This approach overcomes all of the problems in the previous approach, nevertheless, it can be applied only when the volume of data is large enough to have these very specific sub-models converge.

In the next couple of entries we will explore other solutions and a proposed modeling framework.


Great post! Using presentation number as a partitioning attribute sound like a good idea, but I see two possible snags.

Firstly, as with using this data as input to the predictive model, this number "is not a customer characteristic or a piece of contextual data about the interaction with the customer, but it is contextual data about the content presented." Can we partition the models based on attributes of a choice, or would this mean we would need to partition the models for each observed value of the number for each choice available? The number of partitions might be enormous in the latter case.

Secondly, apart from the volumes of data required for the models to converge, I think we should also consider the effects of this partitioning on memory usage.

Could you please elaborate on these? Thanks.

Posted by Lukas Vermeer on December 11, 2011 at 09:00 PM PST #

You are right. The problems that you mention are real ones.

You can partition by an attribute of a choice, but then you have to make it different for each choice, so what you need to do is to copy the attribute of the choice into the session and call "learn" explicitly. Then change for the next choice and call learn again.

I agree with you regarding the memory usage and there is the additional problem of statistical coverage. So overall it is not a good idea. In the next entries we will be exploring other options to solve the problem.

Posted by Michel Adar on December 12, 2011 at 02:29 AM PST #

Post a Comment:
  • HTML Syntax: NOT allowed

Issues related to Oracle Real-Time Decisions (RTD). Entries include implementation tips, technology descriptions and items of general interest to the RTD community.


« June 2016