Short Season, Long Models - Dealing with Seasonality
By Michel Adar on Nov 17, 2011
Accounting for seasonality presents a challenge for the accurate prediction of events. Examples of seasonality include:
· Boxed cosmetics sets are more popular during Christmas. They sell at other times of the year, but they rise higher than other products during the holiday season.
· Interest in a promotion rises around the time advertising on TV airs
· Interest in the Sports section of a newspaper rises when there is a big football match
There are several ways of dealing with seasonality in predictions.
If the length of the model time windows is short enough relative to the seasonality effect, then the models will see only seasonal data, and therefore will be accurate in their predictions. For example, a model with a weekly time window may be quick enough to adapt during the holiday season.
In order for time windows to be useful in dealing with seasonality it is necessary that:
- The time window is significantly shorter than the season changes
- There is enough volume of data in the short time windows to produce an accurate model
An additional issue to consider is that sometimes the season may have an abrupt end, for example the day after Christmas.
If available, it is possible to include the seasonality effect in the input data for the model. For example the customer record may include a list of all the promotions advertised in the area of residence.
A model with these inputs will have to learn the effect of the input. It is possible to learn it specific to the promotion – and by the way learn about inter-promotion cross feeding – by leaving the list of ads as it is; or it is possible to learn the general effect by having a flag that indicates if the promotion is being advertised.
For inputs to properly represent the effect in the model it is necessary that:
- The model sees enough events with the input present. For example, by virtue of the model lifetime (or time window) being long enough to see several “seasons” or by having enough volume for the model to learn seasonality quickly.
If we create a model that ignores seasonality it is possible to use that model to predict how the specific person likelihood differs from average. If we have a divergence from average then we can transfer that divergence proportionally to the observed frequency at the time of the prediction.
Ft = trailing average frequency of the event at time “t”. The average is done over a suitable period of to achieve a statistical significant estimate.
F = average frequency as seen by the model.
L = likelihood predicted by the model for a specific person
Lt = predicted likelihood proportionally scaled for time “t”.
If the model is good at predicting deviation from average, and this holds over the interesting range of seasons, then we can estimate Lt as:
Lt = L * (Ft / F)
L = (L – F) + F
Substituting we get:
Lt = [(L – F) + F] * (Ft / F)
Which simplifies to:
(i) Lt = (L – F) * (Ft / F) + Ft
This latest expression can be understood as “The adjusted likelihood at time t is the average likelihood at time t plus the effect from the model, which is calculated as the difference from average time the proportion of frequencies”.
The formula above assumes a linear translation of the proportion. It is possible to generalize the formula using a factor which we will call “a” as follows:
(ii) Lt = (L – F) * (Ft / F) * a + Ft
It is also possible to use a formula that does not scale the difference, like:
(iii) Lt = (L – F) * a + Ft
While these formulas seem reasonable, they should be taken as hypothesis to be proven with empirical data. A theoretical analysis provides the following insights:
- The Cumulative Gains Chart (lift) should stay the same, as at any given time the order of the likelihood for different customers is preserved
- If F is equal to Ft then the formula reverts to “L”
- If (Ft = 0) then Lt in (i) and (ii) is 0
- It is possible for Lt to be above 1.
If it is desired to avoid going over 1, for relatively high base frequencies it is possible to use a relative interpretation of the multiplicative factor.
For example, if we say that Y is twice as likely as X, then we can interpret this sentence as:
- If X is 3%, then Y is 6%
- If X is 11%, then Y is 22%
- If X is 70%, then Y is 85% - in this case we interpret “twice as likely” as “half as likely to not happen”
Applying this reasoning to (i) for example we would get:
If (L < F) or (Ft < (1 / ((L/F) + 1))
Then Lt = L * (Ft / F)
Lt = 1 – (F / L) + (Ft * F / L)