Some familiarity with the basic concepts of time series forecasting concepts will allow the reader to better follow this tutorial, although advanced knowledge is not required. For a good introduction to the basic concepts of forecasting, seethis tutorialandthis tutorial.
To follow the example, the reader should also be familiar with basic R syntax. R packages needed: forecast, prophet, bsts, ggplot2, and repr.
In this overview, we introduce decomposition-based approaches to time series forecasting. Decomposition-based methods are a simple but robust approach to modeling and forecasting time series data. The basic idea behind these methods is that they explicitly model the data as a combination of trend, seasonal, and remainder components, rather than trying to capture temporal dependencies and auto-correlations in the data the way ARIMA or GARCH models do.
Decomposing a time series model involves splitting it into 3 or 4 components, in the form of:
(Note that this is an additive decomposition—we will deal with the multiplicative case later).
With: Y^(t): The modelled/forecast value at timet T(t): The trend component at timet S(t): The seasonal component at timet R(t): The Remainder at timet ε(t): Error term
Time series decomposition is usually presented as an analysis step to be performed before generating predictions, but it can also be used as a forecasting method in and of itself if you know what the structure of your time series will look like beforehand. These scenarios frequently occur in a business context such as retail demand forecasting where it is safe to assume for some products that the sales data will have a yearly seasonal pattern and a year-over-year trend.
To forecast a time series using a decomposition model, you calculate the future values for each separate component and then add them back together to obtain a prediction. The challenge then simply becomes finding the best model for each of the components.
In the following overview, we will present three approaches to forecasting using decomposition with R: Seasonal and Trend decomposition using LOESS, Bayesian structural time series, and Facebook Prophet.
First, we will decompose the time series and forecast it using each of the three methods. Then, we will work on improving the accuracy of the forecasts.
Load The Required Libraries And Settings
# Call the required libraries
#Choosing a theme for clear and consistent data plots
Load The Data
We will use the air passengers data set which is a classic data set for benchmarking time series models first introduced by Box and Jenkins in 1976 (it is to time series forecasting whatthe Iris dataset is to classification and regression algorithms). In particular, the air passenger time series has a very clear trend and seasonal pattern and so it is perfect for testing decomposition methods. This data set is available as part of the R base package—but it can also be downloaded here(if downloading, you'll have to use something likeread.csv()to load the data and format it as a time series object).
#load the airpassengers data
#Split the data into test and train
train <- window(AirPassengers, end = c(1958, 12))
test <- window(AirPassengers, start = c(1959, 1), end = c(1960,12))
Decomposition and Forecasting
Seasonal And Trend Decomposition Using LOESS (STL)
The STL was proposed by Cleveland etal.in 1990. STL uses theLOESS (LOcal regrESSion) methodto model the trend and seasonal components using polynomial regression. The algorithm works by running two loops:
An outer loop where robustness coefficients are calculated to minimize the effect of outliers.
An inner loop where the trend component and seasonal component are iteratively updated using LOESS smoothing.
The algorithm can be set to run for a fixed number of iterations for each loop, or it can be set to run until a specific convergence criterion is met (in the R versions of STL presented here, we will use the default number of fixed iterations which is 2). In addition, we can specify to STL whether we want the seasonal component to remain constant over time or whether we want to change (in R, this can be specified using the parameters.window).
To forecast with STL, we first use STL to decompose the time series into three components:
We then apply a standard forecasting algorithm to the remainderR(t), such as ARIMA or Exponential Smoothing, and generate an h-step ahead forecast for the remainder component R(t+h).
Lastly, we calculate the h-step ahead trend componentT(t + h)andS(t +h)and sum all three components to obtain a final forecast.
If we want to use STL for analysis only, then theSTL()function that comes with the base R installation is sufficient. To use STL for forecasting, however, it is easier to use theSTLF()in the forecast package, which uses the originalSTL()function for decomposition but also allows us to specify which algorithm to use for forecasting the remainder.
To perform a STL decomposition, you can run:
#Decompose the time series
AirPassengersSTL <- stl(AirPassengers, s.window="periodic", robust=TRUE)
Note that you don't need to pass a number of seasons to the STL() function; it is picking it up from the frequency that is defined in the original time series objectAirPassengers.
To generate forecasts, you can useSTLF(), which allows you to specify which algorithm to use for forecasting the remainder. In this case, we will use ARIMA, but we don't need to specify the ARIMA orders (p,q,d)—the forecast method will find the optimal orders on its own.
#Train the STL model, using ARIMA to forecast the remainder
AirPassengersSTL_ARIMA <- stlf(train, method="arima")
#Plot the results
autoplot(train , ylab = 'Passengers') + scale_x_yearmon() + autolayer(test, series="Test Data") +
autolayer(ts(AirPassengersSTL_ARIMA$mean,frequency=12, start=c(1959,1)), series="STL + ARIMA Forecasts")
Bayesian Structural Time Series
Proposed by Scott and Varian in 2013, Bayesian structural time series is a powerful set of methods that cover a large class of time series models using theState Space representationof time series and Bayesian statistics.
In the State Space approach, a time series can be written as a set of two equations, a state equation describing the overall evolution of the system in terms of unobserved states, and an observation or measurement equation describing the relationship between the observable variables and the hidden states.
: The state equation, withα(t)the state of the system at timet.
: The observation equation relating the values of the time series to the hidden states.
It is straightforward to rewrite the trend and seasonal decomposition of a time series:
(A BSTS model can also include a set of external regressorsβX(t), although we don't do so here)
In state space form:
andZ(t) = [1 1]
And the state equation is:
(withS(t - i)a set of seasonal dummy variables used to model seasonality). (i.eη(t) = (u(t),v(t),w(t)).
Prophet is a forecasting method developed at Facebook that is open source. It is essentially a curve fitting approach, very similar in spirit to how BSTS models trend and seasonality, except that it uses generalized additive modelsinstead of a state-space representation to describe each component.
Similar to the other two approaches, a basic prophet model is written as:
withT(t)the trend,S(T)the seasonality andH(T)an additional component to represent holiday effects.
The trend is modeled either as a logistic growth model for time series with saturated growth:
withCthe capacity of the model (i.e. maximum level of the time series).
Or a piece-wise linear growth model for unbounded growths:
Changes in trend are modeled using change points in the growth ratek.
The seasonal component is modeled using a Fourier series:
withPthe period of the time series (365 days for yearly data, 7 days for weekly data, etc...) andaandbare models to be estimating.
The holiday componentH(t)is modeled as a regression on the specific holiday dates (which would have to be provided separately—although we won't cover that here).
Facebook Prophet works great out of the box and is very intuitive, especially for non-specialists with no time series or data science training, but it has very rigid requirements in the way the data should be formatted. The time series data needs to be passed to the function as a data frame with a column 'ds' for date and 'y' for data.
So the first step in training a Prophet model will be to format the data properly:
#Create a data frame in the right format for Prophet
FBTrain <- data.frame(ds = as.Date(as.yearmon(time(train))), y=as.matrix(train))
To fit a Prophet model to the data:
#Generate a model and then generate forecasts
m <- prophet(FBTrain,yearly.seasonality=TRUE)
Initial log joint probability = -2.56207
Optimization terminated normally:
Convergence detected: absolute parameter change was below tolerance
Prophet also requires that a future time data frame be defined, before generating forecasts:
The forecasts that we have generated so far using these three approaches are "OK"—they seem to follow the general trend of the time series and they also replicated the seasonal pattern in the data. But they still seem to fail to capture the increase in the magnitude of the seasonal variation over time. This is because the air passengers time series has more of a multiplicative seasonal pattern than an additive one. In the next section, we will try to overcome this difficulty by applying a log transform to the data so that multiplicative seasonality can be represented by an additive model. We will rerun each of the forecasting methods on the log transformed data, and then reverse transform the resulting forecasts and compare them to the original data.
#Plot the log transformed data - you can see that the seasonality is more additive now.
#Create a data frame in the right format for Prophet
FBTrain_log<- data.frame(ds = as.Date(as.yearmon(time(train))), y=as.matrix(log10(train)))
#Generate a model and then generate forecasts
m<- prophet(FBTrain_log,yearly.seasonality=TRUE , seasonality.prior.scale = 2)
future<- make_future_dataframe(m, periods = 24, freq = 'month')
forecast<- predict(m, future)
Initial log joint probability = -2.08326
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
In all three cases, we can see that using the log transform improved the accuracy of the forecasts for all three methods, although BSTS seems to perform better than the other two methods.
We didn't mention the question of stationarity, which is usually discussed when dealing with ARMA and ARIMA models. Stationarity is not discussed in decomposition-based methods, but it is dealt with implicitly by modeling the trend and seasonal components separately from the remainder series.
We didn't mention forecast intervals in this overview—we only generated point forecasts. In many business scenarios, such as demand forecasting and supply chain planning, the forecast intervals are just as important as the point forecasts. Each of the R functions used allows you to generate forecast intervals as well.
We informally evaluated the performance of the forecast methods based on plots of forecasts vs actuals for hold out test data. A more rigorous analysis of these methods' performance would require the calculation of forecast error metrics like MAPE and RMSE.
In the air passengers data set, there is only one yearly seasonal component. But several business time series can have multiple seasonalities. For example, a restaurant customer time series will have daily seasonality (with peaks at lunch time and dinner time) and weekly seasonality (less customers on weekends). STL cannot handle more than one seasonal component. Prophet can handle multiple seasonalities although, as mentioned above, it requires specific date formats. BSTS can handle multiple seasonalities by using a Fourier series (the same way as Prophet) for representing the seasonal component (callAddTriginstead ofAddSeasonal).
STL in its basic form doesn't allow for holiday effects, although the version of STL that is available in the R package forecast can take in holidays as exogenous variables input into the forecast method that is used for modeling the remainder.
Prophet and BSTS can both handle holiday effects. In Prophet, this is explicit and all you need to do is provide a data frame with the holiday dates. In BSTS, you need to model the holidays as a set of external regressors.
BSTS' full potential is realized when we add additional data beyond the time series and holiday data. For example, for the air passenger data, we could add additional factors such as economic conditions, airline marketing data, number of internet queries for airplane trips, etc., and then Bayesian modeling can be used to improve the accuracy of the forecasts. In particular, BSTS is very useful for "Nowcasting"—predicting the values of time series in the present. In the air passenger data example, it would take several months for the airlines to compile and release the data, so you can't know the value for the current month until several months into the future. If, however, you needed to forecast the number of air passengers for the current month, and you couldn't afford to wait, you could use the above mentioned external data along with historical values of the air passenger traffic and "nowcast" the number of air passengers for the current month.
In this overview, we introduced three decomposition-based approaches to forecasting and showed how they can be used as simple, easily interpretable, but also robust methods for univariate time series forecasting.
Cleveland etal., "STL: A Seasonal-Trend Decomposition", 1990
Scott & Varian, "Predicting the present with Bayesian structural time series", 2013.