Do you ever feel like you're living on the edge with your forecasts? With advanced analytics being built into more and more of the tools we rely on for everyday decision making, are you just the slightest bit uncertain about what you've gotten yourself into? And are you ready to be grilled on your results?
"What does the forecast for the quarter look like?" You: No worries, I've got this.
"Why does the forecast look like that?" You: I know my business (and I sure hope the data backs me up).
"Is this adjusted for seasonality? Have you reviewed the outliers?" You: Whoa now, say what?
"What algorithm did you use for this forecast?" You: I'm gonna have to get back to you on that.
In a world of automatic advanced analytics, sometimes we take the power for granted and forget that "with great power comes great responsibility." (Yeah….I stole that from Spider-Man.) So, let's make sure we know our stuff.
Say you want to forecast order volumes for the next 3 months? It's easy. All you need to do is: 1) Open Oracle Analytics Cloud; 2) Click on your sales data (or my sales data -- link to data); 3) Drag Quantity Ordered; 4) Order Date>Month onto the canvas.
Toss on a filter (by dragging columns to the top bar) for individual product categories and products. We'll look at the following area for now: Product Category=Furniture, Product Sub Category=Bookcases.
Now, let's flex our analytic muscle with a handy forecast. You can get to this in one of two ways.
1) Hover over the line and right click, select Add Statistics>Forecast.
2) From the left side Data Panel, select the Analytics icon, expand Overlay & Projection, select forecast, and drag forecast onto the canvas.
Whether you went left or right, now you should see some forecast values. So now what?
You've successfully executed a time series forecast for order quantity by quarter. But can you explain it?
What kind of algorithm has been applied? Is it the right one?
If you look to the lower left corner, you will see the settings used for the forecast you have just chosen are automatically selected from the chart properties pane.
In the pane, you see the following information:
What does all that mean?
ARIMA stands for Autoregressive Integrated Moving Average Models.
ETS stands for Exponential Triple Smoothing.
Not exactly the help you were looking for?
What do you need to know about ARIMA and ETS?
Both algorithms are used to predict the future value of a variable based on the historic values you provide in your dataset. You need a reasonable amount of data (sample size) to work from, common sense should apply. For example, it rained yesterday, but does that mean it will rain today? You looked at last week's sales, does that give you an idea of how the year will go? No in both cases! You need data that represents at least 10-12 periods like the period you are trying to forecast, and if you suspect there are cycles in your data, you will need data that reflects that.
Both algorithms can handle seasonality, e.g., cycles in the data, but again, you need enough data for it to be apparent that it's part of a cycle, not just a fluctuation. Take shoe sales, for example. You wouldn't be surprised to sell fewer sandals in New York in January than in June, and to see that pattern occur every year.
Time to Experiment
Let's add an old fashioned reference line based on the average of the values over time.
A reference line is simply a horizontal line plotted on the graph to indicate the average of all values.
The first forecast shows us probable values of 60-75 orders plotted on the line, with the shaded area showing prediction interval values significantly higher or lower—this is to account for 95 percent of possibilities.
Based on the average, we could be looking at something in the area of 55 orders.
Now let's change the number of periods forecast.
When doing this, it's important to keep in mind how much data you have to work with. We have four years of data as granular as the day level (that means we should have enough month-level data points), so let's try forecasting six months.
What if we change the Prediction Interval?
Changing the Prediction Interval only affects the shaded area that appears. Instead of accounting for 95 percent of possible outcomes, it shows the area that represents 90 percent of possible outcomes.
Now we'll change the algorithm.
Let's try the ETS option instead of ARIMA; it's supposed to be good for data with "noise," meaning data that is all over the place. Leave the number of periods set to six and the prediction interval at 90 percent and change the model to ETS.
You may notice that this result actually appears to be more consistent in pattern with the historic data than the ARIMA results, but which one is right?
The irony of forecasting and prediction is that only time will tell (not comforting, I know).
Data Scientists continuously evaluate and monitor the algorithms they use to make sure the prediction quality is not degrading. As your data becomes more complex, with more variables in play or even as more time passes, it becomes increasingly difficult to predict based on all the data. This is where you and what you know come in.
For example:
So what can you do?
1. Think like a data scientist, pushing the buttons and pulling the levers to see which algorithm best fits your data. You can use the copy and paste functionality to replicate the same visualization multiple times so you can do a side-by-side comparison. You can also keep this on a separate canvas for your own "monitoring" purposes even after your task is complete.
2. Use all the tools in your box. I've shown you the various forecast algorithms and the reference line, but there's more in the box. Trend lines, clusters, and outliers are there just waiting for you to explore. For example, you may want to add an outlier visualization to help other users understand the data when you present it.
3. Context is king. Be descriptive so that people with whom you share this information get the big picture.
At the end of the day, quality analysis relies on quality data and your expertise. We build the tools to help you do more, faster and smarter, because that's what augmented analytics is all about.
To learn how you can benefit from the new Oracle Analytics, visit Oracle.com/Analytics, and don't forget to subscribe to the Oracle Analytics Advantage blog and get the latest posts sent to your inbox.
Guest author Rachel Bland is a director of product management for Oracle Analytics.
So, if we want to forecast on month level we can create a line graph where we have month of year as categories and then each year as a line. This will quickly show us if there is seasonality, the lines will show similarity for each month.
We can also add a trend line to se if the data has a trend, if it is not obviously.
Also, it is important to see if we have gap in the series, maybe we are missing weeks/days of data that will affect the forecast.
It is overall just 10 minutes extra work but will get you to the end quicker, also you get some statistics of the forecast if you use a histogram instead of line that give you information if the forecast is more correct from a statistical point of view.