Being able to calculate period-to-period growth and growth rates is a common need in most analytics insight projects today. Oracle Analytics Cloud provides a robust semantic layer capability to do this with the Oracle Analytics metadata repository, where time hierarchies and rich time series calculations can be defined and modeled.
Often in a self-service analytics context, we quickly upload a file-based dataset and don't take the time to model a metadata layer on it. We directly run visualizations on the data, and we soon need to start defining calculations like time series, growth rates, etc. to extract deeper insights.
This blog highlights a simple technique to build any period-to-period calculations without building an Oracle Business Intelligence Repository (whose file extension is RPD) model with Oracle Analytics. There are several ways this can be achieved; this blog explains a direct way of running live calculations using a datasource join preparation. Other upcoming blog entries will detail other alternative techniques for similar calculations.
The following 6-minute video goes through the process of defining this, and there are some illustrated steps below as well.
Our data is an Excel dataset with a list of 10,000 orders over several years. Each order has a date and several attributes and metrics. One of the many attributes is the Product Category (three distinct values), and another is Customer Segment (four distinct values). We want to calculate year-over-year growth of monthly sales by Customer Segment by Product Category.
The calculation needed here must be able to properly aggregate sales by these attributes (month, segment, category), call the same value for last year, and then compile the growth rate. As we have no repository on our .xls upload, we have no semantic time hierarchy available. So, we cannot leverage the time series-specific calculations in Oracle Analytics.
To achieve this dynamic calculation, we can use a simple technique of duplicating the initial dataset in Oracle Data Visualization and joining the two duplicate sources with a Period <---> (Period -1) join. If we do this, the initial dataset will still show actual period in the project, and the duplicated joined dataset will provide the data for the previous period, at the right aggregation level.
First, let's start by duplicating the original dataset in Oracle Analytics Data Visualization environment. Note that this does not duplicate the source file or source table data, it only creates another entry in Oracle Analytics pointing to the same source data.
Once a dataset is duplicated, we need to create a custom “join-key” column in each dataset, which we will use to join the two datasets. That happens in the data preparation tab of data visualization. The column we create in our case is a “month-key” column. In the original dataset, it's simply a month sequence showing YEAR*100+MONTH_NUMBER. In the example below, we see a date such as 21-Jan-2012 shows a month key of 201201.
We create this column in both datasets, but the trick is to add or subtract the number 1 to the year number in one of the datasets. In my case, I will use the duplicated dataset to show year ago data, so I create a column in this dataset with a “+1” in the year number. I could just as well have created a “Year-1” column in the other dataset.
Now when we join both datasets via this month-key, records with, for example, year 2013 in the original dataset will join to the records with year 2012 in the Yago dataset (i.e., 2013 = 2012 + 1). We can join the datasets in the Data Visualization data prepare dialogue box.
Note that we join on the month key column; but we must also join on every other column on which we want our analysis to potentially aggregate—customer segment and product category in our case. If we don't join on these, our reports by customer segment and product category will not give detailed results.
Once this is done, we just need to build visualizations on the joined datasets. One more important point—by default, visualizations will show a join type (Data Blending icon, on each viz properties tab) of All Rows joining with All Rows (full outer joins). In our case, we need to set this to “Only Matching Rows” for the Yago dataset. This configuration will be needed for each visualization we build using the two datasets.
With this, we can now build all sorts of visuals and calculations using the Yago Sales metric. Because we specifically joined on month-key, customer segment, and product category, all these computations will aggregate correctly at the levels of month and above (quarter, year), customer segment and above, and product category and above.
This technique can be used in a similar way in Data Visualization Data Flows, but in this case, datasets do not need to be duplicated. Also, default joining in Data Flows there is not of type full outer. That simplifies the various visualization configuration processes.
To learn how you can benefit from Oracle Analytics, visit