What would you say if I told you that advanced data analysis can be as easy as the right-click of a mouse? Preposterous? Impossible?
Oracle Analytics Cloud offers a user-friendly method to leverage advanced analytics functions on a chart with a single mouse click. Having advanced analytical functions like forecast, cluster, and outliers provides a strong capability to business users who want to have better insights into their data.
Supporting advanced analytics functions with a simple click of the mouse means you don't have to see all the underlying code and math required to support these functions. Unlike writing a formula, this right-click capability helps you quickly enrich your visualizations with advanced analytics-based insights.
We'll be showing examples of capabilities such as forecast, clusters computation, outliers identification, trend lines, and reference lines below.
Advanced analytics capabilities in Oracle Analytics Cloud can be added to charts in two different ways:
1) The first is by right-clicking on the chart as seen in the graphics below:
2) The second is by visiting the bottom left "Chart Properties" pane as seen in the graphic below:
Now that you are familiar with how this function works, let's explore some ways in which you can expand the capabilities of Oracle Analytics Cloud by engaging that right-click on your mouse.
Forecasting is the process of making predictions about the future based on past and present data and most commonly by analysis of trends. In the example below, we have a report showing "Sales by Ship Date Month." Let's forecast those sales by adding it to the chart. The sales line extends to show a forecast using a distinguishable color.
Various properties of a forecast can be controlled from the Oracle Analytics Cloud data visualization properties pane. The number of periods to be forecast can be chosen by overriding the default, for example. Confidence areas can be edited or turned off by setting the prediction interval. Eventually, the forecasting model used to calculate a forecast can be chosen between three options:Choose the appropriate prediction time frame to get an estimate of an interval in which a future observation will fall, with a certain probability given what has already been observed, such as indicated in the graphic below.
The right-click option we describe here is the quick way to add a forecast to most of your analyses. If you wish to go deeper and intend to us that forecast in other visualizations, then you may as well create a custom calculation using the same forecast function. This will give you more control over the various options for the computation. Below is the syntax that you will use, as well as what each section represents.
FORECAST(numeric_expr, series, output_column_name, options, runtime_binded_options)
numeric _expr—represents the metric to be forecast.
series—is the time period for which the forecast model is built.
dimension columns—are the dimension grain you want to forecast on. If series is omitted, that is basically the grain of your data query.
output_column_name—is the output column. The valid values are:
options—is a string list of name=value pairs each separated by a semicolon (;). The values can be directly hardcoded values, or they can refer to a column. In this case, put %1 ... %N, for the value and it will reference the same sequenced column in runtime_binded_options.
runtime_binded_options—is an optional comma separated list of runtime binded columns or literal expressions.
Example with ARIMA: In this example, model type used is ARIMA, number of periods to forecast is six (6), while prediction interval is 70.
FORECAST(
revenue,
(time_year, time_quarter),
'forecast',
'modelType=arima;numPeriods=6;predictionInterval=70;'
)
Example with ETS: In the example, model type used is ETS, number of periods to forecast is six (6), while prediction interval is 70.
FORECAST(
revenue,
(time_year, time_quarter),
'forecast',
'modelType=ETS;numPeriods=6;predictionInterval=70;'
)
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group show more of a coherence and proximity to each other than to those in other groups. In the following example, we have a scatter plot where profit and sales are plotted against each other with all the scatter dots representing cities. The colors of the dots represent the various clusters that they are grouped into.
Properties of a cluster calculation can be controlled from the properties pane, such as identifying the number of different clusters. Choose the algorithm used for clustering between K-means clustering and Hierarchical clustering.
K-means clustering aims to partition "n" observations into "k" clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.
Hierarchical clustering is a hierarchy of clusters built either by an agglomerative (bottom-up) or a divisive (top-down) approach.
As you can see below, the number of clusters can be chosen by entering an appropriate number.
As for other advanced analytics options, if you wish to dig deeper and have more control over how the cluster is being calculated, there is an option to create a calculated column using the cluster function. Here is the syntax that you will use, again with what each section represents.
CLUSTER((dimension_expr1 , ... dimension_exprN), (expr1, ... exprN), output_column_name, options, runtime_binded_options))
dimension_expr—represents a list of dimensions , e.g., (productID, companyID), to be the individuals clustered.
expr—represents a list of attributes or measures that make up the space (all the variables) that will be used to cluster the selected dimension_expr.
output_column_name—is the output column. The valid values are 'clusterId,' 'clusterName,' 'clusterDescription,' 'clusterSize,' 'distanceFromCenter,' 'centers.'
options—is a string list of name=value pairs each separated by ';'. The values can be directly hardcoded values, or they can refer to a column. In this case, put %1 ... %N, for the value and it will reference the same sequenced column in runtime_binded_options.
runtime_binded_options—is an optional comma separated list of runtime binded columns or literal expressions.
Example with Clusters: The following is the algorithm used showing the k-means with number of clusters; in this case we identified five without the use of random seed.
CLUSTER((product, company), (billed_quantity, revenue), 'clusterName,' 'algorithm=k-means;numClusters=5;maxIter=%2;useRandomSeed=FALSE;enablePartitioning=TRUE'))
Outliers are data records that are located the furthest away from the average expectation of individual values. For example, extreme values that deviate the most from other observations fall into this category. They may indicate variability in measurement, experimental errors, or a novelty. To the same report where the clusters are added, outliers are added and depicted as different shapes.
Scatter dots represented by circles in the graph below are outliers, while the ones represented by squares are non-outliers.
Using the right-click feature, outlier calculations also offer an option to choose between algorithms K-means and Hierarchical to be computed.
The right-click option is a quick way to add outliers to your analysis. If you wish to dig deeper and have more control over how the outlier is being calculated, there is an option to create a calculated column using the outlier function. Here is the syntax along with some definitions.
OUTLIER((dimension_expr1 , ... dimension_exprN), (expr1, ... exprN), output_column_name, options, [runtime_binded_options]))
dimension_expr—represents a list of dimensions , e.g., (Product, Department).
expr—represents a list of dimension attributes or measures to be used to find outliers.
options—is a string list of name=value pairs each separated by ';'. The values can be directly hardcoded values, or they can refer to a column. In this case, put %1 ... %N for the value and it will reference the same sequenced column in runtime_binded_options.
runtime_binded_options—is an optional comma separated list of runtime binded columns or literal expressions.
Example with Outliers: Here the algorithm used is mvoutlier.
OUTLIER((product, company), (billed_quantity, revenue), 'isOutlier,' 'algorithm=mvoutlier'))
Similar to adding a forecast for your metric, you can measure the trend of your metric by adding a trend line which indicates a general course of the metric in question. A trend line is a straight line connecting a number of points on a graph. It is used to analyze the specific direction of a group of value sets in a presentation. In this example, we have "Sales by Ship Date Month" and the trend line shows that sales are on an upward trajectory.
Configuration options of the trend line can be controlled in the options pane. The method for
the trend line has three options:
Confidence Interval
A confidence interval—as seen below—is a range of values, derived from sample statistics, which is likely to contain the value of an unknown population parameter. Because of their random nature, it is unlikely that two samples from a given population will yield identical confidence intervals.
Similar to forecast, clusters, and outliers, the trend line can also be calculated in a custom calculation:
TRENDLINE(numeric_expr, ([series]) BY ([partitionBy]), model_type, result_type)
numeric_expr—represents the data to trend. This is the Y-axis, and is usually a measure column.
series—is the X-axis. It is a list of numeric or time dimension attribute columns.
partitionBY—is a list of dimension attribute columns that are in the view but not on the X-axis.
model_type—is one of the following ('LINEAR', 'EXPONENTIAL').
result_type—is one of the following ('VALUE', 'MODEL'). 'VALUE' will return all the regression Y values given X in the fit. 'MODEL' will return all the parameters in a JavaScript Object Notation (JSON) format string.
Example with Trend Line: Trend of revenue by product using linear regression.
TRENDLINE(revenue, (calendar_year, calendar_quarter, calendar_month) BY (product), 'LINEAR,' 'VALUE')
Reference lines are vertical or horizontal lines in a graph, corresponding with user-defined values on the X-axis and Y-axis respectively.
By default, the reference line is of a type; line and the method chosen for the computation of the line can be chosen from the various options like average, minimum, maximum, and others. For example, in the airline industry, if passenger turnout is plotted to time, the reference would be able to guide users to know whether the passenger turnout for a particular month is above or below average.
The example below shows where the reference line is added on a report where we have Sales by Ship Data Month. The reference is plotted on the average value of the sales.
The other way to draw a reference line is called the band. The band provides two lines with a range of reference. The function can be chosen between custom and standard deviation, while the upper limit and lower limit of the reference line can be chosen between average, maximum, minimum, and others. For example, if we are analyzing sales by month and we have a reference band plotted from average to maximum, it would be easy to identify those months where sales were falling in the above average category but below the maximum band.
This function can also be chosen as standard deviation and you have an option to select the value of 1, 2, or 3. If you choose 1, then the band is drawn in such a way that 68 percent of values lie within the band. If you choose 2, then 95 percent of the values are within the band. Choosing 3 means 99.7 percent of values lie within the band.
To see a live project which showcases these advanced capabilities, please visit our Oracle Analytics Sandbox (Oracle account required).
To learn how you can benefit from the new Oracle Analytics, visit Oracle.com/Analytics, and don't forget to subscribe to the Oracle Analytics Advantage blog and get the latest posts sent to your inbox.