X

News and Views: Drive Smart Decisions with Cloud Analytics, Machine Learning and More

Smarter and More Efficient Dataflows in Oracle Analytics

Philippe Lions
Senior Director

In nature, water follows the path of least resistance. Think of how a mountain stream eventually flows down to the ocean. In business analytics, the concept is much the same. You want to make sure your data flows efficiently from one software component to the next as it moves toward its destination.

Toward that end, the latest version of Oracle Analytics Cloud has improved data flow capabilities and is more powerful and useful than ever. Let's take a look.

Data flows have various inbuilt functions or nodes like adding new columns/calculations, removing columns, grouping, binning, training different machine learning models, forecasting, and sentiment analysis to transform and enrich the data.

In the latest version of Oracle Analytics, new features like dataset prompts, branching, incremental data processing, and output columns metadata management have been added to data flows. We will go through each of these new features in detail in this blog.

Dataset Prompts 

The dataset prompts feature lets users choose input or output datasets to a data flow on the fly at the time of executing/running the data flow. This prompt option is useful in cases where a user would like to reuse a complex data flow with another dataset, or return an output dataset with a different name without having to edit the flow by opening it. The prompt option can parametrize the input and output datasets of a data flow. By default, this option is disabled, and users must enable it by clicking on the prompt check box. The screenshot below shows how to enable prompts for data flows:

  • Name: This field takes the default dataset name as input. This should be the name of the actual dataset present in the instance.
  • Prompt: This field takes the prompt text to be shown when running the data flow. The prompt option is available for both input and output datasets of the data flow. If the prompt option is not selected, the data flow will run with the dataset selected during data flow creation and edit phases.

This screenshot shows how the prompt window looks when a data flow with prompts enabled is run:

To summarize, the prompt option adds a great deal of flexibility to running data flows by specifying input and output datasets on the fly without having to edit the data flow.

The video below demonstrates the dataset prompt feature in data flows:

 

Branching
The Oracle Analytics Cloud branching option allows users to branch the output of any node (except the train machine learning model node) in the data flow into two or more branches. Users can apply different transformations on different branches and save the outputs of these different branches to different datasets. The end node of each branch will always be Save Data. To add a branch node, click on + and select branch node. The screenshot below shows what the branch node and its options look like:

Number of branches can be incremented or decremented by entering the value or using UI options. Each branch can process disjointed subsets of data and return distinct outputs. For example, in the screenshot shown below, three branches are added to the Sample Order Lines dataset.

The first branch computes Sales by Customer Segment and saves it in a dataset. The second branch computes Sales by Product Category and saves it in a second dataset. The third branch adds month column and saves the entire result in a third dataset.

On running/executing this data flow, three different datasets will be created. Here is a snapshot of the output of these three branches:

The branch option in Oracle Analytics Cloud is useful when you want to do different transformations on different subsets of data and save them separately.

The video tutorial below shows how this branch option can be used:

 

Output Controls
The output controls feature lets users decide how the output column of a data flow should be treated and saved as either an attribute or a metric in the output dataset. A metric columns default aggregation can also be chosen. On adding “save data node” to the data flow, users are provided with an option to decide how each of the output columns should be treated—selecting an attribute or measure from the drop-down list. For metric columns, users can select the default aggregation rule. The screenshot below shows how easy it is to change the data types and default aggregation rule for columns:

To summarize, this feature provides more control over the output column datatype.
The video tutorial below demonstrates this feature:

 

Incremental Data Processing
The Oracle Analytics incremental data processing feature allows users to run the data flow for incremental data/rows that become available between batch runs. This feature helps drive the efficient use of resources by running data flows only on incremental data rather than on data which has already been processed. This option is available only for datasets created from database connections, and this can be enabled only for a single input dataset within a data flow.

Enabling incremental processing for a dataset is a two-step process. The first step is to set the New Data Indicator column while creating the dataset from a database. To enable incremental processing, set the New Data Indicator field to one of the columns from the dataset in the configuration page. New data added to the database will be identified based on this indicator column. Here is a quick snapshot showing how to configure the New Data Indicator column:

In this case, the New Data Indicator is set to the TIME_BILL_DT (date) column. 

After adding the newly created dataset as an input to the data flow, the second step is to select the "Add new data only" field to enable incremental processing for this dataset. Below is a quick snapshot that shows how this should be done:

This dataset is now enabled for incremental processing and any updates to this dataset/table. It will be processed incrementally when the data flow is run after making changes to the dataset.

The output of the data flow for incremental data can either be appended to the existing output or can replace the existing output. Here is a quick snapshot which shows where to choose this option while saving the output data set:

Now all the required parameters are set for incremental processing. When you save and run this data flow for the first time, it will be run for the entire dataset and for subsequent runs. The data flow will be run only for the changed (added or removed) data in the configured dataset.

Check out the video tutorial below to see these incremental data processing capabilities of data flows in action:

 

To find out whether Oracle Analytics Cloud is right for your business data plans, take this 2-minute assessment.

 

Join the discussion

Comments ( 1 )
  • Matt Bedin Monday, April 22, 2019
    I noticed that when Data Flows use the same OAC Connection for the Source and Target, the Target is populated, behind the scenes, with a "CREATE TABLE AS SELECT" statement. This is interesting in that all the processing is pushed down to the database. This is very fast and effectively bypasses any Data Flow source file size or row limits.
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.