Ask a runner about a marathon’s last mile and they’ll likely describe it as the most exhilarating and the most rewarding part of the race. But you can never be entirely sure of what the conditions will be on race day. Some adjustments must inevitably be made to finish the race, and the best adjustments can help runners achieve record times.
Analytics is similar in that organizations spend time, effort and money setting up corporate data sources to provide the best data possible. However, like a race, conditions can vary on the day that a business user must answer a question. Using the curated data sets that are provided is a great start, and you don’t want the cost and effort of that preparation going to waste. But the business user usually needs to adjust what is simply provided at the start to complete their analytics story. This “race day” adjustment is the analytics last mile, where insights are translated into business value that drives change.
In this blog series, I will look more specifically at the disparate needs of last mile analytics by different roles, including data engineers, citizen data scientists, analysts, and business users. Each of these roles require some form of last mile analytics or will feed into data processes for other user groups, but they all need it and use it quite differently. This blog will look at the last mile analytics for business users. Subsequent blogs will address the other data roles.
The last mile of analytics refers to any data processing tasks —such as fixing, enriching, combining, or cleaning—that must be performed on data sets after they have been pulled from their requisite data sources. Data wrangling is the process of connecting and accessing raw data from one or many sources systems, and transforming into another format with intent of making it more appropriate and valuable for downstream purposes like analytics. The two concepts are similar, but not synonymous.
Users might anticipate some measure of last mile data preparation, but it’s almost impossible to plan for the specific changes required. Instead, the business user ends up modifying the data as the need is identified in an ad-hoc fashion by any means or tools they may have on hand; this usually means locally installed desktop or cloud based spreadsheets, but I’ve personally seen this done with desktop databases too! These tools are usually preferred because everyone has access to them, and no special skills or expensive licenses are required.
Data wrangling on the other hand is a more planned task performed with corporate-provided, specialized data wrangling tools executed by specific data roles within the company. Data wrangling tools are expensive and consequently only a handful of people actually have access to them or even have the skills to use them.
Even the best data sources, whether they’re data warehouses, lake-houses, reporting databases or application data sources, sometimes don’t provide data in the exact shape and format that business users require for their reporting and data analytics needs. Business requirements change so quickly, and the questions asked vary so much, that it’s nearly impossible for even the best data management systems to completely remove the need for last mile analytics. Attempts to make such changes to corporate systems create a bottleneck, with IT struggling to keep pace with change requests from the business groups.
Spreadsheets are very versatile tools. They can be a database, developer environment, data transformation tool, and reporting tool all in one. They’re usually installed on just about every person’s computer and they’re comparatively cheap, if not free. As such, spreadsheets have been the perfect stop-gap tool for completing data preparation and enrichment tasks. And on a small scale, they will do the job just fine. But spreadsheets don’t come without their own drawbacks, particularly with medium or larger organizations that are concerned with governance, security, and compliance. Spreadsheets are an insecure platform for holding corporate data. There is no way to track data changes to ensure a governed and consistent approach to creating business metrics. The whole spreadsheet process is usually reliant on individuals, making it a single point of failure and something near impossible to replicate should that person leave the company tomorrow. This leads to the well documented phenomenon known as “Excel hell” – an ungoverned, uncontrolled, chaotic environment that is heavily impacted by human error, making it unreliable and untrustworthy.
A better solution to the analytics last mile needs to address two key requirements:
This points to the need for a unified platform that addresses the needs of the entire analytics workflow – from data sourcing to decision making – not just parts of it. Leveraging multiple tools at each stage introduces architectural complexity, risk, and the increased cost associated with moving data more than is necessary. For example, Excel provides neither built-in machine learning nor spatial analytics, so if the business question to answer requires both, additional tools or plugins are needed.
A holistic view of the analytics workflow is shown the diagram below. It begins with more technical tasks like connecting to relevant data sources or modeling the physical sources into logical business views, then allows for data preparation and enrichment. Next follow more business-centric tasks like exploring data and making insights, which create visual stories and an experience to share through collaboration with colleagues.
Most analytics vendors address parts of the analytics workflow. For example, a leading BI vendor provides a great interface for analysts but requires a chargeable extra to add data preparation. Some vendors even require 3rd party partnerships that attempt to address more of the analytics workflow. Many organizations end up acquiring additional tools to complete their data source extracts, modeling, preparation, and enrichment tasks, or to add machine learning.
An ideal analytics platform should specifically address each of the steps on the analytics workflow in its entirety, including embedded machine learning and natural language interfaces. Business users, with no specialist skills, should be able to:
They should be able to do all of this in a collaborative environment that creates repeatable processes that are not reliant on individuals.
Example 1: A business user needs to calculate the number of days between two purchase records in a data set. The source system does not directly provide this metric but is essential to understand how often a repeat customer returns to the retail store. Data wrangling processes conducted by IT do not typically perform this type of inter-record calculation. With Oracle Analytics, the business user is able to connect to the required data, then build a data flow that provides the necessary calculation logic without the need to export the data outside of the secure environment.
Example 2: Business users often need to simply unpivot a dataset from a tabular form (short and wide table) to a standard columnar table format (long and narrow), with each ‘thing’ residing in its own column and each row as a single transaction. Often performed in a spreadsheet tool, this manipulation is done to change from a form that is hard to use in reports into a more business friendly format.
Data sources seldom provide data in a format entirely ready for business users and their analytic requirements. In most cases, some sort of last mile data preparation and enrichment must be performed, and spreadsheets have traditionally been the ideal stop-gap solution for this need. Spreadsheets, however, are insecure; they introduce human error and rely on individuals. A better approach is to ensure that a secure analytics platform provides similar functionality and prevent any need for data exports or ungovernable data processing. Oracle Analytics provides all the data manipulation capabilities users need to ensure that their corporate, personal, and third-party data sets can be blended and enriched – all in a trackable, sharable, and repeatable business process. And unlike other providers, Oracle doesn’t charge extra for these data preparation capabilities - they’re all built-in and included.
Learn more about Oracle Analytics’ built-in data preparation and enrichment. Click here. Also get additional information at Oracle.com/analytics, follow us on Twitter@OracleAnalytics, and connect with us on LinkedIn.
Barry is a senior director for product marketing covering Oracle's AI and Analytics services.