If you’ve ever worked with data, you know that some level of blending, preparation, transformation, and enrichment is required to get it ready for analytics. Source data systems seldom provide data exactly as the business needs it for use in its visualization tools. This common need to modify data has predominantly manifested itself in complex, error prone, Excel-based processes.
Subscribe to the Oracle Analytics blog and get the latest posts sent to your inbox.
Alternatively, expensive additional tools, like Alteryx, or bolt on extras like Tableau Prep have been purchased to fill the gap. The issue with Excel-based extracts and transformations is that they are disconnected causing stale data that is also unsecure. Also, each Excel workbook can be filled with complex logic that is decipherable only by the author. Bolt on products require additional license fees, training, manual integration, separate maintenance, and consequently add complexity to your architecture. Usually due to cost, it also means that only an elite few users have the opportunity to use a good data transformation tool, and the rest simply resort back to Excel, which ensures all the problems of both options become real in the organization.
A better approach is to provide data preparation capabilities inline, within the same connected platform as the visualization and metadata layers. And, to do so with no additional cost. This empowers all users to do data transformations within a centralized platform that ensures security and auditability while providing all the transformation capabilities a user could need. If you could do it in Excel, then you should be able to do it in the data transformation tool, without specialized training. This way, if Joe from accounting who owns that crazy complex XLS file ever leaves, the ability to interpret, update, and understand that XLS file isn’t lost with Joe.
What is required of modern data transformation tools?
1. Data Connectivity. It is essential that users are able to connect to all the data sources they require, without a ton of help from IT, additional licenses, or extra training. All relevant information should be considered for the best business decision, and as such, corporate, external, and personal sources should feed into a business owned data flow.
Users connect to any authorized data source to bring in relevant information for their analytics. This could be the EDW in Snowflake, CX data from Salesforce, Oracle sources, open source options like Mongo DB or MySQL, or that Excel sheet your boss just emailed you. See all supported data sources here.
2. Data Transformation and Enrichment. A complete set of functionality needs to be available to allow the transformation of the data. Calculated columns, logic calculations, groupings and aggregations, binning, sentiment analysis, and regular expressions are all required to ensure that data needed in the report, but not available in the data sources, can be created. Also, this created information should be auditable and traceable back to the original numbers that were used to create it. This connected data flow explains how you got to a particular number and where its source components reside in source systems.
A variety of capabilities for business users to prepare and enrich their data without any specialist training.
3. Collaboration with Data. The ability to output reusable, sharable, and live datasets for visualization is required. Datasets shouldn’t have row-level restrictions like Excel and should not cause performance problems for analysts using them. Also, users should be able to push their transformed data back to the Autonomous Database—the centralized repository that is always performant and secure.
Output from this dataset is directed to the Oracle Autonomous Data Warehouse. Zero administration means no IT is required to manage or administer the instance. Users are able to securely push data flow results of any size into a centralized database for sharing.
4. Built-in Machine Learning. Being data-driven is top of mind for today’s business decision-makers who are serious about making the best decisions based on what their data is telling them, versus gut-feel decisions based on rather untrustworthy numbers. Therefore, machine learning (ML) is a critical component to complete any data transformation requirements. Machine learning should be an integrated capability, part of the unified data preparation tool. Again, this beats the complicated approach of multiple skill-specific tools where results must be exported, then imported to the external ML tool to process results, then exported and reimported back to the visualization tool.
Data enrichment through machine learning is critical for modern business analysts. Here a revenue prediction model is being trained and tuned.
Oracle Analytics Cloud and Oracle Analytics Server (on premises) provide a modern unified platform that allows users to connect to the data they need, transform it securely, then save, share, and collaborate with the resulting datasets. Data and transformation logic always remain centralized where it can be governed and controlled.
For more examples and to see data flows in action within Oracle Analytics, check out these short videos on YouTube.