Oracle Analytics provides powerful self-service visual data preparation capabilities to let users quickly and easily clean, normalize, and enrich data for analysis. I’m going to give you a quick overview of how the new Data Quality Insights give you an instant visual overview of all the contents in each of the tables that make up your data set. It provides interactive visualizations, Horizontal Bar Graphs for Text columns and Histograms for Number and Date columns.
You can quickly discover hidden issues with your data, such as nulls, misspelling, or non-standard terms. The tiles also include a Data Quality Bar that gives and instant assessment of the quality of the values in each column. In addition to identifying nulls in your data, the quality assessment also leverages the deep semantic understanding of the data to identify invalid values based on semantic classifications. You can explore the data, by using the instant filtering capability built in to the bar graphs, and use the in-line transform capabilities to rapidly change column properties, standardize or repair values, and easily rename columns. Whether for uploaded spreadsheets or on a dataset with multiple database tables with millions of records, Oracle Analytics will help you instantly improve your data and analysis through Data Quality Insights.
An additional convenience, is that all these transform actions can be done without going into the full Transform Editor, they can be done directly from Data Quality Insights in the Data Sets Join Diagram.
Data Quality Insights and Data Preview as shown in the Join Diagram of the Data Set Editor.
You can change the default “Treat As” property of a column by clicking on the “Treat As” icon and selecting from the drop down. You can change a column from Measure to Attribute or vice versa.
To change a column’s “Treat As” property, single click on the Icon to the left of the column header and select the new “Treat As” from the drop down.
Quickly and easily rename columns by double clicking on the column header name in the data preview below the tiles, enter the new name of the column and click the enter key.
To rename columns, double clicking on the column header name, enter the new name click the enter key.
You can easily standardize data values to your standards directly from the Quality Insights Tile by double clicking on the Frequency Bar on a value and typing in the new replace value and clicking the enter or tab key to accept the new value. Once you enter the new value, the column will be re-profiled and the Quality Insights will be updated.
To standardized data values, double click on the bars containing the non-standard data and type in the replacement value, then hit the Enter or Tab keys.
You can easily discover and repair null or missing data directly from the Quality Insights Tile. The Data Quality Bar on the tiles alerts you to the null data with a red indicator. The Null values are also highlighted in Red for you in the bar graph. You can replace null or missing values by double clicking on the Frequency Bar and typing in the replacement value.
To repair nulls, double click on the bars containing the bar indicating “Missing or Null” data and type in the replacement value, then hit the Enter or Tab keys.
In addition to discovering nulls in your data, the system will also detect invalid values based on deep semantic understanding of your data. This is done using the System Knowledge. The System Knowledge is a vast set of geographic and demographic reference data that is used during profiling to discover and classify columns containing these geonames such as cities, provinces, countries, etc. Once columns are classified, the Data Quality Bar reports how many values in the columns match the values in the reference data set for that category as Valid, and those values not matching are reported as Invalid for that category. You can then review and repair the invalid values to improve the quality of your data for downstream analytics. After the repairs, you get immediate feedback in the Data Quality Bar with the improvements to data quality.
To repair data, double click on the bars containing the invalid data and type in the replacement value, then hit the Enter or Tab keys.
As a bonus, the Data Quality Insights come with powerful In-Line filtering capability that allow you to explore your data by simply clicking on the frequency bars. Note that this feature is only enabled for text columns.
To explore your data with In-Line filtering, single click on one or more bars in a column you wish to filter. To remove the filters, single click each column again.
I hope you found some of these new features useful in helping you quickly discover and repair issues in your data sets. Now that you know about the powerful self-service visual data preparation capabilities in Oracle Analytics, I challenge you to go out and try them yourself! Whether you are working with small spreadsheets or big tables from a data warehouse, there is nothing better than spending less time wrangling data and spending more time where the value is — analyzing your data!