Increase data literacy and accelerate self-service analytics with data catalog

September 29, 2021 | 5 minute read
Ling Xiang
Director, Product Management
Text Size 100%:

The Elephant in the Room

Today’s fragmented data analytics landscape reminds me of the Indian parable “Blind men and the elephant.” A group of blind men described what an elephant looked like by each touching a different part of the elephant’s body. Their descriptions of the elephant were all different. Today we can encounter a similar problem with disconnected data silos and no single source of truth.

Digital transformation has ushered in a new era of data governance. Businesses large and small generate more data and have a more complex technology landscape. They're looking for ways to create value from data. Data governance is taking on a new dimension of managing data as an asset so that businesses can truly create value from data.

What is Data Literacy?

Gartner defines “Data Literacy” as “the ability to understand, share common knowledge of and have meaningful conversations about data, including an understanding of data sources and constructs, analytical methods and techniques applied, and the ability to describe the use case, application and resulting value.” (1)

Extending beyond the narrow confines of business users being able to interpret and understand data, Gartner defines "Data Literacy" as a collective data capability spanning job functions in business and IT that produce, manipulate, and consume data. 

Why is Data Literacy important?

Organizations large and small are undergoing rapid changes with data architecture. Relational data warehouses, Hadoop/Spark data lakes, self-service BI, machine learning, cloud, and hybrid cloud have created a plethora of data silos. 

Business analysts, data engineers and data scientists spend a large amount of time finding data, validating data, and manipulating data for productive use.  Organizations struggle with the "single source of truth." The complexity and cost of fragmented and redundant data wrangling are often invisible, but they make analytics projects less effective and they result in lost productivity and lower worker satisfaction.

It's commonly acknowledged that organizations don’t have a common language to talk about data. Removed from the business, IT risks creating data silos. Data Literacy isn't the same as technical literacy. Like a math student who can’t explain what problem a math equation solves, data engineers and data scientists have a challenge articulating the value of a data project even though they have expert technical knowledge. Business users have domain knowledge, but they have no visibility into how data is dispersed in different systems nor the impact or cost of poor data quality. By linking the cost of data production with the value of data, business users have a bigger incentive to influence data quality at the source.

Data is fundamental to all data science projects. Only with good quality data (timely data with the highest quality at the right level of granularity) can Artificial Intelligence and Machine Learning models produce meaningful results. Creating value from data as a digital business strategy gives Data Literacy new urgency. Data silos, compounded by organization and technology silos, have created knowledge gaps and waste. A common dictionary that explains what data exists, what the trusted data sources are, what data are useful, and how data evolve across systems will greatly improve data usability, data governance, and worker productivity.  

How does Data Catalog build Data Literacy?

Data Catalog Connects People, Data and Analytics

Data Catalog provides a central data dictionary and knowledge base

Organizations create a huge amount of data and it's growing. Not all data are equal. As with any type of information, data must be organized to be meaningful and useful.  Data Catalog unifies technical metadata from different data systems and organizes them by business domain context.

Data Catalog establishes a knowledge baseline that educates data stakeholders on:

  • What are the business processes?
  • What are the key business questions that measure the business outcome?
  • What are the metrics that support the key business questions?
  • What data are captured and in what format?
  • What are critical and sensitive data?
  • How are data dispersed and transformed?
  • What are the commonly used analytics that support key business metrics?
  • What data are used in Machine Learning models and how do they influence prediction outcome?

Data Catalog helps scale analytics self-service

Data Catalog classifies data into data domains. It orients business users with a frame of reference on data context, data provenance, data relationship, and data usage. Data Catalog allows users to search data across multiple data systems and user-generated data sets to improve data reusability. With documented business knowledge and intended usage, Data Catalog can be the coach that trains users on data standards and consistent data practices.

By embedding a data dictionary within the analytics user experience, Data Catalog can put data knowledge at the fingertips of a business analyst. This closes the knowledge gaps between data engineering and business and empowers data translators to train and support analytics users at scale. 

Data Catalog improves data quality by closing the data consumption loop

Data Catalog can equalize data knowledge between business and IT. By closing the data production and consumption loop, it gives business teams a perspective on how data are propagated. Systematic data lineage connects disparate systems and helps business and IT visualize the complexity of enterprise data architecture and the cost of data cleansing. 

By removing the language barrier, Data Catalog helps forge a close partnership between business and IT so that business users understand the entire data life-cycle from data gathering to analysis to data science projects. This ensures data consumption and data quality are a forethought in application designs and that analytics are built with an enterprise viewpoint. By visualizing system lineage across BI, data warehouse, and data lake, Data Catalog helps data engineers collaborate better on data standards and reduce unnecessary data cleansing. This helps data teams deliver consistent data that are ready for analysis and data science projects.


In Summary:

Everyone touches data but few own the data elephant in the room. To create value from data, business and IT teams must communicate in a common vernacular that bridges data, technology, and organization silos. By unifying technical metadata with business knowledge and context, and linking data dependencies across different systems, Data Catalog provides a tool that equalizes data knowledge and improves data usage and data quality within an organization. This is the building block of a data-driven organization.

Next Step:

Check out how Oracle Cloud Infrastructure Data Catalog helps your organization build data literacy and accelerate self-service analytics

Further Reading:

  1. A Data and Analytics Leader’s Guide to Data Literacy by Kasey Panetta August 26, 2021
  2. Liquidity is key to unlocking the value in data, researchers say by Tam Harbert, Aug. 25th, 2021
  3. How to build data literacy in your company by Sara Brown Feb 9, 2021

Ling Xiang

Director, Product Management

Ling Xiang is a product leader in Big Data & AI at Oracle.  Making data analytics simple and intelligent is her passion.  Ling has MBA from MIT Sloan School of Management and MS in Information System Management from Bentley University.

Previous Post

Transform your ideas into Oracle Analytics product features

Alexandria Toothman | 4 min read

Next Post

Oracle Analytics best practices: performance tuning relational database queries

NICOLAS BARASZ | 3 min read