Profiling – Data vs Metadata

Introduction

In today’s world of Data Governance and Data Analysis, the term “Data Profiling” is used in Data Quality and Data Catalog (Metadata Management) domains, leading to confusion among teams. However, they serve different purposes and operate at different levels of the data ecosystem. This blog aims to clarify these concepts with clear definitions, use cases, and examples, particularly when we are discussing Modern Data Platform (MDP) or Data Governance for customer’s requirements.

Key Difference

Aspect	Data Profiling	Metadata Profiling
Focus	Actual data values	Structural information about the data
Purpose	Identify anomalies, assess quality	Understand schema, constraints, and structure
Examples	Nulls, duplicates, patterns, frequency	Data types, column names, keys, constraints
Used By	Data Analysts, Data scientists	DBAs, Data Architects, Data Analysts
Tools	SQL, Enterprise Data Quality	Data dictionary views, schema explorers
Personas	Chief Data Officer	Chief Governance Officer

What is Data Profiling?

Data profiling is the process of examining actual content of a dataset to understand its structure, quality, and integrity. It helps in detecting problems such as:

Missing or NULL values
Duplicate records
Inconsistent Patterns
Incomplete Records
Outliers in numeric fields

Example (Enterprise Data Quality):

Example (Oracle SQL):

SELECT COUNT(*) AS total_records FROM Employees;

SELECT COUNT(*) – COUNT(email) AS null_emails FROM Employees;

SELECT department, COUNT(*) FROM Employees GROUP BY department;

These queries analyze real values in the Employees table to highlight quality issues or patterns.

What is Metadata Profiling?

Metadata profiling deals with inspecting the schema or structural layer of a dataset. It doesn’t consider data values, but rather focuses on how the data is defined and governed.

Typical checks include:

Data types and column lengths
Primary and foreign key constraints
Naming conventions and descriptions

Example (OCI Data Catalog):

Example (Oracle SQL):

SELECT column_name, data_type, data_length FROM user_tab_columns WHERE table_name = ‘EMPLOYEES’;

SELECT cols.column_name FROM all_constraints cons JOIN all_cons_columns cols ON cons.constraint_name = cols.constraint_name WHERE cons.table_name = ‘EMPLOYEES’ AND cons.constraint_type = ‘P’;

These queries reveal how data is structured, not what it contains.

Use Cases

Use Case	Data Profiling	Metadata Profiling
Data quality checks	Yes	No
Schema validation	No	Yes
Migration readiness assessments	Yes	Yes
Compliance and governance audits	Sometimes	Yes

Why the Confusion?

Both are common in Data Governance initiatives.
Both can use SQL.
Teams may use profiling tools that blur the lines.

Let’s solve this by asking: “Are we checking the values or the structure?”

Conclusion

Data profiling and Metadata profiling are complementary practices that serve different purposes. Right use of these terms improves clarity, responsibility, and outcome in data management processes. Teams should adopt a standard language and toolkit to separate the two and make collaboration smoother. Oracle Enterprise Data Quality (EDQ) offers Data Profiling capability and OCI Data Catalog offers Metadata Profiling capability.

Profiling – Data vs Metadata

Ravi Lingam

Master Principal Solution Engineer

Bidirectional Replication Between Oracle Heatwave MySQL instances using OCI GoldenGate

Migrate On-premises PostgreSQL Database to OCI Database with PostgreSQL using OCI GoldenGate

Profiling – Data vs Metadata

Authors

Ravi Lingam

Master Principal Solution Engineer

Bidirectional Replication Between Oracle Heatwave MySQL instances using OCI GoldenGate

Migrate On-premises PostgreSQL Database to OCI Database with PostgreSQL using OCI GoldenGate