How Important Is Metadata To An Information Management Strategy?
By Ian Thomas on Aug 31, 2013
It's been a while since my first post and a lot has happened. I have now formed an Information Management Architecture team within our European Data Integration Solutions team to engage more strategically with our customers and partners around helping them to develop their information management strategy.
I'll be discussing aspects of information management strategy in this blog along the lines of the first blog posting. Let's start with a subject that always starts an interesting debate - metadata.
What is Metadata?
Metadata is 'data about data' and can be divided into four basic classifications:
- Business Metadata - the business meaning of data. It includes business definitions of the objects and metrics, hierarchies, business rules, and aggregation rules.
- Operational Metadata - Operational metadata stores information about who accessed what and when.
- Technical Metadata - Technical Metadata describes the data structures and formats such as table types, data types, indexes, and partitioning method.
- Process Metadata - Process Metadata describes the data input process.
Typical Use Cases
- Introspection - discovering and harvesting of information into a metadata repository
- Impact Analysis - analyse dependencies across the architecture. Requires end to end view of interdependencies
- Data Lineage - description of the origins of a piece of data and the proves by which it arrived in the database. The 'provenance' or 'pedigree.'
Why Is Metadata Important To An Organization?
Accurate and comprehensive metadata enables the evaluation of an architecture's efficiency and eases the process of changing or evolving the architecture. For example, when you want to change a particular component then what is the impact on other components in the architecture?
Metadata can have a key part to play in regulatory compliance initiatives or certification processes to clearly demonstrate what components exist in the organization's architecture, their capabilities and interdependencies.
Information and data management initiatives such as common reference data across an organization as part of an Master Data Management strategy. A clean, accurate view of corporate reference data requires clean, accurate metadata.
Enabling cost savings and efficiency through better insight and more efficient troubleshooting and impact analysis and helping to avoid data duplication.
What Approaches Do Customers Take?
Shared Metadata Repository
A common repository used by different tools to share metadata across the toolsets. This is seen as the 'nirvana' of data integration and has many advantages but has some practical limitations that need to be overcome:
- There are many different roles involved in an integration strategy from business users, Data Stewards, ETL Developers to report builders. The tool needs to cater for all these types of users or each individual tool needs to provide a specific interface into the metadata repository.
- Common naming standards and definitions need to be defined, so all users understand they are addressing the same underlying data.
Individual product innovation should not be stifled by having to adhere to a common metadata repository such as having to wait for a new release of the repository before releasing a new version of the tool or not being able to release new functionality because the common repository won't support it.
A common view of metadata across different tools holding metadata about the individual metadata repositories. This is a more loosely coupled approach than a common repository and allows flexibility in introducing new tools and technologies and a more agile approach to data integration as well a single point of reference for regulatory reporting, for example.
There is an overhead in administration and making sure there is an up to date view of metadata information in a timely manner.
Specific Metadata Repository Integration Between Tools
Where tools have their own individual metadata repositories for describing functionality and capability within their tool set, specific points of integration can be developed or the metadata repository extended where it adds value to the business.
For example, sharing metadata between an ETL tools and a reporting tool would achieve a level of data lineage from report to data sources if the reporting tool could interrogate the ETL tool's metadata or import metadata from the ETL tool. A specific development is needed between the tools in this case.
What is Oracle's Approach?
The Oracle technology stack is driven by metadata and all the types of metadata mentioned above from metadata within the business applications to technical metadata within the database and Fusion Middleware products. A 'one size fits all' approach is not viable across the whole stack but there are some standards being used and some specific points of integration. Let's take a look at some of these:
Oracle Metadata Services (MDS)
MDS is a metadata standard across the Fusion Middleware stack and can be used as a central metadata store for multiple Fusion Middleware components such as SOA Suite, Oracle Application Development Framework (ADF) and Oracle WebCenter. It is described in detail in the following article Storing SCA Metadata in the Oracle Metadata Services Repository.
Oracle Enterprise Repository
Oracle Data Integration Solutions and Metadata
Oracle Data Integrator (ODI) is a metadata driven tool and has its own metadata repository, which is extensible via FlexFields and has a Smart Export/Import utility to easily exchange data with other tools. A specific integration between ODI and Oracle Business Intelligence (OBIEE) has been done to share metadata for data lineage and impact analysis. this is described in the white paper Managing Metadata With Oracle Data Integrator.
Oracle Enterprise Data Quality (EDQ) also has its own repository but can share metadata with ODI via Export/Import functions and EDQ functionality can be directly included within ODI process flows.
ODI and Application Adapter for Hadoop
ODI builds Hadoop metadata through it Knowledge Modules and the Application Adapter for Hadoop. Map Reduce jobs can be created, orchestrated and coordinated through an easy to use interface and information loaded through to an Oracle relational database using ODI and the Oracle Loader for Hadoop. This enables shared metadata between Hadoop and Oracle and a bridge between the two types of technology as part of an overal information management strategy. For more information see Oracle Data Integrator Application Adapter for Hadoop.
Metadata Bridging Technology
Third party tools are available to act as a bridge or central point for multiple metadata repositories. An example is the Meta Integration Model Bridge (MIMB) from Meta Integration Technology Inc., which is the foundation technology for several vendor's metadata solutions such as IBM and Informatica.
Here's a few key points to finish with:
Take a Pragmatic Approach - Specific metadata solutions for specific use cases based on defined business benefits
Integration at a Product level Where Possible - Look for specific points of integration at a solution level to reduce development effort and overall product maintenance
Metadata Sharing Promotes Operational Efficiency - Share metadata between data structures as well as with SOA and Middleware components will help with troubleshooting and impact analysis
Better Insight and Cost Savings - Insight into relationships between data structures to analyse the cost of change and make it easier to know how the data structures have been derived
Bridge the divide between structured and unstructured information - As part of a Big Data strategy to gain a better understanding of your overall information architecture through a standardised metadata approach.
I hope this has given an insight into how important metadata can be to an information management strategy and how Oracle take a pragmatic approach to metadata management. Watch this space to see how Oracle will continue to develop this strategy.
Special thanks to Stephen Bennett of Oracle's Global Enterprise Architecture team for some of the definitions around metadata from his soon to be published Information Management Reference Architecture white paper.