X

Learn data science best practices

What Is Oracle Cloud Infrastructure Data Catalog?

This is a syndicated post, view the original post here

And What Can You Do with It?

Simply put, Oracle Cloud Infrastructure Data Catalog helps organizations manage their data by creating an organized inventory of data assets. It uses metadata to create a single, all-encompassing and searchable view to provide deeper visibility into your data assets across Oracle Cloud and beyond. This video provides a quick overview of the service.

This helps data professionals such as analysts, data scientists, and data stewards discover and assess data for analytics and data science projects. It also supports data governance by helping users find, understand, and track their cloud data assets and on-premises data as well—and it’s included with your Oracle Cloud Infrastructure subscription.

Never miss an update about big data! Subscribe to the Big Data Blog to receive the latest posts straight to your inbox!

Why Does Oracle Cloud Infrastructure Data Catalog Matter?

Hint: It has to do with self-service data discovery and governance.

Oracle Cloud Infrastructure Data Catalog matters because it’s a foundational part of the modern data platform—a platform where all of your data stores can act as one, and you can view and access that data easily, no matter whether it resides in Oracle Cloud, object storage, an on-premises database, big data system, or a self-driving database.

This means that data users—data scientists, data analysts, data engineers, and data stewards—can all find data across systems and the enterprise more easily because a data catalog provides a centralized, collaborative environment to encourage exploration. Now these key players can trust their data because they gain technical as well as business context around it. It means they don’t have to have SQL access, or understand what object storage is, or figure out the complexities of Hadoop—they can get started faster with their single unified view through their data catalog. It’s no longer necessary to have five different people with five different skillsets just to find where the right data resides.

Easy data discovery is now possible.

And of course, it’s not just data discovery that’s easier. Governance is also easier—and that is a key benefit with GDPR and ever more complex compliance requirements in today’s world of multiple enterprise systems, with on-premises, cloud, and multi-cloud environments.

With Oracle Cloud Infrastructure Data Catalog, you have better visibility into all of your assets, and business context is available in the form of a business glossary and user annotations. And of course, understanding the data you have is essential for governance.

How Does Oracle Cloud Infrastructure Data Catalog Work?

Oracle Cloud Infrastructure Data Catalog takes metadata—technical, business, and operational—from various data sources, users, and assets, and harvests it to turn it into a data catalog: a single collaborative solution for data professionals to collect, organize, find, access, enrich, and activate metadata to support self-service data discovery and governance for trusted data assets across Oracle Cloud.

And what’s so important about this metadata? Metadata is the key to Oracle Cloud Infrastructure Data Catalog. There are three types of metadata that are relevant and key to how our data catalog works:

  • Technical metadata: Used in the storage and structure of the data in a database or system
  • Business metadata: Contributed by users as annotations or business context
  • Operational metadata: Created from the processing and accessing of data, which indicates data freshness and data usage, and connects everything together in a meaningful way

You can harvest this metadata from a variety of sources, including:

    • Oracle Cloud Infrastructure Object Storage
    • Oracle Database
    • Oracle Autonomous Transaction Processing
    • Oracle Autonomous Data Warehouse
    • Oracle MySQL Cloud Service
    • Hive
    • Kafka

And the supported file types for Oracle Cloud Infrastructure Object Storage include:

    • CSV, Excel
    • ORC, Avro, Parquet
    • JSON

Once the technical metadata is harvested, subject matter experts and data users can contribute business metadata in the form of annotations to the technical metadata. By organizing all this metadata and providing a holistic view into it, Oracle Cloud Infrastructure Data Catalog helps data users find the data they need, discover information on available data, and gain information about the trustworthiness of data for different uses.

How Can You Use a Data Catalog?

Metadata Enrichment

Oracle Cloud Infrastructure Data Catalog enables users to collaboratively enrich technical information with business context to capture and share tribal knowledge. You can tag or link data entities and attributes to business terms to provide a more all-inclusive view as you begin to gather data assets for analysis and data science projects. These enrichments also help with classification, search, and data discovery.

Business Glossaries

One of the first steps towards effective data governance is establishing a common understanding of business concepts across the organization, and establishing their relationships to the data assets in the organization. Oracle Cloud Infrastructure Data Catalog makes it possible to see associations and linkages between glossary terms and other technical terms, assets, and artifacts. This helps increase user trust because users understand the relationships and what they’re looking at.

Oracle Cloud Infrastructure Data Catalog makes this possible by including capabilities to collaboratively define business terms in rich text form, categorize them appropriately, and build a hierarchy to organize this vocabulary. You can also create parent-child relationships between various terms to build a taxonomy, or set business term owners and approval status so that users know who can answer their questions regarding specific terms. Once created, users can then link these terms to technical assets to provide business meaning and use them for searching as well.

Searchable Data Asset Inventory

By organizing all this metadata and providing a more complete view into it, Oracle Cloud Infrastructure Data Catalog helps users find the data they need, discover information on available data, and gain information about the trustworthiness of data for different uses.

Being able to search across data stores makes finding the right data so much easier. With Oracle Cloud Infrastructure Data Catalog, you have a powerful, searchable, standardized inventory of the available data sources, entities, and attributes. You can enter technical information, defined tags, or business terms to easily pull up the right data entities and assets. You can also use filtering options to discover relevant datasets, or browse metadata based on the technical hierarchy of data assets, entities, and attributes. These features make it easier to get started with data science, analytics, and data engineering projects.

Data Catalog API and SDK

Many of Oracle Cloud Infrastructure Data Catalog’s capabilities are also available as public REST APIs to enable integrations such as:

  • Searching and displaying results in applications that use the data assets
  • Looking up definitions of defined business terms in the business glossary and displaying them in reporting applications
  • Invoking job execution to harvest metadata as needed

Available search capabilities include:

  • Search data based on technical names, business terms, or tags
  • View details of various objects
  • Browse Oracle Cloud Infrastructure Data Catalog based on data assets

Available single collaborative environment includes:

  • Homepage with helpful shortcuts and operational stats
  • Search and browse
  • Quick actions to manage data assets, glossaries, jobs, and schedules
  • Popular tags and recently updated objects

Conclusion

Oracle Cloud Infrastructure Data Catalog is the underlying foundation to data management that you’ve been waiting for—and it’s included with your Oracle Cloud Infrastructure subscription. Now, data professionals can use technical, business, and operational metadata to support self-service data discovery and governance for data assets in Oracle Cloud and beyond.

Leverage your data in new ways, and more easily than you ever could before. Try Oracle Cloud Infrastructure Data Catalog today and start discovering the value of your data. And don't forget to subscribe to the Big Data Blog for the latest on Big Data straight to your inbox!

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.