We are excited to announce a new release of Oracle Cloud Infrastructure (OCI) Data Catalog. With this release, data providers and subject matter experts can enrich and curate metadata more effectively with custom metadata properties. Data consumers can utilize these enrichments for a much-improved search and discovery experience. Additionally, they can harvest data lake files at scale and understand how they are logically grouped together. With the added ability to harvest technical metadata from non-Oracle sources such as Microsoft SQL Server, Microsoft Azure SQL DB, IBM DB2 and PostgreSQL, OCI Data Catalog ensures that data from even more enterprise systems can be revealed for added business benefits.

 

How to Gain More Value from Your Data

As a quick refresher, Oracle Cloud Infrastructure (OCI) Data Catalog is a recently launched service that makes it easier to find valuable trusted data. It helps find data in Oracle Cloud and beyond, and collects details about the data to understand what data exists where, its business context and how it can be useful for analytics. It is a metadata management solution that supports data lakes, data warehouses, analytics, and data science solutions by enabling self-service data discovery and improving data governance.

If you are not familiar with this new service, this blog may be helpful to review:  What is OCI Data Catalog?

Oracle Cloud Infrastructure Data Catalog is available in 19 commercial regions.  See details at Oracle Cloud Infrastructure commercial regions.

 

New Features of Data Catalog

Here is a quick overview of these new features.

 

Improved Metadata Curation with Custom Properties:

With this release, users can add custom properties which allow users to define their own properties for specific metadata enrichment needs. This will help users customize metadata outside of the default harvested system metadata in OCI Data Catalog to suit their business use case, such as for analytics and data science projects. For example, they can define “Business Description”, “Update Frequency”, “Data Owners” and so on. This provides a mechanism for data experts to contribute business context to technical metadata beyond simple tagging. Once this rich information is populated for different data sets and fields, it helps with discovery, classification, and overall understanding of the data. Data providers now have an organized way of sharing this information so that they don’t have to keep answering questions from data consumers.

Screenshots of adding a custom property.

 

Understand Data Lake Content Better with Logical Data Entities:

With this release of OCI Data Catalog, harvesting data lake at scale has been improved with logical data entities. In a typical data lake that stores a large number of files, multiple files represent a single data set, and files have a specific naming convention that indicates they are part of single logical entity. Expert users can now define filename patterns and use them during harvesting. Filenames are matched to the pattern expressions to form logical data entities instead of individual file entities. These logical data entities then behave like any other entities and can be used for search and discovery. This helps organize and search data lake content in a meaningful manner for users and also prevents the explosion of data entities and attributes in the catalog.

Screenshots of an example of a logical entity.

 

Harvesting Support for Non-Oracle Data Sources:

With this release of OCI Data Catalog, users can harvest technical metadata from non-Oracle sources such as Microsoft SQL Server, Microsoft Azure SQL DB, IBM DB2 and PostgreSQL. This allows customers to expand the scope of systems that can be included in the data catalog.

A screenshot of a non-Oracle metadata source.

 

As a reminder, OCI Data Catalog is built on the secure, reliable, scalable and highly performant Oracle Cloud. It is a native Oracle Cloud Infrastructure service hosted and managed in Oracle Cloud. All the capabilities are also available as REST APIs and SDKs in multiple languages as well for power users and developers should they want to use those to integrate OCI Data Catalog capabilities into their applications. In addition, for access control, OCI Data Catalog uses OCI IAM policy management like other Oracle Cloud Infrastructure services.

 

Want to Know More?

For more information, review the Oracle Cloud Infrastructure Data Catalog documentation and associated tutorials.

Organizations are embarking on their next-generation analytics journey with data lakes, autonomous databases, and advanced analytics with artificial intelligence and machine learning in the cloud.  OCI Data Catalog helps support discovery, insights, and governance of data assets.  The current capabilities of OCI Data Catalog are available to Oracle Cloud customers at no additional cost, providing customers with great value-added capabilities! Try it out today!