X

Welcome to All Things Data Integration: Announcements, Insights, Best Practices, Tips & Tricks, and Trend Related...

Enrich Metadata with Oracle Cloud Infrastructure Data Catalog

Rashmi Badan
Product Manager

Oracle Cloud Infrastructure (OCI) Data Catalog is a fully managed service that provides a centralized metadata repository for all your data sources within the Oracle ecosystem. It allows data providers to harvest, enrich, and curate technical and business metadata and allows data users to search and discover data suitable for their needs. To learn more, see the OCI Data Catalog page

An important feature in OCI Data Catalog is the ability to enrich technical metadata using a business glossary and free form tags. With the latest release of the service, we’re introducing custom properties as a way to further enrich metadata flexibly.

How does metadata enrichment benefit data consumers and data providers? 

Data catalogs provide data consumers with a single place to explore all data sources within an organization. As a data consumer, you want to determine what data best suits your needs. But the number and variety of data sources present pose a challenge to find suitable data easily. Additionally, to choose the right data you need to understand more than just technical metadata, you need the business context of the data. A business glossary provides one way to establish and share common vocabulary between data producers and data consumers in an organization. Free-form tags are another way to capture tribal knowledge in an organized manner. But these methods might not be sufficient to capture all the details a data user is looking for. What if you want to know who owns the data? Or the business description of a particular table and how its fields are derived? Or how frequently a table is updated, so you can decide the frequency at which to refresh the reports generated from it? Without a proper understanding of such details you could spend a significant amount of time searching for and identifying the right data for your use and, worse, use the wrong data.

Both data providers and data consumers face this common problem, not being able to collaboratively share specific details about data. One of our customers was facing exactly this problem. They manage a data warehouse fed by various services and source systems across the organization. Their end users are data analysts and data scientists, who need to explore these data systems to look for suitable data. To share information about the data, the warehouse team recorded information like business descriptions, data owners, update frequencies, security groups, certification status, and many such details for every data asset and data entities within them. But they maintained these details in Confluence pages. Imagine the time spent by their users to switch between data sources and the Confluence pages to understand the data before they can decide which to use! As we all know, Confluence or internal wiki pages get outdated quickly and then the consumers have to keep asking these questions to data providers. Not a scalable and happy situation. 

What if their data consumers could get all the technical and business context in a single place? With OCI Data Catalog, now they can. They can capture such information with custom properties. This streamlining not only eliminated their team’s overhead of maintaining the extra information, but also made the life of their data providers and data consumers much easier.
 

Understanding custom properties

OCI Data Catalog’s custom properties feature allows you to enrich metadata pertaining to the harvested data sources in a customizable manner. Imagine a table with a column name like OP_ID_reg_US. A sales or business analyst would know nothing about the contents by looking at that name. But if the data provider had a way to add a description, indicating that the column represents unique account ID for customers in the US region, then the analyst can relate to values in that column instantly. This extra information gives great value to data users to better understand the data, know where it’s coming from, and how to use it. A data expert or data provider within your organization understands the data well and also recognizes ways in which the data is used by end users, such as data analysts and data scientists. Data providers and subject matter experts can now provide the right kind of metadata enrichment using custom properties.

Custom properties are flexible, highly configurable, and easy to use. Once the data provider has identified information with which to enrich the metadata, creating and using custom properties involves the following simple steps:

  1. Create custom properties and associate them to object types, typically done by a catalog admin. 
  2. Populate values for the properties for different objects, typically done by a data provider or subject matter expert.
  3. Search, explore and better understand the data with custom properties, typically done by a data consumer.

So, let’s get started!

Consider the scenario where a data analyst is looking for the Revenue table of the Car Accessories division in the business. Without custom properties, a search for Revenue table gives the bare minimum information, mostly limited to technical metadata.

Figure 1. A screenshot of Data Entity without custom properties
 

Clearly, this result doesn’t have any business context for the analyst to understand what the data is about. Let’s see how custom properties can enhance the user experience in understanding the data better.

Create custom properties

OCI Data Catalog provides a powerful way for the admin to create custom properties. You can define the following characteristics for each custom property created:

  • A name to easily identify the property
  • A description of the property that shows up as a tool tip to the data provider
  • A suitable datatype for the property (plain text, rich text, boolean, number, or date)
  • OCI Data Catalog object types the property can be associated with
  • An option to define a limited list of allowed values
  • An option to allow the property to contain multiple values
  • Ability to control, filter, and sort search results based on values of the property

Under Quick Actions, click Manage Custom Properties or Custom Properties on the home page to get started.

Let’s create a property called Data Owners to help our data analyst know whom to contact if there are inaccuracies in data in certain tables.

Figure 2. A screenshot of creating a custom property.

After providing a name and description for this property, we’ve configured some other options related to values of the property and search behavior. Suppose that the data owners in this organization are uniquely identified by email addresses, which is a simple string. So, we pick plain text as the data type.

In our use case, there can be multiple owners of a table, so we select Allow multiple values. We’ve associated this property with data entities and folders. Because we want to see the data owners right away in the search summary and be able to filter the search results based on data owner names, we selected the Show in search results and Allow filtering options.

For properties such as update frequency, where you may want to allow only a fixed set of values such as hourly, daily, weekly, and monthly, you can provide them using the Use list of values option.

  Figure 3. A screenshot of value options available for custom properties.

In this manner, you can create different custom properties that suit your organization’s requirements and benefit your data users. Now let’s see how we can put these created properties to use.

Populate values for custom properties

With the required custom properties created, the data provider can now go on to enrich the harvested metadata by populating the properties for different objects.

At this step value of the metadata is enhanced that will benefit the end-user. To populate values of every custom property associated with an object, search for the object that you want to enrich, go to its details page, and use Edit in the custom properties panel of the Summary tab. Let’s set values for the associated custom properties of the Revenue table.

Figure 4. A screenshot of setting values of custom properties.

As a data provider, we set values of all associated custom properties to describe this table. Properties such as business descriptions tend to be verbose texts, so they are of type rich text where we can highlight important parts of the description. Boolean properties show up as radio buttons. Values of properties with a fixed list of values, such as data quality warning in the Edit screen, show up as a drop-down menu of options.

A data provider can repeat this process for all the objects in the data catalog that have one or more custom properties associated with them. Once all of them are populated, the data user can search or browse for any object of interest and understand it better through these values of custom properties.

Search, filter, and understand better with custom properties

Custom properties have an impact on how the end user perceives the data and metadata. After the enrichment using custom properties, if the data user now searches for the same Revenue table, much richer information is available.

Figure 5. A screenshot of search results with custom properties.

As you can see, the search result now provides you with details like business description, update frequency, and data owners that were previously not available. You can filter the search results based on values of some custom properties. In the figure, we have filtered the results with data quality warning set to high. More information, such as update frequency, can help analysts determine how fresh the data is and how frequently they need to refresh their reports to ensure the use of latest data. Questions about the data can now be directed to the right people listed in the data owners field. So, the data user has already understood so much about the data from the search results.

If you look at the object details page, it shows all the associated properties, including any extra custom properties that aren’t configured to appear in the search result summary.

  Figure 6. A screenshot of data entity with custom properties.

If you compare this result with the first screenshot without any custom properties, you can see that all this extra information gives sufficient business context to the data users to know what the data is about and evaluate if it is suitable for their use. With such information easily available, they no longer need to spend valuable time looking for it elsewhere and can focus on their actual work instead.

Access control for custom properties

Custom properties affect the way data users perceive data. So, creating and populating custom properties should be restricted to only those users who are directly or indirectly responsible for the overall data catalog management. For example, only data catalog admins can create custom properties in the data catalog and the data owners or data providers can populate values for them. Data users can see the values of properties associated with data catalog objects only if they have access to the objects themselves. As the steps to create, populate, and use custom properties are performed by different personas, you can define suitable Identity and Access Management (IAM) policies for the data-catalog-namespaces resource to regulate access by each persona.

You can use the following examples of policies for the different personas:

  • For admin users who need to create the properties identified by data experts: allow group data-catalog-admins to manage data-catalog-namespaces in compartment myCompartment
  • For data providers and data experts who need to populate the value of customer properties associated with OCI Data Catalog objects: allow group data-catalog-data-providers to manage data-catalog-namespaces in compartment myCompartment

You can also define policies to allow privileges in specific catalog instances. To learn more about Data Catalog policies, see the documentation.

Conclusion

Custom properties allow data providers to effectively share the business context of the data. Data consumers can use this enrichment for searching, filtering, and understanding the data without guessing or asking data providers. Data consumers can more quickly and easily discover the most appropriate data through self-service, reducing their dependency on data providers or collective wisdom in the organization.

Still unsure what a catalog is used for? Check out What is a Data Catalog and What is Oracle Cloud Infrastructure Data Catalog?

Get started with Oracle Cloud Infrastructure Data Catalog today, and help your organization discover and use data in the way you’ve always wanted!

To learn more about the Custom Properties feature in Oracle Cloud Infrastructure Data Catalog, see the following resources:

Get started with Oracle Cloud Infrastructure Data Catalog today and help your organization discover and use data in the way you’ve always wanted!

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.