X

Learn about data lakes, machine learning & more innovations

Harvest Metadata from On-Premise and Cloud Sources with a Data Catalog

Rashmi Badan
Product Manager

We’re announcing a new capability in Oracle Cloud Infrastructure Data Catalog that expands its ability to harvest metadata from data systems in private networks and on-premise.


In today’s world, the proliferation of data makes the task of searching for relevant data a challenge for data analysts and data scientists. With data available not only in large volumes and different formats but also distributed in different locations, having a single window to explore all of it is of critical importance for their work. Only when they know where the data lives and what it represents will it eliminate the problem of finding the data that suits their needs best.


Oracle Cloud Infrastructure Data Catalog is a fully managed metadata management service that provides a central metadata repository for your distributed data systems on Oracle Cloud and on-premise. The latest private endpoint capability allows it access to a whole new set of systems that were previously inaccessible, thus expanding your reach for data that can be located anywhere within the Oracle Cloud Infrastructure ecosystem or on-premise, in a private or public network and whether the data is structured or semi-structured. Consequently, the data users will now have a more expansive set of data to work with and add data-driven value to their organizations.

Use a Data Catalog to Harvest Metadata from Private Data Sources


Many organizations may store their data on-premise, either due to privacy regulations, already-existing infrastructure they have invested in, or not having their region covered by a cloud service. Some organizations use the power of Oracle’s Autonomous Database to secure their data in their own data centers or in a shared Exadata Infrastructure. They may also have the traditional Oracle data systems on-premise. Not all of these data systems may be accessible at a public endpoint due to the organization's policies. Others may choose Oracle Cloud Infrastructure to host their data on Autonomous Databases (ATP/ADW) within private subnets in the Virtual Cloud Networks.

Oracle Cloud Infrastructure Data Catalog now ensures that you can harvest metadata from across your Oracle Cloud Infrastructure ecosystem and also Oracle data systems on-premise whether it is available publicly or privately. With the latest version of Oracle Cloud Infrastructure Data Catalog, you will now be able to securely access all of these data systems. This is achieved through Oracle Cloud Infrastructure Data Catalog’s latest feature to support private endpoints. By configuring your data catalog to access a private network, you can:

  • Harvest Oracle Cloud Infrastructure data sources that are only accessible privately.
  • Harvest on-premise data sources that are connected to an Oracle Cloud Infrastructure Virtual Cloud Network (VCN) using VPN Connect or FastConnect

Private Endpoints in Oracle Cloud Infrastructure Data Catalog 

Private endpoints in Oracle Cloud Infrastructure Data Catalog provide the service with secure access to the private data system which means the traffic will not go over the internet. The private endpoint is created in a subnet of your choice within the VCN and managed by Oracle Cloud Infrastructure Data Catalog. It is via the private endpoint that the Oracle Cloud Infrastructure Data Catalog will be able to set up connections to the data systems within private networks or on-premise.

Private Endpoints for a data catalog on-premises
Some prerequisites to keep in mind before creating private endpoints in Oracle Cloud Infrastructure Data Catalog -

  • Set up appropriate ingress and egress security rules for the subnet where the private endpoint will be created.
  • Set up FastConnect or VPN Connect to connect any on-premise data system to Oracle Cloud Infrastructure.
  • Set up a DRG in case you plan to access on-premise data systems.
  • Set up the permissions and policies required for Oracle Cloud Infrastructure Data Catalog to create and manage the private endpoints.

How to Set Up Private Endpoints with a Data Catalog

Once the above pre-requisites are completed you can set up and use the private endpoint with three simple steps in Oracle Cloud Infrastructure Data Catalog.

  1. Create a private endpoint
  2. Attach the private endpoint to a catalog instance
  3. Use the private endpoint to create a data asset for harvesting

When you create a private endpoint from Oracle Cloud Infrastructure Data Catalog you enable the service to access only those data systems located within the specified set of DNS zones.

Set up a private endpoint for a data catalog

When you attach a private endpoint to a catalog instance, then the private endpoint is available to be used by that instance. Note that you can attach only one private endpoint to a catalog instance.

Setting up a private endpoint for a data catalog

You can use the private endpoint while creating data assets for your privately accessible data systems. Ensure to specify the Fully Qualified Domain Name of the data system when creating the data asset. Also, be sure to select the ‘Use Private Endpoint’ checkbox in this step.

Privately accessible data systems

APIs are also available for these operations. Once the above steps are performed, the rest of the actions are the same as for any other data asset i.e create connections, schedule harvest jobs and go on to explore the data and enrich the harvested metadata.

Conclusion

Oracle Cloud Infrastructure Data Catalog gives you the means to search and discover all of your data, enrich, curate and classify the metadata in order for you to derive effective value from the data. Now with private endpoints you can do the same for your private data sources in Oracle Cloud Infrastructure and on-premise.


Still unsure as to what a catalog is used for? Check out What is a Data Catalog, and What is Oracle Cloud Infrastructure Data Catalog?


To learn more about the private endpoint feature in Oracle Cloud Infrastructure Data Catalog, please view the following:

Get started with Oracle Cloud Infrastructure Data Catalog today and help your organization discover and use data in the way you’ve always wanted!

By Rashmi Badan

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.