Extracting REST Data using Oracle Cloud Infrastructure Data Integration (OCI-DI)

October 12, 2022 | 5 minute read
GuruDixit Chepuri
Principal Cloud Architect
Text Size 100%:

Thanks to 

Hello Everyone,We hope that you're enjoying the introductory blogs about Oracle Cloud Infrastructure (OCI) Data Integration

A common ask from customers when integrating data in the cloud using REST Url's. The New Oracle Cloud Infrastructure Data Integration release adds support for the REST Service.
In this blog post, we’ll cover how to extract data using the Generic REST Adaptor.

Special Thanks to David Allan

Cloud native, serverless data integration

As a refresher, OCI Data Integration is a cloud native, fully managed serverless extract, transform, and load (ETL) service on OCI. Organizations building lakehouses for analytics and data science, AI and ML on OCI with Object Storage, and autonomous data warehouses can quickly deliver insights by simplifying, automating, and accelerating the consolidation of data from multiple data silos.

Data Integration provides a graphical, no-code design interface, with interactive data preparation and profiling. It also helps data engineers to design data pipelines using patterns and rules to handle schema evolution. It supports both Spark ETL and ELT push-down execution to the database. If you’re not familiar with this new service, check out this blog to find out more: What is Oracle Cloud Infrastructure Data Integration?

Data Integration is available in all OCI commercial regions.

Prerequisites

1) Define Data Assets for source and target in the workspace.
2) Workspace, Projects, Integration Task and Applications have been created. To know more, refer to our blogs on Data Integration and Oracle documentation.

Introduction

REST APIs provide a flexible, lightweight way to integrate applications, and have emerged as the most common method for connecting components in microservices architectures. An API, or application programming interface, is a set of rules that define how applications or devices can connect to and communicate with each other. A REST API is an API that conforms to the design principles of the REST, or representational state transfer architectural style. For this reason, REST APIs are sometimes referred to RESTful APIs.

Features of REST Adaptor

  1. REST as a source will enable users to ingest data from REST Endpoint.
  2. User can provide a base URL for the REST service.
  3. Upload a Manifest file or set the Manifest file URL.
  4. REST as source Supports: No Authentication and Basic Authentication
  5. Users can connect to public or private REST service for ingestion.
  6. Users can borrow resources and operations.

Implementation

The following canvas screenshot shows the flow of creating REST Data asset and transforming data into AVRO/JSON/PARQUET formats in OCI Data Integration.

Sign in to OCI Console and click on the hamburger menu on your top left corner.
Navigate as Analytics&AI --> DataLake --> DataIntegration

Navigation

 

Create a new Data Asset for REST
From the Workspace's Home page, click the Open tab (plus icon) and then select Data Assets.
On the Data Assets page, select a data asset type as Generic REST and provide the baseURL and Manifest file.
After you complete all required fields successfully, click Test Connection to ensure a connection can be established.
Click Create. 

BaseURL 
ManifestFile Attached 

DataAsset

TestConnection

Create the data flow and select and drag the Source and Target operators in the canvas. Use the REST data asset . FYI,I'm using Object Storage Data Asset as Target.

DataAsset

Set the properties of the REST Data Asset.
For the selected options Resource and Opeartion. In the Advanced tab, i selected IN value as Available (Vary based on usecase) 

Advanced

After validating the Dataflow, Create an Integration Task --> Publish the task to an Application.

Publish

Verify the Patch and Run the published Task

Execution

As my Target Data Asset is Object Storage, the executed the job multiple times to create AVRO/JSON/PARQUET file formats from the Source REST

Outcome

Want to know more?

Organizations are embarking on their next-generation analytics journey with data lakehouses and advanced analytics with artificial intelligence and machine learning in the cloud. For this journey to succeed, they need to ingest, prepare, transform, and load their data with Oracle Cloud Infrastructure Data Integration quickly and easily. Try it out today!

For more information, review the Oracle Cloud Infrastructure Data Integration documentation, associated tutorials, and the Oracle Cloud Infrastructure Data Integration blogs.

GuruDixit Chepuri

Principal Cloud Architect

Guru Dixit specializes in Oracle and Azure Cloud technologies.
He excels in designing IaaS and PaaS solutions and delivering Data Integration and Analytics solutions utilizing Data Warehouse, Oracle Integration, Oracle Modern Data Platform, Data Lakehouse, and Multi-Cloud Data Platforms. 
With deep knowledge of Oracle Modern Data Platform and cloud-native architectures, Guru helps organizations adopt multi-cloud strategies for agility and growth.

Show more
Oracle Chatbot
Disconnected