We’re excited to announce the release of the Oracle Cloud Infrastructure (OCI) Data Flow plugin for OCI Code Editor. This new plugin offers a streamlined way to develop and deploy serverless OCI Data Flow Spark application on OCI.
OCI Oracle Data Flow is a powerful managed service that uses Apache Spark to process large datasets seamlessly without the need to deploy or manage any underlying infrastructure. The service also offers Spark Streaming, allowing developers to perform cloud-based extract, transform, load (ETL) operations on continuously produced streaming data. This integration gives you a faster application delivery process, so developers can concentrate their efforts on application development rather than dealing with infrastructure management.
OCI Code Editor provides a rich, in-Console editing environment that enables you to edit code and update service workflows and scripts without having to switch between the Console and your local development environment. You can now access the power of managed Apache Spark directly from OCI Code Editor, simplifying your cloud experience even further.
The OCI Data Flow plugin for OCI Code Editor offers a range of features and benefits that make it an essential tool for developers working with serverless Data Flow on OCI. The plugin includes the following benefits:
You want to build a data pipeline that loads data from a CSV file stored in an OCI Object Storage bucket, performs some transformations on the data using Spark, and then saves the results to an Oracle Autonomous Data Warehouse instance. Before this plugin, you took the following steps:
Now you can stay within the Code Editor and utilize the Spark API to implement the entire data pipeline and run it on a Spark cluster powered by Data Flow. This approach simplifies the overall development, testing, and deployment process, resulting in a more efficient and streamlined workflow.
Before you begin, ready the following prerequisites:
Now you’re ready to set up your data pipeline with the following steps.
When the code editor window launches, click the Oracle logo button, which shows the Data Flow plugin. To see all the compartments and the associated Data Flow Code Editor projects for your tenancy, expand the plugin.
We’re using a Python example called “csv_metastore_adw,” which has the sample code to read from CSV file, transforms the data, writes the metadata to Data Catalog, and saves the transformed data to an Autonomous Data Warehouse instance. After creating the project, name the project.
After you select the example, the plugin pulls the code into the editor. Next, inspect the code and make changes to capture the setup parameters per for enviornment. The changes to the input parameters, such as path for the CSV file and Autonomous Data Warehouse details, are described in the README file .
Run the code locally to verify that it works as expected. Clicking Run locally opens the screen “Run Application” to capture the application input parameters.
Enter the following input parameters to run the application locally:
Language: Python
File Name: csv_metastore_adw
Argument: --table <tableName>. Supply the Data Catalog metastore table.
Enable both the Spark Oracle data source property and Enable Spark Oracle metastore property. Enabling them requires you to give the compartment housing the metastore instance
Click Run. It submits the Spark job locally and shows the output in the log file, runlog.txt. If an error occurs, it’s logged to the terminal and the log file.
After verifying the output, deploy the code to the Object Storage bucket so that Data Flow can use it by clicking Upload artifact in the menu. While uploading, supply the path for the bucket. Select the language (Python), select the Python file, and enter the Object Storage file URL path (oci://<bucket_name>@<tenancyname>/folder_name>). Then press Enter.
You can now start using the application in the Data Flow service.
Revolutionize your Data Flow with Apache Spark app development and deployment with the OCI Data Flow plugin for OCI Code Editor. Say goodbye to the hassle of managing infrastructure and hello to a streamlined workflow that lets you create, test, and deploy your data pipeline, all in one powerful integrated environment. Unleash the power of the Oracle Cloud Infrastructure Data Flow plugin today, and see today for yourself how easy it can be.
For more information, see the following resources:
Data Flow (Documentation)
Code Editor (Documentation)
In Oracle since 2018, Julien is a subject matter expert as cloud and cybersecurity/CDN solutions architect, product director and successful global sales. He's a certified architect with OCI, AWS and Azure. Julien belongs to OCI Global Product Marketing and Enablement team. He's dedicated to Oracle Modern Data Platform unique positioning. Julien is based in Vancouver, Canada and was previously in Amsterdam and Singapore with Oracle.
Julien holds a MS of the Institut Polytechniques de Grenoble, an INSEAD MBA and speaks French, Spanish and English.
Previous Post
Next Post