Oracle Cloud Infrastructure (OCI) Data Flow is a fully managed Big Data service that lets you run Apache Spark applications at any scale with no administration. Spark is the leading Big Data processing framework, and OCI Data Flow is the easiest way to run Spark in OCI because developers have nothing to install or manage.
Spark 3 support added to Data Flow
Spark 3 is a major milestone in the Big Data ecosystem that advances Spark’s dominance of the big data landscape with faster SQL queries, better ANSI SQL compatibility, and better interoperability with the Python ML ecosystem. Data Flow makes it easy to run new Spark 3 applications, upgrade existing applications, or use a mixture of Spark 2 and Spark 3.
Using Spark 3 with a new application is as simple as selecting Spark 3 instead of Spark 2.
The underlying Scala version is also mentioned. If you use Java or Scala, you need to compile your application with the correct Scala version. For tips on selecting the right artifacts when compiling, check out the Spark Quick Start page.
Upgrading existing applications to Spark 3
Spark applications need extensive testing before a version upgrade. After the upgrade, you can find out that something went wrong and you need to roll back. Data Flow makes both these actions simple with its serverless execution model, where each run is done on a dedicated, isolated environment. This isolation lets you test the new version while continuing to run the old version and easily switch back and forth.
You can upgrade a Python or SQL application in place by editing the application and changing the version.
If your application is based on Java or Scala, you need to recompile it using Scala 2.12 to be compatible with Spark 3. Upload the new version and create an application for the newly compiled version. You can then run both versions in parallel and shut the old version down when you’re ready to switch.
The biggest benefit of running Spark in a serverless manner is that each run is completely independent and isolated. This capability lets you test upgrading to Spark 3 while continuing to run Spark 2 jobs without changes. When you’re ready to make the switch, simply upgrade your applications. If you need to, downgrading is just as easy.
Try it yourself
Spark 3.0.2 is available now in all commercial regions for no charge. With Data Flow, you only pay for the infrastructure resources that your Spark jobs use while Spark applications are running. To learn more about using and upgrading to Spark 3, see the Data Flow documentation.
To get started today, sign up for the Oracle Cloud Free Trial or sign in to your account to try OCI Data Flow. Try Data Flow’s 15-minute no-installation-required tutorial or our in-depth Introductory LiveLab to see just how easy Spark processing can be with Oracle Cloud Infrastructure.
