Big Data service major release

June 17, 2022 | 5 minute read
Karan Singh
Director Product Management, Data and AI
Text Size 100%:

There are releases, and then there are major releases. Oracle. Big Data service just had a major release that brings several key features that our customers have been asking for. Big Data service version 3.0.7 (and later) with Oracle Distribution of Hadoop now offers the following new features.

Autoscaling

Big Data autoscaling enables you to create autoscaling rules to automatically scale a cluster horizontally or vertically using CPU utilization metric. With autoscaling, you automatically get more resources and performance when you need and automatically reduce your consumption to optimize for costs when you don’t need them. Autoscaling works with both standard and AMD Flex shapes.

With vertical autoscaling, you can change the shape of nodes in your cluster based on the rules specified.

A screenshot of the Create autoscale configuration window for vertical autoscaling.

With horizontal autoscaling, when thresholds are met, the number worker nodes in the cluster are automatically added (scaled out) or removed (scaled in), depending on the configured rules.

A screenshot of the Create autoscale configuration for horizontal autoscaling.

For more information, see Autoscaling a Cluster.

AMD Flex shapes

Big Data now supports VM.Standard.E4.Flex (AMD) shapes. When you create a Big Data cluster with this shape, you select the number of OCPUs and the amount of memory that you need to match your workload, enabling you to optimize performance and minimize cost.

A screenshot of the Create cluster window showing the options for virtual machine shapes.

Presto (Trino)

Trino is a distributed SQL query engine designed to query large data sets in Big Data Hadoop Distributed File System (HDFS) and in Oracle Cloud Infrastructure (OCI) Object Storage. Presto is now preconfigured in Big Data clusters and can be managed through Ambari.

A screenshot of the Trino service configuration page in Ambari.

For more information, see Managing Clusters.

JupyterHub

JupyterHub is a popular open source project that enables multiple users to work together by providing each with their own Jupyter notebook on shared resources managed together. Big Data now comes preconfigured with JupyterHub

A screenshot of the Cluster information tab in the Console with the Jupyter server URL outlined in red.

From the URL provided in the Oracle Cloud Console, you can access Jupyterhub to deploy a notebook server and launch page.

A screenshot of the Jupyter service launcher.

From there, you can deploy one of the multiple kernels available by default, such as Python, PySpark, Spark, and SparkR.

For more information, see Using Jupyterhub.

Bootstrap scripts

Bootstrap scripts enable easier configuration and automation with Big Data. You can run a bootstrap script on all the cluster nodes after a cluster is created, when the shape of a cluster changes, or when you add or remove nodes from a cluster. You can use this script to install, configure, and manage custom components in a cluster.

A screenshot of the Create cluster screen showing the bootstrap options.

Bootstrap scripts can also be updated after a cluster has been created.

A screenshot of the Cluster information tab with the menu for More Actions expanded and the button for Update bootstrap scripts outlined in red.

For more information, see Updating bootstrap script URL.

Patch management

The Big Data Console page has a new Updates link in the Resources column. Click this link to view the available updates and installed updates for the cluster. This link allows you to easily manage and apply Big Data patches and review the history of the patches that were previously applied.

A screenshot of the Big Data service page in the Console showing the Updates tab in the Resources menu.

Custom Kerberos realm names

You can now specify your own custom Kerberos realm name while creating a Big Data cluster.

A screenshot of the Create Cluster page with the Kerberos realm name field outlined in red.

Enterprise customers can now use their own trusted realms and more easily integrate Big Data with their preexisting Kerberos.

For more information, see Creating a Cluster.

Other features

With this release comes the general availability of the following features:

  • Add compute-only worker nodes to your cluster.

  • Delete any worker node from your cluster.

  • Hue comes preconfigured in the cluster.

  • Livy comes preconfigured in the cluster.

Conclusion

All the features in this Big Data release help enable enterprises of all sizes to use the power of managed open source software in Big Data more easily and efficiently to build their data lake, extract, transform, and load (ETL), querying, processing, and machine learning platforms. Future blogs cover some of these new features in more detail.

Karan Singh

Director Product Management, Data and AI

www.linkedin.com/in/karandeepsingh


Previous Post

How Oracle Cloud Infrastructure powers Oracle Help Center

Danny Yip | 6 min read

Next Post


Meeting Customer Requirements for On-Premises Public Cloud

Brian Huynh | 7 min read