X

Qubole and Oracle: Big Data in the Cloud with Better Performance Than On-Premises

Xing Quan
Senior Director of Product Management, Qubole

We are excited to be working with Oracle as the premier Big Data partner for the Oracle Cloud Platform. Qubole is the fastest growing Big Data as a service platform, with our flagship product Qubole Data Service (QDS) providing self-service access to Spark, Hadoop, Hive, and many other open source analytics tools to enterprises across the world. We believe Big Data belongs in the cloud, and we’re excited about bringing QDS to the Oracle Cloud Platform.

Over the past few weeks, we have been working closely with the Oracle Bare Metal Cloud Service team to prototype our offering on their new service. Oracle Cloud Bare Metal Cloud Service has all the traditional values of the public cloud, such as fast deployment times, the scale and elasticity to run petabyte+ workloads, and flexibility with pay-as-you-go pricing.

In our discussions with the Oracle team, it quickly became clear that performance would be a differentiator as well. The bare metal compute instances run on dedicated hardware and avoid the overhead in performance and latency added by a layer of virtualization. Oracle Bare Metal Cloud Service also has compute shapes that utilize NVMe (Non-Volatile Memory Express) SSDs (solid-state drives), which make much better use of SSDs by parallelizing queues for disk access.

As you’ll see in our results below, we’ve found that Oracle Bare Metal Cloud Service performs up to 115% faster for some sets of query workloads.

Spark SQL benchmark

We decided to benchmark the performance of Spark SQL using the TPC-DS data and query set. Spark SQL is just one of the many tools that QDS offers, including Hive, Presto, MapReduce, and the rest of the Spark ecosystem (Scala, PySpark, MLlib, and Spark Streaming). In the future, we’ll look to benchmark some of these other products on the Oracle Bare Metal Cloud Service as well.

Here is the setup for our benchmark runs:

  • QDS Spark SQL running from Apache Spark 1.6.1
  • 15 TB scale-factor data set for TPC-DS
  • 7-node cluster
  • Oracle Bare Metal Cloud compute instances with shape BM.HighIO1.36
  • 36 compute cores per node, running Intel Xeon X5-2600 v3
  • 512 GB of memory per node
  • 12.8 TB of total disk capacity, NVMe SSD, per node

Up to 115% faster results than on-premises

We compared our results to a recent published benchmark conducted by Cloudera, which used the same TPC-DS dataset and a similar hardware profile. The big differences are that the Cloudera benchmark uses base Apache Spark and on-premises hardware that has more than twice as much memory and disk capacity. The table below summarizes the differences in the compute clusters used.

  QDS on Oracle Bare Metal Cloud Service Cloudera w/On-prem Hardware
Compute Cores

252 total CPU cores

(Intel Xeon X5-2600 v3)

252 total CPU cores

(Intel Xeon CPU E5-2630L)

Memory Capacity 3,584 GB 8,064 GB
Disk Capacity 90 TB 229 TB

 

We chose queries from the TPC-DS set that are representative of Spark SQL workloads. These queries included more complexity in terms of joining tables together, more filters, and more aggregation. The detailed list of queries is numbers 7, 34, 43, 46, 53, 59, 79, and 89 from this repo of source code TPC-DS queries. These queries coincide with the Analytics and Reporting queries in the Cloudera benchmark.

Our results are summarized in the chart below. QDS on Oracle Bare Metal Cloud Service ran 50% faster for the Analytics queries. On the Reporting queries, we were even better, running 115% faster.

TPC-DS_Spark_SQL_on_Oracle_Bare_Metal_Cloud_Service.png

To summarize, we were able to achieve up to 115% faster query performance, with an average of 66% speedup, compared to an on-premises setup. This is all while using a much smaller hardware profile (less than half the memory and disk capacity) compared to the on-premises setup.

With QDS and the Oracle Bare Metal Cloud Service, enterprises will get the agility to start experimenting quickly, the scale to run massive workloads, and the elasticity and flexibility to pay only for what you use. We’re happy to see that performance is also better in the cloud, and we’ll keep working to improve it even further. With QDS and the Oracle Bare Metal Cloud Service, we can help enterprises achieve the best of the cloud and on-prem.

Thanks to Mayank Ahuja, Shridhar Ramachandran, and Harsh Shah for their efforts in leading this technical benchmark study.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
Oracle

Integrated Cloud Applications & Platform Services