Tectonic shifts in the technology space usually precipitate out of a necessity or a desire to change the status quo. Big data, as we know it today is one such tectonic shift that was a result of the necessity to keep up with the deluge of data - a result of automation of pretty much everything that touches our life today. Big data encompasses data sets that are large and complex such that traditional data processing cannot keep up (think petabytes and exabytes of data). It matters in the Oracle Bare Metal Cloud Service context because cloud has been the accelerator for big data adoption and making it accessible to all. Big data belongs in the cloud. The elasticity of the cloud makes the use of big data analysis practical and the scalability, durability and cost effectiveness of object storage makes it possible to store data at the ‘big data’ scale.
The performance characteristics, cost effectiveness, and processing prowess of the Bare Metal Cloud make it the platform of choice to run all your big data workloads.
The Mapreduce reference architecture was the harbinger of all things big data. It was the first step to re-imagining how data should be analyzed at cloud scale. Apache Hadoop brought this dynamic, distributed, and highly scalable framework to open source. With built-in fault tolerance and high scalability, this framework became the cornerstone of big data infrastructure and over a period of time it has spawned development of different interfaces such as Hive, Spark, Presto, and others in the community. Engineers at Facebook first democratized this framework with Hive and made big data available and accessible to data users beyond developers. Since then engines such as Spark have emerged to fill the needs of data scientists while solutions like Presto and Impala have become popular solutions for Analysts.
Today, I am very excited to let you know that in addition to supporting DIY big data workloads, Bare Metal Cloud Service now also supports a turnkey, cloud scale big data-as-a-Service solution. We have accomplished this by integrating with Qubole’s big data-as-a-service platform ‘Qubole Data Service’ (QDS). Qubole knows a thing or two about big data processing at scale. It was founded by the same engineers who introduced Hive to Facebook and the Hadoop ecosystem. They were the architects of the Facebook data infrastructure, which today supports 1.85 billion users. In its five years of existence, QDS has scaled to process almost 500 petabytes of data each month.
Oracle Bare Metal Cloud Services and Qubole together, offer a compelling big data solution for enterprises that wish to gain a competitive edge by generating business insights and making their data work for them.
DIY big data or a turnkey solution – your choice
We like to give our customers choice. You can take two routes to standing up your big data infrastructure on Oracle Bare Metal Cloud Services:
Seven great things about the Oracle and Qubole big data solution
Oracle and Qubole high level architecture
Qubole offers native integration with the Oracle IaaS platform. The solution consists of three components
- Bare Metal Cloud Services infrastructure: This is basic compute, storage, networking infrastructure layer, where your data will be hosted and where the Hadoop clusters will run. You own the resources in your BMCS tenancy, at all times but Qubole manages it on your behalf using an IAM user that you create, specifically for this purpose.
- Qubole Tier: This is where the QDS engine is hosted. It drives activities like provisioning, orchestration, management, auto scaling, data visualization etc
- User Access tier: The customer manages their big data infrastructure via the user access layer. Users have a choice to manage their big data infrastructure via the UI (using a browser based interface) or an SDK. ODBC drivers are supported as well.
Four steps to starting your Oracle IaaS big data solution with Qubole
Getting started is a quick and easy. You can have you big data infrastructure up and running in a matter of hours.
Step #1: Create a BMCS account here and setup your tenancy to host Qubole big data clusters
Once you establish access to your Bare Metal Cloud Services account, log into the service console and set up your tenancy for Qubole to initialize your big data infrastructure. You’ll need to do the following
Be sure that you are closely following the instructions posted in the Qubole ‘quick start guide’ for smooth sailing.
Once the setup is done, identify the bucket in the Bare Metal Cloud object storage that hosts your data, or create a bucket in the Bare Metal Cloud Service console and import your data into the bucket.
Step # 2: Create a Qubole - Oracle account
Click here to establish your Qubole-Oracle account. You will receive a confirmation email from Qubole that will provide the log in details as well as the link to the Qubole on BMCS service console.
Step # 3: Input your Bare Metal Cloud Tenancy details into the Qubole service console
The compute and storage settings can be same of different, depending on whether a single or multiple users control access to the storage and compute resources.
Step # 4: Start analyzing your data
You are all set, you can now start running queries to analyze your data, schedule jobs and visualize data to detect patterns.
- Create a cluster configuration
- Create the Hive metastore and submit queries, via the ‘QDS Analyze’. You can also choose to recurring jobs via the ‘QDS Scheduler’
- Monitor your jobs
Blazing fast Spark performance
Workloads just run better on Oracle Bare Metal Cloud Services and we have a benchmark to prove it. Qubole on BMCS runs Spark SQL 116% faster than comparable solution on-premises.
Sample Data Sets
Just in case you want to give our big data solution a test run and are looking for some sample data, here are some of the public data sets that you can download:
Give it a try!
In summary, the Bare Metal Cloud Services team is invested in offering customers choice. We built our infrastructure to handle the most demanding workloads. Infrastructure as Service primitives like compute (with blazing fast NVMe storage), block storage, object storage and networking were purposely architected to efficiently host big data workloads.
So whether you are gravitating towards standing up your own big data cluster on BMCS or leveraging Qubole as your turnkey big data solution of choice, we have you covered.
Go ahead, I encourage you to take big data out for a spin and see how it works for you. If you need help, feel free to reach out to us either by commenting below, on stack overflow, or reach out to your Oracle account manager so that they can get you in direct contact with us.
Rachna Thusoo
Director, Product Management
Oracle Bare Metal Cloud Team