Information, tips, tricks and sample code for Big Data Warehousing in an autonomous, cloud-driven world

Start Planning your Upgrade Strategy to Cloudera 6 on Oracle Big Data Now

Jean-Pierre Dijcks
Master Product Manager

Last week Cloudera announced the general availability of Cloudera CDH 6 (read more from Cloudera here). With that, many of the ecosystem components switched to a newer base version, which should provide significant benefits for customer applications. This post describes Oracle's strategy to support our customers in up-taking C6 quickly and efficiently, with minimal disruption to their infrastructure.

The Basics

One of the key differences with C6 are its core versions, which are summarized here for everyones benefit:

  • Apache Hadoop 3.0
  • Apache Hive 2.1
  • Apache Parquet 1.9
  • Apache Spark 2.2
  • Apache Solr 7.0
  • Apache Kafka 1.0
  • Apache Sentry 2.0
  • Cloudera Manager 6.0
  • Cloudera Navigator 6.0
  • and much more... for full details, always check the Cloudera download bundles or Oracle's documentation.

Now what that this all mean for Oracle's Big Data platform (cloud and on-premises) customers?

Upgrading the Platform

This is the part where running Big Data Cloud Service, Big Data Appliance and Big Data Cloud at Customer makes a big difference. As with minor updates, where we move the entire stack (OS, JDK, MySQL, Cloudera CDH and everything else), we will also do this for your CDH 5.x to CDH 6.x move. What to expect:

Target Version: CDH 6.0.1, which at the point of writing this post, has not been released

Target Dates: November 2018 with a dependency on the actual 6.0.1 release date

Automated Upgrade: Yes - as with minor releases, CDH and the entire stack (OS, MySQL, JDK) will be upgraded using the Mammoth Utility

As always, Oracle is building this all in house, and we will are testing the migration across a number scenarios for technical correctness. 

Application Impact

The first thing to start planning for is what a version uptick like this means for your applications. Will everything work nicely as before? Well, that is where the hard work comes in: testing the actual applications on a C6 version. In general, we would recommend to configure a small BDA/BDCS/BDCC cluster and load some data (also note the paragraph below on Erasure Coding in that respect) and then do the appropriate functional testing. Once that is all running satisfactorily and per your expectations, you would start to upgrade existing clusters.

What about Erasure Coding?

This is the big feature that will become available in the 6.1 timeframe. Just to be clear, Erasure Coding is not in the first versions supported by Cloudera. Therefore it will also not be supported on the Oracle platforms, which is based on 6.0.1 (note the 0 in the middle :-) ).

As usual, once the 6.1 is available, Oracle will offer that as a release to upgrade too, and we will at that time address the details around Erasure Coding, how to get there, and how to leverage this on the Oracle Big Data solutions.

To give everyone a quick 10,000 foot guideline, keep using regular block encoding (the current HDFS structure) for best performance, and use Erasure Coding for storage savings, while understanding that more network traffic can impact raw performance.

Do I have to Move?

No. You do not have to move to CDH 6, nor do you need to switch to Erasure Coding. We do expect one more 5.x release, most likely 5.16, and will release this on our platforms as well. That is of course a fully supported release. It is then - generally speaking - up to your timelines to move to the C6 platform.

As we move closer to the C6 release on BDA, BDCS and BDCC we will provide updates on specific versions to migrate from, dates and timelines etc. Should you have questions, contact us in the big data community.

The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.


Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.