Best practices, news, tips and tricks - learn about Oracle's R Technologies for Oracle Database and Big Data

  • News
    December 12, 2017

Announcing the release of Oracle R Advanced Analytics for Hadoop 2.7.1

Mark Hornick
Senior Director, Data Science and Machine Learning

We are pleased to announce the general availability of Oracle R Advanced Analytics for Hadoop (ORAAH) 2.7.1, a component of the Oracle Big Data Connectors, which enables big data analytics from R. With ORAAH, Data Scientists and Data Analysts have access to the rich and productive R language for accessing and manipulating data resident across multiple platforms, including HDFS, Hive, Oracle Database, and local files. By leveraging the parallel and distributed Hadoop and Spark computational infrastructure, users can take advantage of scalability and performance when analyzing big data.

ORAAH 2.7.1 provides several important advantages for big data analytics:

  • A general computation framework where users invoke parallel, distributed MapReduce jobs from R, writing custom mappers and reducers in R while also leveraging open source CRAN packages. Support for binary RData representation of input data enables R-based MapReduce jobs to match the I/O performance of pure Java-based MapReduce programs.
  • Parallel and distributed machine learning algorithms take advantage of all the nodes of your Hadoop cluster for scalable, high performance modeling on big data. Algorithms include linear regression, generalized linear models, neural networks, low rank matrix factorization, non-negative matrix factorization, k- means clustering, principal components analysis, and multivariate analysis. Functions use the expressive R formula object optimized for Spark parallel execution.
  • R functions wrap Apache Spark MLlib algorithms within the ORAAH framework using the R formula specification and Distributed Model Matrix data structure. ORAAH's MLlib R functions can be executed either on a Hadoop cluster using YARN to dynamically form a Spark cluster, or on a dedicated standalone Spark cluster.
  • Spark execution can be switched on or off

ORAAH’s architecture and approach to big data analytics leverages the cluster compute infrastructure for parallel, distributed computation, while shielding the R user from Hadoop’s complexity using a small number of easy-to-use functions.

What's new in ORAAH 2.7.1

  • Compatible with Oracle R Distribution 3.3.0 and Oracle R Enterprise 1.5.1
  • Support for Cloudera Distribution of Hadoop (CDH) release 5.12.0, both “classic” MR1 and YARN MR2 APIs
  • Extended support for Apache Spark: execute select predictive analytic functions on a Hadoop cluster using YARN - dynamically forming a Spark cluster on a dedicated standalone Spark cluster; switch Spark execution on or off using new spark.connect() and spark.disconnect() functions
  • Support for the new OAAgraph package, which provides a tight integration with Oracle’s Parallel Graph AnalytiX (PGX) engine from the Oracle Big Data Spatial and Graph option
  • Function hdfs.write() adds support in writing Spark DataFrame objects, this is in addition to the Distributed Model Matrix (DMM) that can be saved in Comma-Separated Value (CSV) format on HDFS
  • Connect to multi-tenant container databases (CDB) of Oracle Database using a new parameter pdb to specify the service name of the pluggable database (PDB) to which a connection has to be established
  • Improved performance of ore.create() for Hive and the ability to create tables in a Hive database other than the one they are connected to
  • A new parameter append for ore.create enables appending an ore.frame or data.frame to an existing Hive table
  • Support for a new environment configuration variable ORCH_CLASSPATH, which sets the CLASSPATH used by ORAAH and resolves issue to support wildcards in path
  • New features added to distributed model matrix and distributed formula to improve performance and support in all ORAAH and Spark MLlib-based functionality
  • Upgrade to Intel® Math Kernel Library Version 2017 for Intel® 64 architecture
  • Improved installers and un-installers for both server and client: the installer checks your environment and runs validation checks to ensure prerequisites are met; un-installation has been improved to work with co-existing Oracle R Enterprise and 3rd party installed packages
  • Bug fixes and updates across the platform improve stability and ease-of-use. For details, see the Change List

See the ORAAH OTN page to download the software and access the latest documentation.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.