Thursday Mar 27, 2014

Why choose Oracle for Advanced Analytics?

If you're an enterprise company, chances are you have your data in an Oracle database. You chose Oracle for it's global reputation at providing the best software products (and now engineered systems) to support your organization. Oracle database is known for stellar performance and scalability, and Oracle delivers world class support.

If your data is already in Oracle Database or moving in that direction, leverage the high performance computing environment of the database to analyze your data. Traditionally it was common practice to move data to separate analytic servers for the explicit purpose of model building. This is no longer necessary nor is it scalable as your organization seeks to deliver value from Big Data. Oracle database now has several state of the art algorithms that execute in a parallel and distributed architecture directly in-database and augmented by custom algorithms in the R statistical programming language. Leveraging Oracle database for Advanced Analytics has benefits including:

  • Eliminates data movement to analytic servers

  • Enables analysis of all data not just samples

  • Puts your database infrastructure to even greater use

  • Eliminates impedance mismatch in the form of model translation when operationalizing models

  • All aspects of modeling and deployment are optionally available via SQL making integration into other IT software

  • Leverage CRAN algorithms directly in the database

Customers such as Stubhub, dunnhumby, CERN OpenLab, Financiera Uno, Turkcell, and others leverage Oracle Advanced Analytics to scale their applications, simplify their analytics architecture, and reduce time to market of predictive models from weeks to hours or even minutes.

Oracle leverages its own advanced analytics products, for example, by using Oracle Advanced Analytics in a wide range of Oracle Applications and internal deployments, ranging from:

  • Human Capital Management with Predictive Workforce to produce employee turnover, performance prediction, and "what if" analysis

  • Customer Relationship Management with Sales Prediction Engine to predict sales opportunities, what to sell, how much, and when

  • Supply Chain Management with Spend Classification to flag non-compliance or anomalies in expense submissions

  • Retail Analytics with Oracle Retail Customer Analytics to perform shopping cart analysis and next best offers

  • Oracle Financial Services Analytic Applications to enable quantitative analysts in credit risk management divisions to author rules/models directly in R

Oracle wants you to be successful with advanced analytics. Working closely with customers to integrate Oracle Advanced Analytics as an integral process of their analytics strategy, customers are able to put their advanced analytics into production much faster.

Thursday Mar 20, 2014

ROracle 1-1.11 released - binaries for Windows and other platforms available on OTN

We are pleased to announce the latest update of the open source ROracle package, version 1-1.11, with enhancements and bug fixes. ROracle provides high performance and scalable interaction from R with Oracle Database. In addition to availability on CRAN, ROracle binaries specific to Windows and other platforms can be downloaded from the Oracle Technology Network. Users of ROracle, please take our brief survey. We want to hear from you!

Latest enhancements in version 1-1.11 of ROracle:

• Performance enhancements for RAW data types and large result sets
• Ability to cache the result set in memory to reduce memory consumption on successive reads
• Added session mode to connect as SYSDBA or using external authentication
• bug 17383542: Enhanced dbWritetable() & dbRemoveTable() to work on global schema

Users of ROracle are quite pleased with the performance and functionality:

"In my position as a quantitative researcher, I regularly analyze database data up to a gigabyte in size on client-side R engines. I switched to ROracle from RJDBC because the performance of ROracle is vastly superior, especially when writing large tables. I've also come to depend on ROracle for transactional support, pulling data to my R client, and general scalability. I have been very satisfied with the support from Oracle -- their response has been prompt, friendly and knowledgeable."

           -- Antonio Daggett, Quantitative Researcher in Finance Industry

"Having used ROracle for over a year now with our Oracle Database data, I've come to rely on ROracle for high performance read/write of large data sets (greater than 100 GB), and SQL execution with transactional support for building predictive models in R. We tried RODBC but found ROracle to be faster, much more stable, and scalable."

           -- Dr. Robert Musk, Senior Forest Biometrician, Forestry Tasmania

See the ROracle NEWS for the complete list of updates.

We encourage ROracle users to post questions and provide feedback on the Oracle R Technology Forum.

In addition to being a high performance database interface to Oracle Database from R for general use, ROracle supports database access for Oracle R Enterprise.

Monday Mar 17, 2014

Oracle R Enterprise Upgrade Steps

We've recently announced that Oracle R Enterprise 1.4 is available on all platforms. To upgrade Oracle R Enterprise to the latest version:

  1. *Install the version of R that is required for the new version of Oracle R Enterprise.
         See the Oracle R Enterprise supported platforms matrix for the latest requirements.
  2. Update Oracle R Enterprise Server on the database server by running the script
         and follow the prompts for the upgrade path.
  3. Update the Oracle R Enterprise Supporting packages on the database server.
  4. Update the Oracle R Enterprise Client and Supporting packages on the client.

For RAC/Exadata installations, upgrade items 1, 2 and 3 must be performed on all compute notes. 

*If you've changed the R installation directory between releases, manually update the location of the R_HOME directory in the Oracle R Enterprise configuration table.  The sys.rqconfigset procedure edits settings in a configuration table called sys.rq_config. Use of this function requires the sys privilege. You can view the contents of this table to verify various environment settings for Oracle R Enterprise. Among the settings stored in sys.rq_config is the R installation directory, or R_HOME. The following query shows sample values stored in sys.rq_config for a Linux server:

SQL> select * from sys.rq_config;
R_HOME       /usr/lib64/R
R_LIBS_USER  /u01/app/oracle/product/12.0.1/dbhome_1/R/library
VERSION      1.4

7 rows selected.

To point to the correct R_HOME:

SQL > sys.rqconfigset('R_HOME', '<path to current R installation directory>')

All Oracle R Enterprise downloads are available on the Oracle Technology Network. Refer to the instructions in section 8.3 of the Oracle R Enterprise Installations Guide for detailed steps on upgrading Oracle R Enterprise, and don't hesitate to post questions to the Oracle R forum.

Saturday Mar 15, 2014

Oracle R Enterprise 1.4 Released

We’re pleased to announce that Oracle R Enterprise (ORE) 1.4 is now available for download on all supported platforms. In addition to numerous bug fixes, ORE 1.4 introduces an enhanced high performance computing infrastructure, new and enhanced parallel distributed predictive algorithms for both scalability and performance, added support for production deployment, and compatibility with the latest R versions.  These updates enable IT administrators to easily migrate the ORE database schema to speed production deployment, and statisticians and analysts have access to a larger set of analytics techniques for more powerful predictive models.

Here are the highlights for the new and upgraded features in ORE 1.4:

Upgraded R version compatibility

ORE 1.4 is certified with R-3.0.1 - both open source R and Oracle R Distribution. See the server support matrix for the complete list of supported R versions. R-3.0.1 brings improved performance and big-vector support to R, and compatibility with more than 5000 community-contributed R packages.

High Performance Computing Enhancements

Ability to specify degree of parallelism (DOP) for parallel-enabled functions (ore.groupApply, ore.rowApply, and ore.indexApply)
An additional global option, ore.parallel, to set the number of parallel threads used in embedded R execution

Data Transformations and Analytics

ore.neural now provides a highly flexible network architecture with a wide range of activation functions, supporting 1000s of formula-derived columns, in addition to being a parallel and distributed implementation capable of supporting billion row data sets
ore.glm now also prevents selection of less optimal coefficient methods with parallel distributed in-database execution
Support for weights in regression models
New ore.esm enables time series analysis, supporting both simple and double exponential smoothing for scalable in-database execution
Execute standard R functions for Principal Component Analysis (princomp), ANOVA (anova), and factor analysis (factanal) on database data

Oracle Data Mining Model Algorithm Functions

Newly exposed in-database Oracle Data Mining algorithms:

ore.odmAssocRules function for building Oracle Data Mining association models using the apriori algorithm
ore.odmNMF function for building Oracle Data Mining feature extraction models using the Non-Negative Matrix Factorization (NMF) algorithm
ore.odmOC function for building Oracle Data Mining clustering models using the Orthogonal Partitioning Cluster (O-Cluster) algorithm

Production Deployment

New migration utility eases production deployment from development environments
"Snapshotting" of production environments for debugging in test systems

For a complete list of new features, see the Oracle R Enterprise User's Guide. To learn more about Oracle R Enterprise, check out the white paper entitled, "Bringing R to the Enterprise -  A Familiar R Environment with Enterprise-Caliber Performance, Scalability, and Security.", visit Oracle R Enterprise on Oracle's Technology Network, or review the variety of use cases on the Oracle R blog.

Monday Mar 10, 2014

Oracle R Distribution 3.0.1 Benchmarks

Oracle R Distribution, Oracle's distribution of Open Source R, improves performance by dynamically linking to optimized, multi-threaded BLAS libraries. Unlike open source R, Oracle R Distribution uses all available cores and processors when dynamically linked against optimized BLAS, resulting in increased performance. Thus, the more cores available to Oracle R Distribution, the higher performance for many operations.

How is this possible?  Standard R's internal BLAS library was created when multi-core machines were not widely used, so it is single-threaded, i.e., operates on a single core. However, the BLAS API in R allows linking to different, multi-threaded BLAS libraries that allow linear algebra computations to use all cores and therefore run much faster. Oracle R Distribution simplifies the linking process by loading the high performance math library after it's added to PATH or LD_LIBRARY_PATH, depending on the Operating System.  Then you are set to use optimized math libraries - the Intel Math Kernel Library (MKL)AMD Core Math Library (ACML), or Solaris Sun Performance Library on Solaris. 

The benchmarks in this section demonstrate the performance of Oracle R Distribution 3.0.1 with and without dynamically loaded MKL. The R-25 benchmark script developed by the R community consists of fifteen tests. They are split into three groups (matrix calculation, matrix functions and "programmation") with trimmed means for each group, and each test is run three times. For this comparison, we report the mean for the three test runs. The benchmarks show that using Oracle R Distribution with dynamically loaded MKL libraries on a 30-core machine is significantly faster than the single core time.

Oracle R Distribution 3.0.1 Benchmarks

This benchmark was executed on a 3-node cluster, with 24 cores at 3.07GHz per CPU and 47 GB RAM, using Linux 5.5.

In-Database Scalability and Parallelism with Oracle R Enterprise

Oracle R Enterprise, the set of big data analytics R packages provided by Oracle, provides scalable, parallel in-database data manipulation and algorithms for analyzing very large data sets. Oracle R Enterprise functions implement parallel, out-of-core algorithms that overcome R's limitations of being memory-bound and single-threaded by executing requested R calculations on data in Oracle Database, using the database itself as the computational engine. Oracle R Enterprise allows users to further leverage Oracle's engineered systems, like Exadata, Big Data Appliance, and Exalytics, for enterprise-wide analytics, as well as reporting tools like Oracle Business Intelligence Enterprise Edition dashboards and BI Publisher documents. The combination of Oracle Database and R delivers an enterprise-ready, integrated environment for advanced analytics.

At the time of this post, Oracle R Distribution 3.0.1 is certified with Oracle R Enterprise 1.4.  See the Install Guide's Oracle R Enterprise support matrix  for a list of current Oracle R Distribution supported configurations and platforms, and this link for instructions on enabling high performance library support for Oracle R Distribution on a Windows or Linux client.


The place for best practices, tips, and tricks for applying Oracle R Enterprise, Oracle R Distribution, ROracle, and Oracle R Advanced Analytics for Hadoop in both traditional and Big Data environments.


« March 2014 »