Friday Oct 24, 2014

Addressing Analytic Pain Points

If you’re an enterprise data scientist, data analyst, or statistician, and perform analytics using R or another third party analytics engine, you’ve likely encountered one or more of these pain points:

Pain Point #1: “It takes too long to get my data or to get the ‘right’ data”
Pain Point #2: “I can’t analyze or mine all of my data – it has to be sampled”
Pain Point #3: “Putting R (or other) models and results into production is ad hoc and complex”
Pain Point #4: “Recoding R (or other) models into SQL, C, or Java takes time and is error prone”
Pain Point #5: “Our company is concerned about data security, backup and recovery”
Pain Point #6: “We need to build 10s of thousands of models fast to meet business objectives”

Some pain points are related to the scale of data, yet others are felt regardless of data size. In this blog series, I’ll explore each of these pain points, how they affect analytics users and their organizations, and how Oracle Advanced Analytics addresses them.

Thursday Mar 20, 2014

ROracle 1-1.11 released - binaries for Windows and other platforms available on OTN

We are pleased to announce the latest update of the open source ROracle package, version 1-1.11, with enhancements and bug fixes. ROracle provides high performance and scalable interaction from R with Oracle Database. In addition to availability on CRAN, ROracle binaries specific to Windows and other platforms can be downloaded from the Oracle Technology Network. Users of ROracle, please take our brief survey. We want to hear from you!

Latest enhancements in version 1-1.11 of ROracle:

• Performance enhancements for RAW data types and large result sets
• Ability to cache the result set in memory to reduce memory consumption on successive reads
• Added session mode to connect as SYSDBA or using external authentication
• bug 17383542: Enhanced dbWritetable() & dbRemoveTable() to work on global schema

Users of ROracle are quite pleased with the performance and functionality:

"In my position as a quantitative researcher, I regularly analyze database data up to a gigabyte in size on client-side R engines. I switched to ROracle from RJDBC because the performance of ROracle is vastly superior, especially when writing large tables. I've also come to depend on ROracle for transactional support, pulling data to my R client, and general scalability. I have been very satisfied with the support from Oracle -- their response has been prompt, friendly and knowledgeable."

           -- Antonio Daggett, Quantitative Researcher in Finance Industry

"Having used ROracle for over a year now with our Oracle Database data, I've come to rely on ROracle for high performance read/write of large data sets (greater than 100 GB), and SQL execution with transactional support for building predictive models in R. We tried RODBC but found ROracle to be faster, much more stable, and scalable."

           -- Dr. Robert Musk, Senior Forest Biometrician, Forestry Tasmania

See the ROracle NEWS for the complete list of updates.

We encourage ROracle users to post questions and provide feedback on the Oracle R Technology Forum.

In addition to being a high performance database interface to Oracle Database from R for general use, ROracle supports database access for Oracle R Enterprise.

Wednesday Jun 12, 2013

R to Oracle Database Connectivity: Use ROracle for both Performance and Scalability

R users have a few choices of how to connect to their Oracle Database. The most commonly seen include: RODBC, RJDBC, and ROracle. However, these three packages have significantly different performance and scalability characteristics which can greatly impact your application development. In this blog, we’ll discuss these options and highlight performance benchmark results on a wide range of data sets.

If you use ROracle, we'd like to hear about your experience. Please take this brief survey.

By way of introduction, RODBC is an R package that implements ODBC database connectivity. There are two groups of functions: the largely internal odbc* functions implement low-level access to the corresponding ODBC functions having a similar name, and the higher level sql* functions that support read, save, copy, and manipulation of data between R data.frame objects and database tables. Here is an example using RODBC:


con <- odbcConnect("DD1", uid="rquser", pwd="rquser", rows_at_time = 500)

sqlSave(con, test_table, "TEST_TABLE")

sqlQuery(con, "select count(*) from TEST_TABLE")

d <- sqlQuery(con, "select * from TEST_TABLE")


The R package RJDBC is an implementation of the R DBI package – database interface – that uses JDBC as the back-end connection to the database. Any database that supports a JDBC driver can be used in connection with RJDBC. Here is an example using RJDBC:


drv <- JDBC("oracle.jdbc.OracleDriver",

classPath="…tklocal/instantclient_11_2/ojdbc5.jar", " ")
con <- dbConnect(drv, "
jdbc:oracle:thin:@myHost:1521:db", "rquser", "rquser")

dbWriteTable(con, "TEST_TABLE", test_table)

dbGetQuery(con, "select count(*) from TEST_TABLE")

d <- dbReadTable(con, "TEST_TABLE")

The ROracle package is an implementation of the R DBI package that uses Oracle OCI for high performance and scalability with Oracle Databases. It requires Oracle Instant Client or Oracle Database Client to be installed on the client machine. Here is an example using ROracle:


drv <- dbDriver("Oracle")

con <- dbConnect(drv, "rquser", "rquser")

dbWriteTable(con,”TEST_TABLE”, test_table)

dbGetQuery(con, "select count(*) from TEST_TABLE")

d <- dbReadTable(con, "TEST_TABLE")


Notice that since both RJDBC and ROracle implement the DBI interface, their code is the same except for the driver and connection details.

To compare these interfaces, we prepared tests along several dimensions:

  • Number of rows – 1K, 10K, 100K, and 1M
  • Number of columns – 10, 100, 1000
  • Data types – NUMBER, BINARY_DOUBLE, TIMESTAMP, and VARCHAR; Numeric data is randomly generated, all character data is 10 characters long.
  • Interface: RODBC 1.3-6 (with Data Direct 7.0 driver), RJDBC 0.2-1 (with rJava 0.9-4 with increased memory limit in,
    and ROracle 1.1-10 (with Oracle Database Client
  • Types of operations: select *, create table, connect

Loading database data to an R data.frame

Where an in-database capability as provided by Oracle R Enterprise is not available, typical usage is to pull data to the R client for subsequent processing. In Figure 1, we compare the execution time to pull 10, 100, and 1000 columns of data from 1K, 10, 100K, and 1M rows for BINARY_DOUBLE data on a log-log scale. Notice that RJDBC does not scale to 100 columns x 1M rows, or above 1000 cols x 100K records. While RODBC and ROracle both scale to these volumes, ROracle is consistently faster than RODBC: up to 2.5X faster. For RJDBC, ROracle is up to 79X faster.

Figure 1: Comparison of RJDBC, RODBC, and ROracle for BINARY_DOUBLE for Select *

In Figure 2, we provide the range of results for RODBC, ROracle, and RJDBC across all data types. Notice that only ROracle provides the full range of scalability while providing superior performance in general.

ROracle is virtually always faster than RODBC: NUMBER data up to 2.5X faster, VARCHAR2 data up to 142X faster, and time stamp data up to 214X faster. RODBC fails to process 1000 columns at 1M rows.

For RJDBC, ROracle is up to 13X faster on NUMBER data, 79X faster on binary double data, 3X for VARCHAR2 data (excluding the 25X over the smallest data set). Note that RODBC and RJDBC have a limit of 255 characters on the length the VARCHAR2 columns. TIMESTAMP data is the one area where RJDBC initially shines, but then fails to scale to larger data sets.

Figure 2: Comparing the three interfaces for select * from <table>

Data set sizes represented in megabytes are captured in Table 1 for all data types. With only minor variation, the data sizes are the same across data types.

Table 1: Dataset sizes in megabytes

Creating database tables from an R data.frame

Data or results created in R may need to be written to a database table. In Figure 3, we compare the execution time to create tables with 10, 100, and 1000 columns of data with 1K, 10, 100K, and 1M rows for BINARY_DOUBLE. Notice that in all three cases, RJDBC is slowest and does not scale. RJDBC does not support the NUMBER or BINARY_DOUBLE data types, but uses FLOAT(126) instead. ROracle scaled across the remaining data types, while RODBC and RJDBC were not tested.

ROracle is 61faster than RODBC for 10 columns x 10K rows, with a median of 5X faster across all data sets. ROracle is 630X faster on 10 columns x 10K rows, with a median of 135X faster across all data sets. RJDBC did not scale to the 1M row data sets.

Figure 3: Comparison of RJDBC, RODBC, and ROracle for BINARY_DOUBLE create table

Connecting to Oracle Database

Depending on the application any sub-second response time may be sufficient. However, as depicted in Figure 4, ROracle introduces minimal time to establish a database connection. ROracle is nearly 10X faster than RJDBC and 1.6X faster than RODBC.

Figure 4: Database connection times for ROracle, RODBC, and RJDBC

In summary, for maximal performance and scalability, ROracle can support a wide range of application needs. RJDBC has significant limitations in both performance and scalability. RODBC can be more difficult to configure on various platforms and while it largely scales to the datasets tested here, its performance lags behind ROracle.

If you use ROracle, we'd like to hear about your experience. Please take this brief survey.

All tests were performed on a 16 processor machine with 4 core Intel Xeon E5540 CPUs @ 2.53 GHz and 74 GB RAM. Oracle Database was version For JDBC, the following was modified before installing rJava.

rJava_0.9-4.tar.gz\rJava_0.9-4.tar\rJava\jri\bootstrap\ was modified to use 2GB :

try {

System.out.println(jl.toString()+" -cp "+System.getProperty("java.class.path")+" -Xmx2g -Dstage=2 Boot");

Process p = Runtime.getRuntime().exec(new String[] {

jl.toString(), "-cp", System.getProperty("java.class.path"),"-Xmx2g", "-Dstage=2", "Boot" });

System.out.println("Started stage 2 ("+p+"), waiting for it to finish...");


} catch (Exception re) {}

Wednesday Apr 04, 2012

New Release of ROracle posted to CRAN

Oracle recently updated ROracle to version 1.1-2 on CRAN with enhancements and bug fixes. The major enhancements include the introduction of support for Oracle Wallet Manager and datetime and interval types. 

Oracle Wallet support in ROracle allows users to manage public key security from the client R session. Oracle Wallet allows passwords to be stored and read by Oracle Database, allowing safe storage of database login credentials. In addition, we added support for datetime and interval types when selecting data, which expands ROracle's support for date data. 

See the ROracle NEWS for the complete list of updates.

We encourage ROracle users to post questions and provide feedback on the Oracle R Forum.

In addition to being a high performance database interface to Oracle Database from R for general use, ROracle supports database access for Oracle R Enterprise.

Wednesday Feb 29, 2012

ROracle 1.1-1 Delivers Performance Improvements

The Oracle R Advanced Analytics team is happy to announce the release of the ROracle 1.1-1 package on the Comprehensive R Archive Network (CRAN).  We’ve rebuilt ROracle from the ground up, working hard to fix bugs and add optimizations. The new version introduces key improvements for interfacing with Oracle Database from open-source R.

Specific improvements in ROracle 1.1-1 include:

  • Lossless database reads: consistent accuracy for database read results
  • Performance improvements: faster database read and write operations
  • Simplified installation:  Oracle Instant Client now bundled on Windows, Instant or full Oracle Client on Linux

ROracle uses the Oracle Call Interface (OCI) libraries to handle the database connections, providing a high-performance, native C-language interface to the Oracle Database.  ROracle 1.1-1 is supported for R versions 2.12 through 2.13, and with Oracle Instant Client and Oracle Database Client versions 10.2 through 11.2, both 32 and 64-bit running Linux and Windows.

We think ROracle 1.1-1 is a great step forward, allowing users to build high performance and efficient R applications using Oracle Database. Whether you are upgrading your existing interface or using it for the first time,  ROracle 1.1-1 is ready for download..  If you have any questions or comments please post on the Oracle R discussion forum. We'd love to hear from you!

Monday Feb 20, 2012

Announcing Oracle R Distribution

Oracle has released the Oracle R Distribution, an Oracle-supported distribution of open source R. This is provided as a free download from Oracle. Support for Oracle R Distribution is provided to customers of the Oracle Advanced Analytics option and Oracle Big Data Appliance. The Oracle R Distribution facilitates enterprise acceptance of R, since the lack of a major corporate sponsor has made some companies concerned about fully adopting R. With the Oracle R Distribution, Oracle plans to contribute bug fixes and relevant enhancements to open source R.

Oracle has already taken responsibility for and contributed modifications to ROracle - an Oracle database interface (DBI) driver for R based on OCI. As ROracle is LGPL and used for Oracle Database connectivity from R, we are committed to ensuring this is the best package for Oracle connectivity.

The place for best practices, tips, and tricks for applying Oracle R Enterprise, Oracle R Distribution, ROracle, and Oracle R Advanced Analytics for Hadoop in both traditional and Big Data environments.


« March 2015