By Sherry LaMonica on Feb 14, 2012
Following last week's press release, we wanted to post a series of demonstrations using Oracle R Enterprise. Stay tuned to learn more about Oracle R Enterprise (ORE),
a component in the Oracle Advanced Analytics Option of Oracle Database Enterprise Edition.
The R programming language and environment was originally designed to hold data in memory, providing fast and efficient calculations by not
requiring the user's program to access information stored on the hard drive. Modern data set size has surpassed the rate which
RAM has increased. Consequently, R users will often encounter errors similar to the following:
"cannot allocate vector of length xxx"
This error occurs because R requires the operating system to provide a block of memory large enough to contain the contents of the data file, and the operating system responds that not enough memory is available. The maximum amount of memory that can be accessed by 32-bit R is 3GB. On 64-bit versions of R, larger objects may be created - theoretically up to 8TB. However, the Operating System imposes limitations on the resources available to a single process, and using such large objects may be unacceptably slow.
R Programmers with big data sets work around memory limitations in a variety of ways. Some opt to analyze data samples, and some divide the data into manageable batches, run jobs sequentially on a single processor, and then combine the results. This is both costly and time-consuming. For R users who like the flexibility of the R language and the support of the R community, the option to analyze and model large data sets in R is an exciting enhancement.
The Oracle R Enterprise framework allows R users to operate on tables and views directly from R in Oracle Database. Instead of loading large data files into memory, the R engine processing is moved to the database, requiring minimal resources on the user's system, regardless of the size of the data.
In this introductory series, we'll cover everything you need to know to get started with Oracle R Enterprise, including:
Part 1: The ORE transparency layer - a collection of R packages with functions to connect to Oracle Database and use R functionality in Oracle Database. This enables R users to work with data too large to fit into the memory of a user's desktop system, and leverage the scalable Oracle Database as a computational engine.
Part 2. The ORE statistics engine - a collection of statistical functions and procedures corresponding to commonly-used statistical libraries. The statistics engine packages also execute in Oracle Database.
PART 3: ORE SQL extensions supporting embedded R execution through the database on the database server. R users can execute R closures (functions) using an R or SQL API, while taking advantage of data parallelism. Using the SQL API for embedded R execution, sophisticated R graphics and results can be exposed in Oracle Business Intelligence EE dashboards and Oracle BI Publisher documents.
PART 4: Oracle R Connector for Hadoop (ORCH) - an R package that interfaces with the Hadoop Distributed File System (HDFS) and enables executing MapReduce jobs. ORCH enables R users to work directly with an Oracle Hadoop cluster, executing computations from the R environment, written in the R language and working on data resident in HDFS, Oracle Database, or local files.
But we won't stop there - expect to see posts discussing many new
features in 2012, including expanded platform support and an extended
set of analytics routines. Please come back frequently for updates that can help your organization mature in its implementation of in-database analytics.