By Mark Hornick on Feb 03, 2012
For many in the Oracle community, the addition of R through Oracle R Enterprise could leave them wondering "What is R?"
R has been receiving a lot of attention recently, although it’s been around for over 15 years. R is an open-source language and environment for statistical computing and data visualization, supporting data manipulation and transformations, as well as sophisticated graphical displays. It's being taught in colleges and universities in courses on statistics and advanced analytics - even replacing more traditional statistical software tools. Corporate data analysts and statisticians often know R and use it in their daily work, either writing their own R functionality, or leveraging the more than 3400 open source packages. The Comprehensive R Archive Network (CRAN) open source packages support a wide range of statistical and data analysis capabilities. They also focus on analytics specific to individual fields, such as bioinformatics, finance, econometrics, medical image analysis, and others (see CRAN Task Views).
So why do statisticians and data analysts use R?
Well, R is a statistics language similar to SAS or SPSS. It’s a powerful, extensible environment, and as noted above, it has a wide range of statistics and data visualization capabilities. It’s easy to install and use, and it’s free – downloadable from the CRAN R project website.
In contrast, statisticians and data analysts typically don’t know SQL and are not familiar with database tasks. R provides statisticians and data analysts access a wide range of analytical capabilities in a natural statistical language, allowing them to remain highly productive. For example, writing R functions is simple and can be done quickly. Functions can be made to return R objects that can be easily passed to and manipulated by other R functions. By comparison, traditional statistical tools can make the implementation of functions cumbersome, such that programmers resort to macro-oriented programming constructs instead.
So why do we need anything else?
R was conceived as a single user tool that is not multi-threaded. The
client and server components are bundled together as a single
executable, much like Excel.
R is limited by the memory and processing power of the machine where it runs, but in addition, being single threaded, it cannot automatically leverage the CPU capacity on a user’s multi-processor laptop without special packages and programming.
However, there is another issue that limits R’s scalability…
R’s approach to passing data between function invocations results in data duplication – this chews up memory faster. So inherently, R is not good for big data, or depending on the machine and tasks, even gigabyte-sized data sets.
This is where Oracle R Enterprise comes in. As we'll continue to discuss in this blog, Oracle R Enterprise lifts this memory and computational constraint found in R today by executing requested R calculations on data in the database, using the database itself as the computational engine. Oracle R Enterprise allows users to further leverage Oracle's engineered systems, like Exadata, Big Data Appliance, and Exalytics, for enterprise-wide analytics, as well as reporting tools like Oracle Business Intelligence Enterprise Edition dashboards and BI Publisher documents.