How much faster is the Oracle R implementation compared to “Normal” R ?
By Mike.Hallett@Oracle-BI&EPM on Apr 25, 2013
For those of you just getting interested in “Advanced Analytics”, you may still be wondering “What is R ?”… R is an open-source language and environment for statistical computing and data visualization: and it works with OBI to enrich the graphics and predictive capabilities: see here a quick preview on YouTube.
It is being taught in colleges and universities in courses on statistics and advanced analytics – often in preference to more traditional statistical software tools – and so skills in R are readily available among younger graduates.
For you experts in the field who use R anyway, the key question is why use “Oracle’s version” ?
The tight integration between R, Oracle Database 11g, and Hadoop enables R users to write one R script that can run in three different environments: a laptop running open source R, Hadoop running with Oracle Big Data Connectors, and Oracle Database 11g.
For large analyses on large Oracle data-sets, it is much faster and easier to do this “inside” the database, than exporting the data into another specialised external data format. Some of the benchmarks below, show 4x ~ 20x + faster for various operations, and even 100x + for some data scoring algorithms.
For Oracle Advanced Analytics Option, Oracle R Enterprise in-DB functionality, the performance gains come through the R-to-SQL transparency layer for native SQL performance and the OAA/ORE “mapping” to the OAA/Oracle Data Mining SQL based hi-performance data mining algorithms and statistical functions that are native in-DB parallelized implementations of the algorithms.
Also, besides the simpler and more scalable architecture, the majority of the performance gains stem from eliminating the extract, mine, apply models, import outer loop which can take weeks to months, largely due to “human time” sinks to manually translate the data transformations and model logic to native SQL for in-DB deployment. That conversion process is tedious, time consuming and error prone. The OAA in-DB performance therefore cuts that latency time down to secs / mins / hours.