Introduction to the ORE Transparency Layer

This Oracle R Enterprise (ORE) transparency layer tutorial is the first in a series to help users get started using ORE. Oracle R Enterprise  is a component in the Oracle Advanced Analytics Option of Oracle Database Enterprise Edition.

Oracle R Enterprise (ORE) implements a transparency layer on top of the R engine that allows R computations to be executed in Oracle Database from the R environment. A mapping is established between a special R object called an ORE frame and a corresponding database table or view, allowing a wide range of R functionality to be executed on these mapped objects.  The overloaded functions in the ORE packages generate SQL statements in the background, export the expensive computations to the database for execution, and return results to the R environment.  

Here's a quick overview and an example using the airline on-time performance data from Research and Innovative Technology Administration (RITA), which  coordinates the U.S. Department of Transportation (DOT) research programs. The data consists of 123 million records of U.S. domestic commercial flights between 1987 and 2008. 

Configuring the R Environment

We begin by configuring the local R environment by executing a few simple commands.  These commands may be saved in the .Rprofile file and executed during the initialization of each R session or typed directly into the R session.

Load the ORE library and connect to Oracle Database:

   >  library(ORE)
   >  ore.connect("USER", "SID", "HOST", "PASSWORD")

Syncing with the database syncs the metadata in the database schema with the R environment:

   >  ore.sync("SCHEMA")
   >  ore.ls()

Attaching the database provides access to views and tables so they can be manipulated from a local R session:

Data Exploration
   >  ore.attach("SCHEMA")


Travelers in the western United States may be interested in the mean arrival delay for the San Fransisco, Los Angeles and Seattle airports.
Executing this R code produces a boxplot representing the delay profile for these selected airports:

   > delay <- ontime$ARRDELAY[ontime$DEST %in% c("SEA", "SFO", "LAX"), ]
   > dest <- ontime$DEST[ontime$DEST %in% c("SEA", "SFO", "LAX"), ]
   > bd <- split(delay, dest)
   > boxplot(bd, notch = TRUE, col = "gold", cex = 0.5,
            outline = FALSE, horizontal = TRUE, yaxt = "n",
            main = "Arrival Delays by Destination",
            ylab = "Destination Airport", xlab = "Delay (minutes)")
   > labels <- levels(dest)
   > text(par("usr")[1] - 3, 1:length(labels), srt = 0, adj = 1, labels = labels, xpd = TRUE, cex = 0.75)


The delay profile shows that, on average, arrival delays are greater in San Fransisco than Seattle and Los



With this information, we proceed with fitting a linear model where arrival delay is modeled as a linear function of departure delay and destination. Fitting models in ORE requires minimal modification to traditional R syntax. Simply replace R's linear modeling function lm with ore.lm, and the remaining syntax is transparent:

   > mod.ore <-ore.lm(ARRDELAY ~ DEST + DEPDELAY, data=ontime)
   > summary(mod.ore)

Many R functions compute and store more information than they report by default. Users can easily save these results to an object and extract the components they need. This allows the output of one function to be used as the input to another - a very powerful feature of the R programming environment.  We will take advantage of this functionality by generating predictions from the linear model built in the database:

   > newdat <- ontime[,c("ARRDELAY","DEST", "DEPDELAY")]
   > newdat.sub <- subset(newdat,!(is.na(ARRDELAY) | is.na(DEST) | is.na(DEPDELAY)))
   > prd.ore <- predict(mod.ore, newdata=newdat)
   > res.ore <- cbind(newdat, PRED = prd.ore)


To complete our session, we may detach the schema if desired, and terminate the database connection:

    > ore.detach ("RQUSER")
    > ore.disconnect()

This a small subset of the functionality available in the ORE transparency layer.  The R functions overloaded in the transparency layer include:

    data type assignment and verification
    distribution testing
    mathematical transformations
    statistical summaries
    arithmetic operators
    comparison operators
    logical operators
    set operations
    string manipulations
    data manipulation
    graphics
    matrix operations
    gamma and bessel functions
    hypothesis testing
    regression 

 Summary

Using ORE objects, users transparently remain in the R language, requiring minimal modifications to their existing R scripts.  Although supported, it's not necessary to pull data from the database into R.  This eliminates the need to manipulate memory-bound R objects on the user's desktop system.  R users may access open-source R packages containing many standard and cutting-edge routines for data analysis.

To learn more about ORE offerings, including statistics and modeling features and and advanced topics like the Oracle R Connector for Hadoop (ORCH), view the documentation and training materials on our our product page.  To encourage collaboration, we provide a discussion forum for ORE topics.






Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

The place for best practices, tips, and tricks for applying Oracle R Enterprise, Oracle R Distribution, ROracle, and Oracle R Advanced Analytics for Hadoop in both traditional and Big Data environments.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today