Saving R Objects in Oracle Database using Oracle R Enterprise 1.3 Datastore

R allows users to save R objects to disk. The whole workspace of R objects can be saved to a file, and reloaded across R sessions, which allows users to return to their previous R environment even after quitting R or to avoid recreating objects needed in the future. One such type of object includes predictive models, which can be built in one session, and saved for scoring in another R session, or even multiple, possibly parallel R sessions.

R provides the save and load functions where objects in memory are serialized and unserialized, respectively. Figure 1 depicts an example where two R objects, a linear model and data.frame are saved to a file, and then reloaded. When objects are restored, they have the same names as when they were saved.

Figure 1: Using R save() and load() functions

Oracle R Enterprise (ORE) 1.3 supports object persistence using an R datastore in the database. Now, ORE proxy objects, as well as any R objects, can be saved and restored across R sessions as a named entity in Oracle Database. Serializing ORE objects, such as ore.frames, and saving them doesn’t work across sessions, since any referenced temporary tables or other database objects are not saved across R sessions. If these ore.frame proxy object references are not maintained, restoring them makes them incomplete and inoperative.

Figure 2 has an example similar to the previous example. The main difference is that we are using ore.save and ore.load, and providing the name of the datastore from which we want to retrieve ORE objects.

Figure 2: Using ORE datastore functions ore.save and ore.load

Each schema has its own datastore table where R objects are saved. By being managed in Oracle Database, ORE provides referential integrity of saved objects such that when otherwise temporary database objects are no longer referenced, they are auto-deleted at the end of the R session. This applies, for example, to tables created via ore.push or Oracle Data Mining models produced using the OREdm package.

Here’s a simple example:

DAT1 <- ore.push(iris)

ore.lm.mod <- ore.lm(Sepal.Length~., DAT1 )

lm.mod <- lm(mpg ~ cyl + disp + hp + wt + gear, mtcars)

nb.mod <- ore.odmNB(YEAR ~ ARRDELAY + DEPDELAY + log(DISTANCE), ONTIME_S)

ore.save(ore.lm.mod, lm.mod, nb.mod, name = "myModels")

We’re creating four objects: an ore.frame “DAT1” consisting of the iris data set, an ore.lm model that uses DAT1,a standard R lm model using the mtcars dataset, and an ODM naïve Bayes model using ONTIME_S. We then invoke ore.save for the three models and use the datastore name “myModels”.

The R objects are saved in the datastore, and any referenced data tables or ODM models are kept in the database, otherwise these are treated as temporary database objects and dropped when the ORE session ends.

To load these objects, invoke:

ore.load(name = "myModels")

 To see the content of a datastore, you can invoke ore.datastoreSummary with the name of the datastore.

The datastore also makes it easy to access R and ORE objects within ORE embedded R execution functions. Simply, the name of the datastore is passed as an embedded R function argument. Within the function, ore.load is invoked with the name of that datastore. We'll see an example of this using the SQL API below. In addition, by maintaining persisted R objects in the database, ORE facilitates application deployment while leveraging existing database backup, recovery, and security mechanisms. 

Consider the following ORE embedded R execution example using the SQL API.  We build a simple linear model and store it in a datastore with the name "myDatastore". This R script will be stored in the database R script repository with the name "BuildModel-1". To invoke this R script, we use the rqTableEval function, providing the input data from ONTIME_S, passing the parameters that include the name of the datastore, requesting XML output for the result, and specifying the R script by name. 

begin
  sys.rqScriptCreate('BuildModel-1',
 'function(dat,datastore_name) {
    mod <- lm(ARRDELAY ~ DISTANCE + DEPDELAY, dat)
    ore.save(mod,name=datastore_name, overwrite=TRUE)
    TRUE
}');
end;
/

select *

from table(rqTableEval(
  cursor(select ARRDELAY,DISTANCE,DEPDELAY from ONTIME_S),
  cursor(select 1 as "ore.connect", 'myDatastore' as "datastore_name"
         from dual),
  'XML',
  'BuildModel-1'));

To score using this model, we create an R script named "ScoreData-1" that loads the model from the named datastore, invokes predict, and binds the predictions with the predictors to be returned as the result.To invoke this script, we again use the rqTableEval function that takes as parameters a cursor specifying the data to be scored, parameters, a description of the result as a SQL query, and the R script name.

begin
  sys.rqScriptCreate('ScoreData-1',
 'function(dat, datastore_name) {
     ore.load(datastore_name)
     pred <- predict(mod, newdata=dat)
     pred[as.integer(rownames(pred))] <- pred
     cbind(dat, PRED = pred)
     }');
end;
/

select *
from table(rqTableEval(
    cursor(select ARRDELAY, DISTANCE, DEPDELAY from ONTIME_S
           where YEAR = 2003 and MONTH = 5 and DAYOFMONTH = 2),
    cursor(select 1 as "ore.connect",
                 'myDatastore' as "datastore_name" from dual),
    'select ARRDELAY, DISTANCE, DEPDELAY, 1 PRED from ontime_s',
    'ScoreData-1'))
order by 1, 2, 3;

Datastore functionality supports the following interface: ore.save, ore.load, ore.datastore, ore.datastoreSummary, and ore.delete. See the online documentation for details. A presentation on the ORE transparency layer that provides additional examples is available here.

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

The place for best practices, tips, and tricks for applying Oracle R Enterprise, Oracle R Distribution, ROracle, and Oracle R Advanced Analytics for Hadoop in both traditional and Big Data environments.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today