Best practices, news, tips and tricks - learn about Oracle's R Technologies for Oracle Database and Big Data

  • February 18, 2013

Saving R Objects in Oracle Database using Oracle R Enterprise 1.3 Datastore

Mark Hornick
Senior Director, Data Science and Machine Learning

R allows users to save R objects to disk. The whole
workspace of R objects can be saved to a file, and reloaded across R sessions, which allows users to return to their previous R
environment even after quitting R or to avoid recreating objects needed in the future. One such type of object includes predictive
models, which can be built in one session, and saved for scoring in another R
session, or even multiple, possibly parallel R sessions.

R provides the save and load functions where objects in
memory are serialized and unserialized, respectively. Figure 1 depicts an example
where two R objects, a linear model and data.frame are saved to a file, and
then reloaded. When objects are restored, they have the same names as when they
were saved.

Figure 1: Using R save() and load() functions

Oracle R Enterprise (ORE) 1.3 supports object
persistence using an R datastore in
the database. Now, ORE proxy objects, as well as any R objects, can be saved and
restored across R sessions as a named entity in Oracle Database. Serializing ORE objects, such as
ore.frames, and saving them doesn’t work across sessions, since any
referenced temporary tables or other database objects are not saved across R
sessions. If these ore.frame proxy object references are not maintained,
restoring them makes them incomplete and inoperative.

Figure 2 has an example similar to the previous example. The
main difference is that we are using ore.save
and ore.load, and providing the
name of the datastore from which we want to retrieve ORE objects.

Figure 2: Using ORE datastore functions ore.save and ore.load

Each schema has its own datastore table where R objects are
saved. By being managed in Oracle Database, ORE provides referential integrity
of saved objects such that when otherwise temporary database objects are no
longer referenced, they are auto-deleted at the end of the R session. This
applies, for example, to tables created via ore.push
or Oracle Data Mining models produced using the OREdm package.

Here’s a simple example:

DAT1 <- ore.push(iris)

ore.lm.mod <- ore.lm(Sepal.Length~.,
DAT1 )

lm.mod <- lm(mpg ~ cyl + disp + hp + wt +
gear, mtcars)

nb.mod <- ore.odmNB(YEAR ~ ARRDELAY +

lm.mod, nb.mod, name = "myModels")

We’re creating four objects: an ore.frame
“DAT1” consisting of the iris data set, an ore.lm model that uses DAT1,a standard
R lm model using the mtcars dataset, and an ODM naïve Bayes model using
ONTIME_S. We then invoke ore.save
for the three models and use the datastore name “myModels”.

The R objects are saved in the datastore, and any referenced
data tables or ODM models are kept in the database, otherwise these are treated
as temporary database objects and dropped when the ORE session ends.

To load these objects, invoke:

= "myModels")

 To see the content of a datastore, you can invoke ore.datastoreSummary with the name of
the datastore.

The datastore also makes it easy to access R and ORE objects within ORE embedded R execution functions. Simply, the name of the datastore is
passed as an embedded R function argument. Within the function, ore.load is invoked with the name of
that datastore. We'll see an example of this using the SQL API below. In
addition, by maintaining persisted R objects in the database, ORE
facilitates application deployment while leveraging existing database
backup, recovery, and security mechanisms. 

Consider the following ORE embedded R execution example using the SQL API.  We build a simple linear model and store it in a datastore with the name "myDatastore". This R script will be stored in the database R script repository with the name "BuildModel-1". To invoke this R script, we use the rqTableEval function, providing the input data from ONTIME_S, passing the parameters that include the name of the datastore, requesting XML output for the result, and specifying the R script by name. 



 'function(dat,datastore_name) {

    mod <- lm(ARRDELAY ~ DISTANCE + DEPDELAY, dat)

    ore.save(mod,name=datastore_name, overwrite=TRUE)




select *

from table(rqTableEval(


  cursor(select 1 as "ore.connect", 'myDatastore' as "datastore_name"

         from dual),


To score using this model, we create an R script named "ScoreData-1" that loads the model from the named datastore, invokes predict, and binds the predictions with the predictors to be returned as the result.To invoke this script, we again use the rqTableEval function that takes as parameters a cursor specifying the data to be scored, parameters, a description of the result as a SQL query, and the R script name.

 'function(dat, datastore_name) {
     pred <- predict(mod, newdata=dat)
     pred[as.integer(rownames(pred))] <- pred
     cbind(dat, PRED = pred)

select *
from table(rqTableEval(
           where YEAR = 2003 and MONTH = 5 and DAYOFMONTH = 2),
    cursor(select 1 as "ore.connect",
                 'myDatastore' as "datastore_name" from dual),
    'select ARRDELAY, DISTANCE, DEPDELAY, 1 PRED from ontime_s',
order by 1, 2, 3;

Datastore functionality supports the following interface: ore.save, ore.load, ore.datastore,
ore.datastoreSummary, and ore.delete. See the online documentation for details. A presentation on the ORE transparency layer that provides additional examples is available here.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.