X

Best practices, news, tips and tricks - learn about Oracle's R Technologies for Oracle Database and Big Data

  • Wednesday, February 28, 2018

Deploying Multiple R Scripts in Oracle R Enterprise

By: Jie Liu | Data Scientist

In this tips and tricks blog, we share some techniques through our own use of Oracle R Enterprise applied to data science projects that you may find useful in your own projects.

Some data science projects may have tens or hundreds of R scripts and R functions written by developers or data scientists. While under ideal circumstances, you would create a package to contain these functions, that may involve more effort than you had in mind. This tradeoff of package vs. no package arose in one of our recent projects, built with ORE and running on a production database. Since the production environment is managed under strict access rules, our team did not have access to the Linux environment to install and reinstall packages at will. This posed a challenge as our codebase contained hundreds of functions. As a live machine learning project, feedback and enhancement requests arrived on a daily basis early on. Thus, we needed to respond quickly and deliver new features or fixes and deploy the updated model in a timely manner. This makes installing our code as an R package a more heavyweight process, involving administrators. Package installation must be done with system administrator privileges, whereas loading R scripts to the R Script Repository is permitted to users with the RQADMIN privilege.

Since our deployment strategy requires the application to run inside Oracle Database using ORE embedded R execution, we store our top level function in the database R Script Repository and invoke it by name. If our function invokes other functions also stored in the R Script Repository, we can load each by name using ore.scriptLoad - one invocation per function, which can be a lot.

Here is a simple convenience function, ore.scriptLoad2, which allows using a regular expression to load multiple functions and leverages two existing ORE functions: ore.scriptList and ore.scriptLoad.

ore.scriptLoad2 <- function(pattern=NULL, envir = parent.frame()){
  lst <- ore.scriptList(pattern = pattern)$NAME
  for( n in lst ){
    ore.scriptLoad(name= n, envir = envir)
  }
}
ore.scriptCreate('ore.scriptLoad2',ore.scriptLoad2)

We create the named script ore.scriptLoad2 so we can use it inside our embedded R functions as well.

Tip: name your R functions with a common prefix or postfix related to your project to make it easy to grab them all at once.

For instance, if all our functions have the prefix ml, we can load every matching function in the following way.

ore.scriptLoad('ore.scriptLoad2') # loads our convenience function
ore.scriptLoad2(pattern = '^ml')

Not that the parameter pattern accepts a regular expression, which allows the developer to name the function with any predefined pattern. The regular expression used here ensures that ml is the prefix.
Similarly, we provide a corresponding drop function so that we can also drop multiple functions from the R Script Repository in a single call. This makes it easy to ensure you have a clean environment.

ore.scriptDrop2 <- function(pattern=NULL){
  lst <- ore.scriptList(pattern = pattern)$NAME
  for( n in lst ){
    ore.scriptDrop(name= n)
  }
}

Example

Here we show a simple deployment example. Suppose a machine learning application contains only two functions ml.preproc and ml.train in source file main.R. To deploy the two functions, we need first to create the scripts in the R Script repository. See the sample code as follows.

library(ORE)
ore.connect(...)
source('main.R') # contains ml.preproc and ml.train
ore.scriptDrop2(pattern='^ml') # drop the existing old version
funcs <- lsf.str() # list all functions in workspace
funcs <- as.vector(funcs) 
sapply(funcs, function(func){  # create the script in R repository
  ore.scriptCreate(func, eval(parse(text=func)) )
})

After the R scripts are created, we can load these user-defined R functions to our top level function when we invoke ORE embedded R execution. For simplicity, we call them in ore.doEval().

ore.doEval(function(){
    ore.scriptLoad('ore.scriptLoad2') # load our convenience function
    ore.scriptLoad2(pattern = '^ml')
    ml.preproc()
    ml.train()
})

Note that all this code just runs from the ORE client, and does not require direct command line access to the database server machine.

Conclusion

In this blog, we addressed the issue of agile deployment of machine learning software based on Oracle R Enterprise. We shared two functions to assist with ORE production deployment involving multiple user-defined R functions. This allows privileged R users to deploy code without system administrator privileges to the target machine. Using ORE in this manner, code can be updated in batch and reduce system administrator overhead. In practice, this type of deployment proved advantageous for our project. We also look forward to feedback from the community of ORE users.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha