X

Best practices, news, tips and tricks - learn about Oracle's R Technologies for Oracle Database and Big Data

Integrating custom algorithms with Oracle Advanced Analytics with R

Mark Hornick
Director, Advanced Analytics and Machine Learning

Data scientists and other users of machine learning and predictive analytics technology often have their favorite algorithm for solving particular problems. If they are using a tool like Oracle Advanced Analytics -- with Oracle R Enterprise and Oracle Data Mining -- there's a desire to use these algorithms within that tool's framework. Using ORE's embedded R execution, users can already use 3rd party R packages in combination with Oracle  Database for execution at the database server machine. New in Oracle R Enterprise 1.5.1 and Oracle Database 12.2 is the feature to integrate 3rd party algorithms with three user-defined functions: model build, score, and model details -- referred to as extensible R algorithm models.

Some details:
  • Users provide R scripts stored in the ORE R Script Repository that implement functionality for model build, score, and model details (viewing of R models)
  • Supports mining functions: classification, regression, clustering, feature_extraction, attribute_importance, and association
  • The predict method executes the score.function specified and returns an ore.frame containing the predictions along with any columns specified by the supplemental.cols argument
  • Function predict applicable to classification, regression, clustering, and feature_extraction models
  • Registered algorithms can be used with Oracle Data Mining in Oracle Database 12.2 or later

Here's a simple script that defines the functions supporting R's GLM algorithm. The first, glm_build, builds a GLM model given data, formula and family as arguments. The second, glm_score, invokes R's predict function on the model and data provided as arguments. The third, glm_detail, takes a model and returns a data.frame with whatever information the user intend.

IRIS <- ore.push(iris)
ore.scriptCreate("glm_build", 
                 function(data, form, family) {
                    glm(formula = form, data = data, family = family)})
ore.scriptCreate("glm_score", 
                 function(mod, data) {
                    res <- predict(mod, newdata = data); data.frame(res)})
ore.scriptCreate("glm_detail", 
                 function(mod) {
                   data.frame(name=names(mod$coefficients), 
                   coef=mod$coefficients)})

To invoke this algorithm from R, we use the function ore.odmRAlg, which expects an ore.frame with the data, the formula for building the model, and the three functions. Note that only the build function is required; scoring and model details are optional. The overloaded summary function returns the model details, and the overloaded predict function scores the data and returns any requested supplemental columns.

ralg.mod <- ore.odmRAlg(IRIS, mining.function = "regression",
                        formula = c(form="Sepal.Length ~ ."),
                        build.function = "glm_build", build.parameter = list(family="gaussian"),
                        score.function = "glm_score",
                        detail.function = "glm_detail", detail.value = data.frame(name="a", coef=1))
summary(ralg.mod)
ralg.mod$details
predict(ralg.mod, newdata = head(IRIS), supplemental.cols = "Sepal.Length")
Here is the output from invoking ore.odmRAlg and the remaining functions:
> ralg.mod <- ore.odmRAlg(IRIS, mining.function = "regression",
+                         formula = c(form="Sepal.Length ~ ."),
+                         build.function = "glm_build", build.parameter = list(family="gaussian"),
+                         score.function = "glm_score",
+                         detail.function = "glm_detail", detail.value = data.frame(name="a", coef=1))
> summary(ralg.mod)

Call:
ore.odmRAlg(data = IRIS, mining.function = "regression", formula = c(form = "Sepal.Length ~ ."), 
    build.function = "glm_build", build.parameter = list(family = "gaussian"), 
    score.function = "glm_score", detail.function = "glm_detail", 
    detail.value = data.frame(name = "a", coef = 1))

Settings: 
                                                                                       value
odms.missing.value.treatment                                         odms.missing.value.auto
odms.sampling                                                          odms.sampling.disable
prep.auto                                                                                OFF
build.function                                                              RQUSER.glm_build
build.parameter              select 'Sepal.Length ~ .' "form", 'gaussian' "family" from dual
details.format                 select cast('a' as varchar2(4000)) "name", 1 "coef" from dual
details.function                                                           RQUSER.glm_detail
score.function                                                              RQUSER.glm_score

               name       coef
1       (Intercept)  2.1712663
2      Petal.Length  0.8292439
3       Petal.Width -0.3151552
4       Sepal.Width  0.4958889
5 Speciesversicolor -0.7235620
6  Speciesvirginica -1.0234978
> ralg.mod$details
               name       coef
1       (Intercept)  2.1712663
2      Petal.Length  0.8292439
3       Petal.Width -0.3151552
4       Sepal.Width  0.4958889
5 Speciesversicolor -0.7235620
6  Speciesvirginica -1.0234978
> predict(ralg.mod, newdata = head(IRIS), supplemental.cols = "Sepal.Length")
  Sepal.Length PREDICTION
1          5.1   5.004788
2          4.9   4.756844
3          4.7   4.773097
4          4.6   4.889357
5          5.0   5.054377
6          5.4   5.388886

In a subsequent post, we'll look at doing the same from SQL.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
Oracle

Integrated Cloud Applications & Platform Services