Data scientists and other users of machine learning and predictive analytics technology often have their favorite algorithm for solving particular problems. If they are using a tool like Oracle Advanced Analytics -- with Oracle R Enterprise and Oracle Data Mining -- there's a desire to use these algorithms within that tool's framework. Using ORE's embedded R execution, users can already use 3rd party R packages in combination with Oracle Database for execution at the database server machine. New in Oracle R Enterprise 1.5.1 and Oracle Database 12.2 is the feature to integrate 3rd party algorithms with three user-defined functions: model build, score, and model details -- referred to as extensible R algorithm models.
Some details:Here's a simple script that defines the functions supporting R's GLM algorithm. The first, glm_build, builds a GLM model given data, formula and family as arguments. The second, glm_score, invokes R's predict function on the model and data provided as arguments. The third, glm_detail, takes a model and returns a data.frame with whatever information the user intend.
IRIS <- ore.push(iris) ore.scriptCreate("glm_build", function(data, form, family) { glm(formula = form, data = data, family = family)}) ore.scriptCreate("glm_score", function(mod, data) { res <- predict(mod, newdata = data); data.frame(res)}) ore.scriptCreate("glm_detail", function(mod) { data.frame(name=names(mod$coefficients), coef=mod$coefficients)})
To invoke this algorithm from R, we use the function ore.odmRAlg, which expects an ore.frame with the data, the formula for building the model, and the three functions. Note that only the build function is required; scoring and model details are optional. The overloaded summary function returns the model details, and the overloaded predict function scores the data and returns any requested supplemental columns.
ralg.mod <- ore.odmRAlg(IRIS, mining.function = "regression", formula = c(form="Sepal.Length ~ ."), build.function = "glm_build", build.parameter = list(family="gaussian"), score.function = "glm_score", detail.function = "glm_detail", detail.value = data.frame(name="a", coef=1)) summary(ralg.mod) ralg.mod$details predict(ralg.mod, newdata = head(IRIS), supplemental.cols = "Sepal.Length")Here is the output from invoking ore.odmRAlg and the remaining functions:
> ralg.mod <- ore.odmRAlg(IRIS, mining.function = "regression", + formula = c(form="Sepal.Length ~ ."), + build.function = "glm_build", build.parameter = list(family="gaussian"), + score.function = "glm_score", + detail.function = "glm_detail", detail.value = data.frame(name="a", coef=1)) > summary(ralg.mod) Call: ore.odmRAlg(data = IRIS, mining.function = "regression", formula = c(form = "Sepal.Length ~ ."), build.function = "glm_build", build.parameter = list(family = "gaussian"), score.function = "glm_score", detail.function = "glm_detail", detail.value = data.frame(name = "a", coef = 1)) Settings: value odms.missing.value.treatment odms.missing.value.auto odms.sampling odms.sampling.disable prep.auto OFF build.function RQUSER.glm_build build.parameter select 'Sepal.Length ~ .' "form", 'gaussian' "family" from dual details.format select cast('a' as varchar2(4000)) "name", 1 "coef" from dual details.function RQUSER.glm_detail score.function RQUSER.glm_score name coef 1 (Intercept) 2.1712663 2 Petal.Length 0.8292439 3 Petal.Width -0.3151552 4 Sepal.Width 0.4958889 5 Speciesversicolor -0.7235620 6 Speciesvirginica -1.0234978 > ralg.mod$details name coef 1 (Intercept) 2.1712663 2 Petal.Length 0.8292439 3 Petal.Width -0.3151552 4 Sepal.Width 0.4958889 5 Speciesversicolor -0.7235620 6 Speciesvirginica -1.0234978 > predict(ralg.mod, newdata = head(IRIS), supplemental.cols = "Sepal.Length") Sepal.Length PREDICTION 1 5.1 5.004788 2 4.9 4.756844 3 4.7 4.773097 4 4.6 4.889357 5 5.0 5.054377 6 5.4 5.388886
In a subsequent post, we'll look at doing the same from SQL.