### Step-by-step: Returning R statistical results as a Database Table

#### By Mark Hornick on Apr 27, 2014

R provides a rich set of statistical functions that we may want to use directly from SQL. Many of these results can be readily expressed as structured table data for use with other SQL tables, or for use by SQL-enabled applications, e.g., dashboards or other statistical tools.

In this blog post, we illustrate in a sequence of five simple steps how to go from an R function to a SQL-enabled result. Taken from recent "proof of concept" customer engagement, our example involves using the function princomp, which performs a principal components analysis on a given numeric data matrix and returns the results as an object of class princomp. The customer actively uses this R function to produce loadings used in subsequent computations and analysis. The loadings is a matrix whose columns contain the eigenvectors).

The current process of pulling data from their Oracle Database, starting an R engine, invoking the R script, and placing the results back in the database was proving non-performant and unnecessarily complex. The goal was to leverage Oracle R Enterprise to streamline this process and allow the results to be immediately accessible

through SQL.

As a best practice, here is a process that can get you from start to finish:**Step 1: Invoke from command line, understand results**

If you're using a particular R function, chances are you are familiar with its content. However, you may not be familiar with its structure. We'll use an example from the R princomp documentation that uses the USArrests data set. We see that the class of the result is of type princomp, and the model prints the call and standard deviations of the components. To understand the underlying structure, we invoke the function str and see there are seven elements in the list, one of which is the matrix loadings.

mod <- princomp(USArrests, cor = TRUE)

class(mod)

mod

str(mod)

Results:

R> mod <- princomp(USArrests, cor = TRUE)

R> class(mod)

[1] "princomp"

R> mod

Call:

princomp(x = USArrests, cor = TRUE)

Standard deviations:

Comp.1 Comp.2 Comp.3 Comp.4

1.5748783 0.9948694 0.5971291 0.4164494

4 variables and 50 observations.

R> str(mod)

List of 7

$ sdev : Named num [1:4] 1.575 0.995 0.597 0.416

..- attr(*, "names")= chr [1:4] "Comp.1" "Comp.2" "Comp.3" "Comp.4"

$ loadings: loadings [1:4, 1:4] -0.536 -0.583 -0.278 -0.543 0.418 ...

..- attr(*, "dimnames")=List of 2

.. ..$ : chr [1:4] "Murder" "Assault" "UrbanPop" "Rape"

.. ..$ : chr [1:4] "Comp.1" "Comap.2" "Comp.3" "Comp.4"

$ center : Named num [1:4] 7.79 170.76 65.54 21.23

..- attr(*, "names")= chr [1:4] "Murder" "Assault" "UrbanPop" "Rape"

$ scale : Named num [1:4] 4.31 82.5 14.33 9.27

..- attr(*, "names")= chr [1:4] "Murder" "Assault" "UrbanPop" "Rape"

$ n.obs : int 50

$ scores : num [1:50, 1:4] -0.986 -1.95 -1.763 0.141 -2.524 ...

..- attr(*, "dimnames")=List of 2

.. ..$ : chr [1:50] "1" "2" "3" "4" ...

.. ..$ : chr [1:4] "Comp.1" "Comp.2" "Comp.3" "Comp.4"

$ call : language princomp(x = dat, cor = TRUE)

- attr(*, "class")= chr "princomp"**Step 2: Wrap script in a function, and invoke from ore.tableApply**

Since we want to invoke princomp on database data, we first push the demo data, USArrests, to the database to create an ore.frame. Other data we wish to use will also be in database tables.

We'll use ore.tableApply (for the reasons cited in the previous blog post) providing the ore.frame as the first argument and simply returning within our function the model produced by princomp. We'll then look at its class, retrieve the result from the database, and check its class and structure once again.

Notice that we are able to obtain the exact same result we received using our local R engine as with the database R engine through embedded R execution.

dat <- ore.push(USArrests)

computePrincomp <- function(dat) princomp(dat, cor=TRUE)

res <- ore.tableApply(dat, computePrincomp)

class(res)

res.local <- ore.pull(res)

class(res.local)

str(res.local)

res.local

res

Results:

R> dat <- ore.push(USArrests)

R> computePrincomp <- function(dat) princomp(dat, cor=TRUE)

R> res <- ore.tableApply(dat, dat, computePrincomp)

R> class(res)

[1] "ore.object"

attr(,"package")

[1] "OREembed"

R> res.local <- ore.pull(res)

R> class(res.local)

[1] "princomp"

R> str(res.local)

List of 7

$ sdev : Named num [1:4] 1.575 0.995 0.597 0.416

..- attr(*, "names")= chr [1:4] "Comp.1" "Comp.2" "Comp.3" "Comp.4"

$ loadings: loadings [1:4, 1:4] -0.536 -0.583 -0.278 -0.543 0.418 ...

..- attr(*, "dimnames")=List of 2

.. ..$ : chr [1:4] "Murder" "Assault" "UrbanPop" "Rape"

.. ..$ : chr [1:4] "Comp.1" "Comap.2" "Comp.3" "Comp.4"

$ center : Named num [1:4] 7.79 170.76 65.54 21.23

..- attr(*, "names")= chr [1:4] "Murder" "Assault" "UrbanPop" "Rape"

$ scale : Named num [1:4] 4.31 82.5 14.33 9.27

..- attr(*, "names")= chr [1:4] "Murder" "Assault" "UrbanPop" "Rape"

$ n.obs : int 50

$ scores : num [1:50, 1:4] -0.986 -1.95 -1.763 0.141 -2.524 ...

..- attr(*, "dimnames")=List of 2

.. ..$ : chr [1:50] "1" "2" "3" "4" ...

.. ..$ : chr [1:4] "Comp.1" "Comp.2" "Comp.3" "Comp.4"

$ call : language princomp(x = dat, cor = TRUE)

- attr(*, "class")= chr "princomp"

R> res.local

Call:

princomp(x = dat, cor = TRUE)

Standard deviations:

Comp.1 Comp.2 Comp.3 Comp.4

1.5748783 0.9948694 0.5971291 0.4164494

4 variables and 50 observations.

R> res

Call:

princomp(x = dat, cor = TRUE)

Standard deviations:

Comp.1 Comp.2 Comp.3 Comp.4

1.5748783 0.9948694 0.5971291 0.4164494

4 variables and 50 observations.**Step 3: Determine what results we really need**

Since we are only interested in the loadings and any result we return needs to be a data.frame to turn it into a database row set (table result), we build the model, transform the loadings object into a data.frame, and return the data.frame as the function result. We then view the class of the result and its values.

Since we do this from the R API, we can simply print res to display the returned data.frame, as the print does an implicit ore.pull.

returnLoadings <- function(dat) {

mod <- princomp(dat, cor=TRUE)

dd <- dim(mod$loadings)

ldgs <- as.data.frame(mod$loadings[1:dd[1],1:dd[2]])

ldgs$variables <- row.names(ldgs)

ldgs

}

res <- ore.tableApply(dat, returnLoadings)

class(res)

res

ore.create(USArrests,table="USARRESTS")

Results:

R> res <- ore.tableApply(dat, returnLoadings)

R> class(res)

[1] "ore.object"

attr(,"package")

[1] "OREembed"

R> res

Comp.1 Comp.2 Comp.3 Comp.4 variables

Murder -0.5358995 0.4181809 -0.3412327 0.64922780 Murder

Assault -0.5831836 0.1879856 -0.2681484 -0.74340748 Assault

UrbanPop -0.2781909 -0.8728062 -0.3780158 0.13387773 UrbanPop

Rape -0.5434321 -0.1673186 0.8177779 0.08902432 Rape**Step 4: Load script into the R Script Repository in the database**

We're at the point of being able to load the script into the R Script Repository before invoking it from SQL. We can create the function from R or from SQL. In R,

ore.scriptCreate('princomp.loadings', returnLoadings)

or from SQL,

begin

--sys.rqScriptDrop('princomp.loadings');

sys.rqScriptCreate('princomp.loadings',

'function(dat) {

mod <- princomp(dat, cor=TRUE)

dd <- dim(mod$loadings)

ldgs <- as.data.frame(mod$loadings[1:dd[1],1:dd[2]])

ldgs$variables <- row.names(ldgs)

ldgs

}');

end;

/**Step 5: invoke from SQL SELECT statement**

Finally, we're able to invoke the function from SQL using the rqTableEval table function. We pass in a cursor with the data from our USARRESTS table. We have no parameters, so the next argument is NULL. To get the results as a table, we specify a SELECT string that defines the structure of the result. Note that the column names must be identical to what is returned in the R data.frame. The last parameter is the name of the function we want to invoke from the R script repository.

Invoking this, we see the result as a table from the SELECT statement.

select *

from table(rqTableEval( cursor(select * from USARRESTS),

NULL,

'select 1 as "Comp.1", 1 as "Comp.2", 1 as "Comp.3", 1 as "Comp.4", cast(''a'' as varchar2(12)) "variables" from dual',

'princomp.loadings'));

Results:

SQL> select *

from table(rqTableEval( cursor(select * from USARRESTS),NULL,

'select 1 as "Comp.1", 1 as "Comp.2", 1 as "Comp.3", 1 as "Comp.4", cast(''a'' as varchar2(12)) "variables" from dual','princomp.loadings'));

2 3

Comp.1 Comp.2 Comp.3 Comp.4 variables

---------- ---------- ---------- ---------- ------------

-.53589947 .418180865 -.34123273 .649227804 Murder

-.58318363 .187985604 -.26814843 -.74340748 Assault

-.27819087 -.87280619 -.37801579 .133877731 UrbanPop

-.54343209 -.16731864 .817777908 .089024323 Rape

As you see above, we have the loadings result returned as a SQL table.

In this example, we walked through the steps of moving from invoking an R function to obtain a specific result to producing that same result from SQL by invoking an R script at the database server under the control of Oracle Database.