Best practices, news, tips and tricks - learn about Oracle's R Technologies for Oracle Database and Big Data

Using RStudio Shiny with ORE for interactive analysis and visualization

Guest Author

Shiny, by RStudio, is a popular web application framework for R. It can be used, for example, for building flexible interactive analyses and visualization solutions without requiring web development skills and knowledge of Javascript, HTML, CSS, etc. An overview of it's capabilities with numerous examples is available on RStudio's Shiny web site.

In this blog we illustrate a simple Shiny application for processing and visualizing data stored in Oracle Database for the special case where this data is wide and shallow. In this use case, analysts wish to interactively compute and visualize correlations between many variables (up to 40k) for genomics-related data where the number of observations is small, typically around 1000 cases. A similar use case was discussed in a recent blog Consolidating wide and shallow data with ORE Datastore, which addressed the performance aspects of reading and saving wide data from/to an Oracle R Enterprise (ORE) datastore. ORE allows users to store any type of R objects, including wide data.frames, directly in an Oracle Database using ORE datastores.

Our shiny application, invoked at the top level via the shinyApp() command below, has two components: a user-interface (ui) definition myui and a server function myserver.
R> library("shiny")
R> shinyApp(ui = myui, server = myserver) 

The functionality is very basic. The user specifies the dataset to be analyzed, the sets of variables to correlate against each other, the correlation method (Pearson, Spearman,etc) and the treatment of missing data as handled by the 'method' and, respectively, the 'use' arguments of R's cor() function. The datasets are all wide (ncols > 1000) and have been already saved in an ORE datastore.

A partial code for the myui user interface is below. The sidebarPanel section handles the input and the correlation values are displayed graphically by plotOutput() within the mainPanel section. The argument 'corrPlot' corresponds to the function invoked by the server() component.

R> myui <- fluidPage(...,   
                  label   = "Data Sets",
                  choices = c(...,...,...),                   
                  selected = (...),      
                   label = "Correlation Type",
                   choices = list("Pearson"="pearson",
                   selected = "pearson"),
                  label   = "Use",
                  choices = c("everything","all.obs","complete.obs",
                  selected = "everything"),      
      textInput("yvars",label=...,value=...) ,
    mainPanel( plotOutput("corrPlot") )       

The server component consists of two functions :
  • The server function myserver(), passed as argument during the top level invocation of shinyApp(). myserver returns the output$corrPlot object (see the ui component) generated by shiny's rendering function renderPlot(). The plot object p is generated within renderPlot() by calling ore.doEval() for the embedded R execution of gener_corr_plot(). The ui input selections are passed to gener_corr_plot() through ore.doEval().
R> myserver <- function(input, output) {   
  output$corrPlot <- renderPlot({
    p<- ore.pull(ore.doEval(
  • A core function gener_corr_plot(), which combines loading data from the ORE datastore, invoking the R cor() function with all arguments specified in the ui and generating the plot for the resulting correlation values.
R> gener_corr_plot <- function(TblName,corrtyp="pearson",use="everything",
                            xvars=...,yvars=...,ds.name=...) {
       res <- cor(get(TblName)[,filtering_based_on_xvars],
       p <- qplot(y=c(res)....)

The result of invoking ShinyApp() is illustrated in the figure below. The data picked for this display is DOROTHEA, a drug discovery dataset from the UCI Machine Learning Repository and the plot shows the Pearson correlation between the variable 1 and the first 2000 variables of the dataset. The variables are labeled here generically, by their index number.

For the type of data considered by the customer (wide tables with 10k-40k columns and ~ 1k rows) a complete iteration (data loading, correlation calculations between one variable and all others & plot rendering) takes from a few seconds up to a few dozen seconds, depending on the correlation method and the handling of missing values. This is considerably faster than the many minutes reported by customers running their code in memory with data loaded from CSV files.

Several questions may come to mind when thinking about such Shiny-based applications. For example:

  • When and how to decouple the analysis from data reading?
  • How to build chains of reactive components for more complex applications?
  • What is an efficient rendering mechanism for multiple plotting methods and increasing data sizes?
  • What other options for handling data with more than 1000 columns exist? (nested tables?)

We will address these in future posts. In this Part 1 we illustrated that it is easy to construct a Shiny-based interactive application for wide data by leveraging ORE's datastores capability and support for embedded R execution. Besides improved performance, this solution offers the security, auditing, backup and recovery capabilities of Oracle Database.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha