X

Best practices, news, tips and tricks - learn about Oracle's R Technologies for Oracle Database and Big Data

  • News |
    Thursday, August 31, 2017

Oracle R Enterprise 1.5.1 for Oracle Database is now available

By: Mark Hornick | Director, Advanced Analytics and Machine Learning

We are pleased to announce that Oracle R Enterprise (ORE) 1.5.1 is now available for download for Oracle Database Enterprise Edition with Oracle R Distribution 3.3.0 / R-3.3.0. Oracle R Enterprise is a component of the Advanced Analytics option to Oracle Database.

With ORE 1.5.1, we introduce two new packages: OREdplyr - a transparency layer enhancement - allows ORE users access to many of the popular dplyr functions on ore.frames; and a second package via separate download, OAAgraph 2.4.1, which provides an R interface to the powerful Oracle Spatial and Graph Parallel Graph Engine (PGX) for use in combination with ORE and database tables. New in-database algorithms exposed through R in the OREdm package include: Expectation Maximization (EM), Explicit Semantic Analysis (ESA), and Singular Value Decomposition (SVD). In addition, ORE 1.5.1 enables performing automated text processing for Oracle Data Mining's Support Vector Machine (SVM), Generalized Linear Model (GLM), KMeans, SVD, Non-negative Matrix Factorization (NMF), and ESA models; and the building of "partitioned models" and "extensible R algorithm models" for users to define R functions that plug into the Oracle Advanced Analytics in-database model framework.

Here are the highlights for the new and enhanced features in ORE 1.5.1.

Upgraded R version compatibility

ORE 1.5.1 is certified with R-3.3.0 - both open source R and Oracle R Distribution. See the server support matrix for the complete list of supported R versions. R-3.3.0 brings improved performance and big in-memory data objects, and compatibility with the ever-growing community-contributed R packages.

For supporting packages, ORE 1.5.1 has upgraded several packages:

  • arules 1.5-0
  • cairo 1.5-9
  • DBI 0.6-1
  • png 0.1-7
  • ROracle 1.3-1
  • statmod 1.4.29
  • randomForest 4.6-12

OREdplyr

The dplyr package is widely used providing a grammar for data manipulation while working with data.frame-like objects, both in memory and out of memory. The dplyr package is also an interfaces to database management systems, operating on data.frame or numeric vector objects.

OREdplyr provides a subset of dplyr functionality extending the Oracle R Enterprise transparency layer. OREdplyr functions accept ore.frames instead of data.frames for in-database execution of the corresponding dplyr functions. OREdplyr allows users to avoid costly movement of data while scaling to larger data volumes because operations are not constrained by R Client memory.

OAAgraph

OAAgraph is a new package that provides a single, unified interface supporting the complementary use of machine learning and graph analytics technologies. Graph analytics use a graph representation of data, where data entities are nodes and relationships are edges. Machine learning produces models that identify patterns in data for both descriptive and predictive analytics. Together, these technologies complement and augment one another.

Graph analytics can be used to compute graph metrics and analysis using efficient graph algorithms and representations. These metrics can be added to structured data where machine learning algorithms build models including graph metrics as predictors - producing more accurate results. Similarly, machine learning models can be used to score or classify data. These results can be added to graph nodes where graph algorithms can be used to further explore the graph or compute new metrics leveraging the machine learning result.

New In-database Algorithms

Expectation Maximization (EM) - a popular probability density estimation technique used to implement a distribution-based clustering algorithm. Special features of this algorithm implementation include: automated model search that finds the number of clusters or components up to a stated maximum; protects against overfitting; supports numeric and multinomial distributions; produces models with high quality probability estimates; generates cluster hierarchy, rules, and other statistics; supports both Gaussian and multi-value Bernoulli distributions; and includes heuristics that automatically choose distribution types.

Explicit Semantic Analysis (ESA) - designed to improve text categorization, this algorithm computes "semantic relatedness" using cosine similarity between vectors representing the text, collectively interpreted as a space of concepts explicitly defined and described by humans. The name "explicit semantic analysis" contrasts with latent semantic analysis (LSA) because ESA uses a knowledge base that makes it possible to assign human-readable labels to concepts comprising the vector space.

Singular Value Decomposition (SVD) - this feature extraction algorithm uses orthogonal linear transformations to capture the underlying variance of data by decomposing a rectangular matrix into three matrices: U, D and V. Matrix D is a diagonal matrix and its singular values reflect the amount of data variance captured by the bases. Special features of this algorithm implementation include: support for narrow data via Tall and Skinny solvers and wide data via stochastic solvers, and providing traditional SVD for more stable results and eigensolvers for faster analysis with sparse data.

Automated Text Processing

For select algorithms in the OREdm package (Support Vector Machine, Singular Value Decomposition, Non-negative Matrix Factorization, Explicit Semantic Analysis), users can now identify columns that should be treated as text, similar to how Oracle Data Mining enables automated text processing as a precursor to model building and scoring. A new argument, ctx.setting, allows the user to specify Oracle Text attribute-specific settings. This argument is applicable to building models in Database 12.2. The name of each list element refers to a column that should be treated as text while the list value specifies the text transformation.

Partitioned Models

The OREdm package with Oracle Database 12.2 enables the building of a type of ensemble model where each model consists of multiple sub-models. A sub-model is automatically built for each partition of data, where partitions are determined based on the unique values found in user-specified columns. Partitioned models also automate scoring by allowing users to reference the top-level model only, allowing the proper sub-model to be chosen based on the values of the partitioned column(s) for each row of data to be scored.

Extensible R Algorithm Models

This feature enables R users to create an Extensible R Algorithm model using the Oracle Data Mining framework in Oracle Database 12.2. This makes such models appear as Oracle Data Mining models that are accessible using the ODM SQL API. Extensible R Algorithm models enable the user to build, score, and view a model from R via user-provided R functions stored in R Script Repository. The ORE overloaded predict method executes the user-specified scoring function for the model, returning an ore.frame with the predictions.

For a complete list of new features, see the Oracle R Enterprise User's Guide. To learn more about Oracle R Enterprise, visit Oracle R Enterprise on Oracle's Technology Network, or review the variety of use cases on the Oracle R Technologies blog.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha