Graphs are everywhere, whether looking at social media such as Facebook (friends of friends), Twitter, and LinkedIn, or customer relationships such as who calls whom or which bank accounts have money transfers between them.
Graph algorithms come in two major flavors: computational graph analytics, where we analyze the entire graph to compute metrics or identify graph components, and graph pattern matching, where queries find sub-graphs corresponding to specified patterns.
In contrast, machine learning algorithms typically train models based on observed data called cases, where each row often corresponds to a single case and the columns correspond to predictors and targets. These models are used to learn patterns in the data for scoring data and making predictions.
As depicted above, allowing seamless interaction between graph analytics and machine learning in a single environment or language, such as R, enables data scientists to leverage powerful graph algorithms to supplement the machine learning process with computed graph metrics as predictor variables. The graph analysis can provide additional strong signals, thereby making predictions more accurate. Similarly, machine learning scores or predictions can be used in combination with graph pattern matching or analytics. For example, identifying groups of close customers from their mobile call graph can improve customer churn prediction. If a customer with strong connections to other customers were to churn, this may increase the likelihood of customers in his call graph to also churn. Since a given problem can be approached from different perspectives, it may be beneficial to investigate a given problem using both graph and machine learning algorithms, and then comparing / contrasting the results for greater insight into the problem and solution.
New with Oracle R Enterprise 1.5.1 - a component of the Oracle Advanced Analytics option to Oracle Database - is the availability of the R package OAAgraph, which provides a single, unified interface supporting the complementary use of machine learning and graph analytics technologies. OAAgraph leverages the ORE transparency layer and the Parallel Graph Analytics (PGX) engine from the Oracle Spatial and Graph option to Oracle Database. PGX is an in-memory graph analytics engine that provides fast, parallel graph analysis using built-in algorithm packages, graph query / pattern-matching, and custom algorithm compilation. With some thirty-five graph algorithms, PGX exceeds open source tool capabilities.
OAAgraph uses ore.frame objects representing a graph's node and edge tables to construct an in-memory graph. While the basic node table includes node identifiers, nodes can also have properties, stored in node table columns. Similarly, relationships among nodes are described as edges - from node identifier to node identifier. Each edge may also have properties stored in edge table columns. Various graph algorithms can now be applied to the graph, and the results such as node or edge metrics, or sub-graphs can be exported again as database tables, for use by ORE machine learning algorithms.
In subsequent blog posts, we will explore OAAgraph in more detail.
R users have a few choices of how to connect to Oracle Database. The most commonly seen include: RODBC, RJDBC, and ROracle. However, these...
Based on our Fall 2017 survey, where the R Consortium asked about opportunities, concerns, and issues facing the R community, the R...
This installment of the Data Science Maturity Model (DSMM) blog series contains a summary table of the dimensions and levels. Enterprises...