X

Oracle Big Data Spatial and Graph - technical tips, best practices, and news from the product team

Combining Graph Traversal with Powerful Graph Analytics

Alan Wu
Architect

Oracle Big Data Spatial and Graph has, in the property graph feature, two important components: data access layer and in-memory analyst. This first component, data access layer, allows one to store, manage, index, query, and traverse property graph data in a horizontally scalable database (Apache HBase or Oracle NoSQL Database). And the second component, in-memory analyst, offers a rich set of out-of-the-box graph analytics and graph operations. These two components together provide a solid framework for users to build graph based applications.

In this blog, I am going to demonstrate how graph traversal, an important function supported by the data access layer, can be used together with graph analytics.


  • Setup


If you haven't already, download Oracle Big Data Lite Virtual Machine v4.4.0 (or newer) from the following page.
http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html

  • Retrieve the latest property graph Hands-on-Lab/Demo scripts


- Login to Big Data Lite 4.4.0 VM

- Click Refersh Samples icon on the desktop, follow the instructions and download the latest property graph HoL/Demo scripts. (Kudos to Marty Gubar and Nigel Bayliss who designed this very cool script that can automatically fetch latest content from Github!)

- Open the following page using the Firefox browser
  file:///home/oracle/src/hol/property_graph_hol_2015_Nov/property_graph_hol_2015_Nov.html

  • Load example property graph data


- Follow steps described in 2.3 to 2.4.2 (if you are using Oracle NoSQL Database), or steps in 4.10 to 4.11 (for Apache HBase) to load an example property graph.

  • Traverse the graph with Blueprints APIs and Gremlin syntax

In the built-in groovy shell, one can easily navigate the graph using either Blueprints Java APIs and/or Gremlin Syntax. A few examples as follows:

// find a start vertex using Blueprints Java API
opg-nosql> v=opg.getVertex(1l);
==>Vertex ID 1 {country:str:United States, name:str:Barack Obama, occupation:str:44th president of United States of America, political party:str:Democratic, religion:str:Christianity, role:str:political authority}

opg-nosql> din=com.tinkerpop.blueprints.Direction.IN; dout= com.tinkerpop.blueprints.Direction.OUT;
==>OUT

// get in edges (using Java API)
opg-nosql> v.getEdges(din);
==>Edge ID 1078 from Vertex ID 2 {country:str:United States, music genre:str:pop soul , name:str:Beyonce, role:str:singer actress} =[collaborates]=> Vertex ID 1 {country:str:United States, name:str:Barack Obama, occupation:str:44th president of United States of America, political party:str:Democratic, religion:str:Christianity, role:str:political authority} edgeKV[{weight:flo:1.0}]
...

// get out edges (using Gremlin Syntax)
opg-nosql> v.outE
==>Edge ID 1000 from Vertex ID 1 {country:str:United States, name:str:Barack Obama, occupation:str:44th president of United States of America, political party:str:Democratic, religion:str:Christianity, role:str:political authority} =[collaborates]=> Vertex ID 2 {country:str:United States, music genre:str:pop soul , name:str:Beyonce, role:str:singer actress} edgeKV[{weight:flo:1.0}]
...

// follow "collaborates" edges and add a filter on religion
opg-nosql> v.outE('collaborates').inV.filter{it.religion != 'Christianity'}
==>Vertex ID 3 {country:str:United States, name:str:Charlie Rose, role:str:talk show host journalist, show:str:Charlie Rose}
...

  • Use PipeFunction to combine Gremlin traversal and In Memory analysis.

The following scripts will create a session and in memory analyst, compute page rank value for the vertices, and start a simple Gremlin traversal from vertex (with ID 1) and limit visited vertices to those with page rank value above a threshold.

// Create in-memory analytics session and analyst
session=Pgx.createSession("session_ID_1");
analyst=session.createAnalyst();

// Read the graph from database into memory
pgxGraph = session.readGraphWithProperties(opg.getConfig());

// Execute Page Rank
rank=analyst.pagerank(pgxGraph, 0.00000001, 0.85, 5000);

import com.tinkerpop.gremlin.java.*;
import com.tinkerpop.pipes.*;

opg-nosql> pipe = new GremlinPipeline(opg.getVertex(1).out("collaborates").filter(new PipeFunction<Vertex, Boolean>() { public Boolean compute(Vertex v) { if (rank.get(v.getId()) > 0.01) return true ; return false; } }));

// Traversal results shown below
 ==>Vertex ID 2 {country:str:United States, music genre:str:pop soul , name:str:Beyonce, role:str:singer actress}
...

The important part of the above traversal is that it includes a Tinkerpop PipeFunction implementation which, upon receiving a vertex from the traversal, checks the analytical result (from a parallel in-memory page rank computation) for that vertex, and uses that information to guide the traversal.

Acknowledgement: thanks Jay Banerjee for his input on this blog post.


Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.