X

Oracle Big Data Spatial and Graph - technical tips, best practices, and news from the product team

From Relational Table(s) to Property Graph

Alan Wu
Architect

Lately, I have got quite a few questions on how to convert a
relational data source (tables or views) to a property graph. It
is actually straightforward. In this post, I am going to
demonstrate an end to end flow using a well-known table "EMP" in
the "SCOTT" schema. 

As usual, I am going to use Oracle Big Data Lite VM (latest
version is 4.4.0
as of Mar 17th 2016) because it has the whole big
data technology stack, Oracle Big Data Spatial and Graph, andOracle Database 12.1.0.2. Got everything we need in a single
box.



On the desktop, click "Start/Stop Services", check ORCL and NoSQL
database, and hit Enter.  This will bring up Oracle Database and
also Oracle NoSQL Database, if they are down.


Let's first take a look at the relational data source that we want
to convert into property graph. From a Linux terminal, login to
the Oracle Database and describe the "EMP" table.


sqlplus scott/tiger@orcl

SQL*Plus: Release 12.1.0.2.0 Production on Thu Mar 17
18:24:01 2016

Oracle Database 12c Enterprise Edition Release 12.1.0.2.0
- 64bit Production


SQL> desc emp;
 Name                                      Null?    Type
 ----------------------------------------- --------
----------------------------

 EMPNO                                     NOT NULL
NUMBER(4)

 ENAME                                             
VARCHAR2(10)

 JOB                                               
VARCHAR2(9)

 MGR                                               
NUMBER(4)

 HIREDATE                                           DATE
 SAL                                               
NUMBER(7,2)

 COMM                                              
NUMBER(7,2)

 DEPTNO                                            
NUMBER(2)



SQL> -- The following shows a few rows of the table
SQL> select empno, ENAME, mgr from emp;

     EMPNO ENAME             MGR
---------- ---------- ----------
      7369 SMITH            7902
      7499 ALLEN            7698
      7521 WARD             7698
      7566 JONES            7839
      7654 MARTIN           7698
      7698 BLAKE            7839
      7782 CLARK            7839
      7788 SCOTT            7566
      7839 KING
      7844 TURNER           7698
      7876 ADAMS            7788



This employee table (EMP) has well-defined columns. Now,
say we want to model employees as vertices, and the "manager"
relationship as edges in a property graph.
Conceptually, we will have a property graph with "SMITH",
"ALLEN" etc. as vertices, and edges of label "manager" that link
those people together. Note that employee "KING" is probably the
CEO as he is the only one that does not have a manager.


First, we need to copy this xdb.jar from ORACLE_HOME/jlib into
the lib/ directory of property graph installation.

$ cp
/u01/app/oracle/product/12.1.0.2/dbhome_1/rdbms/jlib/xdb.jar
/opt/oracle/oracle-spatial-graph/property_graph/lib/




Now, let's start the groovy shell.

cd
/opt/oracle/oracle-spatial-graph/property_graph/dal/groovy/

sh gremlin-opg-nosql.sh

//
// First, create an empty property graph
//
server = new ArrayList<String>();
server.add("bigdatalite:5000");

// Create a graph config that contains the graph name
"connections"

// KV store name "kvstore", edge property "weight" to be
loaded into

// in-memory graph, etc.
cfg =
GraphConfigBuilder.forPropertyGraphNosql()             \

 
.setName("employee").setStoreName("kvstore")               \

 
.setHosts(server)                                          \

  .setMaxNumConnections(2).build();

// Get an instance of OraclePropertyGraph which is a key
Java

// class to manage property graph data
opg = OraclePropertyGraph.getInstance(cfg);


import oracle.jdbc.pool.*;
import oracle.jdbc.*;


// Make a connection to the Oracle Database

//

ds = new OracleDataSource();

ds.setURL("jdbc:oracle:thin:@localhost:1521:orcl");
ds.setUser("scott");
ds.setPassword("tiger");
conn = ds.getConnection();


// Now, read from EMP table, construct vertices and add
them to the property graph


// using Blueprints API

stmtV = conn.createStatement();
rs = stmtV.executeQuery(
  "select empno, ename, job, hiredate, sal, comm, deptno
from EMP"); i=1;

while (rs.next()) {
  System.out.println("EMP " + rs.getInt(1) + " ename " +
rs.getString(2));

  v = opg.addVertex((long) rs.getInt(1)); //
employee ID becomes vertex ID

  v.setProperty("name",    rs.getString(2));
  v.setProperty("job",     rs.getString(3));
  v.setProperty("hiredate",rs.getDate(4));
  v.setProperty("sal",     rs.getFloat(5));
  v.setProperty("comm",    rs.getFloat(6));
  v.setProperty("deptno",  rs.getInt(7));
}

rs.close();
stmtV.close();



//
// Now read the table again, construct and add edges
to the property graph

// using Blueprints API

// Note that we skip employees that have no manager
//

stmtE = conn.createStatement();

rs = stmtE.executeQuery(
  "select rownum, empno, mgr from EMP where mgr is
not null
"); i=1;

while (rs.next()) {
  System.out.println("EMP " + rs.getInt(2) + " mgr " +
rs.getString(3));

  vs = opg.getVertex((long) rs.getInt(2));
  vd = opg.getVertex((long) rs.getInt(3));
   e = opg.addEdge((long) rs.getInt(1), vs, vd,
"manager");

}

rs.close();
stmtE.close();
conn.close();

opg.commit();



//
// Finally, write it out as .opv and .ope files

//

OraclePropertyGraphUtils.exportFlatFiles(opg, "/u02/emp.opv",
"/u02/emp.ope", false);



//
// Let's check the output. Pay attention to the various
data types

// used (float, string, date, integer, etc.)

//

// A snippet of the vertex flat file

//
[oracle@bigdatalite ~]$ head -10 /u02/emp.opv
7369,comm,3,,0.0,
7369,name,1,SMITH,,
7369,job,1,CLERK,,
7369,hiredate,5,,,1980-12-17T00:00:00.000-05:00
7369,deptno,2,,20,
7369,sal,3,,800.0,
7566,comm,3,,0.0,
7566,name,1,JONES,,
7566,job,1,MANAGER,,
7566,hiredate,5,,,1981-04-02T00:00:00.000-05:00


// A snippet of the edge flat file

//
[oracle@bigdatalite ~]$ head -10 /u02/emp.ope
5,7654,7698,manager,%20,,,,
7,7782,7839,manager,%20,,,,
11,7900,7698,manager,%20,,,,
6,7698,7839,manager,%20,,,,
1,7369,7902,manager,%20,,,,
10,7876,7788,manager,%20,,,,
12,7902,7566,manager,%20,,,,
8,7788,7566,manager,%20,,,,
2,7499,7698,manager,%20,,,,
3,7521,7698,manager,%20,,,,



That is it. We have successfully converted the famous EMP
table into a property graph flat file format. BTW, this graph
has already been loaded into the database. It is ready for
further analysis.

Acknowledgement: thanks Jay Banerjee for his input on this blog post.


Join the discussion

Comments ( 4 )
  • guest Saturday, April 2, 2016

    sorry

    I got the change in config API (.forPropertyGraphHbase() ) nad it works,

    But now there is a following error, as follows

    thank you michele

    opg-nosql> server = new ArrayList<String>();

    opg-nosql> server.add("bigdatalite:5000");

    ==>true

    opg-nosql> cfg = GraphConfigBuilder.forPropertyGraphHbase().setName("myFirstGraph") \

    opg-nosql> .setZkQuorum("localhost").setZkClientPort(2181).addEdgeProperty("weight", \

    opg-nosql> PropertyType.DOUBLE, "1000000").build();

    ==>{"zk_quorum":"localhost","db_engine":"HBASE","vertex_props":[],"edge_props":[{"default":"1000000","type":"double","name":"weight"}],"format":"pg","zk_client_port":2181,"error_handling":{},"name":"myFirstGraph","loading":{}}

    opg-nosql> opg = OraclePropertyGraph.getInstance(cfg);

    No signature of method: static oracle.pg.nosql.OraclePropertyGraph.getInstance() is applicable for argument types: (oracle.pgx.config.PgHbaseGraphConfig) values: [{"zk_quorum":"localhost","db_engine":"HBASE","vertex_props":[],"edge_props":[{"default":"1000000","type":"double","name":"weight"}],"format":"pg","zk_client_port":2181,"error_handling":{},"name":"myFirstGraph","loading":{}}]

    Possible solutions: getInstance(oracle.pgx.config.PgNosqlGraphConfig), getInstance(oracle.kv.KVStoreConfig, java.lang.String), getInstance(oracle.kv.KVStoreConfig, java.lang.String, oracle.kv.LoginCredentials), getInstance(java.lang.String, java.lang.String, int, java.lang.String), getInstance(oracle.kv.KVStoreConfig, java.lang.String, java.lang.String, java.lang.String), getInstance(oracle.kv.KVStoreConfig, java.lang.String, java.lang.String, java.lang.String, java.lang.String)

    Display stack trace? [yN]


  • guest Monday, April 4, 2016

    I think the problem was that you used a config for HBase instead of a config for Oracle NoSQL Database.

    Here is an example on Oracle NoSQL Database.

    server = new ArrayList<String>();

    server.add("bigdatalite:5000");

    // Create a graph config that contains the graph name "connections"

    // KV store name "kvstore", edge property "weight" to be loaded into

    // in-memory graph, etc.

    cfg = GraphConfigBuilder.forPropertyGraphNosql() \

    .setName("myFirstGraph").setStoreName("kvstore") \

    .setHosts(server) \

    .addEdgeProperty("weight", PropertyType.DOUBLE, "1000000") \

    .setMaxNumConnections(2).build();

    // Get an instance of OraclePropertyGraph which is a key Java

    // class to manage property graph data

    opg = OraclePropertyGraph.getInstance(cfg);

    Please let me know if you have any further questions.

    Zhe Wu


  • guest Tuesday, April 5, 2016

    Thanks Zhe,

    Yes, I want to use HBase and it works. The error was in the groovish script to be called, which if I wanto to use HBase is :

    # sh gremlin-opg-hbase.sh (instead of gremlin-opg-nosql.sh)

    As an additional fix of the code: in the first row of the Employees table, I find:

    EMPNO ENAME MGR

    ---------- ---------- ----------

    7369 SMITH 7902

    the MGR Id:7902 is NOT present in the EMPNO IDs, so the edge creation in this case is going to fail (to fix this I replaced 7902 with 7839)

    Given this the example works perfectly, thanks.

    As an additional question:

    Can we omit to provide primary identifiers to the edges/vertex creation or do we have some sequence-like component in the APIs allowing us to have fresh IDs ?

    Once I must create a graph reading from multiple RDBMS sources, I'm afraid to not provide two times the same ID inserting edges and vertex form different tables/schemas.

    thank you

    /Michele Piunti


  • guest Tuesday, April 5, 2016

    Glad you got it working :)

    It is recommended that you provide the IDs for vertices and edges. You can also provide a String value and we will hash it to a 64 bits integer.

    To avoid using the same ID, you can use something like a global instance of AtomicInteger or AtomicLong

    e.g.

    mySeq = new java.util.concurrent.atomic.AtomicInteger(initialValue);

    int newValue = mySeq.getAndIncrement()

    Hope it helps,

    Zhe Wu


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.