X

Oracle Big Data Spatial and Graph - technical tips, best practices, and news from the product team

Identifying Influencers with the Built-in Page Rank Analytics in Oracle Big Data Spatial and Graph

Alan Wu
Architect

In a previous post , I talked about how to start Oracle NoSQL Database in the BigDataLite 4.2 VM and demonstrated a few property graph functions. In this post,
I am going to use the same property graph feature provided in that VM to identify influencers in a social network. This time, Apache HBase is used as the backend
database.

  • Setup and configuration

First things first, if you haven't already, download Oracle Big Data Lite Virtual Machine v4.2.0 from the following page.
http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html

It is recommended to use Oracle VM Virtual Box 4.3.28 to import this virtual machine. Once import is done successfully, login as oracle (default password is welcome1). On the desktop, there is a "Start/Stop Services" icon - double clicking that will lead to a console popup with a list of services. Uncheck everything, hit enter, and you will see existing services being terminated with the following output on the terminal.

Stopping hadoop-hdfs-namenode
...
Stopping hadoop-yarn-nodemanager
...

Double click "Start/Stop Services" again and this time check ClouderaManager and hit Enter. This will bring up ClouderaManager. (Note that I have allocated ~12GB RAM to this VM.)

Now, start the Firefox browser and open the following URL and login as admin/admin:
http://bigdatalite:7180/cmf

Click Add Service.  A table of "Select the type of service you want to add" will be presented. You will see in this table services like "Accumulo 1.6", "Flume", "HBase", among many others.

Check HBase. Click Continue a few times till you see the following message.

  "Congratulations! Your new service is installed and configured on your cluster."

Click Finish and your browser will automatically go back to bigdatalite:7180:cmf/home page

Go to a Linux terminal and run the following command:

$ sudo cp /opt/oracle/oracle-spatial-graph/property_graph/lib/sdopgdal-for-cdh5.2.1.jar /usr/lib/hbase/lib/


Switch back to the browser page.  Click the drop down icon in the table row for HBase, select Start, click Start button and wait. After a while, the status icon to the left of HBase will become a solid green circle, as shown below.


Note that in this particular setup, the default Java Heap Size of HBase (RegionServer in Bytes) is only 50MiB, which is way too small. I set it to 800MiB using the ClouderaManager.

  • Execute property graph functions

 Open a Linux terminal in this virtual machine and type in the following:

$ cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy
$ unset CLASSPATH
$ ./gremlin-opg-hbase.sh

//
// Connect to the Oracle NoSQL Database in this virtual box
//
cfg = GraphConfigBuilder.forHbase()                                                  \
                        .setName("connections")                                      \
                        .setZkQuorum("bigdatalite")                                  \
                        .setZkClientPort(2181)                                       \
                        .setZkSessionTimeout(60000)                                  \
                        .setMaxNumConnections(2)                                     \
                        .setInitialNumRegions(3)                                     \
                        .setSplitsPerRegion(1)                                       \
                        .addEdgeProperty("lbl", PropertyType.STRING, "lbl")          \
                        .addEdgeProperty("weight", PropertyType.DOUBLE, "1000000")   \
                        .build();


// Note: the above GraphConfigBuilder.forHbase is available

// only in Big Data Spatial and Graph v1.0.

// For v1.1 or newer, use updated Java APIs instead. See here for details.

//
// Get an instance of OraclePropertyGraph
//
opg = OraclePropertyGraph.getInstance(cfg);
opg.clearRepository();

//
// Use the parallel data loader API to load a
// sample property graph in flat file formatwith a
// degree of parallelism (DOP) 2
//
vfile="../../data/connections.opv"
efile="../../data/connections.ope"
opgdl=OraclePropertyGraphDataLoader.getInstance();
opgdl.loadData(opg, vfile, efile, 2);


// read through the vertices
opg.getVertices();

// read through the edges
opg.getEdges();

//
// You can add vertices/edges, change properties etc. here.
// ...
//

//
// Serialize the graph out into a pair of flat fileswith DOP=2
//
vOutput="/tmp/mygraph.opv"
eOutput="/tmp/mygraph.ope"
OraclePropertyGraphUtils.exportFlatFiles(opg, vOutput, eOutput, 2, false);

//
// Run the built-in analytical function, PageRank, to identify influencers
//
analyst = opg.getInMemAnalyst();

rank = analyst.pagerank(0.0001, 0.85, 100).get();
rank.getTopKValues(5);

v1=opg.getVertex(1l); v2=opg.getVertex(60l); v3=opg.getVertex(42l);    \
System.out.println("Top 3 influencers: \n " + v1.getProperty("name") + \
                     "\n " + v2.getProperty("name") +                  \
                     "\n " + v3.getProperty("name"));


The last output of the script above should be something as follows:

Top 3 influencers:
 Barack Obama
 Nicolas Maduro
 NBC


It is really simple, isn't it?  Now, are you interested in finding out communities in this graph?


Join the discussion

Comments ( 2 )
  • marti Wednesday, August 5, 2015

    Hi , This example is really interesting and works in NOSQL. But how did you set the default Java Heap Size of HBase (RegionServer in Bytes) to 800MiB using the ClouderaManager? I cannot find this setting in Claudera manager in CDH 5.4.0. Thanks marti


  • Zhe Wu Thursday, August 6, 2015

    Section 3.1.1.2 "Modifying the Java Memory Settings" of the following document described the steps

    (first part for CDH 5.2.x and 5.3.x, and second part for CDH 5.4.0) to change the Java heap size for Apache HBase.

    https://docs.oracle.com/cd/E63064_01/doc.42/e62124/pg_install.htm#BDSPA184

    Zhe Wu


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha