In a previous post , I talked about how to start Oracle NoSQL Database in the BigDataLite 4.2 VM and demonstrated a few property graph functions. In this post,
I am going to use the same property graph feature provided in that VM to identify influencers in a social network. This time, Apache HBase is used as the backend
First things first, if you haven't already, download Oracle Big Data Lite Virtual Machine v4.2.0 from the following page.
It is recommended to use Oracle VM Virtual Box 4.3.28 to import this virtual machine. Once import is done successfully, login as oracle (default password is welcome1). On the desktop, there is a "Start/Stop Services" icon - double clicking that will lead to a console popup with a list of services. Uncheck everything, hit enter, and you will see existing services being terminated with the following output on the terminal.
Double click "Start/Stop Services" again and this time check ClouderaManager and hit Enter. This will bring up ClouderaManager. (Note that I have allocated ~12GB RAM to this VM.)
Now, start the Firefox browser and open the following URL and login as admin/admin:
Click Add Service. A table of "Select the type of service you want to add" will be presented. You will see in this table services like "Accumulo 1.6", "Flume", "HBase", among many others.
Check HBase. Click Continue a few times till you see the following message.
"Congratulations! Your new service is installed and configured on your cluster."
Click Finish and your browser will automatically go back to bigdatalite:7180:cmf/home page
$ sudo cp /opt/oracle/oracle-spatial-graph/property_graph/lib/sdopgdal-for-cdh5.2.1.jar /usr/lib/hbase/lib/
Switch back to the browser page. Click the drop down icon in the table row for HBase, select Start, click Start button and wait. After a while, the status icon to the left of HBase will become a solid green circle, as shown below.
Note that in this particular setup, the default Java Heap Size of HBase (RegionServer in Bytes) is only 50MiB, which is way too small. I set it to 800MiB using the ClouderaManager.
Open a Linux terminal in this virtual machine and type in the following:
$ cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy
$ unset CLASSPATH
// Connect to the Oracle NoSQL Database in this virtual box
cfg = GraphConfigBuilder.forHbase() \
.addEdgeProperty("lbl", PropertyType.STRING, "lbl") \
.addEdgeProperty("weight", PropertyType.DOUBLE, "1000000") \
// Note: the above GraphConfigBuilder.forHbase is available
// only in Big Data Spatial and Graph v1.0.
// For v1.1 or newer, use updated Java APIs instead. See here for details.
// Get an instance of OraclePropertyGraph
opg = OraclePropertyGraph.getInstance(cfg);
// Use the parallel data loader API to load a
// sample property graph in flat file formatwith a
// degree of parallelism (DOP) 2
opgdl.loadData(opg, vfile, efile, 2);
// read through the vertices
// read through the edges
// You can add vertices/edges, change properties etc. here.
// Serialize the graph out into a pair of flat fileswith DOP=2
OraclePropertyGraphUtils.exportFlatFiles(opg, vOutput, eOutput, 2, false);
// Run the built-in analytical function, PageRank, to identify influencers
analyst = opg.getInMemAnalyst();
rank = analyst.pagerank(0.0001, 0.85, 100).get();
v1=opg.getVertex(1l); v2=opg.getVertex(60l); v3=opg.getVertex(42l); \
System.out.println("Top 3 influencers: \n " + v1.getProperty("name") + \
"\n " + v2.getProperty("name") + \
"\n " + v3.getProperty("name"));
The last output of the script above should be something as follows:
Top 3 influencers:
It is really simple, isn't it? Now, are you interested in finding out communities in this graph?