X

Oracle Big Data Spatial and Graph - technical tips, best practices, and news from the product team

Recent Posts

Adding Location and Graph Analysis to Big Data

A Tip on Using Property Value as Vertex Label(s)

Oracle's property graph database (Oracle Spatial and Graph Property Graph, and Oracle Big Data Spatial and Graph) has a feature that allows one to use the literal value of a designated property to define vertex label(s). The difference between a label and a property is not always clear. In general, a label denotes some kind of typing and/or categorization.  One thing to watch out for is that in addition to using the following two setters, it is also important to add a setLoadVertexLabels(true) when constructing a graph config. Without it, your in-memory graph simply does not have the vertex labels. setUseVertexPropertyValueAsLabel("<property_key>"), and setPropertyValueDelimiter(",") A complete example is as follows. This Groovy-based code snippet assumes an Oracle NoSQL Database as the backend. server = new ArrayList<String>(); server.add("bigdatalite:5000"); // -- use country as label cfg = GraphConfigBuilder.forPropertyGraphNosql()     .setName("connections").setStoreName("kvstore")     .setHosts(server) .hasEdgeLabel(true)     .setLoadEdgeLabel(true)                      .addEdgeProperty("weight", PropertyType.DOUBLE, "1000000")     .setMaxNumConnections(2)     .addVertexProperty("country", PropertyType.STRING, "null")     .addVertexProperty("name", PropertyType.STRING, "empty name")     .setLoadVertexLabels(true)     .setUseVertexPropertyValueAsLabel("country")     .setPropertyValueDelimiter(",")     .build(); opg = OraclePropertyGraph.getInstance(cfg); opg.clearRepository(); // OraclePropertyGraphDataLoader is a key Java class // to bulk load property graph data into the backend databases. opgdl=OraclePropertyGraphDataLoader.getInstance(); vfile="../../data/connections.opv" efile="../../data/connections.ope" opgdl.loadData(opg, vfile, efile, 2); // Create an in memory analytics session and analyst session=Pgx.createSession("session_ID_1"); analyst=session.createAnalyst(); // Read graph data from database into memory pgxGraph = session.readGraphWithProperties(opg.getConfig()); opg-nosql> v=pgxGraph.getVertex(1l) ==>PgxVertex[ID=1] opg-nosql> v.labels ==>United States // add a vertex label constraint in a PGQL query opg-nosql> pgxResultSet = pgxGraph.queryPgql("SELECT n,m WHERE (n:'United States') -> (m)") ==>PgqlResultSet[graph=connections,numResults=117] opg-nosql> pgxResultSet.print(10); --------------------------------------- | n                | m                | ======================================= | PgxVertex[ID=38] | PgxVertex[ID=2]  | | PgxVertex[ID=1]  | PgxVertex[ID=2]  | ...   Cheers, Zhe  

Oracle's property graph database (Oracle Spatial and Graph Property Graph, and Oracle Big Data Spatial and Graph) has a feature that allows one to use the literal value of a designated property to...

Adding Location and Graph Analysis to Big Data

Using PGQL in Python

I got a question on how to run and consume PGQL in Python yesterday so I decided to write a short blog about it. Find below a complete example on executing PGQL, iterating through its result set, and generating a bar chat. Note that I am using BDSG v2.4 and Python 2.7. The same flow applies to Oracle Spatial and Graph (other than the graph configuration part). Start the pyopg shell script. You can also use Jupyter if you want. $ sh ./pyopg.sh     ____        ____  ____  ______    / __ \__  __/ __ \/ __ \/ ____/   / /_/ / / / / / / / /_/ / / __  / ____/ /_/ / /_/ / ____/ /_/ / /_/    \__, /\____/_/    \____/       /____/ Oracle Big Data Spatial and Graph Property Graph Python Shell ... Context available as opg Class loading done >>> >>> gcb=JClass('oracle.pgx.config.GraphConfigBuilder') pgx_types = JPackage('oracle.pgx.common.types') server = JClass('java.util.ArrayList')(); server.add("bigdatalite:5000"); gcb=JClass('oracle.pgx.config.GraphConfigBuilder') cfg = gcb.forPropertyGraphNosql() .setName("connections").setStoreName("kvstore") .setHosts(server) .hasEdgeLabel(True).setLoadEdgeLabel(True).addVertexProperty("name", pgx_types.PropertyType.STRING, "empty name") .setMaxNumConnections(2).build(); pgx_param=JClass("java.util.HashMap")() instance=JClass("oracle.pgx.api.Pgx").getInstance() if not instance.isEngineRunning():   instance.startEngine(pgx_param) session=instance.createSession("mysession1") pgxGraph = session.readGraphWithProperties(cfg); pgxResultSet = pgxGraph.queryPgql("SELECT n,m WHERE (n) -> (m)") it=pgxResultSet.getResults().iterator() while (it.hasNext()):   element = it.next();   print element.toString() The output may look like what's shown below. ==> ... n(VERTEX)=56 m(VERTEX)=58 n(VERTEX)=56 m(VERTEX)=57 n(VERTEX)=56 m(VERTEX)=5 n(VERTEX)=59 m(VERTEX)=60 n(VERTEX)=67 m(VERTEX)=37 n(VERTEX)=67 m(VERTEX)=73 n(VERTEX)=67 m(VERTEX)=72 ... While command line output is useful, we can do a bit of charting with Pyplot. The query itself is a simple aggregation and counting based on vertex's name property. pgxResultSet = pgxGraph.queryPgql("SELECT n.name,count(m) as size WHERE (n) -> (m) group by n.name limit 10") it=pgxResultSet.getResults().iterator() graph_communities = [{"name":i.get(0),"size":int(i.get(1).toString())} for i in it] import pandas as pd import numpy as np community_frame = pd.DataFrame(graph_communities) community_frame[:5] import matplotlib as mpl import matplotlib.pyplot as plt fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(16,22)); ax=community_frame.plot.bar(x='name', y='size', rot=0) plt.show() Cheers, Zhe

I got a question on how to run and consume PGQL in Python yesterday so I decided to write a short blog about it. Find below a complete example on executing PGQL, iterating through its result set, and...

Adding Location and Graph Analysis to Big Data

Applying changes to an in-memory graph snapshot

A little while ago, I got a question on how to apply changes to an existing in-memory graph snapshot and generate an updated graph. The following Groovy code snippet shows how. Note that I am using Oracle Big Data Spatial and Graph (BDSG) v2.4. // First, read the graph data out of an Oracle NoSQL Database cfg = GraphConfigBuilder.forPropertyGraphNosql()             \   .setName("connections").setStoreName("kvstore")            \   .setHosts(server)                                          \   .addVertexProperty("name", PropertyType.STRING, "null") \   .addEdgeProperty("weight", PropertyType.DOUBLE, "1000000") \   .setMaxNumConnections(2).build(); // Get an instance of OraclePropertyGraph which is a key Java // class to manage property graph data opg = OraclePropertyGraph.getInstance(cfg); // assume there is a vertex (ID = 1) with name="John" in this graph // read this graph into in-memory snapshot pgxGraph = session.readGraphWithProperties(opg.getConfig(), true); // We can inspect vertex 1 // pgxVertex1=pgxGraph.getVertex(1) opg-nosql> pgxVertex1.getProperty("name") ==>John   // Start to make changes  changeSet = pgxGraph.createChangeSet() vm = changeSet.updateVertex(pgxVertex1.getId()) vm.setProperty("name", "Hey a new name") pgxGraphNew = changeSet.build("new") pgxVertexNew1 = pgxGraphNew.getVertex(1) pgxVertexNew1.getProperty("name") opg-nosql> pgxVertexNew1.getProperty("name") ==>Hey a new name // The original vertex in the old graph still has name="John" opg-nosql> pgxGraph.getVertex(1).getProperty("name") ==>John Cheers, Zhe  

A little while ago, I got a question on how to apply changes to an existing in-memory graph snapshot and generate an updated graph. The following Groovy code snippet shows how. Note that I am...

Adding Location and Graph Analysis to Big Data

Join AskTOM Office Hours on May 31 - Gain Insights with Graph Analytics

Want to learn how to use powerful graph analysis in your business applications?  Get a jumpstart in our free, monthly 1 hour sessions for developers in the AskTOM series.  This month’s topic: May 31, 2018    8:00 US PDT  |  17:00 CEST Exploring Gain Insights with Graph Analytics See the magic of graphs in this session. Graph analysis can answer questions like detecting patterns of fraud or identifying influential customers - and do it quickly and efficiently. Our experts Albert Godfrind and Zhe Wu will show you the APIs for accessing graphs and running analytics such as finding influencers, communities, anomalies, and how to use them from various languages including Groovy, Python, and Javascript, with Jupiter and Zeppelin notebooks. As always, feel free to bring any graph-related questions you have for us also. Learn more and register here:  https://devgym.oracle.com/pls/apex/dg/office_hours/3084 If you missed them, view replays from prior AskTOM property graph sessions here: Introduction to Property Graphs https://youtu.be/e_lBqPh2k6Y How To Model and Construct Graphs with Oracle Database  https://youtu.be/evFTmXWU7Zw    

Want to learn how to use powerful graph analysis in your business applications?  Get a jumpstart in our free, monthly 1 hour sessions for developers in the AskTOM series.  This month’s topic: May 31,...

Adding Location and Graph Analysis to Big Data

Free AskTOM Office Hours on April 24 - How To Model and Construct Graphs

We invite you to join our next free property graph AskTOM Office Hours session.  This session focuses on Oracle Database, but the graph modeling concepts are also useful for users of big data platforms. April 24, 2018   8:00 US PDT | 17:00 CEST Exploring How To Model and Construct Graphs with Oracle Database With property graphs in Oracle Database, you can perform powerful analysis on big data such as social networks, financial transactions, sensor networks, and more. To use property graphs, first, you’ll need a graph model. For a new user, modeling and generating a suitable graph for an application domain can be a challenge. This month, we’ll describe key steps required to construct a meaningful graph, and offer a few tips on validating the generated graph. Join Albert Godfrind (EMEA Solutions Architect) and Zhe Wu (Architect) who will walk you through, and take your questions. Learn more and register here:  https://devgym.oracle.com/pls/apex/dg/office_hours/3084 The replay from February’s session, Introduction to Property Graphs, is available here:    https://asktom.oracle.com/pls/apex/f?p=100:551:::NO:551:P551_CLASS_ID:3401&cs=11 ED98BAEBB81DD95ADED292AE0744349    

We invite you to join our next free property graph AskTOM Office Hours session.  This session focuses on Oracle Database, but the graph modeling concepts are also useful for users of big...

Adding Location and Graph Analysis to Big Data

A Robust and Easy Way to Create Flat Files (.opv/.ope) using Python on BDSG

This is a sibling blog for a very similar one posted a few days ago. The only difference is the example code snippets shown blown are using Python and JPype. Basically, a user's question made me realize it is indeed tricky for a user to manually create records in the flat file format (.opv/.ope). Fortunately, the product offers two utility APIs to help a user to manually and incrementally create .opv/.ope. Note that, if you have a CSV or a relational data source, then there are utility APIs to convert the whole CSV or tables (views) into flat files in one shot. There is no need to do the graph generation incrementally.  Below, I am showing some code snippets in Python. Specifically, the snippets are calling Java APIs via JPype. ... util=JClass("oracle.pg.common.OraclePropertyGraphUtilsBase") osv=JClass('java.io.FileOutputStream')("/tmp/g.opv") ose=JClass('java.io.FileOutputStream')("/tmp/g.ope") vid1 = JClass("java.lang.Long")(1) util.outputVertexRecord(osv, vid1, "name", "Jon"); util.outputVertexRecord(osv, vid1, "salary", 18000.5); util.outputVertexRecord(osv, vid1, "age", 18); vid2 = JClass("java.lang.Long")(2) util.outputVertexRecord(osv, vid2, "name", "Mary"); util.outputVertexRecord(osv, vid2, "married", True); osv.flush() sdf=JClass('java.text.SimpleDateFormat')("mm/dd/yyyy") eid1=JClass("java.lang.Long")(100) util.outputEdgeRecord(ose, eid1, vid1, vid2, "friend_of", "started_on", sdf.parse("01/02/2004")); ose.flush() ​...   Now, let's check the output. As you can see, all the commas, ID of data types, property values are all handled properly. $ cat /tmp/g.opv 1,name,1,Jon,, 1,salary,4,,18000.5, 1,age,2,,18, $ cat /tmp/g.ope 100,1,2,friend_of,started_on,5,,,2004-01-02T00:01:00.000-05:00 Cheers, Zhe Wu  

This is a sibling blog for a very similar one posted a few days ago. The only difference is the example code snippets shown blown are using Python and JPype. Basically, a user's question made me...

Adding Location and Graph Analysis to Big Data

A Robust and Easy Way to Create Flat Files (.opv/.ope) on BDSG

Today I got a question on a data loading problem that happened during a parallel load of a PG flat file (.opv/.ope). This made me realize it is indeed tricky for a user to manually create those records in the flat file format. Some of you may even question who on earth designed such a weird looking format with all those commas. Well, that would be me. Believe it or not, there is a reason for such a format :)  For users who are familiar with Oracle Database external tables, you may see that it is trivial to define an external table on top of .opv/.ope and ingest the graph data into Oracle Database, really efficiently. Now, back to the original problem. Fortunately, the product also offers two utility APIs to help a user to manually and incrementally create .opv/.ope. Note that, if you have a CSV or a relational data source, then there are utility APIs to convert the whole CSV or tables (views) into flat files in one shot. There is no need to do the graph generation incrementally.  As usual, I am using Java code snippets in Groovy. cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy/ osV = new FileOutputStream("/tmp/g.opv") osE = new FileOutputStream("/tmp/g.ope") // Say there is a person Jon with some attributes (properties). // The following static API outputVertexRecord is able to handle escaping/comma placement, for you. // Note that different data types are used here for illustration purpose. OraclePropertyGraphUtilsBase.outputVertexRecord(osV, 1l, "name", "Jon"); OraclePropertyGraphUtilsBase.outputVertexRecord(osV, 1l, "salary", 18000.5d); OraclePropertyGraphUtilsBase.outputVertexRecord(osV, 1l, "age", 18); osV.flush(); Now, if we take a look at the generated .opv file, you will see [oracle@bigdatalite ~]$ cat /tmp/g.opv 1,name,1,Jon,, 1,salary,4,,18000.5, 1,age,2,,18,   So far so good, let's introduce a second vertex Mary who is married. OraclePropertyGraphUtilsBase.outputVertexRecord(osV, 2l, "name", "Mary"); OraclePropertyGraphUtilsBase.outputVertexRecord(osV, 2l, "married", Boolean.TRUE); osV.flush(); ==> [oracle@bigdatalite ~]$ cat /tmp/g.opv 1,name,1,Jon,, 1,salary,4,,18000.5, 1,age,2,,18, 2,name,1,Mary,, 2,married,6,Y,, Finally, let's create an edge (denoting friend relationship) that links these two vertices together. Assume this friendship started on a fine day in 2004, sdf=new java.text.SimpleDateFormat("mm/dd/yyyy") OraclePropertyGraphUtilsBase.outputEdgeRecord(osE, 100l, 1l, 2l, "friend_of", "started_on", sdf.parse("01/02/2004")); osE.flush()   And let's take a look at the generated .ope file. [oracle@bigdatalite ~]$ cat /tmp/g.ope 100,1,2,friend_of,started_on,5,,,2004-01-02T00:01:00.000-05:00 [oracle@bigdatalite ~]$   Cheers, Zhe Wu

Today I got a question on a data loading problem that happened during a parallel load of a PG flat file (.opv/.ope). This made me realize it is indeed tricky for a user to manuallycreate those records...

Adding Location and Graph Analysis to Big Data

Announcing Graph Developer Training Day – March 19, 2018

The Oracle Spatial and Graph product team announces Graph Developer Training Day 2018, a free full-day workshop to help you understand and develop property graph applications using Oracle Database Spatial and Graph and Big Data Spatial and Graph.  Oracle partners, customers, attendees of Analytics and Data Summit 2018, and Oracle staff are invited.  Targeted toward developers, architects and data scientists.  Sessions will be delivered by Oracle developers and product managers. The event will take place at Oracle HQ on Monday, March 19, before Analytics and Data Summit.  Please RSVP to marion.smith@oracle.com by March 6 if you’re planning to attend!  Seating is limited.   Details: March 19, 2018 8:15am – 5:00pm Oracle Conference Center 350 Oracle Pkwy, Redwood City, CA 94065   Agenda (subject to change): 8:15 – 9:00am       Breakfast/Registration   9:00 - 10:30am      Getting Started with Graph Databases - Welcome and overview of graph technologies - Provisioning an Oracle Database Cloud Service (DBCS) - Understanding graph formats and efficient data loading   10:30 - 11:00am    Break   11:00 -12:30            Generating and Analyzing Graph Data - Graph generation - how to construct a graph from source data - Graph analytics using PGX and RDBMS   12:30 – 1:30pm     Networking Lunch   1:30 – 3:00pm                 Graph Query and Visualization - Property Graph Query Language (PGQL) - on PGX and RDBMS - Graph visualization (Cytoscape)   3:00 – 3:30pm                 Break   3:30 – 5:00pm                 New Tooling and Functionality + Lightning Round - Notebook UI - Graph Studio - Lightning Round   5:00                       Wrap-up and Close    

The Oracle Spatial and Graph product team announces Graph Developer Training Day 2018, a free full-day workshop to help you understand and develop property graph applications using Oracle Database...

Adding Location and Graph Analysis to Big Data

Spatial and Graph Sessions at Analytics and Data Summit 2018

All Analytics. All Data. No Nonsense. Featuring Spatial and Graph Summit March 20–22, 2018 Oracle HQ Conference Center 350 Oracle Pkwy, Redwood City, CA 94065 We’ve changed our name! Formerly called the BIWA Summit with the Spatial and Graph Summit.  Same great technical content – great new name! Announcing 24+ technical sessions, case studies, and hands on labs around spatial and graph technologies View the agenda here Register today for best rates. The agenda for Spatial and Graph Summit at Analytics and Data Summit 2018 is now available.  Join us for this premier event for spatial, graph, and analytics and attend: Technology sessions from Oracle experts on the latest developments, useful features and best practices Case studies from top customers and partners Hands-on Labs - ramp up fast on new technologies Keynotes by industry experts Sessions on machine learning, big data, cloud and database technologies from Analytics and Data Summit View the agenda at  http://www.biwasummit.org/schedule/ Selected Spatial + Graph sessions include the following.  Check out the agenda for more sessions and details. Spatial technologies for business applications and GIS Enriching Business Data with Location – Albert Godfrind, Oracle Using GeoJSON in the Oracle Database – Albert Godfrind, Oracle Powerful Spatial Features You Never Knew Existed in Oracle Spatial and Graph – Daniel Geringer, Oracle 18c Spatial New Features Update – Siva Ravada, Oracle Using Spatial in Oracle Cloud with Developer Tools and Frameworks – David Lapp & Siva Ravada, Oracle Spatial analytics with Oracle DV, Analytics Cloud, and Database Cloud Service – David Lapp & Jayant Sharma, Oracle Spatial Analytics with Spark & Big Data – Siva Ravada, Oracle Geospatial Industry Use Cases Country Scale digital maps data with Oracle Spatial & Graph – Ankeet Bhat, MapmyIndia Ordnance Survey Ireland: National Mapping as a Service – Éamonn Clinton, Ordnance Survey Ireland Feeding a Hungry World: Using Oracle Products to Ensure Global Food Security – Mark Pelletier, USDA/InuTeq 3D Spatial Utility Database at CALTRANS – Donna Rodrick, California Dept of Transportation Hands On Labs – Get started with property graphs Property Graph 101 on Oracle Database 12.2 for the completely clueless –  Arthur Dayton and Cathye Pendley, Vlamis Software Solutions Using Property Graph and Graph Analytics on NoSQL to Analyze Data on Meetup.com – Karin Patenge, Oracle Germany Using R for Big Data Advanced Analytics, Machine Learning, and Graph – Mark Hornick, Oracle Graph Technical Sessions An Introduction to Graph: Database, Analytics, and Cloud Services – Hans Viehmann, Zhe Wu, Jean Ihm, Oracle Sneak Preview:  Graph Cloud Services and Spatial Studio for Database Cloud – Jim Steiner & Jayant Sharma, Oracle Analyzing Blockchain and Bitcoin Transaction Data as Graph –Zhe Wu & Xavier Lopez, Oracle Ingesting streaming data into Graph database – Guido Schmutz, Trivadis   Applications of Graph Technologies Analyze the Global Knowledge Graph with Visualization, Cloud, & Spatial Tech – Kevin Madden, Tom Sawyer Software Graph Modeling and Analysis for Complex Automotive Data Management – Masahiro Yoshioka, Mazda Follow the Money: A Graph Model for Monitoring Bank Transactions – Federico Garcia Calabria, Oracle Anomaly Detection in Medicare Provider Data using OAAgraph – Sungpack Hong, Mark Hornick, Francisco Morales, Oracle Fake News, Trolls, Bots and Money Laundering – Find the truth with Graphs – Jim Steiner and Sungpack Hong, Oracle

All Analytics. All Data. No Nonsense. Featuring Spatial and Graph Summit March 20–22, 2018 Oracle HQ Conference Center 350 Oracle Pkwy, Redwood City, CA 94065 We’ve changed our name! Formerly called the...

Kicking off Relational to Property Graph Conversion from BDSG

Today, I got a question on converting relational to PG (flat files). I actually wrote about PG data generation in the past. What makes this one different is the relational data is accessible from within an Oracle Database but the property graph database is on Big Data platform. Here is an example flow with Python & JPype. The BDSG version is 2.1. Assume we have a relational table with three columns, employee ID (EMPID), manager ID (MGRID), and a random property X. Say, we want to generate a set of edges based on this manager-employee relationship and treat that X column as an edge property. SQL> desc reports  Name                                      Null?    Type  ----------------------------------------- -------- ----------------------------  EMPID                                              NUMBER  MGRID                                              NUMBER  X                                                  NUMBER   We can use the following Python code, Java wrapped in JPype,  to convert this table into a .ope file: ora=JClass('oracle.pg.rdbms.Oracle') conn = ora("jdbc:oracle:thin:@<host>:1521:<sid>", "<user>",                                   "<password>").getConnection() ctamCls=JClass("oracle.pg.common.ColumnToAttrMapping") c1=ctamCls.getInstance("X", "X", JClass("java.lang.Long")) util=JClass("oracle.pg.common.OraclePropertyGraphUtilsBase") os=JClass("java.io.FileOutputStream")("/tmp/test.ope") try:         util.convertRDBMSTable2OPE(conn, "reports", None, 0l, "EMPID","MGRID",           0l, False, "reportsTo", [c1], 1, os, None) except jpype.JException(java.lang.RuntimeException), ex :         print ex.message()         print ex.stacktrace() An example output is as follows: $ cat /tmp/test.ope 1,1,2,reportsTo,X,7,,100, 2,3,2,reportsTo,X,7,,100, [oracle@bigdatalite ~]$ Cheers,

Today, I got a question on converting relational to PG (flat files). I actually wrote about PG data generation in the past. What makes this one different is the relational data is accessible...

Tips on Sharing and Cleaning Up In-Memory Graphs for BDSG Property Graph

The in-memory graph snapshots stored in compact data structures by PGX are the foundation of many cool graph analytics and powerful graph queries (PGQL). In this post, I am going to share with you a couple of tips on sharing in-memory graphs and cleaning up resources used by in-memory graphs. 1. To share an existing in-memory graph snapshot G, one needs to use the same graph config as that used to load the graph G. Assume G was loaded with graphConfig, then the following command will reuse the same in-memory graph snapshot.  pgxGraph = pgxSession.readGraphWithProperties(graphConfig); This technique is used in Oracle's support for Cytoscape visualization of property graph. 2. To clean up an existing in-memory graph snapshot G, one can invoke pgxGraph.close() API. Note that it does not immediately free up the resources taken by G. Rather, it simply denotes the current session will not use G anymore.  There is a periodically scheduled task in PGX which goes through all shareable graphs and cleans them up if: - The overall memory consumption is above a certain threshold. By default, clean up if more than 85% of total memory (a sum of on- and off- heap)  is used, and - The graph G doesn't have any references from any sessions. The default period for the memory cleanup is 600 seconds, and can be configured using the config parameter "memory_cleanup_interval" (unit in seconds). The default cleanup threshold is 85% and can be changed using the config parameter "release_memory_threshold". Acknowledgement: thanks Alexander Weld  for his input on this blog post!

The in-memory graph snapshots stored in compact data structures by PGX are the foundation of many cool graph analytics and powerful graph queries (PGQL). In this post, I am going to share with you...

Sanity Checking Property Graph Functions After Upgrade

Recently I worked with Oracle Support on resolving a property graph related issue. One question asked was "how do we quickly sanity check property graph functions after an upgrade?" Well, I usually run through the following steps. - Make sure BDSG Property Graph works well with Oracle NoSQL Database using the built-in Groovy shell. cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy/ sh gremlin-opg-nosql.sh // Provide Oracle NoSQL Database server name and port server = new ArrayList<String>(); server.add("bigdatalite:5000"); // Create a graph config that contains the graph name "connections" // KV store name "kvstore", edge property "weight" to be loaded into // in-memory graph, etc. cfg = GraphConfigBuilder.forPropertyGraphNosql()             \   .setName("connections").setStoreName("kvstore")            \   .setHosts(server)                                          \   .addEdgeProperty("weight", PropertyType.DOUBLE, "1000000") \   .setMaxNumConnections(2).build(); // Get an instance of OraclePropertyGraph which is a key Java // class to manage property graph data opg = OraclePropertyGraph.getInstance(cfg); // Set the Degree of Parallelism (DOP) for clearing graph data // from the existing OraclePropertyGraph instance opg.setClearTableDOP(2); // will return NULL because this                          // API has no return value.                          // It is expected. opg.clearRepository();   // remove all vertices and edges opgdl=OraclePropertyGraphDataLoader.getInstance(); vfile="../../data/connections.opv"  // vertex flat file efile="../../data/connections.ope"  // edge flat file // Set Degree of Parallelism (DOP) to 2 and load in parallel the // above property graph data files into the database. opgdl.loadData(opg, vfile, efile, 2); opg.getVertices(); // Create in-memory analytics session and analyst session=Pgx.createSession("session_ID_1"); analyst=session.createAnalyst(); // Read the graph from database into memory pgxGraph = session.readGraphWithProperties(opg.getConfig()); // create a helper function for pretty printing def p(v) { s1=v.getProperty("name");                 \     if (s1.length() > 30) return s1;                 \     s=s1;                                            \     for (int idx = 0; idx < 30 - s1.length(); idx++) \       { s=s+ " ";};                                  \     return ("vertex " + s  + " id " + v.getId());    \ } // Execute Page Rank rank=analyst.pagerank(pgxGraph, 0.00000001, 0.85, 5000); // Get 3 vertices with highest PR values it = rank.getTopKValues(3).iterator();     \ while(it.hasNext()) {                      \   v=it.next();                             \   id=v.getKey().getId();                   \   pr=v.getValue();                         \   System.out.println("Influencers --->" +  \     p(opg.getVertex(id)) + " pr= " + pr);  \ } :quit All the above steps should work without a problem. - Now, time to move on to Apache HBase. cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy/ sh gremlin-opg-hbase.sh    // Get a graph config that has graph name "connections" and // Zookeeper host, port, and some other parameters cfg = GraphConfigBuilder.forPropertyGraphHbase()            \  .setName("connections")                                    \  .setZkQuorum("bigdatalite").setZkClientPort(2181)          \  .setZkSessionTimeout(120000).setInitialEdgeNumRegions(3)   \  .setInitialVertexNumRegions(3).setSplitsPerRegion(1)       \  .addEdgeProperty("weight", PropertyType.DOUBLE, "1000000") \  .build();   // Get an instance of OraclePropertyGraph which is a key Java // class to manage property graph data opg = OraclePropertyGraph.getInstance(cfg); opg.clearRepository();   // OraclePropertyGraphDataLoader is a key Java class // to bulk load property graph data into the backend databases. opgdl=OraclePropertyGraphDataLoader.getInstance(); vfile="../../data/connections.opv" efile="../../data/connections.ope" opgdl.loadData(opg, vfile, efile, 2);   Note that the above HBase-related steps are shorter than those for Oracle NoSQL Database. The reason is we no longer need to retest the embedded PGX because we have already done that for Oracle NoSQL Database. - Finally, start a PGX server which can be used for a remote PGX client. For simplicity, I am using HTTP (instead of HTTPS or two-way SSL) and this requires setting "enable_tls":false and "enable_client_authentication": false in the following configuration file.   /opt/oracle/oracle-spatial-graph/property_graph/pgx/conf/server.conf To kick off the PGX server, cd /opt/oracle/oracle-spatial-graph/property_graph/pgx/bin/ ./start-server Open a browser and connect to the following URL, you should see a very simple line of text describing the version.     http://<hostname>:7007/version If you want to tune this endpoint a bit, take a look at a previous blog I wrote. Good luck!  

Recently I worked with Oracle Support on resolving a property graph related issue. One question asked was "how do we quickly sanity check property graph functions after an upgrade?" Well, I usually...

3 Ways to Serialize and Write a Sub-Graph to Client Side (III)

This is the third installment of the "3 Ways to Serialize and Write a Sub-Graph to Client Side" series. The first installment showed an approach which serializes a sub graph on the server side and copies the graph data files to the client side. The second installment, on the other hand, showed a more direct way that first reads a sub graph to the client side and then uses utility methods in OraclePropertyGraphUtilsBase Java class to serialize the graph data. I saved the easiest for the last :) Approach #3 Use Grep or AWK or Whatever Your Favorite Text Processing Tool to Apply Filtering Directly on the Flat Files The flat file format used by Oracle Big Data Spatial and Graph (BDSG) is in fact quite text processing friendly. Assume you have a large graph stored in flat files (.opv, .ope) and you want to create a sub graph on the client side. Chances are you can use grep or egrep or gawk or whatever your favorite text processing tool to apply filtering directly on the fat files, as long as the filtering condition is at per record level. For example, the following egrep will keep only edges with label "collaborates". cd /opt/oracle/oracle-spatial-graph/property_graph/data cat connections.ope | egrep ",collaborates" > /tmp/my_subgraph.ope If you worry about possible mismatch against other text fields, then a bit of regular expression can make sure we only match against the edge label field. cat connections.ope  |  egrep "^[^,]*,[^,]*,[^,]*,collaborates," > /tmp/my_subgraph.ope head -5 /tmp/my_subgraph.ope 1000,1,2,collaborates,weight,3,,1.0, 1001,1,3,collaborates,weight,3,,1.0, 1007,6,7,collaborates,weight,3,,1.0, 1009,7,6,collaborates,weight,3,,1.0, 1014,8,9,collaborates,weight,3,,1.0, To use this approach well, one needs to have a solid understanding of the flat file format. Cheers,

This is the third installment of the "3 Ways to Serialize and Write a Sub-Graph to Client Side" series. The first installment showed an approach which serializes a sub graph on the server side and...

3 Ways to Serialize and Write a Sub-Graph to Client Side (Part II)

This is the second installment of the "3 Ways to Serialize and Write a Sub-Graph to Client Side" series. The first installment showed an approach which serializes a sub graph on the server side and copies the graph data files to the client side. Today, I am going to talk about a different approach that does not require access to the server side OS or filesystem. Approach #2 Read a Sub-Graph to the Client Side and Serialize It Similar to the first approach, I am using a Groovy script. Important APIs are highlighted. cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy/ sh ./gremlin-opg-nosql.sh pgxServer="http://127.0.0.1:7007" pgxSession=Pgx.getInstance(pgxServer).createSession("session1"); server = new ArrayList<String>(); server.add("bigdatalite:5000"); cfg = GraphConfigBuilder.forPropertyGraphNosql().setCreateEdgeIdMapping(true).setName("connections").setStoreName("kvstore") .setHosts(server) .hasEdgeLabel(true).setLoadEdgeLabel(true) .addEdgeProperty("weight", PropertyType.DOUBLE, "1000000") .setMaxNumConnections(2).build(); opg = OraclePropertyGraph.getInstance(cfg); pgxGraph = session.readGraphWithProperties(opg.getConfig()); ==>PgxGraph[name=connections,N=78,E=164,created=1507628356244] ef = new EdgeFilter("edge.weight < 5.0"); subGraph = pgxGraph.filter(ef); ==>PgxGraph[name=connections_sub-graph_1,N=69,E=118,created=1507628611702] Now subGraph, an in-memory sub graph, is ready. The following code snippet gets, from the sub graph, a set of vertices  to the client side and uses an utility API OraclePropertyGraphUtilsBase.outputVertexRecord to write the vertices into a .opv file. Note that this utility API takes care of escaping and encoding so you don't have to worry about special characters including comma, new line, quotes etc. vertexSet = subGraph.getVertices(); vertexIter = vertexSet.iterator(); i=1 osVertex = new FileOutputStream("/tmp/mysubgraph.opv"); while (vertexIter.hasNext())  {   v = vertexIter.next();   OraclePropertyGraphUtilsBase.outputVertexRecord(osVertex, v.getId(), null, null); } osVertex.flush(); osVertex.close(); Last step, issue a simple PGQL query about edges, get back a result set, and write out the edges using OraclePropertyGraphUtilsBase.outputEdgeRecord into a .ope file. osEdge   = new FileOutputStream("/tmp/mysubgraph.ope"); pgxResultSet = subGraph.queryPgql("SELECT n.id(), e.label(), m.id(), e.id() WHERE (n) -[e]-> (m)") resultsIter = pgxResultSet.getResults().iterator(); i=1 while (resultsIter.hasNext()) {   r = resultsIter.next()   OraclePropertyGraphUtilsBase.outputEdgeRecord(osEdge, r.get(3), r.get(0), r.get(2), r.get(1), null, null); } osEdge.flush() osEdge.close() That is it. If the above steps complete successfully, you will see the following two files on the client side. [root@hqgraph1 Downloads]# head -5 /tmp/mysubgraph.opv 2,%20,,,, 4,%20,,,, 11,%20,,,, 16,%20,,,, 18,%20,,,, [root@hqgraph1 Downloads]# head -3 /tmp/mysubgraph.ope 1081,38,2,admires,%20,,,, 1000,1,2,collaborates,%20,,,, 1084,36,2,collaborates,%20,,,,   Cheers,

This is the second installment of the "3 Ways to Serialize and Write a Sub-Graph to Client Side" series. The first installment showed an approach which serializes a sub graph on the server side and...

Adding Location and Graph Analysis to Big Data

Property Graph (v2.2) in a Box

To start, download Oracle Big Data Lite Virtual Machine v4.9 from the following page. http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html It is recommended to use Oracle VM Virtual Box 5.0.16 or up to import this virtual machine. Once import is done successfully, login as oracle (default password is welcome1). On the desktop, there is a "Start/Stop Services" icon - double clicking that will lead to a console popup with a list of services. Check Oracle NoSQL Database, hit enter, and the built-in Oracle NoSQL Database will start automatically. If you need to shutdown the Oracle NoSQL Database, just repeat this process.  Next, I am going to show you a simple Groovy based script that loads a sample property graph representing a small social network, reads out vertices and edges, writes it out, and finally counts the number of triangles and run a simple pgql in this network. Open a Linux terminal in this virtual machine and type in the following: $ cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy $ ./gremlin-opg-nosql.sh -------------------------------- opg-nosql> // // Connect to the Oracle NoSQL Database in this virtual box //     server = new ArrayList<String>();     server.add("bigdatalite:5000");     cfg = GraphConfigBuilder.forPropertyGraphNosql()             \       .setName("connections").setStoreName("kvstore")                \       .setHosts(server)                                          \       .addEdgeProperty("weight", PropertyType.DOUBLE, "1000000") \       .addVertexProperty("company", PropertyType.STRING, "NULL") \       .setMaxNumConnections(2).build(); // Get an instance of OraclePropertyGraph // opg = OraclePropertyGraph.getInstance(cfg); opg.clearRepository(); // // Use the parallel data loader API to load a // sample property graph in flat file formatwith a // degree of parallelism (DOP) 2 // vfile="../../data/connections.opv" efile="../../data/connections.ope" opgdl=OraclePropertyGraphDataLoader.getInstance(); opgdl.loadData(opg, vfile, efile, 2); opg-nosql> opg.countVertices() ==>78 opg-nosql> opg.countEdges() ==>164 // read through the vertices opg-nosql> opg.getVertices(); ==>Vertex ID 21 {country:str:United States, name:str:Hillary Clinton, occupation:str:67th United States Secretary of State, political party:str:Democratic, religion:str:Methodism, role:str:political authority} ==>Vertex ID 25 {country:str:Russia, name:str:Vladimir Putin, occupation:str:president of Russia, political party:str:Comunist party of the Soviet Union, role:str:political authority} ... ... // read through the edges opg-nosql> opg.getEdges(); ==>Edge ID 1051 from Vertex ID 20 {company:str:Farallon Capital Management, country:str:United States, name:str:Tom Steyer, occupation:str:founder, political party:str:Democratic, role:str:philantropist} =[collaborates]=> Vertex ID 1 {country:str:United States, name:str:Barack Obama, occupation:str:44th president of United States of America, political party:str:Democratic, religion:str:Christianity, role:str:political authority} edgeKV[{weight:flo:1.0}] ==>Edge ID 1054 from Vertex ID 19 {country:str:United States, name:str:Kirsten Gillibrand, occupation:str:junior United States Senator from New York, political party:str:Democratic, religion:str:Methodism, role:str:political authority} =[admires]=> Vertex ID 31 {country:str:United States, name:str:Jason Collins, occupation:str:basketball center player, role:str:professional basketball player, team:str:Brooklyn nets} edgeKV[{weight:flo:1.0}] ... ... // // You can add vertices/edges, change properties etc. here. a = opg.addVertex(1l); a.setProperty("company", "oracle"); a.setProperty("age", 31); b = opg.addVertex(2l); b.setProperty("company", "oracle"); b.setProperty("age", 28); // Add edges e1, e1 = opg.addEdge(1l, a, b, "knows"); e1.setProperty("type", "partners"); // Serialize the graph out into a pair of flat files with DOP=2 // vOutput="/tmp/mygraph.opv" eOutput="/tmp/mygraph.ope" OraclePropertyGraphUtils.exportFlatFiles(opg, vOutput, eOutput, 2, false); //start the pgx services    $ cd /opt/oracle/oracle-spatial-graph/property_graph/pgx/bin    $ sh ./start-server  // Create an analyst instance // session=Pgx.createSession("session_ID_1"); analyst=session.createAnalyst(); // Read graph data from database into memory pgxGraph = session.readGraphWithProperties(opg.getConfig(), true); //call the count triangle api analyst.countTriangles(pgxGraph, true); ==>22 // run a pgql opg-nosql> pgxResultSet = pgxGraph.queryPgql("SELECT n.company,m WHERE (n WITH company='Koch Industries')->(m)->(n), n!=m") ==>PgqlResultSetImpl[graph=connections,numResults=4] opg-nosql> pgxResultSet.print(4); +-------------------------------------+ | n.company         | m               | +-------------------------------------+ | "Koch Industries" | PgxVertex[ID=1] | | "Koch Industries" | PgxVertex[ID=9] | | "Koch Industries" | PgxVertex[ID=1] | | "Koch Industries" | PgxVertex[ID=8] | +-------------------------------------+ ==>null // get the total number of the records from the result set pgxResultSet.getNumResults()     ==>132 Note that the same virtual machine has Apache HBase installed as well as Oracle NoSQL Database.Once Apache HBase is configured and started, the same above script (except the DB connection initialization part) can be used without a change. //HBase DB connection initialization configuration cfg = GraphConfigBuilder.forPropertyGraphHbase()            \  .setName("connections")                                    \  .setZkQuorum("bigdatalite").setZkClientPort(2181)          \  .setZkSessionTimeout(120000).setInitialEdgeNumRegions(3)   \  .setInitialVertexNumRegions(3).setSplitsPerRegion(1)       \  .addEdgeProperty("weight", PropertyType.DOUBLE, "1000000") \  .addVertexProperty("company", PropertyType.STRING, "NULL") \  .build();  For more detailed deveoper level guide information, please visit our formal developer guide here(http://docs.oracle.com/bigdata/bda49/index.htm).

To start, download Oracle Big Data Lite Virtual Machine v4.9 from the following page. http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.htmlIt is recommended to...

3 Ways to Serialize and Write a Sub-Graph to Client Side (Part I)

When applying graph analytics or running graph queries, some applications need to work on a sub graph instead of the whole graph. In this blog, I am going to show three different approaches to serialize and write a sub-graph to a local file system on the client side. By making a sub graph available on the client side, it becomes easy to re-load it for further analysis in the future. Approach #1 Serialize a Sub-Graph on the Server Side and Copy to the Client Side The following Groovy script (containing Java code snippets) shows how one can issue a sub-graph serialization command from the client side and get the serialized graph data files generated on the server side. Important APIs (method names) are highlighted. Note that one needs to set "allow_local_filesystem"=true in pgx.conf (in directory /opt/oracle/oracle-spatial-graph/property_graph/pgx/conf/) on the server side. cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy/ sh ./gremlin-opg-nosql.sh pgxServer="http://127.0.0.1:7007" pgxSession=Pgx.getInstance(pgxServer).createSession("session1"); server = new ArrayList<String>(); server.add("bigdatalite:5000"); cfg = GraphConfigBuilder.forPropertyGraphNosql().setCreateEdgeIdMapping(true).setName("connections").setStoreName("kvstore") .setHosts(server) .hasEdgeLabel(true).setLoadEdgeLabel(true) .addEdgeProperty("weight", PropertyType.DOUBLE, "1000000") .setMaxNumConnections(2).build(); opg = OraclePropertyGraph.getInstance(cfg); pgxGraph = session.readGraphWithProperties(opg.getConfig()); ==>PgxGraph[name=connections,N=78,E=164,created=1507628356244] ef = new EdgeFilter("edge.weight < 5.0"); subGraph = pgxGraph.filter(ef); ==>PgxGraph[name=connections_sub-graph_1,N=69,E=118,created=1507628611702] graphConfig = GraphConfigBuilder.forFileFormat(Format.FLAT_FILE) .setVertexUris("/tmp/g.opv").setEdgeUris("/tmp/g.ope") .setSeparator(",").addEdgeProperty("weight", PropertyType.DOUBLE, "1000000") .build(); subGraph.store(graphConfig, true); Now, on the server side, you can find a pair of flat files, g.opv and g.ope, under /tmp [oracle@bigdatalite ~]$ head -5 /tmp/g.opv 2,%20,,,, 4,%20,,,, 11,%20,,,, 16,%20,,,, 18,%20,,,, [oracle@bigdatalite ~]$ head -5 /tmp/g.ope 1078,2,1,collaborates,weight,4,,1.0, 1079,2,36,collaborates,weight,4,,1.0, 1077,2,35,admires,weight,4,,1.0, 1019,11,1,collaborates,weight,4,,1.0, 1020,11,10,collaborates,weight,4,,1.0,    Finally, copy (cp, scp, ftp, sftp or use whatever tool of your choice to bring) the above two files to the client side.

When applying graph analytics or running graph queries, some applications need to work on a sub graph instead of the whole graph. In this blog, I am going to show three different approaches to...

Graph Features

How Big is My In-Memory Graph?

This blog details the steps I used to achieve the following: Start a remote PGX instance with customized server settings Load, from Groovy shell, a pair of property graph flat files (.opv/.ope) into a remote PGX instance Measure the memory consumption  The test environment used was Oracle BigDataLite VM (v4.8) and Oracle Big Data Spatial and Graph (BDSG v2.2). Updated configuration files and scripts are listed below. [oracle@bigdatalite ~]$ cd /opt/oracle/oracle-spatial-graph/property_graph/pgx/bin/ [oracle@bigdatalite bin]$ cat ../conf/server.conf {   "port": 7007,   "enable_tls": false,   "enable_client_authentication": false } [oracle@bigdatalite bin]$ cat ../conf/pgx.conf {   "allow_idle_timeout_overwrite": true,   "allow_task_timeout_overwrite": true,   "allow_local_filesystem": true,   "enable_gm_compiler": true,   "graphs": [],   "max_active_sessions": 1024,   "max_queue_size_per_session": -1,   "max_snapshot_count": 0,   "memory_cleanup_interval": 600,   "num_workers_analysis": "<no-of-CPUs>",   "num_workers_fast_track_analysis": 1,   "num_workers_io": "<no-of-CPUs>",   "path_to_gm_compiler": null,   "release_memory_threshold": 0.85,   "session_idle_timeout_secs": 0,   "session_task_timeout_secs": 0,   "strict_mode": true,   "tmp_dir": "<system-tmp-dir>" } [oracle@bigdatalite bin]$ export _JAVA_OPTIONS="-Dpgx.num_workers_io=2 -Dpgx.max_off_heap_size=4000 -Dpgx.num_workers_analysis=3 -Xmx6000m " [oracle@bigdatalite bin]$ sh ./start-server Picked up _JAVA_OPTIONS: -Dpgx.num_workers_io=2 -Dpgx.max_off_heap_size=4000 -Dpgx.num_workers_analysis=3 -Xmx6000m ... oracle@bigdatalite ~]$ cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy/ [oracle@bigdatalite groovy]$ [oracle@bigdatalite groovy]$ sh gremlin-opg-nosql.sh pgxServer = "http://127.0.0.1:7007/" pgxSession = Pgx.getInstance(pgxServer).createSession("graph-session"); vfile="file:///u02/mygraph.opv"  // vertex flat file efile="file:///u02/mygraph.ope"  // edge flat file config = GraphConfigBuilder .forMultipleFileFormat(Format.FLAT_FILE) .setSeparator(",") .addVertexUri(vfile) .addEdgeUri(efile).addVertexProperty("first_name", PropertyType.STRING, "null").build(); opg-nosql> g = pgxSession.readGraphWithProperties(config); ==>PgxGraph[name=...,N=1600000,E=5399998,created=1499720843627] opg-nosql> g.getMemoryMb() ==>533 opg-nosql>   The last two steps print the size of the graph: # of vertices, edges, and also an estimated memory consumption in MB. Hope it helps, Zhe

This blog details the steps I used to achieve the following: Start a remote PGX instance with customized server settings Load, from Groovy shell, a pair of property graph flat files (.opv/.ope) into a...

Adding Location and Graph Analysis to Big Data

A Tip to Help You Quickly Identify Object Types When Using Graph APIs

Oracle Big Data Spatial and Graph (BDSG) and Oracle Spatial and Graph offer a rich set of Java APIs to process property graph data. Sometimes, the return value (object) of an API can be a bit tricky to deal with. Here is a real question I recently got: "If I run PGQL selecting an id(), is there a way to convert this to a long to use when looking for a corresponding partition derived from an SCC call?" To find an answer, one can either check the product Javadocs, or use a Groovy script. To get the exact type (class) of an object, simply run .getClass().getName() As an illustration, the following Groovy script runs a PGQL query, iterates through the result set, gets an ID and finds its type. opg-nosql> pgxResultSet = pgxGraph.queryPgql("SELECT n,m WHERE (n) -> (m)") opg-nosql> opg-nosql> r=pgxResultSet.getResults() ==>n(VERTEX)=1 m(VERTEX)=10 ==>n(VERTEX)=11 m(VERTEX)=10 ==>n(VERTEX)=1 m(VERTEX)=20 ... opg-nosql> r=pgxResultSet.getResults(); i=1 ==>1 opg-nosql> r.getClass().getName() ==>oracle.pgx.api.PgqlResultSetImpl$PgqlResultIterable opg-nosql> opg-nosql> a=r.iterator().next() ==>n(VERTEX)=1 m(VERTEX)=10 opg-nosql> a.getVertex(1) ==>PgxVertex[ID=10] opg-nosql> a.getVertex(1).getId() ==>10 opg-nosql> a.getVertex(1).getId().getClass().getName() ==>java.lang.Long Hope this small tip can make your graph exploring journey a bit easier!  

Oracle Big Data Spatial and Graph (BDSG) and Oracle Spatial and Graph offer a rich set of Java APIs to process property graph data. Sometimes, the return value (object) of an API can be a bit...

Graph Features

How Many Ways to Run Property Graph Query Language (PGQL) in BDSG? (II)

This is the second installment of "How Many Ways to Run Property Graph Query Language (PGQL) in BDSG?" In this blog, I am going to show how one can run PGQL and visualize results from Cytoscape, an open source network visualization tool. After running startCytoscape.sh (a script bundled in the latest Cytoscape visualization kit for BDSG v2.1 [1]), click File >> Load  >> Property graph >> Connect to Oracle NoSQL Database (or Connect to Apache HBase Database), then click tab Start from PGQL. Alternatively, after establishing a connection to one of the supported backend graph databases, click the icon (located right under the menu bar) that looks like f(...). Either route leads to a pop-up window that looks like the one shown below. In the above pop up window, one can type in a PGQL query, customize vertex/edge properties referenced by the PGQL query, and click on Execute button. Depending on the actual graph data, you may see something as follows. Cheers, Zhe [1]  http://www.oracle.com/technetwork/database/database-technologies/bigdata-spatialandgraph/downloads/bdsp-downloads-2562575.html NOTE: in the zip file (Oracle Big Data Spatial and Graph 2.1 Support for Cytoscape), there is a superfluous file named api-bundle-3.2.0.jar which should be removed for Cytoscape to function properly.

This is the second installment of "How Many Ways to Run Property Graph Query Language (PGQL) in BDSG?" In this blog, I am going to show how one can run PGQL and visualize results from Cytoscape, an...

Graph Features

How Many Ways to Run Property Graph Query Language (PGQL) in BDSG? (I)

As many of you know it already, Property Graph Query Language (PGQL) is a very powerful feature offered in Oracle Big Data Spatial and Graph (BDSG)*. PGQL provides a clean, intuitive, SQL-like syntax for one to write expressive query patterns. By design, PGQL combines strengths of SQL syntax, Cypher syntax, and a bit of SPARQL (a query language for RDF). Details on PGQL can be found here. One thing you may not know is there are actually quite a few different ways one can execute PGQL: from Cytoscape UI, Groovy, Python, Zeppelin, to name a few. In a few installments, I am going to illustrate how to execute PGQL in different environments. Approach #1: Execute PGQL from Groovy/Java The following script can be executed with the groovy shell: ./gremlin-opg-hbase.sh  ...  // Get a graph config that has graph name "connections" and// Zookeeper host, port, and some other parameterscfg = GraphConfigBuilder.forPropertyGraphHbase()            \ .setName("connections")                                    \ .setZkQuorum("bigdatalite").setZkClientPort(2181)          \ .setZkSessionTimeout(120000).setInitialEdgeNumRegions(3)   \ .setInitialVertexNumRegions(3).setSplitsPerRegion(1)       \ .addVertexProperty("name", PropertyType.STRING, "empty name") \ .build(); // Get an instance of OraclePropertyGraph which is a key Java// class to manage property graph dataopg = OraclePropertyGraph.getInstance(cfg);opg.clearRepository(); // OraclePropertyGraphDataLoader is a key Java class// to bulk load property graph data into the backend databases.opgdl=OraclePropertyGraphDataLoader.getInstance();vfile="../../data/connections.opv"efile="../../data/connections.ope"opgdl.loadData(opg, vfile, efile, 2);// Create an in memory analytics session and analystsession=Pgx.createSession("session_ID_1");analyst=session.createAnalyst(); // Read graph data from database into memorypgxGraph = session.readGraphWithProperties(opg.getConfig()); pgxResultSet = pgxGraph.queryPgql("SELECT n,n.name, m WHERE (n) -> (m)")pgxResultSet.print(10);pgxResultSet.getNumResults()---------------------------------------------------------------| n                    | n.name        | m                    |===============================================================| PgxVertex with ID 78 | Hosain Rahman | PgxVertex with ID 77 || PgxVertex with ID 71 | Pony Ma       | PgxVertex with ID 69 || PgxVertex with ID 44 | Seth Meyers   | PgxVertex with ID 42 || PgxVertex with ID 5  | Pope Francis  | PgxVertex with ID 59 || PgxVertex with ID 5  | Pope Francis  | PgxVertex with ID 56 || PgxVertex with ID 5  | Pope Francis  | PgxVertex with ID 58 || PgxVertex with ID 5  | Pope Francis  | PgxVertex with ID 57 || PgxVertex with ID 77 | Tony Fadell   | PgxVertex with ID 78 || PgxVertex with ID 77 | Tony Fadell   | PgxVertex with ID 73 || PgxVertex with ID 75 | Bobby Murphy  | PgxVertex with ID 74 |--------------------------------------------------------------- pgxResultSet = pgxGraph.queryPgql("SELECT n,e, m WHERE (n) -[e:'collaborates']-> (m)")pgxResultSet.print(10);pgxResultSet.getNumResults()... Cheers, Zhe * Note that PGQL is part of BDSG on big data platform. It is also available with Oracle Spatial and Graph option in Oracle Database (12.2.0.1) via a patch.

As many of you know it already, Property Graph Query Language (PGQL) is a very powerful feature offered in Oracle Big Data Spatial and Graph (BDSG)*. PGQL provides a clean, intuitive, SQL-like syntax...

Graph Features

Persisting Results of Graph Analytics into BDSG Graph Database (Part II)

This is the second part of Persisting Results of Graph Analytics into BDSG Graph Database. See the following blog for the first installment. https://blogs.oracle.com/bigdataspatialgraph/entry/persisting_results_of_graph_analytics Part II: When # of results to be persisted is big In this case, it is no longer efficient to store computation results into vertices/edges on the fly because it incurs too many small, incremental changes. A much more efficient way is to convert the computation results into Oracle defined flat files (a format designed for property graph), and then load it into a BDSG graph database in parallel. Here is an example code snippet:       OraclePropertyGraph opg = OraclePropertyGraph.getInstance(cfg);      ...      Partition communities = analyst.communitiesLabelPropagation(g);      fw = new BufferedWriter(new FileWriter("/tmp/communities.opv"));       i = 0;      while (i < communities.size()) {              VertexCollection commu = communities.getPartitionByIndex(i);              Iterator it = commu.iterator();              while (it.hasNext()) {                  PgxVertex v = (PgxVertex) (it.next());                    fw.write(""                            + v.getId()                            + ","                           + community"                           + ","                           + "1"                           + ","                           + community_" + i                           + ","                           + ",\n"                         );            }             i++;    }     fw.close(); The above code is very straightforward. Basically, it detects communities in a property graph (by running label propagation), iterates through all the communities,  and writes a text line for each vertex in a community denoting the community assignment for that vertex. In this case, there is no need to do any escaping/encoding because we are not using any tricky characters. When there are commas, newlines, etc. involved, please refer to the "Oracle Flat File Format" section in Chapter 5 of the following development guide for the exact encoding. http://docs.oracle.com/bigdata/bda47/BDSPA/toc.htm  Now we have the computed results in a flat file, we can easily persist them into the same property graph with a single loadData API call. Note that because the community assignments are only for vertices, we simply provide an empty flat for the edges (.ope).  OraclePropertyGraphDataLoader opgdl = OraclePropertyGraphDataLoader.getInstance(); opgdl.loadData(   opg,    "/tmp/communities.opv",    "/tmp/empty_file.ope",  // It is OK to use an empty edge flat file   8                       // 8 threads    ); Thanks,

This is the second part of Persisting Results of Graph Analytics into BDSG Graph Database. See the following blog for the first installment. https://blogs.oracle.com/bigdataspatialgraph/entry/persisting...

Enable REST Interface for Oracle Big Data Spatial and Graph on Oracle BigDataLite VM

In this blog post, I am going to show you a few quick steps to enable Rexster-based REST interface for BDSG property graph in Oracle BigDataLite VM (4.5 and above). First of all, download and copy the following Java libraries into/opt/oracle/oracle-spatial-graph/property_graph/dal/webapp/ directory.   blueprints-rexster-graph-2.3.0.jar   commons-cli-1.2.jar   commons-configuration-1.6.jar   grizzly-core-2.2.16.jar   grizzly-http-2.2.16.jar   grizzly-http-server-2.2.16.jar   grizzly-http-servlet-2.2.16.jar   javassist-3.15.0-GA.jar   javax.servlet-api-3.0.1.jar   jersey-core-1.17.jar   jersey-json-1.17.jar   jersey-server-1.17.jar   jersey-servlet-1.17.jar   jung-algorithms-2.0.1.jar   jung-api-2.0.1.jar   jung-visualization-2.0.1.jar   msgpack-0.6.5.jar   rexster-core-2.3.0.jar   rexster-protocol-2.3.0.jar   rexster-server-2.3.0.jar In the same directory, update rexster.xml (can be found in the same zip file) with the correct port, base-uri, and connection/graph information for a BDSG property graph.The following shows a snippet of the relevant rexster xml configuration file. <?xml version="1.0" encoding="UTF-8"?> <rexster>     <http>         <server-port>8182</server-port>         <server-host>0.0.0.0</server-host>         <base-uri>http://127.0.0.1</base-uri>         <web-root>public</web-root>         <character-set>UTF-8</character-set> ...         <graph>             <graph-name>test_nosql</graph-name>             <graph-type>oracle.pg.nosql.OraclePropertyGraphConfiguration</graph-type>            <properties>              <host>127.0.0.1</host>              <port>5000</port>              <storeName>kvstore</storeName>            </properties>             <extensions>                 <allows>                                       <allow>tp:gremlin</allow>                 </allows>             </extensions>         </graph> ...   We are almost there. The last step is to create rexster-opg.sh under the same directory, $ cat /opt/oracle/oracle-spatial-graph/property_graph/dal/webapp/rexster-opg.sh #!/bin/bash CP=$( echo `dirname $0`/*.jar . | sed 's/ /:/g') CP=$CP:$(find -L `dirname $0`/../../lib/ -name "*.jar" | tr '\n' ':') CP=$CP:$(find -L `dirname $0`/../groovy/ -name "*.jar" | tr '\n' ':') PUBLIC=`dirname $0`/../public/ # Find Java if [ "$JAVA_HOME" = "" ] ; then     JAVA="java -server" else     JAVA="$JAVA_HOME/bin/java -server" fi # Set Java options if [ "$JAVA_OPTIONS" = "" ] ; then     JAVA_OPTIONS="-Xms2G -Xmx4G " fi # Launch the application $JAVA $JAVA_OPTIONS -cp $CP com.tinkerpop.rexster.Application $@ -wr $PUBLIC # Return the program's exit code exit $? ############################ Now, we are all ready to go, to start the REST service, sh ./rexster-opg.sh --start -c ./rexster.xml    To test the newly created REST service,  run the following commands from a Linux terminal. They should create a new vertex and then read it out from the REST endpoint. curl --output /tmp/curl.log --data "query=dummy" "http://127.0.0.1:8182/graphs/test_nosql/vertices/123?name=a&hobby=b" curl "http://127.0.0.1:8182/graphs/test_nosql/vertices/123" Cheers, p.s. The following list of wget commands can help you locate the jar files needed. You can run them in a Linux terminal. Don't forget to set proxy if your terminal is behind a corporate firewall. echo "Downloading dependency blueprints-rexster-graph-2.3.0.jar" wget http://central.maven.org/maven2/com/tinkerpop/blueprints/blueprints-rexster-graph/2.3.0/blueprints-rexster-graph-2.3.0.jar echo "Downloading dependency commons-cli-1.2.jar" wget http://central.maven.org/maven2/commons-cli/commons-cli/1.2/commons-cli-1.2.jar echo "Downloading dependency commons-configuration-1.6.jar" wget http://central.maven.org/maven2/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar echo "Downloading dependency grizzly-core-2.2.16.jar" wget http://central.maven.org/maven2/org/glassfish/grizzly/grizzly-core/2.2.16/grizzly-core-2.2.16.jar echo "Downloading dependency grizzly-http-2.2.16.jar" wget http://central.maven.org/maven2/org/glassfish/grizzly/grizzly-http/2.2.16/grizzly-http-2.2.16.jar echo "Downloading dependency grizzly-http-server-2.2.16.jar" wget http://central.maven.org/maven2/org/glassfish/grizzly/grizzly-http-server/2.2.16/grizzly-http-server-2.2.16.jar echo "Downloading dependency grizzly-http-servlet-2.2.16.jar" wget http://central.maven.org/maven2/org/glassfish/grizzly/grizzly-http-servlet/2.2.16/grizzly-http-servlet-2.2.16.jar echo "Downloading dependency javassist-3.15.0-GA.jar" wget http://central.maven.org/maven2/org/javassist/javassist/3.15.0-GA/javassist-3.15.0-GA.jar echo "Downloading dependency javax.servlet-api-3.0.1.jar" wget http://central.maven.org/maven2/javax/servlet/javax.servlet-api/3.0.1/javax.servlet-api-3.0.1.jar echo "Downloading dependency jersey-core-1.17.jar" wget http://central.maven.org/maven2/com/sun/jersey/jersey-core/1.17/jersey-core-1.17.jar echo "Downloading dependency jersey-json-1.17.jar" wget http://central.maven.org/maven2/com/sun/jersey/jersey-json/1.17.1/jersey-json-1.17.1.jar echo "Downloading dependency jersey-server-1.17.jar" wget http://central.maven.org/maven2/com/sun/jersey/jersey-server/1.17/jersey-server-1.17.jar echo "Downloading dependency jersey-servlet-1.17.jar" wget http://central.maven.org/maven2/com/sun/jersey/jersey-servlet/1.17/jersey-servlet-1.17.jar echo "Downloading dependency jung-algorithms-2.0.1.jar" wget http://central.maven.org/maven2/net/sf/jung/jung-algorithms/2.0.1/jung-algorithms-2.0.1.jar echo "Downloading dependency jung-api-2.0.1.jar" wget http://central.maven.org/maven2/net/sf/jung/jung-api/2.0.1/jung-api-2.0.1.jar echo "Downloading dependency jung-visualization-2.0.1.jar" wget http://central.maven.org/maven2/net/sf/jung/jung-visualization/2.0.1/jung-visualization-2.0.1.jar echo "Downloading dependency msgpack-0.6.5.jar" wget http://central.maven.org/maven2/org/msgpack/msgpack/0.6.5/msgpack-0.6.5.jar echo "Downloading dependency rexster-core-2.3.0.jar" wget http://central.maven.org/maven2/com/tinkerpop/rexster/rexster-core/2.3.0/rexster-core-2.3.0.jar echo "Downloading dependency rexster-protocol-2.3.0.jar" wget http://central.maven.org/maven2/com/tinkerpop/rexster/rexster-protocol/2.3.0/rexster-protocol-2.3.0.jar echo "Downloading dependency rexster-server-2.3.0.jar" wget http://central.maven.org/maven2/com/tinkerpop/rexster/rexster-server/2.3.0/rexster-server-2.3.0.jar      

In this blog post, I am going to show you a few quick steps to enable Rexster-based REST interface for BDSG property graph in Oracle BigDataLite VM (4.5 and above). First of all, download and copy the...

Graph Features

Persisting Results of Graph Analytics into BDSG Graph Database (Part I)

From time to time, I received questions on how to persist analytical results back to the graph database. There are many reasons for doing this, performance, persistence, possibility to build text index, ease of query, to name a few. In two installments, I am going to share with you good practices on persisting results of graph analytics back to BDSG graph database. Part I: When # of results to be persisted is small In this case,  one can simply update properties of vertices (and/or edges) on the fly with the newly computed values. As an illustration, the following Java code snippet (running in Groovy) shows a flow of getting an instance of OraclePropertyGraph, executing personalized page rank, selecting the top 50 influencers, and persisting those page rank values back to the relevant vertices in the graph. opg = OraclePropertyGraph.getInstance(cfg); ... ppr = analyst.personalizedPagerank(pgxGraph, vertexSet, ... ); it = ppr.getTopKValues(50).iterator(); while (it.hasNext()) {   entry = it.next(); vid = entry.getKey().getId();    v = opg.getVertex(vid);    v.setProperty("pr_value", entry.getValue());} Basically, the importance thing here is that we fetch vertex ID (vid) from the result, ask from OraclePropertyGraph for the corresponding vertex object (v), and then update it with the newly computed value. Note that you should only take this approach when the number of results is relatively small. Next installment will talk about how one can save, say whole-graph page rank values or all cluster assignment, back to the graph database.

From time to time, I received questions on how to persist analytical results back to the graph database. There are many reasons for doing this, performance, persistence, possibility to build text...

Graph Features

A Use Case In Which Direction of Graph Edges Matters

This work was jointly done with Charles Wang. Recently, we were involved in a Proof of Concept (PoC) and one key objective was to identify influencers in a network of investors, banks, and companies. Very naturally, a property graph was used to model investors, financial institutions, companies, and their relationships. One crucial relationship in this application is ownership that defines how much one individual or one financial institution owns a particular company. As shown in a previous blog post [1], one can use Page Rank, a built-in analytics of Oracle Big Data Spatial and Graph (BDSG), to find influencers and their influence denoted by a non-negative numeric value. In this PoC, however, a direct application of Page Rank is not suitable because the original Page Rank algorithm does not take into consideration weights on edges. In other words, Page Rank is not applicable for this PoC because it cares only about the topology of a graph, not the properties of edges. Fortunately, BDSG offers a weighted page rank that takes edge weights into consideration. There is a catch to which attention has to be paid in order to get desired output. Let's first take a step back and examine general relationships in a graph. Normally, there are at least two ways to assert a relationship. Take a father-daughter relationship for example, to express "John is Mary's father", we can say, (John)   -- [father_of] -->  (Mary) or we can also claim, (John)  <-- [has_father] --  (Mary) Assume the "father_of" is a simple inverse of "has_father", the above two are equivalent except that the direction of the edge differs. The direction here is not critical for graph navigation because in order to navigate from John to Mary, we can either follow an outgoing edge or an incoming edge. Not a challenge one way or another. The choice of edge directions, however, plays a crucial role in page rank and also weighted page rank. If there is an edge from vertex A to vertex B, it implies that A's page rank value will have a positive contribution to B's page rank value, but not the other way around! Now, back to the PoC, to model the relationship between an investor X (which could be an individual or a financial institution) and a company C, and to make sure X's influence (or importance) contributes to C's influence, we will need to create the following edge and attach a weight property with a value that measures the amount of asset X owns of C. X  -- [investor_of] -->   C      weight/19888.6 Thanks, [1] https://blogs.oracle.com/bigdataspatialgraph/entry/identifying_influencers_with_the_built

This work was jointly done with Charles Wang. Recently, we were involved in a Proof of Concept (PoC) and one key objective was to identify influencers in a network ofinvestors, banks, and companies....

Spatial + BIWA Summit 2017: Sessions on Big Data Spatial & Graph Technologies

Technicalsessions, hands on labs, and use cases on graph and spatial big datatechnologies will be shared at the OracleSpatial + BIWA Summit 2017: THEBig Data + Analytics + Spatial + Cloud + IoT + Everything Cool User Conference (http://www.biwasummit.org), Jan.31 – Feb. 2 at the Oracle Conference Center, Redwood Shores, CA. Our experts from Oracle as well as leading partners and customers will leadsessions and share their insights. Sessionswill include: AShortest Path to Using Graph Technologies -- Best Practices in GraphConstruction, Indexing, Analytics and Visualization - Zhe Wu & HansViehmann, Oracle BuildRecommender Systems, Detect Fraud, and Integrate Deep Learning with GraphTechnologies - Zhe Wu, Oracle / Chris Nicholson, Skymind CombiningGraph and Machine Learning Technologies using R - Mark Hornick & HassanChafi, Oracle Buildinga Tax Fraud Detection Platform with Big Data Graph technologies - Wcislo,Wojciech, Oracle Analysingthe Panama Papers with Oracle Big Data Spatial and Graph - Robin Moffatt,Rittman Mead VisualizingGraph Data with Geographical Information – Kevin Madden, Tom Sawyer ContextAware GeoSocial Graph Mining - Ugur Demiryurek, USC Understandinghow a Tweet goes Viral using Oracle Big Data Spatial and Graph - Mark Rittman, MJR Associates BringingLocation Intelligence To Big Data Applications on Spark, Hadoop, and NoSQL HandsOn Lab – Siva Ravada/Eve Kleiman, Oracle Fulltracks over 3 days will also cover Oracle’s big data, business intelligence,data warehousing, spatial, cloud, analytics, machine learning, and IOT technologiesin depth. Registrationis now available here at http://www.biwasummit.org. Hope to see you at this great event!

Technical sessions, hands on labs, and use cases on graph and spatial big data technologies will be shared at the Oracle Spatial + BIWA Summit 2017: THEBig Data + Analytics + Spatial + Cloud + IoT +...

Graph Features

Connecting a Windows Client to Oracle Big Data Spatial and Graph Property Graph Stored in a Secure Apache HBase

This is a joint work with Hugo Hlabra, Gabriela Montiel (primary contributors), and Juan Fco. Garcia. This blog post shows detailed steps on how a clientapplication running on Windows (Windows 7) can communicate with a secure CDHcluster and use property graph stored in a secure Apache HBase. In the following, wewill demonstrate how to configure your Windows machine to authenticate to aKerberos KDC in order to access property graph data stored in a secure Apache HBase.To get started, we need to configure the windows client machine so it is ableto locate the KDC and obtain a ticket to authenticate itself to the secure ApacheHBase service. To do this, we need to create a Kerberos configuration file krb5.ini(equivalent to krb5.conf on Linux OS) inthe Windows/ directory. The following are two options to create the Kerberosconfiguration file: Navigate to the Windows directory under "C:\Windows" and create/edit a file named "krb5.ini" with the following contents: [domain_realm] .example.com = "EXAMPLE.COM" example.com = "EXAMPLE.COM" [libdefaults] default_realm = "EXAMPLE.COM" [realms] EXAMPLE.COM = { admin_server = "myhost.example.com" kdc = "myhost.example.com" master_kdc = "myhost.example.com"} You must configure the realm name,the realm mappings and the KDC server according to the Kerberos realm and KDC specified in the Kerberos configurationof the machines where the secure Apache HBase uses (in this case EXAMPLE.COM).Inorder to successfully obtain a ticket using Kerberos, it is important that youhave a default_realm specified in the"libdefaults" section. If you don't have any default realm specified,you should use the realm specified in the Kerberos configuration used withthe secure Apache HBase. With this configuration, your Windows machine will beable to find the realm and get a ticket from the KDC. Use a graphical user interface to create and configure the file through the Network Identity Manager software provided by the MIT. To do so: Downloadthe MIT Kerberos software from the following url: http://web.mit.edu/kerberos/dist/kfw/3.2/kfw-3.2.1/kfw-3-2-1.exe Executethe downloaded file and follow the instructions below to install it. Choosea language from the combo box and click Ok. ClickNext on the welcome screen of the installer. Acceptthe license by clicking the "Iagree" button. Selectthe components to install (as a minimum you must select the KfW client) andclick Next. Selecta location to install the program and click Next. Select the "Use packaged configuration files for the ATHENA.MIT.EDU realm."option and click Next (This step option create the krb5.ini file for you). Select if you want to auto start thissoftware with your Windows login (as we are only using this software to createthe krb5.ini file, you can deselect this option) and click Install. Wait for the installer to finish and clickFinish. Usethe Start button from Windows, and from the "Kerberos for Windows" directory open the Network IdentityManager software. Be sure to run it as administrator by right clicking on it. On the menu bar, click "Options" and then "Kerberos v5…" Navigate to the "Realms" sectionand click in the "<New realm…>" table to create an entry Configure the realm entry according to theKerberos realm and KDC specified in the Kerberos configuration of the machines that the secure Apache HBase uses.: Click Ok. After following these steps your Windows machine will have thekrb5.ini file created and configured to contact your KDC and get a ticket fromit. Assume Oracle BigData Spatial and Graph is installed under a directory on the Windows machine. Forsimplicity, we refer to this directory as %OPG_HOME%.  The propertygraph directory should contain the following structure: At this point we are ready to connect to theSecure Apache HBase and work use property graph features. To do so, it isnecessary to specify some additional security parameters to thePgHbaseGraphConfigBuilder to denote the security authentication for Hadoop andHBase, the Kerberos principals for Apache HBase Region server and master, aswell as the user credentials we will be using to connect to the database. Hereis an example code snippet of what your configuration should look like, be sureto customize it for your own setup. String szQuorum = "my.cdh.secure.host.com"; String szGraphName = "social_graph"; PgHbaseGraphConfigBuilder builder = GraphConfigBuilder.forPropertyGraphHbase(); builder = builder.setName(szGraphName); builder = builder.setZkQuorum(szQuorum); builder = builder.setZkClientPort(2181); builder = builder.setInitialEdgeNumRegions(3); builder = builder.setInitialVertexNumRegions(3); // These parameters are use for secure HBase connections // they should not be null neither empty builder =builder.setRsKerberosPrincipal("hbase/_HOST@EXAMPLE.COM"); builder =builder.setHmKerberosPrincipal("hbase/_HOST@EXAMPLE.COM"); builder =builder.setUserPrincipal(szArgs[0]); builder =builder.setHbaseSecAuth("kerberos"); builder =builder.setHadoopSecAuth("kerberos"); builder =builder.setKeytab(szArgs[1]); builder = builder.setZkSessionTimeout(Integer.parseInt("3600")); builder = builder.setMaxNumConnections(Integer.parseInt("4")); PgHbaseGraphConfig config = builder.build(); OraclePropertyGraph opg = oracle.pg.hbase.OraclePropertyGraph.getInstance(config); // Add a vertex Vertex v1 = opg.addVertex(1l); v1.setProperty("age", Integer.valueOf(18)); v1.setProperty("name", "Name"); v1.setProperty("weight", Float.valueOf(30.0f)); v1.setProperty("height", Double.valueOf(1.70d)); v1.setProperty("female", Boolean.TRUE); opg.commit(); System.out.println("Fetch 1 vertex: " +opg.getVertices().iterator().next()); opg.shutdown(); You can create a Java application with this code snippet andcompile it using a classpath with all the jar files located in the%OPG_HOME%\lib directory as follows: javac –cp %OPG_HOME%\lib\* YourJavaProgram.java Finally, we need to configure one property that will tellHadoop how to match the principal names and be able to authenticate. In orderto do this, create a file with the name "core-site.xml"and add the following content to it: <configuration>  <property>    hadoop.security.auth_to_local    <value>RULE:[1:$1]    RULE:[2:$1]</value>  </property></configuration> Be sure to add the "core-site.xml"file in the class path when running your Java program. With that set, nowwe are able to run our java program and successfully connect to a secure ApacheHBase cluster to create a Property Graph! java –cp <directory_of_core-site.xml>;%OPG_HOME%\lib\*YourJavaProgram.java <user> <path_to_keytab> Besure to check our next entry, where we will show how to visualize a propertygraph with Cytoscape and run interesting analytics, all in the realm of asecure CDH setup! Acknowledgement: thanks Jay Banerjee and Steven Serra for their input on this blog post.

This is a joint work with Hugo Hlabra, Gabriela Montiel (primary contributors), and Juan Fco. Garcia. This blog post shows detailed steps on how a clientapplication running on Windows (Windows 7) can...

Graph Features

Oracle Big Data Spatial and Graph: Using Apache Spark with Property Graph and Oracle NoSql Database

This is a joint work with Eduardo Pacheco and Gabriela Montiel (primary contributors). Following our previousblog on how to integrate Apache Spark with Oracle Big Data Spatial and Graph,this blog will now guide you on how to read a property graph stored inOracle NoSQL Database into Apache Spark in order to execute Spark SQL queries on the graph data. Let's bring back our property graph namedsocialNet, this time stored in Oracle NoSQL Database using Oracle Big DataSpatial and Graph. This graph describesthe relationships among a set of individuals, where each vertex is identifiedby an ID and has a single property (name). Likewise each edge in the graph is composed by an ID identifier, alabel, and incoming and outgoing vertices. See the following figure. Property Graphs inOracle NoSQL Database are stored into two main tables: <graph_name>VT_ forvertices data and <graph_name>GE_ for edges data. Thus socialNetVT_ andsocialNetGE_ are the table names in Oracle NoSQL Database containing theinformation of vertices and edges, respectively. To execute SQL queriesover property graph data using Spark, remember we need to create a Sparkapplication that loads the graph data into RDDs. To create our Sparkapplication, we need first to import the following Apache Spark libraries: import org.apache.spark._ import org.apache.spark.SparkContext._ import org.apache.spark.sql.SQLContext import org.apache.spark.storage.StorageLevel import scala.collection.mutable.HashMap import collection.JavaConversions._ import scala.io.Source import org.apache.spark.sql.DataFrame Since we need to read rows out of Oracle NoSQLDatabase, we need a few more packages: import oracle.kv.hadoop.table.TableInputFormat import oracle.kv.table.PrimaryKey; import oracle.kv.table.Row; import org.apache.hadoop.conf.Configuration; After that, we create aSparkContext so our Spark program will become the driver application and wewill be able to create RDDs out of the graph data. The following snippet ofcode describes how to get a Spark context in Scala. We also set the applicationname and the master machine: val SC = new SparkContext(newSparkConf()   .setAppName("Example")   .setMaster("spark://localhost:7077")) val sqlCtx = new SQLContext(sc) Since our purpose is torun SQL queries on our graph data we also need to create a SQLContext. Noticethat if we are using Spark shell we canskip this step since Spark shell will do it for us. Using the SparkContext that we just created we can easily get an RDD out of a Hadoop file byinvoking the newAPIHadoopRDD method. Thistime we will create two Hadoop Configuration object to access the vertices andedges tables stored Oracle NoSQL Database. For this, we need to specify theparameters to connect to Oracle NoSQL Database as well as the name of the graphtables socialNetVT_ and socialNetGE_. @transient val noSQLNodeConf = new Configuration(); noSQLNodeConf.set("oracle.kv.kvstore", "kvstore") noSQLNodeConf.set("oracle.kv.tableName", "socialNetVT_") noSQLNodeConf.set("oracle.kv.hosts", "node41:5000") @transient val noSQLEdgeConf =new Configuration() noSQLEdgeConf.set("oracle.kv.kvstore", "kvstore") noSQLEdgeConf.set("oracle.kv.tableName", "socialNetGE_") noSQLEdgeConf.set("oracle.kv.hosts", "node41:5000") Note that the file'sformat must meet Hadoop's InputFormat specification. We can use Oracle NoSQL TableInputFormatas the input format for ourtables. The classes of the key and values of the tables should be mapped to PrimaryKey and Rowclasses in Oracle NoSQL Database. From there we cancreate one RDD (oracleRowVertices) from the vertices table and a second RDD fromedges table (oracleRowEdges). Both of them are RDD of type RDD[(PrimaryKey, Row)] , where each Row object contains the data of exactly one vertex or edge stored in OracleNoSQL Database. val oracleRowVertices = sc.newAPIHadoopRDD(noSQLNodeConf,  classOf[oracle.kv.hadoop.table.TableInputFormat]  .asSubclass(classOf[org.apache.hadoop.mapreduce.InputFormat[PrimaryKey,Row]],  classOf[PrimaryKey],  classOf[Row]) val oracleRowEdges = sc.newAPIHadoopRDD(noSQLEdgeConf,  classOf[oracle.kv.hadoop.table.TableInputFormat]   .asSubclass(classOf[org.apache.hadoop.mapreduce.InputFormat[PrimaryKey,Row]]),  classOf[PrimaryKey],  classOf[Row]) Let us define atransformation row2Vertex for getting the vertex information out ofeach Row object. This transformation takes an OracleNoSQL Row object and it retrieves the Map object containing each propertydefining a vertex, in this case just "name", value nameArr is an array of bytes of length equals to two.The first element is a byte representing the type of the property and thesecond one the property value. Similarly, transformation row2Edge creates and Edge instance. If thecorresponding edge has more properties they can be retrieved by getting thecorresponding object Map as in the case of vertices, meaning: noSqlRow.get("kvs").asMap. case class Vertex(Id : Long, name : String) case class Edge(Id : Long, source : Long,   target : Long,label : String) def row2Vertex(noSqlRow : Row) : Vertex= {   val nameArr : Array[Byte]=noSqlRow.get("kvs").asMap.get("name").asString.get.getBytes   val name = newString(Array(nameArr(1)))   Vertex(noSqlRow.get("vid").asLong.get,name) } def row2Edge(noSqlRow : Row) :Edge = {   Edge(noSqlRow.get("eid").asLong.get,noSqlRow.get("svid").asLong.get,   noSqlRow.get("dvid").asLong.get,noSqlRow.get("el").asString.get) } Now we can apply thesetransformations to each Rowobject in oracleRowVertices and oracleRowEdges to obtain RDDs of types RDD[Vertex] and RDD[Edge], we do so in the following snippet of code. val verticesRDD = oracleRowVertices.values.map(row => row2Vertex(row)) val edgesRDD = oracleRowEdges.values.map( row => row2Edge(row)) Simple, right? At this point we just needto create dataframes from the RDDs using the SQL context to run our Spark SQLqueries. Remember that case classes Vertexand Edge play an important role here since Spark usesthem to figure out the data frame's column names. After creating dataframes we register them astemporary tables. At this point we just needto create dataframes from the RDDs using the SQL context to run our Spark SQLqueries. Remember that case classes Vertexand Edge play an important role here since Spark usesthem to figure out the data frame's column names. After creating dataframes we register them astemporary tables. val verticesDF = sqlCtx.createDataFrame(verticesRDD) verticesDF.registerTempTable("VERTICES_TABLE") val edgesDF = sqlCtx.createDataFrame(edgesRDD) edgesDF.registerTempTable("EDGES_TABLE") Now we are ready forqueries. For instance, let us find all Jay's friends different to Miriam. sqlCtx.sql("select name from (select target from EDGES_TABLE WHEREsource = 2) REACHABLE (select target from EDGES_TABLE WHEREsource = 2) REACHABLE left join VERTICES_TABLE onVERTICES_TABLE.id = REACHABLE.target WHERE name != 'Miriam'").show Thanks! Acknowledgement: thanks Jay Banerjee and Steven Serra for their input on this blog post.

This is a joint work with Eduardo Pacheco and Gabriela Montiel (primary contributors). Following our previous blog on how to integrate Apache Spark with Oracle Big Data Spatial and Graph,this blog will...

Graph Features

Integrating Apache Spark with Oracle Big Data Spatial and Graph Property Graph

This is a joint work with Eduardo Pacheco and Gabriela Montiel (primary contributors). Apache Spark is anexciting open source big-data framework that allows data scientists to do inmemory computations. It allows one to process large amounts of data efficiently.Apache Spark comes with a set of powerful libraries for processing data namely SQL, MLlib,Spark Streaming, and DataFrames andit is able to read data from different sources like HDFS and Apache HBase. Becauseof all these Apache Spark can be anexcellent complementary technology to our Oracle Big Data Spatial and Graph (BDSG)Property Graph feature. This blog post will guideyou through the process of loading instances of Big Data Oracle Property Graphinto Spark in order to query them using SPARKSQL. We assume, in this case, graph is stored in Apache HBase. Throughout this blogwe will use Scala 10.4 and Spark 1.6 (included in CDH 5.7 ). One advantage ofusing Scala is that you can simply copy-paste the provided snippets of codeinto Spark’s shell (small modifications are required based on your setup) to startplaying with your data in Spark. Weassume that the reader has some basic knowledge of Spark and Scala. Let's start with a propertygraph named socialNet stored in Apache HBase using Oracle Big Data Spatial andGraph. This graph describes therelationships among a set of individuals, see Figure 1. As figure shows, each vertex is identified byan ID and has a single property ( name ). And each edge in the graph has anID, a label, and incoming and outgoing vertices . BDSG Property Graphfor Apache HBase stores the vertices and edges information of a graph into twomain HBase tables: <graph_name>VT. and <graph_name>GE., respectively. Thus socialnetVT. andsocialnetGE. are the table names inHBase containing the information of vertices and edges of our example graph. To execute SQL queries over property graph datausing Spark, we need to create a Spark application that loads the graph datainto RDDs. We will create an RDD for each table (socialNetVT. andsocialNetGE.). To create our Sparkapplication, we need first to import the following libraries: import org.apache.spark._ import org.apache.spark.SparkContext._ import org.apache.spark.sql.SQLContext import org.apache.spark.storage.StorageLevel import scala.collection.mutable.HashMap import collection.JavaConversions._ import scala.io.Source import org.apache.spark.sql.DataFrame Additionally we need the following libraries toread rows out of HBase tables import org.apache.hadoop.hbase.client.{HBaseAdmin, Result} import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor,CellUtil, Cell } import org.apache.hadoop.hbase.mapreduce.TableInputFormat import org.apache.hadoop.hbase.io.ImmutableBytesWritable import org.apache.hadoop.hbase.util.Bytes The first thing a Spark programmust do is to create a SparkContext object which is required to create RDDs. The following snippet of codedescribes how to get a Spark context in Scala specifying the application nameand the master: val SC = new SparkContext( new SparkConf() .setAppName("Example") .setMaster("spark://localhost:7077")) val sqlCtx = new SQLContext(sc) Since our purpose is runningSQL queries on our graph data we also need to create a SQLContext. Notice that If we are using Spark’s shell we can skipthis step since Spark’s shell will take care of doing it for us. Using the SparkContext that we just created we can easily get an RDD out of a Hadoop file byinvoking the newAPIHadoopRDD method. Thefirst step is to create HadoopConfiguration objects to access the vertices and edges stored in Apache HBase. Notethat the file’s format must meet Hadoop's InputFormatspecification. Therefore we can easily read the data from our socialNetVT. andsocialNetGE. tables using this method. A Configuration object specifies the parameters used toconnect to Apache HBase such as the Zookeeper quorum, Zookeper Client port, andtable name. @transient val hBaseConfNodes = HBaseConfiguration.create() hBaseConfNodes.set(TableInputFormat.INPUT_TABLE, "socialNetVT.") hBaseConfNodes.set("hbase.zookeeper.quorum", "node041,node042,node043") hBaseConfNodes.set("hbase.zookeeper.port", "2181") @transientval hBaseConfEdges =HBaseConfiguration.create() hBaseConfEdges.set(TableInputFormat.INPUT_TABLE,"socialNetGE.") hBaseConfEdges.set("hbase.zookeeper.quorum","node041,node042,node043") hBaseConfEdges.set("hbase.zookeeper.port","2181") We pass thisConfiguration object to the newAPIHadoopRDD. We also pass in the class of the InputFormatand the classes of its keys and values. We will createa RDD (called bytesResultVertices) for the vertices table and a RDD for edgestable (bytesResultEdges). Both of them are RDD of type RDD[ImmutableBytesWritable, Result] ,where each Result Object contains the data of one vertex or one edge. val bytesResultVertices =sc.newAPIHadoopRDD(   hBaseConfNodes,   classOf[TableInputFormat],   classOf[ImmutableBytesWritable],   classOf[Result]) val bytesResultEdges =sc.newAPIHadoopRDD(   hBaseConfEdges,   classOf[TableInputFormat],   classOf[ImmutableBytesWritable],   classOf[Result]) Let us define atransformation res2Vertex to get the vertex information out of eachResult object in bytesResultVertices. This simple transformation takes an HBase Result object, gets the corresponding vertex id andname, and creates a Vertex instance out of this data. Similarly, the transformationres2Edge does the same to get an Edge instance. case class Vertex(Id : Long, name :String) case class Edge(Id : Long, source : Long, target : Long, label : String) def res2Vertex( res : Result) : Node = {   valid = Bytes.toLong(res.getRow(),8)   valfamily = CellUtil.cloneFamily(res.rawCells()(0))   valmap = res.getFamilyMap(family)   valbname = Bytes.toBytes("kname")   valbdefault = Bytes.toBytes("null")   val name = Bytes.toString(map.getOrElse(bname,bdefault)).trim   Vertex(id,name) } def res2Edge(res : Result) : Edge = {   val id = Bytes.toLong(res.getRow(),8)   val family = CellUtil.cloneFamily(res.rawCells()(0))   val map = res.getFamilyMap(family)   val bsource = Bytes.toBytes("ao")   val btarget = Bytes.toBytes("ai")   val blabel =Bytes.toBytes("al")   val bdefault = Bytes.toBytes("null")   val ksource = Bytes.toLong(map.getOrElse(bsource,bdefault))   val ktarget = Bytes.toLong(map.getOrElse(btarget,bdefault))   val klabel =Bytes.toString(map.getOrElse(blabel,bdefault)).trim   Edge(id,ksource,ktarget,klabel) } Now we can apply thesetransformations to each Result object in bytesResultVertices and bytesResultEdges to obtain the RDD[Vertex] and RDD[Edge], we do so in the followingsnippet of code. valverticesRDD = bytesResultVertices.values.map(result => res2Node(result)) valedgesRDD = bytesResultEdges.values.map( result => res2Edge(result)) Simple, right? At this point we arealmost ready to run SQL queries on our graph. To do so, we need to create a DataFrameobject from our RDDs using the SQL context we defined at the beginning andmethod  createDataFrame. Case classes Vertex and Edge play an importantrole here since Spark uses them in order to figure out the dataframe’s columnnames. After creating dataframes we savethem as temporary tables. val verticesDF =sqlCtx.createDataFrame(verticesRDD) verticesDF.registerTempTable("VERTICES_TABLE") val edgesDF= sqlCtx.createDataFrame(edgesRDD) edgesDF.registerTempTable("EDGES_TABLE") Now we are set to run queries.For instance, let us find all Hugo's friends. sqlCtx.sql("select name from (select target from EDGES_TABLE WHEREsource = 1) REACHABLE left join VERTICES_TABLE onVERTICES_TABLE.id = REACHABLE.target ").show So that's it, folks! In addition to readingout graph data directly from Apache HBase and perform operations on the graphin Apache Spark, BDSG also has a very useful feature that allows one to usein-memory analyst to analyze graph data in Apache Spark. See Section 6.14 in [1] for details. Acknowledgement: thanks Jay Banerjee for his input on this blog post. [1] http://docs.oracle.com/bigdata/bda46/BDSPA/using-in-memory-analyst.htm#BDSPA362

This is a joint work with Eduardo Pacheco and Gabriela Montiel (primary contributors). Apache Spark is an exciting open source big-data framework that allows data scientists to do inmemory...

Graph Features

Adding Visualization to Python-Notebook Based BDSG Property Graph Workflow

This is a joint work with Isai Barajas (primary contributor) and John Wyant. Python Notebook is a very convenient tool for one to build a workflow/demo based on Oracle Big Data Spatial and Graph (BDSG) property graph. In this blog, we are going to demonstrate how to add a bit visualization to a Python-Notebook based property graph workflow. Requirement: Python 2.7, JPipe 0.57 and pyopg (under /opt/oracle/oracle-spatial-graph), D3.js version 3, Big Data Lite VM 4.6, and Oracle Big Data Spatial and Graph v2.0 Detailed steps are as follows. Please cut & paste the code snippets into Python Notebook.  Step 1: Specify a few necessary libraries and imports import matplotlib as mplimport matplotlib.pyplot as pltimport sysdefault_stdout = sys.stdoutdefault_stderr = sys.stderrreload(sys)sys.setdefaultencoding("utf-8")sys.stdout = default_stdoutsys.stderr = default_stderrfrom pyopg.core import *pgx_config = JPackage('oracle.pgx.config')pgx_types = JPackage('oracle.pgx.common.types')pgx_control = JPackage('oracle.pgx.api')hbase = JPackage('oracle.pg.hbase')  Step 2: Create a graph configuration graph_builder = pgx_config.GraphConfigBuilder.forPropertyGraphHbase() \.setName("connections").setZkQuorum("bigdatalite").setZkClientPort(2181) \.setZkSessionTimeout(120000).setInitialEdgeNumRegions(3) \.setInitialVertexNumRegions(3).setSplitsPerRegion(1) \graph_builder.addEdgeProperty("weight", pgx_types.PropertyType.DOUBLE, "1000000") Step 3: Read the "connections" graph into the in-memory analyst opg = hbase.OraclePropertyGraph.getInstance(graph_builder.build())pgx_param = JClass("java.util.HashMap")()instance = JClass("oracle.pgx.api.Pgx").getInstance()if not instance.isEngineRunning():instance.startEngine(pgx_param)session = instance.createSession("my_recommender_session1")analyst = session.createAnalyst() pgxGraph = session.readGraphWithProperties(opg.getConfig(), True)pgxGraph.getNumEdges() Step 3.1 (optional): Read out a few vertices for element in range(1,10,1):vertex = opg.getVertex(element)print 'Vertex ID: ' + str(element) + ' - Name: ' + vertex.getProperty("name") Vertex ID: 1 - Name: Barack ObamaVertex ID: 2 - Name: Beyonce...  Step 4: Create JSON objects (nodes, links) out of edges (and vertices) that we want to visualize # Get Edgesedges = opg.getEdges().iterator();edge = edges.next()# Dictiony for Nodes and Linksnodes = []links = []names = []sources = []targets = []values = []# Get Nodesfor count in range(1,20,1):# Vertex ValuesoutVertexName = edge.getOutVertex().getProperty("name")outVertexRole = edge.getOutVertex().getProperty("country")inVertexName = edge.getInVertex().getProperty("name")inVertexRole = edge.getInVertex().getProperty("country") # Add out Vertexif {"name": outVertexName, "group": outVertexRole} not in nodes:nodes.append({"name": outVertexName, "group": outVertexRole})names.append(outVertexName) # Add in Vertexif {"name": inVertexName, "group": inVertexRole} not in nodes:nodes.append({"name": inVertexName, "group": inVertexRole})names.append(inVertexName) # Edge Informationsources.append(outVertexName)targets.append(inVertexName)values.append(edge.getLabel()) # Next Edgeedge = edges.next()# Get Linksfor count in range(0,19,1):# Vertex ValuesoutVertexName = sources[count]inVertexName = targets[count] # Edge Valuessource = names.index(outVertexName)target = names.index(inVertexName)value = values[count]links.append({"source": source, "target": target, "value": value}) from IPython.display import Javascriptimport json# Transform the graph into a JSON graphdata = {"nodes":nodes, "links":links}jsonGraph = json.dumps(data, indent=4)# Send to JavascriptJavascript("""window.jsonGraph={};""".format(jsonGraph)) Step 5: Set up a <div>...</div> for graph plotting %%html<div id="d3-example"></div><style>.node {stroke: #fff; stroke-width: 1.5px;}.link {stroke: #999; stroke-opacity: 5.6;}</style> leadsadmirescollaboratesfeudsfeudscollaboratescollaboratesadmiresfeudsfeudsfeudscollaboratesadmirescollaboratesadmiresfeudscollaboratesfeudsadmiresName: Tony FadellCountry: United StatesName: NestCountry: United StatesName: The AcademyCountry: United StatesName: Matthew McConaugheyCountry: United StatesName: Barack ObamaCountry: United StatesName: Kirsten GillibrandCountry: United StatesName: Vladimir PutinCountry: RussiaName: Angela MerkelCountry: GermanyName: David KochCountry: United StatesName: Eric HolderCountry: United StatesName: Nicolas Guerekoyame GbangouCountry: nullName: Omar Kobine LayamaCountry: nullName: BeyonceCountry: United StatesName: Serena WilliamsCountry: United StatesName: FacebookCountry: United StatesName: GoogleCountry: United StatesName: CBSCountry: United StatesName: ABCCountry: United StatesName: Xi JinpingCountry: ChinaName: Shinzo AbeCountry: JapanName: Tom SteyerCountry: United StatesName: Kristen Anderson-LopezCountry: United StatesName: H.R. McMasterCountry: United StatesName: Charlie RoseCountry: United StatesName: Pope FrancisCountry: ItalyName: Hillary ClintonCountry: United StatesTony FadellNestThe AcademyMatthew McConau...Barack ObamaKirsten Gillibr...Vladimir PutinAngela MerkelDavid KochEric HolderNicolas Guereko...Omar Kobine Lay...BeyonceSerena WilliamsFacebookGoogleCBSABCXi JinpingShinzo AbeTom SteyerKristen Anderso...H.R. McMasterCharlie RosePope FrancisHillary Clinton.node {stroke: #fff; stroke-width: 1.5px;}.link {stroke: #999; stroke-opacity: 5.6;}  Step 6: Graph processing with D3 Force-directed layout %%javascript// We load the d3.js library from the Web.require.config({paths: {d3: "http://d3js.org/d3.v3.min"}});require(["d3"], function(d3) {// The code in this block is executed when the // d3.js library has been loaded. // First, we specify the size of the canvas containing// the visualization (size of the <div> element).var width = 800, height = 600;// We create a color scale.var color = d3.scale.category20();// We create a force-directed dynamic graph layout.var force = d3.layout.force().charge(-300).linkDistance(100).size([width, height]);// In the <div> element, we create a <svg> graphic// that will contain our interactive visualization.var svg = d3.select("#d3-example").select("svg")if (svg.empty()) {svg = d3.select("#d3-example").append("svg").attr("width", width).attr("height", height);} // We load the JSON graph we generated from iPython inputvar graph = window.jsonGraph; plotGraph(graph); // Graph Plot functionfunction plotGraph(graph) {// We load the nodes and links in the force-directed graph.force.nodes(graph.nodes).links(graph.links).start();// We create a <line> SVG element for each link in the graph.var link = svg.selectAll(".link").data(graph.links).enter().append("line").attr("class", "link").attr("stroke-width", 7);// Link Valuelink.append("title").text(function(d) { return d.value; }); // We create a <circle> SVG element for each node// in the graph, and we specify a few attributes.var node = svg.selectAll(".node").data(graph.nodes).enter().append("circle").attr("class", "node").attr("r", 16) // radius.style("fill", function(d) {// The node color depends on the club.return color(d.group); }).call(force.drag); // The name of each node is the node number.node.append("title").text(function(d) { var info = "Name: " + d.name + "\n" + "Country: " + d.group;return info; });// Text Over Nodesvar text = svg.append("g").selectAll("text").data(force.nodes()).enter().append("text").attr("x", function(d) { return -10 }).attr("y", 0).style("font-size","10px").text(function(d) { if (d.name.length > 15) {return d.name.substring(0, 15) + "...";}return d.name; }); // We bind the positions of the SVG elements// to the positions of the dynamic force-directed graph,// at each time step.force.on("tick", function() {link.attr("x1", function(d) { return d.source.x; }).attr("y1", function(d) { return d.source.y; }).attr("x2", function(d) { return d.target.x; }).attr("y2", function(d) { return d.target.y; });node.attr("cx", function(d) { return d.x; }).attr("cy", function(d) { return d.y; }); text.attr("transform", function(d) {return "translate(" + d.x + "," + d.y + ")";}); });} }); Acknowledgement: thanks Jay Banerjee for his input on this blog post.

This is a joint work with Isai Barajas (primary contributor) and John Wyant. Python Notebook is a very convenient tool for one to build aworkflow/demo based on Oracle Big Data Spatial and Graph...

Graph Features

Using iPython Notebook and BDSG Property Graph

This blog post is for users that are using python and Oracle Big Data Spatial and Graph (BDSG). It shows how one can easily develop a demo flow with iPython Notebook and BDSG functions. As usual, we are using the famous Big Data Lite VM (version 4.5.0) and the demo flow is about building a Personalized Page Rank (PPR) based recommender system.  Step 1: Start iPython Notebook (for brevity, setup of iPython is omitted) with "$ ipython notebook --no-mathjax"  Type in the following in the browser page (Notebook). These few lines set the UTF8 encoding and import several packages. Step 2: Create a graph config. Note that the graph data is stored in Apache HBase. The graph name is "user_movie" and this property graph has two kinds of vertices: users and movies. If a user U clicked movie M, then there is an edge with label "click" from U to M. In addition, there is a reverse edge with label "clickedBy" from M to U.  Step 3: Get an instance of OraclePropertyGraph, starts the in-memory analyst (PGX), and creates a session for running recommendation.  Step 4: Read the property graph from Apache HBase into the in-memory analyst.  Step 5: Use Text Search to find a vertex with a first_name starts with "nathan". Note that "first_name" is a property of vertices representing users in this graph. Step 6: Say we want to recommend movies for this user "Nathaniel" we just found. Create a vertex set that includes this user "Nathaniel" Step 7: Execute Personalized Page Rank to recommend movies (and also similar users) to Nathaniel  Step 8: Prepare data for plotting a chart on top personalized page rank values  Step 9: Plot it out in iPython Notebook. Cheers, Acknowledgement: thanks Jay Banerjee for his input on this blog post.

This blog post is for users that are using python and Oracle Big Data Spatial and Graph (BDSG). It shows how one can easily develop a demo flow with iPython Notebook and BDSG functions. As usual,...

Graph Features

Identifying Potential Fraud Activities in Finance

There are many forms of fraud activities in pretty much every single aspect of our social life. In this blog post, I am going to discuss one specific financial fraud that is based on circular payment scheme. In such a scheme, a notable pattern is that there is a clear, circular payment (or purchase) relationship that chains together a set of individuals or financial institutions. Using a simple example,  user A made a payment to user B, user B made a payment to user C, and user C made a payment back to user A. Granted that not every single circular payment is a true fraud case. Circular payments, nevertheless, deserve a scrutiny. Starting from v1.2.0, Oracle Big Data Spatial and Graph (BDSG) property graph has support for PGQL, a declarative, SQL-like query language for querying property graph data. It has clean and intuitive syntax. See [1] for the language specification.  PGQL, together with the scalable graph database and powerful graph analytics, can be used to detect circular payment in a graph.  An end-to-end flow is shown next. As usual, Big Data Lite VM is used and BDL 4.5 has BDSG v1.2.0 pre-installed. No vertex or edge property is used for simplicity. It is straightforward, however, to add a name property to vertices, and a weight property to edges. // Start Groovy for BDSG against Oracle NoSQL Database (Apache HBase can also be used).//cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy/sh gremlin-opg-nosql.sh server = new ArrayList<String>(); server.add("bigdatalite:5000"); // We start with creating an empty graph named "loop" in Oracle NoSQL Database cfg = GraphConfigBuilder.forPropertyGraphNosql() .setName("loop").setStoreName("kvstore") .setHosts(server) .hasEdgeLabel(true).setLoadEdgeLabel(true).addEdgeProperty("weight", PropertyType.DOUBLE, "1000000") .setMaxNumConnections(2).addVertexProperty("name", PropertyType.STRING, "empty name").build(); opg = OraclePropertyGraph.getInstance(cfg); opg.clearRepository(); // Add a cycle of 2 edges to the "loop" graph // a21=opg.addVertex(21l); a22=opg.addVertex(22l); e1=opg.addEdge(2122, a21,a22, 'paid'); e2=opg.addEdge(2221, a22,a21, 'paid'); opg.commit() // Add a cycle of 3 edges to the "loop" graph // a31=opg.addVertex(31l); a32=opg.addVertex(32l); a33=opg.addVertex(33l); opg.addEdge(3132, a31,a32, 'paid'); opg.addEdge(3233, a32,a33, 'paid'); opg.addEdge(3331, a33,a31, 'paid'); opg.commit() // Add a cycle of 4 edges to the "loop" graph // a41=opg.addVertex(41l); a42=opg.addVertex(42l); a43=opg.addVertex(43l); a44=opg.addVertex(44l); opg.addEdge(4142, a41,a42, 'paid'); opg.addEdge(4243, a42,a43, 'paid'); opg.addEdge(4344, a43,a44, 'paid'); opg.addEdge(4341, a44,a41, 'paid'); opg.commit() // Add a cycle of 5 edges to the "loop" graph // a51=opg.addVertex(51l); a52=opg.addVertex(52l); a53=opg.addVertex(53l); a54=opg.addVertex(54l); a55=opg.addVertex(55l); opg.addEdge(5152, a51,a52, 'paid'); opg.addEdge(5253, a52,a53, 'paid'); opg.addEdge(5354, a53,a54, 'paid'); opg.addEdge(5455, a54,a55, 'paid'); opg.addEdge(5551, a55,a51, 'paid'); opg.commit() // Now, read the graph into the in-memory analyst // session=Pgx.createSession("session_ID_1"); analyst=session.createAnalyst(); pgxGraph = session.readGraphWithProperties(opg.getConfig(), true); // Execute a query to find a cycle with 2 hops // An in-equality constraint is added to avoid finding a smaller circle // pgxResultSet = pgxGraph.queryPgql("SELECT n,m WHERE (n)->(m)->(n), n!=m") pgxResultSet.print(10); pgxResultSet.getNumResults() // Execute a query to find a cycle with 3 hops // In-equality constraints are added to avoid finding a smaller circle // pgxResultSet = pgxGraph.queryPgql("SELECT n,m,o WHERE (n)->(m)->(o)->(n), n!=m,n!=o,m!=o") pgxResultSet.print(10); pgxResultSet.getNumResults() // Execute a query to find a cycle with 4 hops // In-equality constraints are added to avoid finding a smaller circle // pgxResultSet = pgxGraph.queryPgql("SELECT n,m,o,p WHERE (n)->(m)->(o)->(p)->(n), n!=m,n!=o,n!=p,m!=o,m!=p,o!=p") pgxResultSet.print(10); pgxResultSet.getNumResults() // Execute a query to find a cycle with 5 hops // In-equality constraints are added to avoid finding a smaller circle // pgxResultSet = pgxGraph.queryPgql("SELECT n,m,o,p,q WHERE (n)->(m)->(o)->(p)->(q)->(n), n!=m,n!=o,n!=p,n!=q,m!=o,m!=p,m!=q,o!=p,o!=q,p!=q") pgxResultSet.print(10); pgxResultSet.getNumResults() An example output of the last command is as follows: ------------------------------------------------------------------------------------------------ | n                | m                | o                | p                | q                | ================================================================================================ | PgxVertex[ID=53] | PgxVertex[ID=54] | PgxVertex[ID=55] | PgxVertex[ID=51] | PgxVertex[ID=52] | | PgxVertex[ID=52] | PgxVertex[ID=53] | PgxVertex[ID=54] | PgxVertex[ID=55] | PgxVertex[ID=51] | | PgxVertex[ID=51] | PgxVertex[ID=52] | PgxVertex[ID=53] | PgxVertex[ID=54] | PgxVertex[ID=55] | | PgxVertex[ID=55] | PgxVertex[ID=51] | PgxVertex[ID=52] | PgxVertex[ID=53] | PgxVertex[ID=54] | | PgxVertex[ID=54] | PgxVertex[ID=55] | PgxVertex[ID=51] | PgxVertex[ID=52] | PgxVertex[ID=53] | ------------------------------------------------------------------------------------------------ ==>null opg-nosql> pgxResultSet.getNumResults() ==>5 You may wonder how one would detect circles without knowing the size of a circle? For that, we have a built-in API to detect strongly connected components (SCC). One can easily find circles in each SCC and it is guaranteed that there is zero circles across SCC's.  Details on that are in an upcoming blog post. Stay tuned. Acknowledgement: thanks Jay Banerjee and Oskar van Rest  for their input on this blog post. [1] PGQL 0.9 Specification: https://docs.oracle.com/cd/E56133_01/1.2.0/PGQL_Specification.pdf

There are many forms of fraud activities in pretty much every single aspect of our social life. In this blog post, I am going to discuss one specific financial fraud that is based on circularpayment...

Graph Features

Oracle Big Data Spatial and Graph: Converting CSV to Oracle-defined Property Graph Serialization Format

Oracle Big Data Spatial and Graphprovides converter APIs to transform graph data from a comma separated values (CSV) format to the Oracle-defined serialization format and store the transformed data in vertices and edges files with  ".opv" and ".ope" extensions, respectively. The benefits of using the Oracle-defined serialization format are described in the post: Oracle Big Data Spatial and Graph Flat File Format: a fast, rich serialization format for Property Graphs. The CSV format can be used to to encode vertices and edges of a graph. In this format, each record of the CSV filerepresents a single vertex or edge, with allits properties. The converter APIs do not require two separate csv files. If a single CSV has both vertices and edges with corresponding relationships, then the conveter APIs can be executed on the same CSV file twice (or even more times as needed.) The CSV file mayinclude a header line specifying the column name and the data type of the attribute in the column. If the data type isn't specified the converter APIs assume the data type is String. The Java APIs that convert CSV to OPV or OPE receive an InputStream of vertices in CSV format and write the vertices in .opv or .ope format to an OutputStream. Theconverter APIs also allow customizationof the conversionprocess. If the CSV file does not include a header, one needs to specifya ColumnToAttrMapping array describing all of the attributenames and values data types in the same order in which they appearin the CSV file. Additionally, the columns in the CSV file must bedescribed in full in the array, including special columns such as the ID for thevertices and the ID for the edgesif it applies, and START_ID, END_ID, TYPE that are mandatory. If you want to specify the headers for the column in the first lineof the same CSV file, then this parameter must be set to null. Converting Vertices To convert a CSV file representing vertices one can use one of the convertCSV2OPV APIs, the simplest of these APIs requires: An InputStream to read vertices from a CSV file The name of the column representing the vertex ID (this column must appear in the CSV file) An integer offset added to the VID (An offset is useful to avoid collisions among ID values for graph elements) A ColumnToAttrMapping array (which must be null if headers are specified in the file) A degree of Parallelism (DOP) An integer denoting the offset before beginning the conversion, in other words the number of vertex records to skip An OutputStream to which the vertex flat file (.opv) will be written An optional DataConverterListener that can keep track of the conversion progress and execute if an error occurs. There are four additional parameters that can be used to specifya different format of the CSV file, these parameters are: The delimiter character: which is used to separate tokens in a record, the default is the comma character ‘,’. The quotation character, which is used to quote String values so they can contain special characters, for example, commas. If a quotation character appears in the value of the String itself it must be escaped either by duplication or by placing a backslash character ‘\’ before it. Some examples are: """Hello, world"", the screen showed…" "But Vader replied: \"No, I am your father.\"" The Date format, which will be used to parse the date values. For the CSV conversion, this parameter can be null, but it is recommended to be specified if the CSV has a specific date format., Providing a specific date format helps performance sincethat format will be used as the first option when trying to parse date values. Some example date formats are: "yyyy-MM-dd'T'HH:mm:ss.SSSXXX" "MM/dd/yyyy HH:mm:ss" "ddd, dd MMM yyyy HH':'mm':'ss 'GMT'" "dddd, dd MMMM yyyy hh:mm:ss" "yyyy-MM-dd" "MM/dd/yyyy" A flag indicating if the CSV file contains String values with new line characters. If this parameter is set to true, all the Strings in the file that contain new lines or quotation characters as values must be quoted. "The first lines of Don Quixote are: ""In a village of La Mancha, the name of which I have no desire to call to mind""." The following code fragment shows how to create a ColumnToAttrMapping array and use the API to convert a CSV file into a .opvfile. String inputCSV = "/path/mygraph-vertices.csv"; String outputOPV = "/path/mygraph.opv";ColumnToAttrMapping[] ctams = newColumnToAttrMapping[4];ctams[0] =ColumnToAttrMapping.getInstance("VID", Long.class);ctams[1] =ColumnToAttrMapping.getInstance("name", String.class);ctams[2] =ColumnToAttrMapping.getInstance("score", Double.class);ctams[3] =ColumnToAttrMapping.getInstance("age", Integer.class);String vidColumn = "VID"; isCSV = new FileInputStream(inputCSV);osOPV = new FileOutputStream(newFile(outputOPV)); // Convert VerticesOraclePropertyGraphUtilsBase.convertCSV2OPV(isCSV,vidColumn, 0, ctams, 1, 0, osOPV, null);isOPV.close();osOPV.close(); In this example, the CSV file to be converted must not includethe header and contain four columns (one for the vertex ID, the second one forname, third one for score and the last one for age). An example CVS is shownbelow: 1,John,4.2,302,Mary,4.3,323,"Skywalker, Anakin",5.0,464,"Darth Vader",5.0,465,"Skywalker, Luke",5.0,53 The resulting .opv is as follows: 1,name,1,John,,1,score,4,,4.2,1,age,2,,30,2,name,1,Mary,,2,score,4,,4.3,2,age,2,,32,3,name,1,Skywalker%2C%20Anakin,,3,score,4,,5.0,3,age,2,,46,4,name,1,Darth%20Vader,,4,score,4,,5.0,4,age,2,,46,5,name,1,Skywalker%2C%20Luke,,5,score,4,,5.0,5,age,2,,53, Converting EdgesTo convert a CSV file representing edges one can use one of the convertCSV2OPE APIs, the simplest of these APIs requires: An InputStream to read edges from a CSV file Name of the column that is representing the edge ID (this is optional in the CSV file, if it is not present then the line number will be used as the ID) An integer offset to add to the EID (An offset is useful to avoid collision in ID values of graph elements) Name of the column that is representing the source vertex ID (this column must appear in the CSV file) Name of the column that is representing the destination vertex ID (this column must appear in the CSV file) A boolean flag indicating if the edge label column is present in the CSV file. Name of the column that is representing the edge label (if this column is not present in the CSV file, then this parameter will be used as a constant for all edge labels) A ColumnToAttrMapping array (which must be null if the headers are specified in the file) Degree of Parallelism (DOP) An integer denoting offset (number of edge records to skip) before converting An OutputStream in which the edge flat file (.ope) will be written An optional DataConverterListener that can be used to keep track of the conversion progress and decide what to do if an error occurs. There are four additional parameters that can be used to specifya different format of the CSV file, these parameters are: The delimiter character: which is used to separate tokens in a record, the default is the comma character ‘,’. The quotation character, which is used to quote String values so they can contain special characters, for example, commas. If a quotation character appears in the value of the String itself it must be escaped either by duplication or by placing a backslash character ‘\’ before it. Some examples are: "Hello, world"", the screen showed…" "But Vader replied: \"No, I am your father.\"" The Date format, which will be used to parse the date values. For the CSV conversion, this parameter can be null, but it is recommended to be specified if the CSV has a specific date format., Providing a specific date format helps performance sincethat format will be used as the first option when trying to parse date values. Some example date formats are: "yyyy-MM-dd'T'HH:mm:ss.SSSXXX" "MM/dd/yyyy HH:mm:ss" "ddd, dd MMM yyyy HH':'mm':'ss 'GMT'" "dddd, dd MMMM yyyy hh:mm:ss" "yyyy-MM-dd" "MM/dd/yyyy" A flag indicating if the CSV file contains String values with new line characters. If this parameter is set to true, all the Strings in the file that contain new lines or quotation characters as values must be quoted. "The first lines of Don Quixote are: ""In a village of La Mancha, the name of which I have no desire to call to mind""." The following code fragment shows how to use the API to converta CSV file into a .ope file with a null ColumnToAttrMapping array. String inputOPE = "/path/mygraph-edges.csv"; String outputOPE = "/path/mygraph.ope";String eidColumn = null; // null implies that an integersequence will be usedString svidColumn      = "START_ID";String dvidColumn = "END_ID";boolean hasLabel       = true;String labelColumn     = "TYPE"; isOPE = new FileInputStream(inputOPE);osOPE = new FileOutputStream(newFile(outputOPE)); // Convert EdgesOraclePropertyGraphUtilsBase.convertCSV2OPE(isOPE, eidColumn, 0, svidColumn,dvidColumn, hasLabel, labelColumn, null, 1, 0, osOPE, null); An input CSV that uses the former example to be converted shouldinclude the header specifying the columns name and their type. An example CSVfile is shown below: START_ID:long,weight:float,END_ID:long,:TYPE 1,1.0,2,loves1,1.0,5,admires2,0.9,1,loves1,0.5,3,likes2,0.0,4,likes4,1.0,5,is the dad of3,1.0,4,turns to5,1.0,3,saves from the dark side The resulting .ope is as follows: 1,1,2,loves,weight,3,,1.0,2,1,5,admires,weight,3,,1.0,3,2,1,loves,weight,3,,0.9,4,1,3,likes,weight,3,,0.5,5,2,4,likes,weight,3,,0.0,6,4,5,is%20the%20dad%20of,weight,3,,1.0,7,3,4,turns%20to,weight,3,,1.0,8,5,3,saves%20from%20the%20dark%20side,weight,3,,1.0,

Oracle Big Data Spatial and Graph provides converter APIs to transform graph data from a comma separated values (CSV) format to the Oracle-defined serialization format and store the transformed data in...

Spatial Features

Questions & Answers from the July 19th Big Data Spatial on Hadoop webcast

We had a number of good questions from theaudience during our July 19 Directions webcast, Get the most from yourdata, developers and data scientists using Big Data spatial analysis on Hadoopand NoSQL. Below are more answers to the questions, including those we didn’t havetime for during the event. Q&A Q: Is arecorded version of this webinar available for a review on a later date? A: Yes,you can view the recording on demand here: http://www.directionsmag.com/webinars/get-the-most-from-your-data-developers-and-data-scientists-using-big-d/469420 Q: Wherecan I find product documentation, data sheets, and software downloads? A: TheOracle Big Data Spatial and Graph product site on OTN has all these materials –visit the site here: www.oracle.com/technetwork/database/database-technologies/bigdata-spatialandgraph The Big Data Spatial and Graph blog also hasexamples and tips: https://blogs.oracle.com/bigdataspatialgraph/ Q: Whattype of clustering algorithm do you use? and why did you choose this algorithm? A: We do the K-means based clustering as itis one of the popular clustering algorithms. In future, we are looking at anextensible framework so that users can plug into their own clusteringalgorithms into our framework. Q: Whatthe difference between the spatial & graph available in Oracle Database vsBig Data? A: Oracle Spatial and Graph option for OracleDatabase is mainly developed for these following classes of applications: applications that need transactional support,applications that need a SQL interface, applications that need to interact withother business data stored in relational databases, etc. It is a more complete platform for Spatial,Graph and large-scale GIS analytical and operational applications. In addition to native 2D and 3D vector, pointcloud, and raster datatype support, it includes data models for Networkanalysis, Topology data. It alsoincludes OGC web services, linear referencing, a geocoding engine and a routingengine. Because it is tightly integratedwith Oracle Database, it inherits the security and support for a wide range ofdatabase analytic, performance, and manageability features. Oracle Big Data Spatial and Graph was developedto support the following classes of applications: applications that are batchprocessing oriented, applications that need to process large amounts ofunstructured data to filter and aggregate based on spatial attributes, rasterdata processing applications that need to do quick and simple filtering type ofprocessing on the data, etc. Both technologies are developed to addressdifferent types of applications, but there may be some areas where eithertechnology will be suitable. Another factor to consider is the technicalexpertise required to develop applications on these platforms. With Databasetechnology, standard SQL and simple Java programming skills are required. ForBig Data platforms, complex programming in Java may be required. Q: Whatdo you recommend for storing the spatial data (images) - database or filesystem? A: See above for some considerations. Storage of raster imagery in the Database,through the native GeoRaster format that is a part of Oracle Spatial and Graph,or on files, are both possible alternatives. This can be very application dependent. Q: Is thefunctionality you are showing actual software? A: Yes, this is commercially supported software,and can be deployed on any CDH or HDP Hadoop cluster. The software demonstrated was Oracle Big DataSpatial and Graph stand alone for the Raster demo and in conjunction with theOracle Big Data Discovery product for the analytic demo. Q: Is thereany way to get a trial version to get a feel for the capabilities? A: Yes. Oracle Big Data Lite Virtual Machine provides a free sandboxenvironment that you can download totry out Big Data Spatial and Graph, and other Oracle software components, forevaluation. You can download Big Data Lite here: http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html There’s also a spatial and graph Hands On Labwith exercises and sample data sets that you can try, available here: http://tinyurl.com/BDSG-HOL Q: Isit possible to use the applications in an educational setting, so that studentscan see how this works? A: Yes,you may download Big Data Lite at no cost, and hands on lab materials above aswell, and try out the software and demos. It’s a great resource, and we encourage folks to try it out. Additionally, Oracle Academy allows educationalinstitutions to use our software for teaching and includes a variety ofadditional benefits - such as access to training material and the like. We invite you to join the program. Please visit https://academy.oracle.com for moreinformation. Q: Howto store spatial data like shapefiles, satellite imagery, maps in Big data andprocess accordingly on our algorithms? A: ForVector data, it is as simple as copying the data to HDFS using HDFS file copyfunctions. For Raster data, some data processing is required to split the datainto HDFS blocks. Seeour technical white paper here for more details. We alsohave a GDAL based loader that does this data loading for Raster data. Q: Howor will it support GIS Web Application APIs such as Leaflet, Google maps API, ESRI JS API, etc.? A: It is possible for the developer to exposethe vector and raster processing APIs as micro web services andprogrammatically integrate these services into GIS Web Application APIs. In addition, those GIS Web Application APIscan access the results of processing as with any other data in Hadoop. Q: Howis it possible to mix MapReduce oriented spatial functions developed by yourteam and internal GIS development? A: The APIs we have are like any other JavaAPI, so they can be combined with any other functionality developed in house. Atthe top level MapReduce job, these APIs can be combined to process the data. Q: Howdo you support on the fly map projection and in what accuracy? A: We use the GDAL PROJ4 driver for this, sothe accuracy is based on the PROJ4 libraries. Q: Howare analytic processes documented so that end users have understanding andconfidence in derived data products? A: These are documented as Java APIs alongwith the documentation in White Papers and User Guides. Q: Haveyou talked to State DOTs about using this for crashes and location research? A: We have number of initiatives with differentcustomers. Several departments oftransportation are users of our Database Spatial and Graph features forhighway/roadway management, crash analysis, and more. Hereis one customer DOT example usingspatial technologies for geolocation analysis of crashes: http://download.oracle.com/otndocs/products/spatial/pdf/osuc2013_presentations/osuc13_autoeventgeoloc_dildine.pdf Q: DrSiva was speaking about loading data, by picking layers. Where is this datasourced? A: This data can be from open source or someother commercial source. This data has to be loaded into HDFS as GeoJSON layersand our APIs can use them. Q: Couldit work as a micro-service we call on a regular basis based on a pre-definedmodel? A: Yes any of these APIs can be published asmicro-services. One can define services that just invoke the underlying JavaAPIs. Q: Area"binning" very good - is there such a thing as "linearbinning"? Interested in roadnetwork incident mapping / exploring for data not (yet) linear referenced. A: Linearbinning is not part of the prebuilt functionality. For extensive support of linearreferencing and road networks you may want to look into using the Spatial andGraph option in the Oracle Database (moreinformation here). Q: Ihave never come across a custom made Oracle software that can handle spatialdata processing especially raster image processing and building a database forspatial data. Can you point me to oneapart from the one that is being demonstrated at the moment? How is it different from other imageprocessing software products? A: As part of Oracle Spatial and Graph optionin the Oracle Database, we have included a GeoRaster feature that can do rasterprocessing. It was primarily meant forraster data management and not necessarily for raster image processing. Since12c Release 1 we started adding more image processing capabilities intoGeoRaster (like raster algebra and other image processing features). You canfind more info here: Oracle Spatial and Graph GeoRaster Technical White Paper With BDSG (the Hadoop based product) we are focused more on imageprocessing, since the types of workflows we want to support with this productare more image processing type workloads. Q: Howmany incidents are refreshed/ managed in real time (fast moving objects)? A: Since this is a Hadoop based system, it isnot geared for real-time data management. Hadoop processes have a long latency forstarting the MapReduce jobs. We plan to release support on Apache SPARK in afuture release that can support real-time workloads. Oracle also has a stream processing engine, OracleStream Analytics, that can, depending on processing power, handle over 1.5Mops (seeBenchmark whitepaper). Q: Forraster, Can you give some use cases why you would like to do with Raster? Facerecognition? A: Facerecognition is a pure image processing application, and we do include softwareto do this type of application with BDSG as part of the multimedia features (seethis blog example). For Raster processing, the applications include processing to quicklyassess the quality of satellite images (cloud cover, etc.), quickly mosaickrasters to create new data products, create new data like hillshading fromelevation models, process elevation models to calculate flood risks, etc. Q: Onstoring satellite images on Hadoop, accepted it can be stored however how do weretrieve it again? Assuming I have deleted the original image on the system. A: The data can be sent back to a regularfile system after some raster processig is done. For example, the subset or mosaick operationscan process rasters stored in HDFS, and the results can be written back to NFSin some user specified format like GeoTIFF. And once it is on NFS, other applications canaccess the results. Q: Doyou work with external sources in their native format or you need to migrate toHDFS? A: Forvectors, we work in their native format. Users need to provide a RecordReaderclass to read the native format and plug that Java class into our framework. For Raster, we still work in the nativeformat, but the file block structure is organized in such a way to optimize theraster data processing. So it doeschange the layout of the raster data in the data (HDFS) blocks, but the formatitself is not changed. Q: Isthere any way one can do satellite image processing on Hadoop after storing theimages in the Hadoop environment? A: Yes this can be done with the raster dataanalysis framework we have in the product. The example raster demo at the endis an example of such satellite image processing operations. Q: Canyou work in a 3D environment? A: The vector API we provide does work with3D data as well. Additionally, Spatial and Graph optionprovides native 3D and Lidar/Point Cloud support inOracle Database.

We had a number of good questions from the audience during our July 19 Directions webcast, Get the most from your data, developers and data scientists using Big Data spatial analysis on Hadoopand...

Graph Features

Oracle Big Data Spatial and Graph: Using Secure Apache HBase on CDH

The property graph feature of Oracle Big Data Spatial and Graph supports Secure Apache HBase (Kerberos-enabled) on CDH as well as secure Oracle NoSQL Database. Users may encounter an SaslException using Secure Apache HBase cause:org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException):GSS initiate failed This SaslException can occur if you use a specific hostname when setting the kerberos principals, hbase.master.kerberos.principal and hbase.regionserver.kerberos.principal as shown in the example below. config.set("hbase.master.kerberos.principal",       "hbase/mycuster_host1@YOUR_REALM.COM") config.set("hbase.regionserver.kerberos.principal", "hbase/mycuster_host1@YOUR_REALM.COM") You can avoid this exception by using "hbase/_HOST" as shown below. config.set("hbase.master.kerberos.principal",       "hbase/_HOST@YOUR_REALM.COM");config.set("hbase.regionserver.kerberos.principal", "hbase/_HOST@YOUR_REALM.COM"); Verifying your Secure Apache HBase configuration Confirm that HDFS is operating normally hdfs dfs -ls / Confirm that  HBase is operating normally hbase shellstatuslist Login to Cloudera Manager and confirm that Zookeeper, HDFS, and HBase are all operating normally Test whether you can connect to your Apache HBase database and perform a simple scan using the Groovy script provided below.  (If this script completes successfully, your Big Data Spatial and Graph property graph functions should execute successfully.) Note: please customize this Groovy script before executing it, including the quorum, port, principal, keytab, table name, column family name, etc. import org.apache.hadoop.security.*;szQuorum="host1,host2,host3";  // need customization config = HBaseConfiguration.create();config.set("hbase.zookeeper.quorum", szQuorum);config.set("hbase.zookeeper.property.clientPort", "2181");config.set("hbase.security.authentication", "kerberos");config.set("hadoop.security.authentication", "kerberos");config.set("hbase.rpc.engine", "org.apache.hadoop.hbase.ipc.SecureRpcEngine"); config.set("hbase.master.kerberos.principal",       "hbase/_HOST@YOUR_REALM.COM");config.set("hbase.regionserver.kerberos.principal", "hbase/_HOST@YOUR_REALM.COM"); UserGroupInformation.setConfiguration(config);ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI( "cloudera-scm/admin@YOUR_REALM.COM","YOUR_PATH_HERE/cmf.keytab");UserGroupInformation.setLoginUser(ugi); hconn = HConnectionManager.createConnection(config);hti = hconn.getTable("connectionsVT.");  // "connectionsVT." is the table name scan = new Scan();scan.addFamily(Bytes.toBytes("v"));      // "v" is the column family namersScanner=hti.getScanner(scan); i=1;obj=rsScanner.next();

The property graph feature of Oracle Big Data Spatial and Graph supports Secure Apache HBase (Kerberos-enabled) on CDH as well as secure Oracle NoSQL Database. Users may encounter an SaslException...

Spatial Features

Upcoming Webinar 7/19: Get the most from your data, developers and data scientists using Big Data spatial analysis on Hadoop and NoSQL

You're invited to join us for this webinar: Get the most from your data, developers and data scientists using Big Data spatial analysis on Hadoop and NoSQLTuesday, July 19th 2016, 11:00am - 12:00pm US Eastern, 8:00 - 9:00 am US PacificREGISTER HERE TODAY Hosted by Directions MagazineNote:  The event will be recorded for viewing later, if you're unable to attend the live event.  To receive an email with a link to the recording, please also register at the link above.  Overview Just the facts Hadoop, NoSQL, Spark, and related technologies are at the heart of a powerful new development and analytic platform that enterprises are using to lower cost and discover insights from a wide range of big data sources.   Data scientists and software developers use these distributed frameworks to create a new class of solutions designed to analyze and process massive amounts of data.   Oracle Big Data Spatial and Graph offers a set of high-performance, commercial spatial analysis components, services and data models that bring location analysis to Big Data workloads and enable Big Data processing of data for use in geospatial workflows.   A geo-enrichment service provides location identifiers to big data, enabling data harmonization based on location.  Dozens of pre-built geographic and location analysis functions can be used for analyzing, categorizing and filtering data.   For geo-processing and raster processing workloads, the ability to perform large-scale operations for cleansing data and preparing imagery, sensor data, and raw input data is provided.  Results of spatial analysis and processing can be displayed using an HTML5 map visualization API.  We will share an overview of the new features, demos and examples showing how to invoke these services for your application, and use cases.  Learn how developers and data scientists can handle spatial and raster big data analytics and processing workloads. In this webinar you will learn   Where Hadoop and related technologies fit in vector and raster workflows and how to best use this analytic and processing platform How to discover location relationships and patterns in big data among customers, organizations, and assets, and enrich your big data with location How to enrich, categorize, harmonize, and visualize big data with location services How to handle the most challenging spatial and raster analytic and data processing workloads More details Oracle is the world's most complete, open, and integrated business software and hardware systems company. Oracle is a leader for geospatial technologies, offering a wide range of 2D and 3D spatial capabilities based on OGC and ISO standards, for database, middleware, big data, and cloud platforms. Oracle’s spatial technologies are used by numerous 3rd party tools, components, and solutions, as well as by Oracle's enterprise applications for on-premise and cloud deployment. Speakers James Steiner, Vice President, Server Technologies, Oracle Dr. Siva Ravada, Senior Director of Development, Oracle David Lapp, Senior Principal Product Manager, Oracle Who should attend Chief information officers; chief data officers; Big Data, IT and GIS developers; Data scientists; anyone responsible for innovation, analytics, Big Data strategy and implementationREGISTER HERE TODAY

You're invited to join us for this webinar: Get the most from your data, developers and data scientists using Big Data spatial analysis on Hadoop and NoSQLTuesday, July 19th 2016, 11:00am - 12:00pm US...

Graph Features

Oracle Big Data Spatial and Graph Flat File Format: a fast, rich serialization format for Property Graphs

Serialization in flat file formats is handy for bulk loading and exchanging models among applications, systems and databases. Oracle Big Data Spatial and Graph Property Graph can be serialized into  the common formats, GraphML, GraphSON, and GML. However, if you have billions of edges, need parallel loading, and/or need additional built-in data types, such as date and Serializable Java objects you might consider using the easy to read and write Oracle flat file format. It is included with the property graph feature of Big Data Spatial and Graph. It has strong data type support for string, integer, float, double, date, boolean and Serializable Java object. The Oracle flat file format makes it possible to specify and load graphs with billions of edges because it is simple and quick to read, write and parse, and is also well suited to break into chunks for parallelized loading. Using this flat file format, each property graph is encoded using a pair of files: a .opv (vertex) file and .ope (edge) file. The files are very intuitive. Each line in a vertex file is a record that describes a vertex of the property graph, and each line in an edge file is a record that describes an edge. If a vertex or edge has more than one property (multiple K/V pairs), then multiple records/lines are used to represent those properties of the vertex or edge. An example of a property graph in Oracle flat file format is as follows. In this graph, there are two vertices (John and Mary), and a single edge denoting that John is a friend of Mary. %cat simple.opv1,age,2,,10,1,name,1,John,,2,name,1,Mary,,2,hobby,1,soccer,, Each line above describes vertex ID, key (of a property), data type of the value, and the actual value. %cat simple.ope100,1,2,friendOf,%20,,, Each line in the above .ope describes edge ID, source vertex ID, destination vertex ID, edge label, key (of a property), data type of the value, and the actual value. The data type field is specified as an encoding. For example, data type 1 means string, 2 means integer, etc. Details for the fields can be found in Chapter 4, section 4.11 of the Big Data Spatial and Graph User's Guide and Reference https://docs.oracle.com/en/bigdata/. Note that in Oracle-defined property graph flat file format, the delimiter used is "," and the number and positions of delimiters per line are pre-defined and have to be correct. To illustrate, I am going to use the Oracle Big Data Lite VM which can be downloaded from the following location. http://www.oracle.com/technetwork/community/developer-vm/index.html#bi After starting the VM and login, use "Start/Stop Services" icon on the desktop to start the Oracle NoSQL Database service. Open a Linux terminal and type in the following: cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy/ sh gremlin-opg-nosql.sh server = new ArrayList<String>(); server.add("bigdatalite:5000"); // Create a graph config that contains the graph name "test_flat_file" cfg = GraphConfigBuilder.forPropertyGraphNosql()             \   .setName("test_flat_file").setStoreName("kvstore")         \   .setHosts(server)                                          \   .addEdgeProperty("weight", PropertyType.DOUBLE, "1000000") \   .setMaxNumConnections(2).build(); opg = OraclePropertyGraph.getInstance(cfg); opg.setClearTableDOP(2); // will return NULL because this                          // API has no return value.                          // It is expected. opg.clearRepository();   // remove all vertices and edges vx = opg.addVertex(1234l); vy = opg.addVertex(1235l); // Add an edge from vx to vy, and another from vy to vx e1=opg.addEdge(3000l, vx, vy, "likes"); e1.setProperty("weight", 1.1d); e2=opg.addEdge(3001l, vy, vx, "likes"); e2.setProperty("weight", 1.5d); opg.commit();  // Serialize this graph out OraclePropertyGraphUtils.exportFlatFiles(opg, "/tmp/test.opv", "/tmp/test.ope", 2, false /*append*/); // exit Groovy shell :quit The Java code snippet above (running inside Groovy) does quite a few things. It creates a graph called "test_flat_file", removes all vertices and edges, adds two vertices and two edges, and finally serializes the graph out using the flat file format. Feel free to tweak it to add more vertices and edges to your satisfaction. Now, let's inspect the contents of the two files. [oracle@bigdatalite groovy]$ cat /tmp/test.opv 1234,%20,,,, 1235,%20,,,, [oracle@bigdatalite groovy]$ cat /tmp/test.ope 3001,1235,1234,likes,weight,4,,1.5, 3000,1234,1235,likes,weight,4,,1.1, We hope you find this flat file format useful, please let us know what you think.

Serialization in flat file formats is handy for bulk loading and exchanging models among applications, systems and databases. Oracle Big Data Spatial and Graph PropertyGraph can be serialized into ...

Graph Features

Using Geospatial Data and Applying GeoSpatial Search with BDSG Property Graph

I’d like to show how to store geospatial data and apply geospatial search with Oracle Big Data Spatial and Graph (BDSG) Property Graph.  Filtering graph entities with geospatial search is a powerful way to enhance  social network analysis, recommendation systems, and fraud analysis workflows.  Apache Solr is a well-known search engine supporting geospatial index and search.  In this example, we’ll use Apache Solr to generate a text index on a sample social graph (vertices and edges)  which has geospatial data (lat/long) associated with vertices.  Using the built-in Groovy shell environment, we’ll invoke the data access layer Java APIs and apply Solr’s built-in geospatial functions to filter the graph results with a spatial “window of interest”.  The resulting entities and their locations can be used for further analysis or visualization on a map.As usual, let's use the well-known Big Data Lite VM. After login, click on "Start/Stop Services" icon on the desktop, make sure Zookeeper, HDFS, HBase, NoSQL and Solr are checked, and hit Enter. Open the following file in an editor, /opt/oracle/oracle-spatial-graph/property_graph/dal/opg-solr-config/schema.xml Go to line 138, add the following, save and quit the editor. Basically this configuration change declares values of any property with a name suffix "location_str" will be treated as location data type.    <dynamicField name="*location_str"  type="location" indexed="true" stored="true"/> Run the following command line in a Linux terminal to load the solr config.  /usr/lib/solr/bin/zkcli.sh -zkhost bigdatalite:2181/solr -cmd upconfig -confdir /opt/oracle/oracle-spatial-graph/property_graph/dal/opg-solr-config/ That is it! We are done with configuration. Don't believe me? Let's give it a try. First thing, we need to find a piece of property graph with GeoSpatial data. Unfortunately, we don't have one readily available in the VM. Don't worry, we can improvise. For the connections graph, under the directory of /opt/oracle/oracle-spatial-graph/property_graph/data/, we have a vertex file which has information about some interesting people and companies. Let's add some Spatial data, some random ones to save time. $ cat connections.opv |cut -d ',' -f 1 | awk '{print $1}' |sort |uniq | awk '{print $1 ",location,1," (37.52914+rand() -0.5) "%2C" (-122.2669+rand() -0.5) ",," }'   > /tmp/addon_spatial.opv $ cat /dev/null >/tmp/addon_spatial.ope Now, what exactly is going on? The above script pulls out all the unique vertex IDs, adds to each vertex a new property "location", and assigns randomly generated coordinates centered around 400 Oracle Pkwy. Since this example only deals with Spatial information on vertices, we will use an empty .ope (edge) file. If you want, you can invoke a geo coding service and assign real coordinates. The idea stays the same though. Let's see a few examples: $ head -3 /tmp/addon_spatial.opv 1,location,1,37.2669%2C-122.476,, 10,location,1,37.875%2C-122.615,, 11,location,1,37.6147%2C-122.573,, In this case,  the following coordinate (lat/long) is assigned to vertex 1. Yous might be different depending on the random values generated. 37.2669,-122.476 That "%2C" is the encoded form of a comma. Such an encoding is required because Oracle-defined flat files use commas to delimit. The show is ready. Now, we need to load the graph data up into the database and create a Solr-based index. $ cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy/ $ sh ./gremlin-opg-hbase.sh cfg = GraphConfigBuilder.forPropertyGraphHbase()            \  .setName("connectionsHBase")                               \  .setZkQuorum("bigdatalite").setZkClientPort(2181)          \  .setZkSessionTimeout(120000).setInitialEdgeNumRegions(3)   \  .setInitialVertexNumRegions(3).setSplitsPerRegion(1)       \  .addEdgeProperty("weight", PropertyType.DOUBLE, "1000000") \  .build(); opg = OraclePropertyGraph.getInstance(cfg); opg.clearRepository(); opgdl=OraclePropertyGraphDataLoader.getInstance(); vfile="../../data/connections.opv" efile="../../data/connections.ope" opgdl.loadData(opg, vfile, efile, 2); vfile="/tmp/addon_spatial.opv" efile="/tmp/addon_spatial.ope" opgdl.loadData(opg, vfile, efile, 2); szSolrServerUrl = "bigdatalite:2181/solr" szNodeSet = "bigdatalite.localdomain:8983_solr" indexParams = OracleIndexParameters.buildSolr("opgconfiglower",szSolrServerUrl, szNodeSet, 15 /*ZKTimeout*/, 1 /*nShards*/, 1/*nRepF*/, 1/*shardsPerNode*/, 1/*numConnections*/, 10000/*batchSize*/,500000/*commitSize*/, 20/*writeTimeout*/); opg.setDefaultIndexParameters(indexParams); opg.createKeyIndex("name", Vertex.class); opg.createKeyIndex("role", Vertex.class); opg.createKeyIndex("country", Vertex.class); opg.createKeyIndex("religion", Vertex.class); opg.createKeyIndex("occupation", Vertex.class); opg.createKeyIndex("location", Vertex.class); The above Groovy-based script opens a connection to Apache HBase, loads the original connections graphs, loads into the same graph additional GeoSpatial data, and finally creates a Solr index. import oracle.pg.text.solr.*; import org.apache.solr.client.solrj.*; index = (SolrIndex<Vertex>) opg.getAutoIndex(Vertex.class); // To find vertices located within 16KM to a given coordinate 37.529147,-122.26693 // and return them sorted based on the distance // query = new SolrQuery("name_str:*")        .addFilterQuery("{!geofilt sfield=location_str pt=37.529147,-122.26693 d=16}") .setSort("geodist(location_str,37.529147,-122.26693) ", SolrQuery.ORDER.asc); index.get(query);  An example output is as follows: ==>Vertex ID 19 {country:str:United States, occupation:str:junior United States Senator from New York, role:str:political authority, name:str:Kirsten Gillibrand, location:str:37.5321,-122.2, religion:str:Methodism, political party:str:Democratic} ==>Vertex ID 5 {country:str:Italy, occupation:str:pope, role:str:Catholic religion authority, name:str:Pope Francis, location:str:37.6152,-122.168, religion:str:Catholicism} ==>Vertex ID 14 {role:str:business magnate, name:str:Aliko Dangote, location:str:37.3961,-122.275, religion:str:Islam, company:str:Dangote Group} ==>Vertex ID 77 {country:str:United States, occupation:str:CEO of Nest, name:str:Tony Fadell, location:str:37.6668,-122.291} ==>Vertex ID 17 {country:str:United States, role:str:actress, name:str:Robin Wright, location:str:37.6609,-122.196} Now, if we reduce the radius from 16KM to 15 KM, only a smaller subset of vertices is returned. query = new SolrQuery("name_str:*")        .addFilterQuery("{!geofilt sfield=location_str pt=37.529147,-122.26693 d=15}") .setSort("geodist(location_str,37.529147,-122.26693) ", SolrQuery.ORDER.asc); index.get(query); ==>Vertex ID 19 {country:str:United States, occupation:str:junior United States Senator from New York, role:str:political authority, name:str:Kirsten Gillibrand, location:str:37.5321,-122.2, religion:str:Methodism, political party:str:Democratic} ==>Vertex ID 5 {country:str:Italy, occupation:str:pope, role:str:Catholic religion authority, name:str:Pope Francis, location:str:37.6152,-122.168, religion:str:Catholicism} ==>Vertex ID 14 {role:str:business magnate, name:str:Aliko Dangote, location:str:37.3961,-122.275, religion:str:Islam, company:str:Dangote Group} For details on geofilt, geodist functions mentioned above, please refer to Apache Solr Reference Guide [1] The following screenshot shows those randomly generated coordinates using Oracle MapViewer. An interactive map can be found here. Note that you may need to click on "Show all content" button when viewing it in IE. Have fun adding GeoSpatial data and query to property graph! [1]https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.10.pdf Acknowledgement: thanks Xavier Lopez for his input on this blog post.

I’d like to show how to store geospatial data and apply geospatial search with Oracle Big Data Spatial and Graph (BDSG) Property Graph.  Filtering graph entities with geospatial search is a powerful...

Multimedia Features

Using Oracle Big Data Spatial and Graph and Oracle Big Data Lite VM for Multimedia Analytics

David Bayard, Oracle Big Data Pursuit TeamApril 2016 Multimedia Analytics with Oracle Big Data Spatial and Graph Facial Detection, Optical Character Recognition, and QR/Barcode Detection In this blog, we will explore how to use and extend the new Multimedia Analytics (MMA) framework that ships with the Oracle Big Data Spatial and Graph (BDSG) product. We will show how we can leverage the BDSG Multimedia Analytics framework to help us do things like facial recognition, optical character recognition, and barcode/QR detection.Note: This blog build upon some of the work in our previous blog about doing Barcode and QR Detection with the Big Data Lite VM. You might benefit from reading the previous blog before continuing. The previous blog is available here: https://blogs.oracle.com/datawarehousing/entry/using_spark_scala_and_oracle Getting Started with Oracle Big Data Spatial and Graph on the Oracle Big Data Lite VM: This demonstration uses the Oracle Big Data Lite VM version 4.4, which is available here: http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html . Version 4.4 of the VM includes CDH 5.5 and Oracle Big Data Spatial and Graph version 1.1.2. You may not be familiar with Oracle Big Data Spatial and Graph; if so, you can find more information at http://docs.oracle.com/cd/E69290_01/doc.44/e67958/toc.htm . This blog will focus on the multimedia analytics framework, which is Chapter 6 in the previous link.Obviously, your first step is to download the Oracle Big Data Lite VM (version 4.4 or higher). Once you have the Big Data Lite VM downloaded, imported, and running, then click on the “Refresh Samples” icon on the VM desktop to refresh the samples. At this point, you should find the files needed for this blog under your /home/oracle/src/Blogs/BDSG_Barcode directory.[Note: If you want to get access to the files referenced in this blog outside of the Oracle Big Data LiteVM, you can find them here:https://github.com/oracle/BigDataLite/tree/master/Blogs/BDSG_Barcode ]Now, run setup.shThe setup.sh script will configure the Big Data Lite VM for this blog by first getting the java libraries needed by ZXing (which is an open-source Barcode/QR detection library).  Then it will proceed to setup Tesseract, which is a C++ open source library for Optical Character Recognition.  Next, we grab the Tess4J java libraries that make it easy to call Tesseract from java.  Finally, we copy a couple of sample videos into HDFS.  Review the setup.sh script if you are curious about the specific details and commands. Running the basic sample from BDSG Multimedia Analytics Now that we have click on Refresh Samples and run setup.sh, our first activity will be to run a Facial Detection sample that ships with Oracle Big Data Spatial and Graph’s Multimedia Analytics feature.  We will try to identify the faces from this video (which is pretty funny, so check it out with the sound on): https://www.youtube.com/watch?v=Qz8bRyf1374&list=PL0DF9A83456FF4351&index=3You can learn more about the Facial Detection sample here: http://docs.oracle.com/cd/E69290_01/doc.44/e67958/GUID-4B15F058-BCE7-4A3C-A6B8-163DB2D4368B.htm#GUID-3C6B70D7-8AE9-4580-AE1C-7F8F15093F3EThe facial detection sample leverages the open source OpenCV libraries.  You can read more about OpenCV’s functionality for facial recognition here:http://docs.opencv.org/2.4/modules/contrib/doc/facerec/facerec_tutorial.htmlFor this sample, the Big Data Lite VM has already been trained to detect the faces of 4 different individuals (you can re-run the training with the /home/oracle/src/samples/mma/facetrain/trainface_bdl.sh script if you are interested).  The set of training images is located at /u01/oracle-spatial-graph/multimedia/example/facetrain/faces/bigdata . Below are the training images for individual #4.For this blog, let’s invoke the multimedia analytics framework to do facial detection using our trained facial data against our sample video (which is stored locally on the VM at /u01/oracle-spatial-graph/multimedia/example/video/bigdata.mp4 ).  To do so, $ cd /home/oracle/src/samples/mma/analytics$ hadoop fs -rm -R voutput_image $ ./runjob_bdl.shNotice that a map-reduce job is started.  The multimedia analytics framework provides an extensible map-reduce job that we can customize to do various kinds of image analytics.When the map-reduce job finishes, we can run the script playimage_bdl.sh (located at /home/oracle/src/samples/mma/analytics ) to see what faces it detected.When done, close the image window and type ctrl-c in the terminal window to get back to the terminal prompt. Building a simple custom OrdFrameProcessor for the framework Now that we’ve seen a working example, let’s look at what it will take to build our own.  As we noted above, the multimedia analytics framework provides an out-of-the-box map-reduce program to extend.  The main way you will extend the framework is by writing your own implementation of the OrdFrameProcessor class (Read more about extending the framework here: http://docs.oracle.com/cd/E69290_01/doc.44/e67958/GUID-4B15F058-BCE7-4A3C-A6B8-163DB2D4368B.htm#BDSPA-GUID-090BD058-396D-41F8-814E-D407DF0941F6 and here: http://docs.oracle.com/cd/E69290_01/doc.44/e66533/oracle/ord/hadoop/mapreduce/OrdFrameProcessor.html ).  Our first OrdFrameProcessor will be very basic; it will simply take the individual images passed in to it and pass them along.  In essence, this example will leverage the framework to convert a video into a series of images.Before continuing, let’s run this simple example by executing the run_video2image.sh script in the /home/oracle/src/Blogs/BDSG_Barcode directory.When the framework’s map-reduce job completes, enter Y to view the results…This will launch a script (save_images.sh) that will save the output images (which by default were written to set of SequenceFiles on HDFS) to the local linux file system.Then it will launch the linux “xdg-open” utility to let you browse the individual image files.  Use the Next and Previous arrows to navigate amongst the images.  NOTE: the first image will be black because the video starts out with a black image; simply click the Next arrow to see the images as expected. Let’s look at the code: import oracle.ord.hadoop.mapreduce.OrdFrameProcessor;import oracle.ord.hadoop.mapreduce.OrdImageWritable;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.io.Text;import java.awt.image.BufferedImage;public class VideoToImage extends OrdFrameProcessor { private Text m_frame_key_text = null; private Text m_frame_value_text = null; private OrdImageWritable m_frame_image = null; public VideoToImage(Configuration conf) { super(conf); } /** * Implement the processFrame method to process the key-value pair, an image, * in the mapper of a MapReduce job. */ @Override public void processFrame(Text key, OrdImageWritable value) { if (m_frame_key_text == null ||m_frame_value_text == null || m_frame_image == null) { m_frame_image = new OrdImageWritable(); m_frame_key_text = new Text(); m_frame_value_text = new Text(); } m_frame_key_text.set(key);//this is where we do our custom code.//In this example, do a simple identity map. Take the image and return it. BufferedImage bi = value.getImage(); m_frame_image.setImage(bi); } /** * Implement the getKey method to return the key after processing an image * in the mapper. */ @Override public Text getKey() { return m_frame_key_text; } /** * Implement the getValue method to return the value after processing an * image in the mapper. */ @Override public OrdImageWritable getValue() { return m_frame_image; }} Notice that the class extends OrdFrameProcessor.  And notice that we defined the types of the Keys and Values that are passed in and out of the class.  In this example, the input and output keys are both Text while the input and output values are both OrdImageWriteable.Also notice the implementation of the processFrame method.  This method reads the value being passed in (which is an OrdImageWriteable that can be converted to a regular java BufferedImage via OrdImageWriteable.getImage).  Then it stores the BufferedImage as an OrdImageWriteable (via OrdImageWriteable.setImage).  Essentially, this code demonstrates how to convert from BufferedImage to OrdImageWriteable and back.Beyond writing our OrdFrameProcessor, we also need to setup a configuration file that tells the framework’s map-reduce job about our custom code.  Here is what the  configuration file looks like: <?xmlversion="1.0" encoding="UTF-8"?><configuration> <!--Framework properties --><property><name>oracle.ord.hadoop.numofsplits</name><value>2</value></property><property><name>oracle.ord.hadoop.frameinterval</name><value>3</value></property><property><name>oracle.ord.hadoop.inputtype</name><value>video</value></property><property><name>oracle.ord.hadoop.outputtypes</name><value>image</value></property><property><name>oracle.ord.hadoop.ordframegrabber</name><value>oracle.ord.hadoop.decoder.OrdJcodecFrameGrabber</value></property><property><name>oracle.ord.hadoop.ordframeprocessor</name><value>VideoToImage</value></property><property><name>oracle.ord.hadoop.ordframeprocessor.k2</name><value>org.apache.hadoop.io.Text</value></property><property><name>oracle.ord.hadoop.ordframeprocessor.v2</name><value>oracle.ord.hadoop.mapreduce.OrdImageWritable</value></property></configuration> Notice that in the configuration file, we specify our custom class (VideoToImage) to use for the OrdFrameProcessor and we inform the framework of the Types to use for the output key and value.Finally, let’s look at the command that runs it (this is a command inside the run_video2image.sh script): hadoop jar ${MMA_HOME}/lib/ordhadoop-multimedia-analytics.jar -libjars ${CUR_DIR_NAME}/Video2Image/VideoToImage.class -conf ${CUR_DIR_NAME}/Video2Image/video2image.xml bdsg_mma_input bdsg_mma_output Notice that in the command line, we’ve told the framework map-reduce job where to find the configuration file, as well as what hdfs directories to use for input and output. Building and Running the OrdFaceDetectionSample Now that we have gotten a very simple example built and running, we will look again at the Face Detection sample that we ran previously.  You can find the code in the OrdFaceDetection directory (/home/oracle/src/Blogs/BDSG_Barcode/OrdFaceDetection ). Let’s first compile and run this example:$ cd /home/oracle/src/Blogs/BDSG_Barcode $ run_sample.shWhen asked type Y to view the image output.  Notice how the faces are detected in most scenes, especially when they are fully facing forward.Now, explore the java source (/home/oracle/src/Blogs/BDSG_Barcode/OrdFaceDetection/OrdFaceDetectionSample.java), configuration file (/home/oracle/src/Blogs/BDSG_Barcode/OrdFaceDetection/sample.xml), and the hadoop jar command (/home/oracle/src/Blogs/BDSG_Barcode/run_sample.sh) to see how this example was built. QR Detection (ZXing) with Big Data Spatial and Graph: Our next example will build on some of the work done in our previous blog post, located here:  https://blogs.oracle.com/datawarehousing/entry/using_spark_scala_and_oracleSpecifically, we will re-use the BarcodeProcessor.java from the previous blog.Run the run_QRImage.sh script to see it in action against a video QR.mov that we’ve provided.If we look at the java source code (/home/oracle/src/Blogs/BDSG_Barcode/QR/QRImage.java) for our OrdFrameProcessor, we will see this code in the processFrame() method showing how we are calling our BarcodeProcessor class and how we are drawing the results on the output image: BufferedImage bi = value.getImage(); String barcodeString =""; try { barcodeString = BarcodeProcessor.processImage(bi); System.out.println("Testing. "+barcodeString); } catch (Exception e) { System.out.println("Key:"+key+" exception: "+e); barcodeString = "Exception:"+e; } m_frame_value_text.set(barcodeString); int width = bi.getWidth(); int height = bi.getHeight(); BufferedImage bufferedImage = new BufferedImage(width, height, BufferedImage.TYPE_INT_RGB); Graphics2D g2d = bufferedImage.createGraphics(); g2d.drawImage(bi, 0, 0, null); g2d.setPaint(Color.blue); g2d.setFont(new Font("Serif", Font.BOLD, 36)); FontMetrics fm = g2d.getFontMetrics(); int y = fm.getHeight(); for (String line : barcodeString.split(" ---- ")) { int x = width - fm.stringWidth(line) - 5; g2d.drawString(line, x, y += fm.getHeight()); } g2d.dispose(); m_frame_image.setImage(bufferedImage);... To learn more about QR/Barcode detection and the open source ZXing library, refer to our previously mentioned blog. Optical Character Recognition (Tesseract) with Big Data Lite VM: Our next example will use the open source Tesseract library to do Optical Character Recognition (OCR).  Read about Tesseract here: https://github.com/tesseract-ocr/tesseract One of your first challenges to working with Tesseract will be installing it.  We’ve taken care of installing it for you in the setup.sh script you ran earlier, but let’s discuss what setup.sh did.  Tesseract is a C++ program.  One way to install it is by downloading the source and building it.  For purposes of this blog, we decided to look for an already compiled version of Tesseract.  Given that the Big Data Lite VM runs Oracle Linux (which uses rpms/yum like RedHat), we want to find an rpm-based version of Tesseract.  We were able to find this at the EPEL yum repository. EPEL stands for “Extra Packages for Enterprise Linux” (you can find out more about EPEL here: https://fedoraproject.org/wiki/EPEL/FAQ ).  Our setup.sh script first configures our Big Data Lite VM to know about EPEL, then it uses yum to install and download Tesseract and its dependencies.  Tesseract also needs training data to recognize images as characters, so we download some pre-built training data via EPEL as well.Now that Tesseract is installed in your Big Data Lite VM, feel free to test it out on the command line.  Simply run “tesseract -help”. As discussed above, Tesseract is a C++ open source project.  However, much of the hadoop ecosystem is more tailored towards working with java.  There are a couple of approaches to bridge Tesseract to java, such as Tess4J and the java-cpp-presets project.  This blog/demonstration shows the Tess4J approach, although we’ve used the java-cpp-presets approach in other situations and don’t judge one better than the other.  For more on Tess4J, see http://tess4j.sourceforge.net/ .  The setup.sh script downloaded the jar files needed by Tess4J and its dependencies. Using Tesseract with Big Data Spatial and Graph: Now that we’ve discussed how Tesseract and Tess4J were setup, let’s look at an example of Tesseract in a custom OrdFrameProcessor.  Run the script “run_tessImage.sh” to run against a sample video OCR.mov that we’ve provided.You should see that the Tesseract did a fairly good job at identifying the characters in the frames that were grabbed from the sample video.Let’s look at parts of the code (/home/oracle/src/Blogs/BDSG_Barcode/TessImage/TessImage.java) of our custom OrdFrameProcessor: public TessImage(Configuration conf) { super(conf); instance = Tesseract.getInstance(); instance.setDatapath("/usr/share/tesseract"); instance.setLanguage("eng"); instance.setPageSegMode(3); }… String ocrString =""; try { ocrString = instance.doOCR(bi); System.out.println("Key:"+key+" OCR:"+ocrString); } catch (Exception e) { System.out.println("Key:"+key+" exception: "+e); ocrString = "Exception:"+e; }… Key things to highlight in the code are the necessary Tesseract setup commands in the class constructor and the call to the Tesseract api (instance.doOCR) in the processFrame() method. Outputting Text Information from the framework So far, all of our OrdFrameProcessor examples have generated image (OrdImageWriteable) output.  Our final example will show how to generate text output.Run the script run_tessText.shAs seen above, this example saves its output as textual data, not images.To see how this worked, notice the changes in the xml configuration file (/home/oracle/src/Blogs/BDSG_Barcode/TessText/tess.xml) as well as the Tess.java (/home/oracle/src/Blogs/BDSG_Barcode/TessText/Tess.java) class. <property><name>oracle.ord.hadoop.outputtypes</name><value>csv</value></property><property><name>oracle.ord.hadoop.ordframeprocessor.v2</name><value>org.apache.hadoop.io.Text</value></property>public class Tess extends OrdFrameProcessor { …public Text getValue() { return m_frame_value_text; } Moving Beyond: Hopefully, this was a good start for your journey into Big Data Spatial and Graph’s Multimedia Analytics Framework.  Here are some possible future paths you could take: You could make test the above code out with your own movies.  For instance, I used my iPhone to record some videos, copied them onto my laptop, and then copied them into my Big Data Lite VM. You could get your own set of facial photos and train the Facial Detection example to work with your own videos. You could explore other possibilities for custom OrdFrameProcessor, using other features of the OpenCV or other open source libraries. NOTE: If you want to play around with the source files and make modifications, you should probably copy the BDSG_Barcode directory tree into a new directory outside of /home/oracle/src.  This is because the “Refresh Samples” utility will wipe-out the /home/oracle/src directory every time it runs.  Conclusion: This blog has shown you how work with and extend the Multimedia Analytics framework that is part of Oracle Big Data Spatial and Graph.  We have explored how the Facial Recognition example works and built new examples to do Barcode/QR detection and Optical Character Recognition.Hopefully, this has made you more comfortable with working with tools like the Oracle Big Data Lite VM, Oracle Big Data Spatial and Graph, OpenCV, ZXing, Tesseract, Tess4j, and EPEL.  Enjoy. About the Author: David Bayard is a member of the Big Data Pursuit team for Oracle North America Sales.

David Bayard, Oracle Big Data Pursuit Team April 2016 Multimedia Analytics with Oracle Big Data Spatial and Graph Facial Detection, Optical Character Recognition, and QR/Barcode Detection In this blog,...

Graph Features

From Relational Table(s) to Property Graph

Lately, I have got quite a few questions on how to convert a relational data source (tables or views) to a property graph. It is actually straightforward. In this post, I am going to demonstrate an end to end flow using a well-known table "EMP" in the "SCOTT" schema.  As usual, I am going to use Oracle Big Data Lite VM (latest version is 4.4.0 as of Mar 17th 2016) because it has the whole big data technology stack, Oracle Big Data Spatial and Graph, andOracle Database 12.1.0.2. Got everything we need in a single box. On the desktop, click "Start/Stop Services", check ORCL and NoSQL database, and hit Enter.  This will bring up Oracle Database and also Oracle NoSQL Database, if they are down. Let's first take a look at the relational data source that we want to convert into property graph. From a Linux terminal, login to the Oracle Database and describe the "EMP" table. sqlplus scott/tiger@orcl SQL*Plus: Release 12.1.0.2.0 Production on Thu Mar 17 18:24:01 2016 Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production SQL> desc emp;  Name                                      Null?    Type  ----------------------------------------- -------- ----------------------------  EMPNO                                     NOT NULL NUMBER(4)  ENAME                                              VARCHAR2(10)  JOB                                                VARCHAR2(9)  MGR                                                NUMBER(4)  HIREDATE                                           DATE  SAL                                                NUMBER(7,2)  COMM                                               NUMBER(7,2)  DEPTNO                                             NUMBER(2) SQL> -- The following shows a few rows of the table SQL> select empno, ENAME, mgr from emp;      EMPNO ENAME             MGR ---------- ---------- ----------       7369 SMITH            7902       7499 ALLEN            7698       7521 WARD             7698       7566 JONES            7839       7654 MARTIN           7698       7698 BLAKE            7839       7782 CLARK            7839       7788 SCOTT            7566       7839 KING       7844 TURNER           7698       7876 ADAMS            7788 This employee table (EMP) has well-defined columns. Now, say we want to model employees as vertices, and the "manager" relationship as edges in a property graph. Conceptually, we will have a property graph with "SMITH", "ALLEN" etc. as vertices, and edges of label "manager" that link those people together. Note that employee "KING" is probably the CEO as he is the only one that does not have a manager. First, we need to copy this xdb.jar from ORACLE_HOME/jlib into the lib/ directory of property graph installation. $ cp /u01/app/oracle/product/12.1.0.2/dbhome_1/rdbms/jlib/xdb.jar /opt/oracle/oracle-spatial-graph/property_graph/lib/ Now, let's start the groovy shell. cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy/ sh gremlin-opg-nosql.sh // // First, create an empty property graph // server = new ArrayList<String>(); server.add("bigdatalite:5000"); // Create a graph config that contains the graph name "connections" // KV store name "kvstore", edge property "weight" to be loaded into // in-memory graph, etc. cfg = GraphConfigBuilder.forPropertyGraphNosql()             \   .setName("employee").setStoreName("kvstore")               \   .setHosts(server)                                          \   .setMaxNumConnections(2).build(); // Get an instance of OraclePropertyGraph which is a key Java // class to manage property graph data opg = OraclePropertyGraph.getInstance(cfg); import oracle.jdbc.pool.*; import oracle.jdbc.*; // Make a connection to the Oracle Database // ds = new OracleDataSource(); ds.setURL("jdbc:oracle:thin:@localhost:1521:orcl"); ds.setUser("scott"); ds.setPassword("tiger"); conn = ds.getConnection(); // Now, read from EMP table, construct vertices and add them to the property graph // using Blueprints API stmtV = conn.createStatement(); rs = stmtV.executeQuery(   "select empno, ename, job, hiredate, sal, comm, deptno from EMP"); i=1; while (rs.next()) {   System.out.println("EMP " + rs.getInt(1) + " ename " + rs.getString(2));   v = opg.addVertex((long) rs.getInt(1)); // employee ID becomes vertex ID   v.setProperty("name",    rs.getString(2));   v.setProperty("job",     rs.getString(3));   v.setProperty("hiredate",rs.getDate(4));   v.setProperty("sal",     rs.getFloat(5));   v.setProperty("comm",    rs.getFloat(6));   v.setProperty("deptno",  rs.getInt(7)); } rs.close(); stmtV.close(); // // Now read the table again, construct and add edges to the property graph // using Blueprints API // Note that we skip employees that have no manager // stmtE = conn.createStatement(); rs = stmtE.executeQuery(   "select rownum, empno, mgr from EMP where mgr is not null"); i=1; while (rs.next()) {   System.out.println("EMP " + rs.getInt(2) + " mgr " + rs.getString(3));   vs = opg.getVertex((long) rs.getInt(2));   vd = opg.getVertex((long) rs.getInt(3));    e = opg.addEdge((long) rs.getInt(1), vs, vd, "manager"); } rs.close(); stmtE.close(); conn.close(); opg.commit(); // // Finally, write it out as .opv and .ope files // OraclePropertyGraphUtils.exportFlatFiles(opg, "/u02/emp.opv", "/u02/emp.ope", false); // // Let's check the output. Pay attention to the various data types // used (float, string, date, integer, etc.) // // A snippet of the vertex flat file // [oracle@bigdatalite ~]$ head -10 /u02/emp.opv 7369,comm,3,,0.0, 7369,name,1,SMITH,, 7369,job,1,CLERK,, 7369,hiredate,5,,,1980-12-17T00:00:00.000-05:00 7369,deptno,2,,20, 7369,sal,3,,800.0, 7566,comm,3,,0.0, 7566,name,1,JONES,, 7566,job,1,MANAGER,, 7566,hiredate,5,,,1981-04-02T00:00:00.000-05:00 // A snippet of the edge flat file // [oracle@bigdatalite ~]$ head -10 /u02/emp.ope 5,7654,7698,manager,%20,,,, 7,7782,7839,manager,%20,,,, 11,7900,7698,manager,%20,,,, 6,7698,7839,manager,%20,,,, 1,7369,7902,manager,%20,,,, 10,7876,7788,manager,%20,,,, 12,7902,7566,manager,%20,,,, 8,7788,7566,manager,%20,,,, 2,7499,7698,manager,%20,,,, 3,7521,7698,manager,%20,,,, That is it. We have successfully converted the famous EMP table into a property graph flat file format. BTW, this graph has already been loaded into the database. It is ready for further analysis. Acknowledgement: thanks Jay Banerjee for his input on this blog post.

Lately, I have got quite a few questions on how to convert a relational data source (tables or views) to a property graph. It is actually straightforward. In this post, I am going todemonstrate an end...

Graph Features

Intuitive Explanation of Personalized Page Rank and its Application in Recommendation

In this blog post, I am going to talk about personalized page rank, its definition and application. Let's start with some basic terms and definitions. Definition Random Walk: Given a graph, a random walk is an iterative process that starts from a random vertex, and at each step, either follows a random outgoing edge of the current vertex or jumps to a random vertex. The jump part is important because some vertices may not have any outgoing edges so a walk will terminate at those places without jumping to another vertex. In the following graph, an example random walk can start from A, follow an outgoing edge of A to B, follow an outgoing edge of B to E, follow one of E's outgoing edges to B, jump to D, follow an outgoing edge of D to F, and jumps to C. Page Rank (PR) measures stationary distribution of one specific kind of random walk that starts from a random vertex and in each iteration, with a predefined probability p, jumps to a random vertex, and with probability1-p follows a random outgoing edge of the current vertex. Page rank is usually conducted on a graph with homogeneous edges, for example, a graph with edges in the form of "A  linksTo B", "A references B", or "A likes B",  or "A endorses B", or  "A readsBlogsWrittenBy B", or "A hasImpactOn B".   Running the page rank algorithm on a graph will generate rankings (PR value) for vertices and the numeric PR values can be viewed as "importance" or "relevance" of vertices. A vertex with a high PR value is usually considered more "important" or more "influential" or having higher "relevance" than a vertex with a low PR value. Personalized Page Rank (PPR) is the same as PR other than the fact that jumps are back to one of a given set of starting vertices. In a way, the walk in PPR is biased towards (or personalized for) this set of starting vertices and is more localized compared to the random walk performed in PR. PPR for Recommendation Now that we are clear about the terms of random walk, PR, and PPR. Let's take a look at how PPR can be used for recommendations. Assume we have the following Customer-purchase-Product graph (with reverse edges). Say we are going to start a PPR from user John. Quite intuitively, the random walk in PPR will very likely touch products purchased by John, and other users who purchased those products, and also products purchases by those users, so on and so forth. In a way, this walk is able to reach users that are similar to John because they purchased the same (or similar, or related) products. In addition, the walk in PPR discovers similar/related products because they were purchased by the same (or similar, or related) users. Recommendation Workflow Using Oracle Big Data Spatial and Graph Property Graph The following Java code snippets can be executed in the Groovy environment supported by Oracle Big Data Spatial and Graph. For some basic information on Groovy, refer to a previous blog post // First, execute gremlin-opg-hbase.sh or gremlin-opg-nosql.sh // ... // Get an instance of OraclePropertyGraph which is a key Java // class to manage property graph data opg = OraclePropertyGraph.getInstance(cfg); opg.clearRepository();   // remove all vertices and edges // Add vertices for the users and products vJohn=opg.addVertex(1l); vJohn.setProperty("name","John"); vJohn.setProperty("age",10i); vMary=opg.addVertex(2l); vMary.setProperty("name","Mary"); vMary.setProperty("sex","F"); vJill=opg.addVertex(3l); vJill.setProperty("name","Jill"); vJill.setProperty("city","Boston"); vTodd=opg.addVertex(4l); vTodd.setProperty("name","Todd"); vTodd.setProperty("student",true); sdf = new java.text.SimpleDateFormat("mm/dd/yyyy"); vPhone=opg.addVertex(10l); vPhone.setProperty("type","Prod"); vPhone.setProperty("desc","iPhone5");                            vPhone.setProperty("released",sdf.parse("02/21/2012")); vKindle=opg.addVertex(11l); vKindle.setProperty("type","Prod"); vKindle.setProperty("desc","Kindle Fire"); vFitbit=opg.addVertex(12l); vFitbit.setProperty("type","Prod"); vFitbit.setProperty("desc","Fitbit Flex Wireless");                             vFitbit.setProperty("rating","****"); vPotter=opg.addVertex(13l); vPotter.setProperty("type","Prod"); vPotter.setProperty("desc","Harry Potter"); vHobbit=opg.addVertex(14l); vHobbit.setProperty("type","Prod"); vHobbit.setProperty("desc","Hobbit"); // List the vertices opg.getVertices() ==>Vertex ID 14 {type:str:Prod, desc:str:Hobbit} ==>Vertex ID 2 {name:str:Mary, sex:str:F} ==>Vertex ID 4 {name:str:Todd, student:bol:true} ==>Vertex ID 11 {type:str:Prod, desc:str:Kindle Fire} ==>Vertex ID 10 {type:str:Prod, desc:str:iPhone5, released:dat:Sat Jan 21 00:02:00 EST 2012} ==>Vertex ID 12 {type:str:Prod, desc:str:Fitbit Flex Wireless, rating:str:****} ==>Vertex ID 13 {type:str:Prod, desc:str:Harry Potter} ==>Vertex ID 1 {name:str:John, age:int:10} ==>Vertex ID 3 {name:str:Jill, city:str:Boston} // Add edges with Blueprints Java APIs. Note that if the number of // edges is much bigger, then use the parallel data loader & flat file format. opg.addEdge(1l,  vJohn, vPhone, "purchased"); opg.addEdge(2l,  vPhone, vJohn, "purchased by"); opg.addEdge(3l,  vJohn, vKindle, "purchased"); opg.addEdge(4l,  vKindle, vJohn, "purchased by"); opg.addEdge(5l,  vMary, vPhone, "purchased"); opg.addEdge(6l,  vPhone, vMary, "purchased by"); opg.addEdge(7l,  vMary, vKindle, "purchased"); opg.addEdge(8l,  vKindle, vMary, "purchased by"); opg.addEdge(9l,  vMary, vFitbit, "purchased"); opg.addEdge(10l, vFitbit, vMary, "purchased by"); opg.addEdge(11l,  vJill, vPhone, "purchased"); opg.addEdge(12l,  vPhone, vJill, "purchased by"); opg.addEdge(13l,  vJill, vKindle, "purchased"); opg.addEdge(14l,  vKindle, vJill, "purchased by"); opg.addEdge(15l,  vJill, vFitbit, "purchased"); opg.addEdge(16l, vFitbit, vJill, "purchased by"); opg.addEdge(17l, vTodd, vFitbit, "purchased"); opg.addEdge(18l, vFitbit, vTodd, "purchased by"); opg.addEdge(19l, vTodd, vPotter, "purchased"); opg.addEdge(20l, vPotter, vTodd, "purchased by"); opg.addEdge(21l, vTodd, vHobbit, "purchased"); opg.addEdge(22l, vHobbit, vTodd, "purchased by"); opg.commit(); // Create in-memory analytics session and analyst session=Pgx.createSession("session_ID_1"); analyst=session.createAnalyst(); // Read the graph from database into memory pgxGraph = session.readGraphWithProperties(opg.getConfig()); // We are going to get a recommendation for user John. // Find this vertex with a simple query v=opg.getVertices("name", "John").iterator().next(); // Add John to the start vertex set vertexSet=pgxGraph.createVertexSet(); vertexSet.addAll(v.getId()); // Run personalized page rank using John as the start vertex ppr=analyst.personalizedPagerank(pgxGraph, vertexSet, \             0.0001 /*maxError*/, 0.85 /*dampingFactor*/, 1000); // Examine the top 9 entries of the PPR output // The vertices for John, iPhone5, and Kindle Fire have the highest PR // values (shown below in the first column) because John is the starting point of PPR // and iPhone5 and Kindle Fire were purchased by John. // Mary and Jill also have high PR values because they made similar purchases as John. // it=ppr.getTopKValues(9).iterator(); while (it.hasNext()) {      entry=it.next(); vid=entry.getKey().getId();      System.out.format("ppr=%.4f  vertex=%s\n", entry.getValue(), opg.getVertex(vid));} ==> ppr=0.2496  vertex=Vertex ID 1 {name:str:John, age:int:10} ppr=0.1758  vertex=Vertex ID 11 {type:str:Prod, desc:str:Kindle Fire} ppr=0.1758  vertex=Vertex ID 10 {type:str:Prod, desc:str:iPhone5, released:dat:Sat Jan 21 00:02:00 EST 2012} ppr=0.1229  vertex=Vertex ID 3 {name:str:Jill, city:str:Boston} ppr=0.1229  vertex=Vertex ID 2 {name:str:Mary, sex:str:F} ppr=0.0824  vertex=Vertex ID 12 {type:str:Prod, desc:str:Fitbit Flex Wireless, rating:str:****} ppr=0.0451  vertex=Vertex ID 4 {name:str:Todd, student:bol:true} ppr=0.0128  vertex=Vertex ID 13 {type:str:Prod, desc:str:Harry Potter} ppr=0.0128  vertex=Vertex ID 14 {type:str:Prod, desc:str:Hobbit} // Now, let's filter out users from this list (assume we only want to recommend products for John, not other similar buyers) // If we exclude the top 2 products that John purchased before, the remaining three, Fitbit, Harry Potter, // and Hobbit, are our recommendations for John, in that order. Note that for John, Fitbit has a much higher PR value // than Harry Potter and Hobbit. // it=ppr.getTopKValues(9).iterator(); while (it.hasNext()) {      entry=it.next(); vid=entry.getKey().getId(); vertex=opg.getVertex(vid);      if ("Prod".equals(vertex.getProperty("type")))      System.out.format("ppr=%.4f  vertex=%s\n", entry.getValue(), opg.getVertex(vid));} => ppr=0.1758  vertex=Vertex ID 11 {type:str:Prod, desc:str:Kindle Fire} ppr=0.1758  vertex=Vertex ID 10 {type:str:Prod, desc:str:iPhone5, released:dat:Sat Jan 21 00:02:00 EST 2012} ppr=0.0824  vertex=Vertex ID 12 {type:str:Prod, desc:str:Fitbit Flex Wireless, rating:str:****} ppr=0.0128  vertex=Vertex ID 13 {type:str:Prod, desc:str:Harry Potter} ppr=0.0128  vertex=Vertex ID 14 {type:str:Prod, desc:str:Hobbit} // So the final recommended products for John are, Fitbit, Harry Potter, and Hobbit ppr=0.0824  vertex=Vertex ID 12 {type:str:Prod, desc:str:Fitbit Flex Wireless, rating:str:****} ppr=0.0128  vertex=Vertex ID 13 {type:str:Prod, desc:str:Harry Potter} ppr=0.0128  vertex=Vertex ID 14 {type:str:Prod, desc:str:Hobbit} In a future blog post, I am going to show an optimization that can reduce the number of edges needed in this graph. Acknowledgement: thanks Jay Banerjee for his input on this blog post, thanks to Sungpack Hong, Martin Sevenich, Korbi Schmid, Hassan Chafi for all those discussions, and thanks to Gaby Montiel, Jane Tao, and Hugo Labra for their work on the backend databases.

In this blog post, I am going to talk about personalized page rank, its definition and application. Let's start with some basic terms and definitions. Definition Random Walk: Given a graph, a random...

Graph Features

Combining Graph Traversal with Powerful Graph Analytics

Oracle Big Data Spatial and Graph has, in the property graph feature, two important components: data access layer and in-memory analyst. This first component, data access layer, allows one to store, manage, index, query, and traverse property graph data in a horizontally scalable database (Apache HBase or Oracle NoSQL Database). And the second component, in-memory analyst, offers a rich set of out-of-the-box graph analytics and graph operations. These two components together provide a solid framework for users to build graph based applications. In this blog, I am going to demonstrate how graph traversal, an important function supported by the data access layer, can be used together with graph analytics. Setup If you haven't already, download Oracle Big Data Lite Virtual Machine v4.4.0 (or newer) from the following page.http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html Retrieve the latest property graph Hands-on-Lab/Demo scripts - Login to Big Data Lite 4.4.0 VM- Click Refersh Samples icon on the desktop, follow the instructions and download the latest property graph HoL/Demo scripts. (Kudos to Marty Gubar and Nigel Bayliss who designed this very cool script that can automatically fetch latest content from Github!)- Open the following page using the Firefox browser  file:///home/oracle/src/hol/property_graph_hol_2015_Nov/property_graph_hol_2015_Nov.html Load example property graph data - Follow steps described in 2.3 to 2.4.2 (if you are using Oracle NoSQL Database), or steps in 4.10 to 4.11 (for Apache HBase) to load an example property graph. Traverse the graph with Blueprints APIs and Gremlin syntax In the built-in groovy shell, one can easily navigate the graph using either Blueprints Java APIs and/or Gremlin Syntax. A few examples as follows:// find a start vertex using Blueprints Java APIopg-nosql> v=opg.getVertex(1l);==>Vertex ID 1 {country:str:United States, name:str:Barack Obama, occupation:str:44th president of United States of America, political party:str:Democratic, religion:str:Christianity, role:str:political authority}opg-nosql> din=com.tinkerpop.blueprints.Direction.IN; dout= com.tinkerpop.blueprints.Direction.OUT;==>OUT// get in edges (using Java API)opg-nosql> v.getEdges(din);==>Edge ID 1078 from Vertex ID 2 {country:str:United States, music genre:str:pop soul , name:str:Beyonce, role:str:singer actress} =[collaborates]=> Vertex ID 1 {country:str:United States, name:str:Barack Obama, occupation:str:44th president of United States of America, political party:str:Democratic, religion:str:Christianity, role:str:political authority} edgeKV[{weight:flo:1.0}]...// get out edges (using Gremlin Syntax)opg-nosql> v.outE==>Edge ID 1000 from Vertex ID 1 {country:str:United States, name:str:Barack Obama, occupation:str:44th president of United States of America, political party:str:Democratic, religion:str:Christianity, role:str:political authority} =[collaborates]=> Vertex ID 2 {country:str:United States, music genre:str:pop soul , name:str:Beyonce, role:str:singer actress} edgeKV[{weight:flo:1.0}]...// follow "collaborates" edges and add a filter on religionopg-nosql> v.outE('collaborates').inV.filter{it.religion != 'Christianity'}==>Vertex ID 3 {country:str:United States, name:str:Charlie Rose, role:str:talk show host journalist, show:str:Charlie Rose}... Use PipeFunction to combine Gremlin traversal and In Memory analysis. The following scripts will create a session and in memory analyst, compute page rank value for the vertices, and start a simple Gremlin traversal from vertex (with ID 1) and limit visited vertices to those with page rank value above a threshold. // Create in-memory analytics session and analystsession=Pgx.createSession("session_ID_1");analyst=session.createAnalyst();// Read the graph from database into memorypgxGraph = session.readGraphWithProperties(opg.getConfig());// Execute Page Rankrank=analyst.pagerank(pgxGraph, 0.00000001, 0.85, 5000);import com.tinkerpop.gremlin.java.*;import com.tinkerpop.pipes.*;opg-nosql> pipe = new GremlinPipeline(opg.getVertex(1).out("collaborates").filter(new PipeFunction<Vertex, Boolean>() { public Boolean compute(Vertex v) { if (rank.get(v.getId()) > 0.01) return true ; return false; } }));// Traversal results shown below ==>Vertex ID 2 {country:str:United States, music genre:str:pop soul , name:str:Beyonce, role:str:singer actress}...The important part of the above traversal is that it includes a Tinkerpop PipeFunction implementation which, upon receiving a vertex from the traversal, checks the analytical result (from a parallel in-memory page rank computation) for that vertex, and uses that information to guide the traversal. Acknowledgement: thanks Jay Banerjee for his input on this blog post.

Oracle Big Data Spatial and Graph has, in the property graph feature, two important components: data access layer and in-memory analyst. This first component, data access layer, allows one to store,...

Big Data Spatial and Graph Hands On Labs and Summit ’16 Slides in the OTN!!

Guest Post By: Tracy McLane, GIS Manager at Bechtel and ViceChair of the Oracle Spatial and Graph SIG On Tuesday,the slides from the Oracle Spatial and Graph Summit at BIWA 2016 wereposted to the Oracle Technology Network (OTN), which you can now access fromthe link below, with over 25 presentations and use cases covering Spatial andGraph features in the Database and Big Data platforms. Oracle Spatial and Graph Summit at BIWA 2016  - Agenda and Presentation slides: http://www.oracle.com/technetwork/database/options/spatialandgraph/learnmore/oracle-spatial-summit-at-biwa-2016-2881713.html For those of you interested in Big Data Spatial and BigData Graph, there are at least nine Big Data presentations, plus twohands-on labs from the conference as well.  This is a hot topic in GIScommunities now, as it not only offers a way to optimally handle unstructuredand social data in a HADOOP or NoSQL database environments, but to analyze,connect and even graph your sensor, social media feeds or raster imagery BigData in new and exciting ways!  For those of you looking for some hands-onexperience, you can supplement the OTN Big Data Lite VM with the data, slides andhands-on lab materials from the Oracle Spatial and Graph Summit at BIWA 2016as well.  I attended both of these hands-on labs at the conference andhave since downloaded the thirteen (13) Big Data Lite VM files, which arearound 24.5 GB.  The additional Applying Spatial Analysis to Big Data zip filecontains the data used for the hands-on lab and requires only an additional28.2 MB of storage. Hands On Lab materials for Big Data Spatial and Big Data Graph (includes workbooks, data file downloads, and presentation slides): http://www.oracle.com/technetwork/database/options/spatialandgraph/learnmore/biwa-2016-more-session-information-2889878.html For thoseof you just getting started in graph technologies, I would recommend thefollowing books that I am finding helpful in getting me started Oracle Graphconcepts and the use of Cytoscape, an opensource graph visualization tool: Graph Analysis and Visualization: Discovering Business Opportunity in Linked Data by Richard Brath , David Jonker (Authors) http://www.amazon.com/Graph-Analysis-Visualization-Discovering-Opportunity/dp/1118845846/ref=sr_1_1?ie=UTF8&qid=1456409056&sr=8-1&keywords=Graph+Analysis Instant Cytoscape Complex Network Analysis How-to  by Gang Su (Author) http://www.amazon.com/Instant-Cytoscape-Complex-Analysis-Paperback/dp/B00ZVP7KO8/ref=sr_1_fkmr0_2?ie=UTF8&qid=1456409201&sr=8-2-fkmr0&keywords=Graph+Analysis+Cytoscape+book And for a 2-minute introduction tograph technologies, check out these videos: What is Oracle Big Data Spatial and Graph?    https://youtu.be/t9pJJhzZKOE How Can Graph Analytics Help My Business? https://youtu.be/0dJNzBi7B-k

Guest Post By: Tracy McLane, GIS Manager at Bechtel and Vice Chair of the Oracle Spatial and Graph SIG On Tuesday, the slides from the Oracle Spatial and Graph Summit at BIWA 2016 wereposted to the...

Graph Features

Some important Property Graph API Changes Introduced in v1.1

The latest Oracle Big Data Spatial and Graph Property Graph (v1.1) has made some important API changes as compared to the v1.0 release. Here are some highlights. To create Graph config for Oracle NoSQL Database      PgNosqlGraphConfig cfg = GraphConfigBuilder.forNosql()...        is replaced with      PgNosqlGraphConfig cfg = GraphConfigBuilder.forPropertyGraphNosql()... To create Graph config for Apache HBase      PgNosqlGraphConfig cfg = GraphConfigBuilder.forHbase()...          is replaced with      PgNosqlGraphConfig cfg = GraphConfigBuilder.forPropertyGraphHbase()... To create in-memory analyst config      Map<PgxConfig.Field, Object> confPgx = new HashMap<PgxConfig.Field, Object>();      ...      confPgx.put(PgxConfig.Field.SESSION_TASK_TIMEOUT_SECS, 0);  // no timeout set      confPgx.put(PgxConfig.Field.SESSION_IDLE_TIMEOUT_SECS, 0);  // no timeout set       PgxConfig.init(confPgx);        is replaced with      Map<PgxConfig.Field, Object>  confPgx = new HashMap<PgxConfig.Field, Object>();      ...      confPgx.put(PgxConfig.Field.SESSION_TASK_TIMEOUT_SECS, 0);// no timeout set      confPgx.put(PgxConfig.Field.SESSION_IDLE_TIMEOUT_SECS, 0);  // no timeout set      ServerInstance instance = Pgx.getInstance()      instance.startEngine(confPgx)  To get in-memory analyst      analyst = opg.getInMemAnalyst();         is replaced with      session = Pgx.createSession("session-id-1");  // Put your session description here.      analyst = session.createAnalyst(); To run analytical functions      analyst.countTriangles().get();        is replaced with      pgxGraph = session.readGraphWithProperties(opg.getConfig());      analyst.countTriangles(pgxGraph, false); Thanks,  Zhe Wu 

The latest Oracle Big Data Spatial and Graph Property Graph (v1.1) has made some important API changes as compared to the v1.0 release. Here are some highlights. To create Graph config for Oracle NoSQL...

Graph Features

Detecting Communities in a Social Graph

Hi, In this post, I am going to demonstrate an easy flow to detect communities in a social network. Communities are very important to a social graph because individuals of a community tend to share a set of common characteristics or exhibit one or multiple common behaviors.  The following Groovy based scripts require Oracle Big Data Spatial and Graph v1.1 which is bundled with Oracle Big Data Appliance v4.3.0. One can also get Oracle Big Data Spatial and Graph v1.1 from My Oracle Support. cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy/  sh gremlin-opg-hbase.sh First, load a small test graph "connections" into Apache HBase, // Get a graph config that has graph name "connections" and  // Zookeeper host, port, and some other parameters cfg = GraphConfigBuilder.forPropertyGraphHbase()            \  .setName("connections")                                    \  .setZkQuorum("bigdatalite").setZkClientPort(2181)          \  .setZkSessionTimeout(120000).setInitialEdgeNumRegions(3)   \  .setInitialVertexNumRegions(3).setSplitsPerRegion(1)       \  .addEdgeProperty("weight", PropertyType.DOUBLE, "1000000") \  .build(); // Get an instance of OraclePropertyGraph which is a key Java // class to manage property graph data opg = OraclePropertyGraph.getInstance(cfg); opg.clearRepository(); // OraclePropertyGraphDataLoader is a key Java class // to bulk load property graph data into the backend databases. opgdl=OraclePropertyGraphDataLoader.getInstance(); vfile="../../data/connections.opv" efile="../../data/connections.ope" opgdl.loadData(opg, vfile, efile, 2);  Next, add a tiny loop of just two vertices, vx and vy, with Blueprints Java API vx = opg.addVertex(1234l); vy = opg.addVertex(1235l); // Add an edge from vx to vy, and another from vy to vx e1=opg.addEdge(3000l, vx, vy, "likes"); e1.setProperty("weight", 1.1d); e2=opg.addEdge(3001l, vy, vx, "likes"); e2.setProperty("weight", 1.5d); opg.commit(); Get in-memory analyst // Create an in memory analytics session and analyst session=Pgx.createSession("session_ID_1"); analyst=session.createAnalyst(); // Read graph data from database into memory pgxGraph = session.readGraphWithProperties(opg.getConfig()); Run community detection algorithms // Run WCC algorithm partition = analyst.wcc(pgxGraph) partition.size() // should be 2 // Get the first community (collection of vertices) vertexCollection = partition.getPartitionByIndex(0);      // Run Label Propagation partition = analyst.communitiesLabelPropagation(pgxGraph) // How many communities do we have? partition.size()  // Get the first community by ID  vertexCollection = partition.getPartitionByIndex(0); An example output of the above command is as follows. Yours might be different: ==>PgxVertex with ID 77 ==>PgxVertex with ID 78 Now we have all the communities detected, you can run the following to look into a community that has an entity of interest. // Look into the community that has the vertex Alibaba v = opg.getVertices("name", "Alibaba").iterator().next(); vertexCollection = partition.getPartitionByVertex(pgxGraph.getVertex(v.getId())); // Get details of the 4 vertices in this community.  // "l" below indicates a Long integer opg.getVertex(69l); opg.getVertex(68l); opg.getVertex(65l); opg.getVertex(71l); :quit Cheers, Zhe Wu

Hi, In this post, I am going to demonstrate an easy flow to detect communities in a social network. Communities are very important to a social graph because individuals of a community tend to share...

Graph Features

Identifying Influencers with the Built-in Page Rank Analytics in Oracle Big Data Spatial and Graph

In a previous post , I talked about how to start Oracle NoSQL Database in the BigDataLite 4.2 VM and demonstrated a few property graph functions. In this post, I am going to use the same property graph feature provided in that VM to identify influencers in a social network. This time, Apache HBase is used as the backenddatabase. Setup and configuration First things first, if you haven't already, download Oracle Big Data Lite Virtual Machine v4.2.0 from the following page.http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.htmlIt is recommended to use Oracle VM Virtual Box 4.3.28 to import this virtual machine. Once import is done successfully, login as oracle (default password is welcome1). On the desktop, there is a "Start/Stop Services" icon - double clicking that will lead to a console popup with a list of services. Uncheck everything, hit enter, and you will see existing services being terminated with the following output on the terminal.Stopping hadoop-hdfs-namenode ...Stopping hadoop-yarn-nodemanager... Double click "Start/Stop Services" again and this time check ClouderaManager and hit Enter. This will bring up ClouderaManager. (Note that I have allocated ~12GB RAM to this VM.)Now, start the Firefox browser and open the following URL and login as admin/admin:http://bigdatalite:7180/cmfClick Add Service.  A table of "Select the type of service you want to add" will be presented. You will see in this table services like "Accumulo 1.6", "Flume", "HBase", among many others.Check HBase. Click Continue a few times till you see the following message.  "Congratulations! Your new service is installed and configured on your cluster."Click Finish and your browser will automatically go back to bigdatalite:7180:cmf/home pageGo to a Linux terminal and run the following command: $ sudo cp /opt/oracle/oracle-spatial-graph/property_graph/lib/sdopgdal-for-cdh5.2.1.jar /usr/lib/hbase/lib/Switch back to the browser page.  Click the drop down icon in the table row for HBase, select Start, click Start button and wait. After a while, the status icon to the left of HBase will become a solid green circle, as shown below. Note that in this particular setup, the default Java Heap Size of HBase (RegionServer in Bytes) is only 50MiB, which is way too small. I set it to 800MiB using the ClouderaManager. Execute property graph functions  Open a Linux terminal in this virtual machine and type in the following: $ cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy$ unset CLASSPATH$ ./gremlin-opg-hbase.sh//// Connect to the Oracle NoSQL Database in this virtual box//cfg = GraphConfigBuilder.forHbase()                                                  \                        .setName("connections")                                      \                        .setZkQuorum("bigdatalite")                                  \                        .setZkClientPort(2181)                                       \                        .setZkSessionTimeout(60000)                                  \                        .setMaxNumConnections(2)                                     \                        .setInitialNumRegions(3)                                     \                        .setSplitsPerRegion(1)                                       \                        .addEdgeProperty("lbl", PropertyType.STRING, "lbl")          \                        .addEdgeProperty("weight", PropertyType.DOUBLE, "1000000")   \                        .build();// Note: the above GraphConfigBuilder.forHbase is available // only in Big Data Spatial and Graph v1.0. // For v1.1 or newer, use updated Java APIs instead. See here for details. //// Get an instance of OraclePropertyGraph//opg = OraclePropertyGraph.getInstance(cfg);opg.clearRepository();//// Use the parallel data loader API to load a// sample property graph in flat file formatwith a// degree of parallelism (DOP) 2//vfile="../../data/connections.opv"efile="../../data/connections.ope"opgdl=OraclePropertyGraphDataLoader.getInstance();opgdl.loadData(opg, vfile, efile, 2);// read through the verticesopg.getVertices();// read through the edgesopg.getEdges();//// You can add vertices/edges, change properties etc. here.// ...////// Serialize the graph out into a pair of flat fileswith DOP=2//vOutput="/tmp/mygraph.opv"eOutput="/tmp/mygraph.ope"OraclePropertyGraphUtils.exportFlatFiles(opg, vOutput, eOutput, 2, false);//// Run the built-in analytical function, PageRank, to identify influencers//analyst = opg.getInMemAnalyst();rank = analyst.pagerank(0.0001, 0.85, 100).get();rank.getTopKValues(5);v1=opg.getVertex(1l); v2=opg.getVertex(60l); v3=opg.getVertex(42l);    \System.out.println("Top 3 influencers: \n " + v1.getProperty("name") + \                     "\n " + v2.getProperty("name") +                  \                     "\n " + v3.getProperty("name"));The last output of the script above should be something as follows:Top 3 influencers:  Barack Obama Nicolas Maduro NBCIt is really simple, isn't it?  Now, are you interested in finding out communities in this graph?

In a previous post , I talked about how to start Oracle NoSQL Database in the BigDataLite 4.2 VM and demonstrated a few property graph functions. In this post,I am going to use the same property...

Graph Features

Property Graph in a Box

I have very good news for those of you who want to play with Oracle Big Data Spatial and Graph 1.0. The most recent Oracle Big Data Lite Virtual Machine (v4.2.0) provides a very convenient environment to help one get started quickly with the Oracle Big Data platform. It comes with the property graph feature installed.To start, download Oracle Big Data Lite Virtual Machine v4.2.0 from the following page.http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.htmlIt is recommended to use Oracle VM Virtual Box 4.3.28 to import this virtual machine. Once import is done successfully, login as oracle (default password is welcome1). On the desktop, there is a "Start/Stop Services" icon - double clicking that will lead to a console popup with a list of services. Check Oracle NoSQL Database, hit enter, and the built-in Oracle NoSQL Database will start automatically.If you need to shutdown the Oracle NoSQL Database, just repeat this process. Next, I am going to show you a simple Groovy based script that loads a sample property graph representing a small social network, reads out vertices and edges, writes it out, and finally counts the number of triangles in this network.Open a Linux terminal in this virtual machine and type in the following:$ cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy$ unset CLASSPATH$ ./gremlin-opg-nosql.sh //// Connect to the Oracle NoSQL Database in this virtual box//server = new ArrayList<String>();server.add("bigdatalite:5000");cfg = new PgNosqlGraphConfigBuilder()                          \    .setDbEngine(DbEngine.NOSQL)                               \    .setName("connections")                                    \    .setStoreName("kvstore")                                   \    .setHosts(server)                                          \    .addEdgeProperty("lbl", PropertyType.STRING, "lbl")        \    .addEdgeProperty("weight", PropertyType.DOUBLE, "1000000") \    .setMaxNumConnections(1).build(); // Note: the above PgNosqlGraphConfigBuilder.setDbEngine is available // only in Big Data Spatial and Graph v1.0. // For v1.1 or newer, use updated Java APIs instead. See here for details. //// Get an instance of OraclePropertyGraph//opg = OraclePropertyGraph.getInstance(cfg);opg.clearRepository();//// Use the parallel data loader API to load a// sample property graph in flat file formatwith a// degree of parallelism (DOP) 2//vfile="../../data/connections.opv"efile="../../data/connections.ope"opgdl=OraclePropertyGraphDataLoader.getInstance();opgdl.loadData(opg, vfile, efile, 2);// read through the verticesopg.getVertices();// read through the edgesopg.getEdges();//// You can add vertices/edges, change properties etc. here.// ...////// Serialize the graph out into a pair of flat files with DOP=2//vOutput="/tmp/mygraph.opv"eOutput="/tmp/mygraph.ope"OraclePropertyGraphUtils.exportFlatFiles(opg, vOutput, eOutput, 2, false);//// Run a built-in analytical function//analyst = opg.getInMemAnalyst();analyst.countTriangles(false).get();The last output of the script above should be "22", meaning that there are 22 triangles in this small network.Note that the same virtual box has Apache HBase installed as well as Oracle NoSQL Database.Once Apache HBase is configured and started, the same script (except the DB connection initialization part) can be used without a change.Now, do you know how to find out influencers in this graph?Stay tuned.

I have very good news for those of you who want to play with Oracle Big Data Spatial and Graph 1.0. The most recent Oracle Big Data Lite Virtual Machine (v4.2.0) provides a very convenient environment...

Spatial Features

Oracle Big Data Lite VM 4.2 - Installing the Image Processing Framework for Big Data Spatial and Graph

Cross-posting from Marty Gubar's entries at the Data Warehouse Insider blog.  Thanks Marty for the info! Oracle Big Data Lite Virtual Machine 4.2 is now available on OTN.  For those of you that are new to the VM - it is a great way to get started with Oracle's big data platform, with many platform components preinstalled and configured. One of the cool new features is Big Data Spatial and Graph.  In order to use this new feature, there is one more configuration step required.  Normally, we include everything you need in the VM - but this is a component that we couldn't distribute. For the Big Data Spatial Image Processing Framework, you will need to install and configure Proj.4 - Cartographic Projections Library.  Simply follow these steps:  Start the Big Data Lite VM and log in as user "oracle" Launch firefox and download this tarball (​http://download.osgeo.org/proj/proj-4.9.1.tar.gz) to ~/Downloads Run the following commands at the linux prompt: cd ~/Downloads tar -xvf proj-4.9.1.tar.gz cd proj-4.9.1 ./configure make sudo make install This will create the libproj.so file in directory /usr/local/lib/.  Now that the file has been created, create links to it in the appropriate directories.  At the linux prompt: sudo ln -s /usr/local/lib/libproj.so /u02/oracle-spatial-graph/shareddir/spatial/demo/imageserver/native/libproj.so sudo ln -s /usr/local/lib/libproj.so /usr/lib/hadoop/lib/native/libproj.so That's all there is to it.  Big Data Lite is now ready for Oracle Big Data Spatial and Graph!

Cross-posting from Marty Gubar's entries at the Data Warehouse Insider blog.  Thanks Marty for the info! Oracle Big Data Lite Virtual Machine 4.2 is now available on OTN.  For those of you that are...

Announcing Oracle Big Data Spatial and Graph

We have just shipped a new big data product.  Oracle Big Data Spatial and Graph offers a set of analytic services and data models that support Big Data workloads on Apache Hadoop and NoSQL database technologies.  For over a decade, Oracle has offered leading spatial and graph analytic technology for the Oracle Database.  Oracle is now applying this expertise to work with social network data and to exploit Big Data architectures.  This post provides product feature highlights.  You can get more detail at the OTN website here. Oracle Big Data Spatial and Graph includes two main components: A distributed property graph database with 35 built-in graph analytics to discover graph patterns in big data, such as communities and influencers within a social graph generate recommendations based on interests, profiles, and past behaviors A wide range of spatial analysis functions and services to evaluate data based on how near or far something is to one another, or whether something falls within a boundary or region process and visualize geospatial map data and imagery Our objective is to provide the spatial and graph capabilities that are best suited to the use cases, data sets, and workloads found in big data environments.  Oracle Big Data Spatial and Graph can be deployed on Oracle Big Data Appliance, as well as other supported Hadoop and NoSQL systems on commodity hardware.  Property Graph Data Management and AnalysisThe property graph feature of Oracle Big Data Spatial and Graph facilitates big data discovery and dynamic schema evolution with real-world modeling and proven in-memory parallel analytics. Property graphs are commonly used to model and analyze relationships, such as communities, influencers and recommendations, and other patterns found in social networks, cyber security, utilities and telecommunications, life sciences and clinical data, and knowledge networks.   Property graphs model the real-world as networks of linked data comprising vertices (entities), edges (relationships), and properties (attributes) for both. Property graphs are flexible and easy to evolve; metadata is stored as part of the graph and new relationships are added by simply adding a edge. Graphs support sparse data; properties can be added to a vertex or edge but need not be applied to all similar vertices and edges.  Standard property graph analysis enables discovery with analytics that include ranking, centrality, recommender, community detection, and path finding. Oracle Big Data Spatial and Graph provides an industry leading property graph capability on Apache HBase and Oracle NoSQL Database with a Groovy-based console; parallel bulk load from common graph file formats; text indexing and search; querying graphs in database and in memory; ease of development with open source Java APIs and popular scripting languages; and an in-memory, parallel, multi-user, graph analytics engine with 35 standard graph analytics. Spatial Analysis and Services – Enrich and Categorize Your Big Data with Location With the spatial capabilities, users can take data with any location information, enrich it, and use it to harmonize their data.  For example, Big Data Spatial and Graph can look at datasets like Twitter feeds that include a zip code or street address, and add or update city, state, and country information.  It can also filter or group results based on spatial relationships:  for example, filtering customer data from logfiles based on how near one customer is to another, or finding how many customers are in each sales territory.  These results can be visualized on a map with the included HTML5-based web mapping tool.  Location can be used as a universal key across disparate data commonly found in Hadoop-based analytic solutions.  Also, users can perform large-scale operations for data cleansing, preparation, and processing of imagery, sensor data, and raw input data with the raster services.  Users can load raster data on HDFS using dozens of supported file formats, perform analysis such as mosaic and subset, write and carry out other analysis operations, visualize data, and manage workflows.  Hadoop environments are ideally suited to storing and processing these high data volumes quickly, in parallel across MapReduce nodes.  An Enterprise-Class Big Data Platform for Spatial and Graph Data ProcessingOracle has taken nearly two decades of experience working with spatial and graph technologies to deliver a new Big Data platform designed to handle the most demanding workloads. With the introduction of Oracle Big Data Spatial and Graph, developers and data scientists can manage their most challenging graph, spatial, and raster data processing in a single enterprise-class Big Data platform.  As observed by the partner community, “Big Data systems are increasingly being used to process large volumes of data from a wide variety of sources. With the introduction of Oracle Big Data Spatial and Graph, Hadoop users will be able to enrich data based on location and use this to harmonize data for further correlation, categorization and analysis. For traditional geospatial workloads, it will provide value-added spatial processing and allow us to support customers with large vector and raster data sets on Hadoop systems.”--Steve Pierce, CEO, Think Huddle“Oracle Spatial and Graph is already a very capable technology. With the explosion of Hadoop environments, the need to spatially-enable workloads has never been greater, and Oracle could not have introduced Oracle Big Data Spatial and Graph at a better time. This exciting new technology will provide value-add to spatial processing and handle very large raster workloads in a Hadoop environment. We look forward to exploring how it helps address the most challenging data processing requirements.”   - Keith Bingham, Chief Architect and Technologist, Ball Aerospace Oracle Big Data Spatial and Graph represents a new opportunity for the Big Data platform.  By offering out of the box spatial enrichment services and analysis functions, as well as dozens of the most popular graph analysis functions, analysts and developers can now apply commercial-grade algorithms to their Big Data workloads. Learn more about Oracle Big Data Spatial and Graph at the OTN product website: http://www.oracle.com/technetwork/database/database-technologies/bigdata-spatialandgraph/overview/index.html Read the Data Sheet http://download.oracle.com/otndocs/products/bigdata-spatialandgraph/bdsg-data-sheet.pdf Read the Spatial Feature Overview http://download.oracle.com/otndocs/products/bigdata-spatialandgraph/bdsg-spatial-feature-overview.pdf

We have just shipped a new big data product.  Oracle Big Data Spatial and Graph offers a set of analytic services and data models that support Big Data workloads on Apache Hadoop and NoSQL database...