X

Oracle Big Data Spatial and Graph - technical tips, best practices, and news from the product team

Using Geospatial Data and Applying GeoSpatial Search with BDSG Property Graph

Alan Wu
Architect

I’d like to show how to store geospatial data and apply geospatial search with Oracle Big Data Spatial and Graph (BDSG) Property Graph.  Filtering graph entities with geospatial search is a powerful way to enhance  social network analysis, recommendation systems, and fraud analysis workflows.  Apache Solr is a well-known search engine supporting geospatial index and search.  In this example, we’ll use Apache Solr to generate a text index on a sample social graph (vertices and edges)  which has geospatial data (lat/long) associated with vertices.  Using the built-in Groovy shell environment, we’ll invoke the data access layer Java APIs and apply Solr’s built-in geospatial functions to filter the graph results with a spatial “window of interest”.  The resulting entities and their locations can be used for further analysis or visualization on a map.

As usual, let's use the well-known Big Data Lite VM. After login, click on
"Start/Stop Services" icon on the desktop, make sure Zookeeper,
HDFS, HBase, NoSQL and Solr are checked, and hit Enter.



Open the following file in an editor,
/opt/oracle/oracle-spatial-graph/property_graph/dal/opg-solr-config/schema.xml


Go to line 138, add the following, save and quit the editor.
Basically this configuration change declares values of any property
with a name suffix "location_str" will be treated as location data
type.
   <dynamicField name="*location_str"  type="location"
indexed="true" stored="true"/>



Run the following command line in a Linux terminal to load the solr
config.

 /usr/lib/solr/bin/zkcli.sh -zkhost bigdatalite:2181/solr
-cmd upconfig -confdir
/opt/oracle/oracle-spatial-graph/property_graph/dal/opg-solr-config/


That is it! We are done with configuration.


Don't believe me? Let's give it a try. First thing, we need to find
a piece of property graph with GeoSpatial data. Unfortunately, we
don't have one readily available in the VM. Don't worry, we can improvise. For the connections graph, under the
directory of /opt/oracle/oracle-spatial-graph/property_graph/data/,
we have a vertex file which has information about some interesting
people and companies. Let's add some Spatial data, some random ones
to save time.

$ cat connections.opv |cut -d ',' -f 1 | awk '{print $1}'
|sort |uniq |

awk '{print $1 ",location,1,"
(37.52914+rand() -0.5) "%2C" (-122.2669+rand() -0.5) ",," }'  

> /tmp/addon_spatial.opv

$ cat /dev/null >/tmp/addon_spatial.ope



Now, what exactly is going on? The above script pulls out all the
unique vertex IDs, adds to each vertex a new property "location",
and assigns randomly generated coordinates centered around 400
Oracle Pkwy. Since this example only deals with Spatial information
on vertices, we will use an empty .ope (edge) file. If you want, you can
invoke a geo coding service and assign real coordinates. The idea
stays the same though.


Let's see a few examples:

$ head -3 /tmp/addon_spatial.opv
1,location,1,37.2669%2C-122.476,,
10,location,1,37.875%2C-122.615,,
11,location,1,37.6147%2C-122.573,,


In this case,  the following coordinate (lat/long) is assigned to
vertex 1. Yous might be different depending on the random values
generated.
37.2669,-122.476


That "%2C" is the encoded form of a comma. Such an encoding is
required because Oracle-defined flat files use commas to delimit.


The show is ready. Now, we need to load the graph data up into the
database and create a Solr-based index.

$ cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy/
$ sh ./gremlin-opg-hbase.sh

cfg = GraphConfigBuilder.forPropertyGraphHbase()           
\

 .setName("connectionsHBase")                              
\

 .setZkQuorum("bigdatalite").setZkClientPort(2181)         
\

 .setZkSessionTimeout(120000).setInitialEdgeNumRegions(3)  
\

 .setInitialVertexNumRegions(3).setSplitsPerRegion(1)      
\

 .addEdgeProperty("weight", PropertyType.DOUBLE, "1000000")
\

 .build();

opg = OraclePropertyGraph.getInstance(cfg);
opg.clearRepository();

opgdl=OraclePropertyGraphDataLoader.getInstance();
vfile="../../data/connections.opv"
efile="../../data/connections.ope"
opgdl.loadData(opg, vfile, efile, 2);

vfile="/tmp/addon_spatial.opv"
efile="/tmp/addon_spatial.ope"
opgdl.loadData(opg, vfile, efile, 2);

szSolrServerUrl = "bigdatalite:2181/solr"
szNodeSet = "bigdatalite.localdomain:8983_solr"
indexParams =
OracleIndexParameters.buildSolr("opgconfiglower",szSolrServerUrl,
szNodeSet, 15 /*ZKTimeout*/, 1 /*nShards*/, 1/*nRepF*/,
1/*shardsPerNode*/, 1/*numConnections*/,
10000/*batchSize*/,500000/*commitSize*/, 20/*writeTimeout*/);

opg.setDefaultIndexParameters(indexParams);
opg.createKeyIndex("name", Vertex.class);
opg.createKeyIndex("role", Vertex.class);
opg.createKeyIndex("country", Vertex.class);
opg.createKeyIndex("religion", Vertex.class);
opg.createKeyIndex("occupation", Vertex.class);
opg.createKeyIndex("location", Vertex.class);


The above Groovy-based script opens a connection to Apache HBase,
loads the original connections graphs, loads into the same graph
additional GeoSpatial data, and finally creates a Solr index.

import oracle.pg.text.solr.*;
import org.apache.solr.client.solrj.*;
index = (SolrIndex<Vertex>)
opg.getAutoIndex(Vertex.class);




// To find vertices located within 16KM to a given coordinate
37.529147,-122.26693

// and return them sorted based on the distance

//
query = new SolrQuery("name_str:*")
       .addFilterQuery("{!geofilt sfield=location_str
pt=37.529147,-122.26693 d=16}")

.setSort("geodist(location_str,37.529147,-122.26693)
", SolrQuery.ORDER.asc);

index.get(query);


 An example output is as follows:

==>Vertex ID 19 {country:str:United States,
occupation:str:junior United States Senator from New York,
role:str:political authority, name:str:Kirsten Gillibrand, location:str:37.5321,-122.2,
religion:str:Methodism, political
party:str:Democratic}

==>Vertex ID 5 {country:str:Italy, occupation:str:pope,
role:str:Catholic religion authority, name:str:Pope Francis,
location:str:37.6152,-122.168,
religion:str:Catholicism}

==>Vertex ID 14 {role:str:business magnate,
name:str:Aliko Dangote,
location:str:37.3961,-122.275,
religion:str:Islam, company:str:Dangote Group}

==>Vertex ID 77 {country:str:United States,
occupation:str:CEO of Nest, name:str:Tony Fadell,
location:str:37.6668,-122.291}
==>Vertex ID 17 {country:str:United States,
role:str:actress, name:str:Robin Wright,
location:str:37.6609,-122.196}


Now, if we reduce the radius from 16KM to 15 KM, only a smaller
subset of vertices is returned.

query = new SolrQuery("name_str:*")
       .addFilterQuery("{!geofilt sfield=location_str
pt=37.529147,-122.26693 d=15}")

.setSort("geodist(location_str,37.529147,-122.26693)
", SolrQuery.ORDER.asc);

index.get(query);

==>Vertex ID 19 {country:str:United States,
occupation:str:junior United States Senator from New York,
role:str:political authority, name:str:Kirsten Gillibrand,
location:str:37.5321,-122.2,
religion:str:Methodism, political
party:str:Democratic}

==>Vertex ID 5 {country:str:Italy, occupation:str:pope,
role:str:Catholic religion authority, name:str:Pope Francis,
location:str:37.6152,-122.168,
religion:str:Catholicism}

==>Vertex ID 14 {role:str:business magnate,
name:str:Aliko Dangote,
location:str:37.3961,-122.275,
religion:str:Islam, company:str:Dangote Group}



For details on geofilt, geodist functions mentioned above, please
refer to Apache Solr Reference Guide [1]

The following screenshot shows those randomly generated coordinates using Oracle MapViewer. An interactive map can be found here. Note that you may need to click on "Show all content" button when viewing it in IE.

Spatial Data in Property Graph

Have fun adding GeoSpatial data and query to property graph!


[1]
https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.10.pdf

Acknowledgement: thanks Xavier Lopez for his input on this blog post.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.