Thursday Jul 02, 2015

Using Oracle Big Data Spatial and Graph

Wondering how to get started with graph analyses?  The latest Oracle Big Data Lite VM includes Oracle's new spatial and graph toolkit for big data.  Check out these two blog posts that describe how to find interesting relationships in data:

 Pretty cool :)


Saturday Jun 20, 2015

Oracle Big Data Spatial and Graph - Installing the Image Processing Framework

Oracle Big Data Lite 4.2 was just released - and one of the cool new features is Oracle Spatial and Graph.  In order to use this new feature, there is one more configuration step required.  Normally, we include everything you need in the VM - but this is a component that we couldn't distribute.

For the Big Data Spatial Image Processing Framework, you will need to install and configure Proj.4 - Cartographic Projections Library.  Simply follow these steps: 

  • Start the Big Data Lite VM and log in as user "oracle"
  • Launch firefox and download this tarball (​ to ~/Downloads
  • Run the following commands at the linux prompt:
    • cd ~/Downloads
    • tar -xvf proj-4.9.1.tar.gz
    • cd proj-4.9.1
    • ./configure
    • make
    • sudo make install

This will create the file in directory /usr/local/lib/.  Now that the file has been created, create links to it in the appropriate directories.  At the linux prompt:

  • sudo ln -s /usr/local/lib/ /u02/oracle-spatial-graph/shareddir/spatial/demo/imageserver/native/
  • sudo ln -s /usr/local/lib/ /usr/lib/hadoop/lib/native/

That's all there is to it.  Big Data Lite is now ready for Orace Big Data Spatial and Graph!

Oracle Big Data Lite 4.2 Now Available!

Oracle Big Data Lite Virtual Machine 4.2 is now available on OTN.  For those of you that are new to the VM - it is a great way to get started with Oracle's big data platform.  It has a ton of products installed and configured - including: 

  • Oracle Enterprise Linux 6.6
  • Oracle Database 12c Release 1 Enterprise Edition ( - including Oracle Big Data SQL-enabled external tables, Oracle Multitenant, Oracle Advanced Analytics, Oracle OLAP, Oracle Partitioning, Oracle Spatial and Graph, and more.
  • Cloudera Distribution including Apache Hadoop (CDH5.4.0)
  • Cloudera Manager (5.4.0)
  • Oracle Big Data Connectors 4.2
    • Oracle SQL Connector for HDFS 3.3.0
    • Oracle Loader for Hadoop 3.4.0
    • Oracle Data Integrator 12c
    • Oracle R Advanced Analytics for Hadoop 2.5.0
    • Oracle XQuery for Hadoop 4.2.0
  • Oracle NoSQL Database Enterprise Edition 12cR1 (3.3.4)
  • Oracle Big Data Spatial and Graph 1.0
  • Oracle JDeveloper 12c (12.1.3)
  • Oracle SQL Developer and Data Modeler 4.1
  • Oracle Data Integrator 12cR1 (12.1.3)
  • Oracle GoldenGate 12c
  • Oracle R Distribution 3.1.1
  • Oracle Perfect Balance 2.4.0
  • Oracle CopyToBDA 2.0

Check out our new product - Oracle Big Data Spatial and Graph (and don't forget to read the blog post on a small config update you'll need to make to use it).  It's a great way to find relationships in data and query and visualize geographic data.  Speaking of analysis... Oracle R Advanced Analytics for Hadoop now leverages Spark for many of its algorithms for (way) faster processing.

 But, that's just a couple of features... download the VM and check it out for yourself :). 

Friday May 15, 2015

Big Data Spatial and Graph is now released!

Cross-posting this from the announcement of the new spatial and graph capabilities. You can get more detail on OTN.

The product objective is to provide spatial and graph capabilities that are best suited to the use cases, data sets, and workloads found in big data environments.  Oracle Big Data Spatial and Graph can be deployed on Oracle Big Data Appliance, as well as other supported Hadoop and NoSQL systems on commodity hardware. 

Here are some feature highlights.   

Oracle Big Data Spatial and Graph includes two main components:

  • A distributed property graph database with 35 built-in graph analytics to
    • discover graph patterns in big data, such as communities and influencers within a social graph
    • generate recommendations based on interests, profiles, and past behaviors
  • A wide range of spatial analysis functions and services to
    • evaluate data based on how near or far something is to one another, or whether something falls within a boundary or region
    • process and visualize geospatial map data and imagery

Property Graph Data Management and Analysis

The property graph feature of Oracle Big Data Spatial and Graph facilitates big data discovery and dynamic schema evolution with real-world modeling and proven in-memory parallel analytics. Property graphs are commonly used to model and analyze relationships, such as communities, influencers and recommendations, and other patterns found in social networks, cyber security, utilities and telecommunications, life sciences and clinical data, and knowledge networks.  

Property graphs model the real-world as networks of linked data comprising vertices (entities), edges (relationships), and properties (attributes) for both. Property graphs are flexible and easy to evolve; metadata is stored as part of the graph and new relationships are added by simply adding a edge. Graphs support sparse data; properties can be added to a vertex or edge but need not be applied to all similar vertices and edges.  Standard property graph analysis enables discovery with analytics that include ranking, centrality, recommender, community detection, and path finding.

Oracle Big Data Spatial and Graph provides an industry leading property graph capability on Apache HBase and Oracle NoSQL Database with a Groovy-based console; parallel bulk load from common graph file formats; text indexing and search; querying graphs in database and in memory; ease of development with open source Java APIs and popular scripting languages; and an in-memory, parallel, multi-user, graph analytics engine with 35 standard graph analytics.

Spatial Analysis and Services Enrich and Categorize Your Big Data with Location

With the spatial capabilities, users can take data with any location information, enrich it, and use it to harmonize their data.  For example, Big Data Spatial and Graph can look at datasets like Twitter feeds that include a zip code or street address, and add or update city, state, and country information.  It can also filter or group results based on spatial relationships:  for example, filtering customer data from logfiles based on how near one customer is to another, or finding how many customers are in each sales territory.  These results can be visualized on a map with the included HTML5-based web mapping tool.  Location can be used as a universal key across disparate data commonly found in Hadoop-based analytic solutions. 

Also, users can perform large-scale operations for data cleansing, preparation, and processing of imagery, sensor data, and raw input data with the raster services.  Users can load raster data on HDFS using dozens of supported file formats, perform analysis such as mosaic and subset, write and carry out other analysis operations, visualize data, and manage workflows.  Hadoop environments are ideally suited to storing and processing these high data volumes quickly, in parallel across MapReduce nodes.  

Learn more about Oracle Big Data Spatial and Graph at the OTN product website:

Read the Data Sheet

Read the Spatial Feature Overview

Sunday Nov 04, 2012

Blueprints for Oracle NoSQL Database

I think that some of the most interesting analytic problems are graph problems.  I'm always interested in new ways to store and access graphs.  As such, I really like the work being done by Tinkerpop to create Open Source Software to make property graphs more accessible over a wide variety of datastores.  Since key-value stores like Oracle NoSQL Database are well-suited to storing property graphs, I decided to extend the Blueprints API to work with it.  Below I'll discuss some of the implementation details, but you can check out the finished product here:

 What's in a Property Graph? 

In the most general sense, a graph is just a collection of vertices and edges.  Vertices and edges can have properties: weights, names, or any number of other traits.  In an undirected graph, edges connect vertices without direction.  A directed graph specifies that all edges have a head and a tail --- a direction.  A multi-graph allows multiple edges to connect two vertices.  A "property graph" encompasses all of these traits.

Key-Value Stores for Property Graphs

Key-Value stores like Oracle NoSQL Database tend to be ideal for implementing property graphs.  First, if any vertex or edge can have any number of traits, we can treat it as a hash map.  For example:

Vertex["name"] = "Mary"

Vertex["age"] = 28

Vertex["ID"] = 12345

 and so on.  This is a natural key-value relationship: the key "name" maps to the value "Mary."  Moreover if we maintain two hash maps, one for vertex objects and one for edge objects, we've essentially captured the graph.  As such, any scalable key-value store is fertile ground for planting graphs.

Oracle NoSQL Database as a Scalable Graph Database

While Oracle NoSQL Database offers useful features like tunable consistency, what lends it to storing property graphs is the storage guarantees around its key structure.  Keys in Oracle NoSQL Database are divided into two parts: a major key and a minor key.  The storage guarantee is simple.  Major keys will be distributed across storage nodes, which could encompass a large number of servers.  However, all minor keys which are children of a given major key are guaranteed to be stored on the same storage node.  For example, the vertices:




May be stored on different servers, but




will always be on the same server.  This means that we can structure our graph database such that retrieving all the properties for a vertex or edge requires I/O from only a single storage node.  Moreover, Oracle NoSQL Database provides a storeIterator which allows us to store a huge number of vertices and edges in a scalable fashion.  By storing the vertices and edges as major keys, we guarantee that they are distributed evenly across all storage nodes.  At the same time we can use a partial major key to iterate over all the vertices or edges (e.g. we search over /Personnel/Vertex to iterate over all vertices).

Fork It!

The Blueprints API and Oracle NoSQL Database present a great way to get started using a scalable key-value database to store and access graph data.  However, a graph store isn't useful without a good graph to work on.  I encourage you to fork or pull the repository, store some data, and try using Gremlin or any other language to explore.

[Read More]

The data warehouse insider is written by the Oracle product management team and sheds lights on all thing data warehousing and big data.


« November 2015