We had a number of good questions from the
audience during our July 19 Directions webcast, Get the most from your
data, developers and data scientists using Big Data spatial analysis on Hadoop
and NoSQL. Below are more answers to the questions, including those we didn’t have
time for during the event.
Q: Is a
recorded version of this webinar available for a review on a later date?
you can view the recording on demand here:
can I find product documentation, data sheets, and software downloads?
Oracle Big Data Spatial and Graph product site on OTN has all these materials –
visit the site here:
The Big Data Spatial and Graph blog also has
examples and tips:
type of clustering algorithm do you use? and why did you choose this algorithm?
A: We do the K-means based clustering as it
is one of the popular clustering algorithms. In future, we are looking at an
extensible framework so that users can plug into their own clustering
algorithms into our framework.
the difference between the spatial & graph available in Oracle Database vs
A: Oracle Spatial and Graph option for Oracle
Database is mainly developed for these following classes of applications: applications that need transactional support,
applications that need a SQL interface, applications that need to interact with
other business data stored in relational databases, etc.
It is a more complete platform for Spatial,
Graph and large-scale GIS analytical and operational applications. In addition to native 2D and 3D vector, point
cloud, and raster datatype support, it includes data models for Network
analysis, Topology data. It also
includes OGC web services, linear referencing, a geocoding engine and a routing
engine. Because it is tightly integrated
with Oracle Database, it inherits the security and support for a wide range of
database analytic, performance, and manageability features.
Oracle Big Data Spatial and Graph was developed
to support the following classes of applications: applications that are batch
processing oriented, applications that need to process large amounts of
unstructured data to filter and aggregate based on spatial attributes, raster
data processing applications that need to do quick and simple filtering type of
processing on the data, etc.
Both technologies are developed to address
different types of applications, but there may be some areas where either
technology will be suitable. Another factor to consider is the technical
expertise required to develop applications on these platforms. With Database
technology, standard SQL and simple Java programming skills are required. For
Big Data platforms, complex programming in Java may be required.
do you recommend for storing the spatial data (images) - database or filesystem
A: See above for some considerations. Storage of raster imagery in the Database,
through the native GeoRaster format that is a part of Oracle Spatial and Graph,
or on files, are both possible alternatives. This can be very application dependent.
Q: Is the
functionality you are showing actual software?
A: Yes, this is commercially supported software,
and can be deployed on any CDH or HDP Hadoop cluster. The software demonstrated was Oracle Big Data
Spatial and Graph stand alone for the Raster demo and in conjunction with the
Oracle Big Data Discovery product for the analytic demo.
Q: Is there
any way to get a trial version to get a feel for the capabilities?
A: Yes. Oracle Big Data Lite Virtual Machine provides a free sandbox
environment that you can download to
try out Big Data Spatial and Graph, and other Oracle software components, for
You can download Big Data Lite here:
There’s also a spatial and graph Hands On Lab
with exercises and sample data sets that you can try, available here:
it possible to use the applications in an educational setting, so that students
can see how this works?
you may download Big Data Lite at no cost, and hands on lab materials above as
well, and try out the software and demos. It’s a great resource, and we encourage folks to try it out.
Additionally, Oracle Academy allows educational
institutions to use our software for teaching and includes a variety of
additional benefits - such as access to training material and the like.
We invite you to join the program. Please visit https://academy.oracle.com for more
to store spatial data like shapefiles, satellite imagery, maps in Big data and
process accordingly on our algorithms?
Vector data, it is as simple as copying the data to HDFS using HDFS file copy
functions. For Raster data, some data processing is required to split the data
into HDFS blocks. See
our technical white paper here for more details. We also
have a GDAL based loader that does this data loading for Raster data.
or will it support GIS Web Application APIs such as Leaflet, Google maps API, ESRI JS API, etc.?
A: It is possible for the developer to expose
the vector and raster processing APIs as micro web services and
programmatically integrate these services into GIS Web Application APIs. In addition, those GIS Web Application APIs
can access the results of processing as with any other data in Hadoop.
is it possible to mix MapReduce oriented spatial functions developed by your
team and internal GIS development?
A: The APIs we have are like any other Java
API, so they can be combined with any other functionality developed in house. At
the top level MapReduce job, these APIs can be combined to process the data.
do you support on the fly map projection and in what accuracy?
A: We use the GDAL PROJ4 driver for this, so
the accuracy is based on the PROJ4 libraries.
are analytic processes documented so that end users have understanding and
confidence in derived data products?
A: These are documented as Java APIs along
with the documentation in White Papers and User Guides.
you talked to State DOTs about using this for crashes and location research?
A: We have number of initiatives with different
customers. Several departments of
transportation are users of our Database Spatial and Graph features for
highway/roadway management, crash analysis, and more.
is one customer DOT example using
spatial technologies for geolocation analysis of crashes: http://download.oracle.com/otndocs/products/spatial/pdf/osuc2013_presentations/osuc13_autoeventgeoloc_dildine.pdf
Siva was speaking about loading data, by picking layers. Where is this data
A: This data can be from open source or some
other commercial source. This data has to be loaded into HDFS as GeoJSON layers
and our APIs can use them.
it work as a micro-service we call on a regular basis based on a pre-defined
A: Yes any of these APIs can be published as
micro-services. One can define services that just invoke the underlying Java
"binning" very good - is there such a thing as "linear
binning"? Interested in road
network incident mapping / exploring for data not (yet) linear referenced.
binning is not part of the prebuilt functionality. For extensive support of linear
referencing and road networks you may want to look into using the Spatial and
Graph option in the Oracle Database (more
have never come across a custom made Oracle software that can handle spatial
data processing especially raster image processing and building a database for
spatial data. Can you point me to one
apart from the one that is being demonstrated at the moment? How is it different from other image
processing software products?
A: As part of Oracle Spatial and Graph option
in the Oracle Database, we have included a GeoRaster feature that can do raster
processing. It was primarily meant for
raster data management and not necessarily for raster image processing. Since
12c Release 1 we started adding more image processing capabilities into
GeoRaster (like raster algebra and other image processing features). You can
find more info here: Oracle Spatial and Graph GeoRaster Technical White Paper
With BDSG (the Hadoop based product) we are focused more on image
processing, since the types of workflows we want to support with this product
are more image processing type workloads.
many incidents are refreshed/ managed in real time (fast moving objects)?
A: Since this is a Hadoop based system, it is
not geared for real-time data management. Hadoop processes have a long latency for
starting the MapReduce jobs. We plan to release support on Apache SPARK in a
future release that can support real-time workloads. Oracle also has a stream processing engine, Oracle
Stream Analytics, that can, depending on processing power, handle over 1.5M
raster, Can you give some use cases why you would like to do with Raster? Face
recognition is a pure image processing application, and we do include software
to do this type of application with BDSG as part of the multimedia features (see
this blog example).
For Raster processing, the applications include processing to quickly
assess the quality of satellite images (cloud cover, etc.), quickly mosaick
rasters to create new data products, create new data like hillshading from
elevation models, process elevation models to calculate flood risks, etc.
storing satellite images on Hadoop, accepted it can be stored however how do we
retrieve it again? Assuming I have deleted the original image on the system.
A: The data can be sent back to a regular
file system after some raster processig is done. For example, the subset or mosaick operations
can process rasters stored in HDFS, and the results can be written back to NFS
in some user specified format like GeoTIFF. And once it is on NFS, other applications can
access the results.
you work with external sources in their native format or you need to migrate to
vectors, we work in their native format. Users need to provide a RecordReader
class to read the native format and plug that Java class into our framework. For Raster, we still work in the native
format, but the file block structure is organized in such a way to optimize the
raster data processing. So it does
change the layout of the raster data in the data (HDFS) blocks, but the format
itself is not changed.
there any way one can do satellite image processing on Hadoop after storing the
images in the Hadoop environment?
A: Yes this can be done with the raster data
analysis framework we have in the product. The example raster demo at the end
is an example of such satellite image processing operations.
you work in a 3D environment?
A: The vector API we provide does work with
3D data as well.