Monday Jan 19, 2015

The Importance of Organizing Spatial Data By Proximity

Guest Post By: Nick Salem, Distinguished Engineer, Neustar

Note: Thanks to Nick Salem, Technical Chair of the Oracle Spatial SIG , for contributing this valuable tuning and performance tip that will be useful to most Oracle Spatial users!

The goal of this post is to shed some light on a technique that I feel is many times overlooked by users working to tune large spatial datasets: to organize the rows in a table by their spatial proximity. Dan Geringer alluded to this in the post “Tips on tuning SDO_NN (nearest neighbor) queries” . What I want to highlight is that this technique is not only beneficial for SDO_NN queries, but also for basically all queries that use any of the spatial query operators like SDO_FILTER, SDO_RELATE, SDO_WITHIN_DISTANCE, as well as the SDO_RELATE helper operators such as SDO_ANYINTERACT, SDO_INSIDE and so on. The SDO_JOIN operator by itself may not benefit from this approach because it relies solely on the spatial index, but if you decide to join the resultset of the SDO_JOIN operation back to the input tables of the join operation, then you will most likely also benefit from having the data stored by proximity.

To understand the dynamics of this, one has to understand in general how Oracle queries work. When a user issues a spatial query, a spatial index is used to find the rowids for the rows that need to be returned. Oracle uses these rowids to retrieve the database blocks from disk into the buffer cache where it can process the data. Each database block can contain one or more rows of data. In order for Oracle to retrieve data for a query, it needs to retrieve all the blocks needed to satisfy a query. The more blocks that need to be scanned, the more I/O operations are performed and the longer the query will take. For example, a spatial query resulting in the scan of one or two adjacent blocks will return a lot faster than the same spatial query needing to scan a large number of disparate blocks to process the same results. Of course, this issue is not confined to just spatial queries. In fact, most Oracle DBAs are aware of this with the concept of the index clustering factor which describes the relationship between the order of an index and its corresponding table. For non-spatial data, achieving an optimal index clustering factor can be as easy as ordering the data in a table by a scalar data type column that you plan to index and then creating that index. For spatial data, this can be a little trickier because one cannot just simply order by the SDO_GEOMETRY column. In Dan’s post “Tips on tuning SDO_NN (nearest neighbor) queries”, he shows an example using the MD.HHENCODE_BYLEVEL function to return a value that you can use to sort your spatial data by.

In this post, I will show an example of a use case I tested that highlights the impact on performance when ordering spatial data by proximity. The example will include taking the US Business listing table and creating two copies: one that is not ordered by proximity, and the other ordered by proximity using the MD.HHENCODE_BYLEVEL function. Then, I will test running a simple SDO_WITHIN_DISTANCE query on both tables to retrieve the sum of employees for an arbitrary 5 mile radius and compare the results.

Example Pre-requisites

In this example, I am starting with a 16 million row table containing all the US business locations with approx 30 columns called BUSINESS1. This table is not ordered by spatial proximity. BUSINESS1 also has a spatial index created on column GEOM.

Step 1) Next I will create table BUSINESS2, which will be an exact copy of BUSINESS1, but will be ordered by proximity using the MD.HHENCODE_BY_LEVEL function as described in Dan’s post.


CREATE TABLE business2 PCTFREE 0 NOLOGGING PARALLEL
as
SELECT /*+ parallel(8) */
  b.*
FROM
  business1 b
ORDER BY
  row_number() over (order by md.hhencode_bylevel(
     b.geom.sdo_point.x,-180,180,27,
     b.geom.sdo_point.y,-90,90,26)) ;

* Note: the MD.HHENCODE_BYLEVEL function takes an x and y coordinate and so the example works well with the US Business points. If you are working with polygons instead of points, you will need to retrieve the centroid of the shape and then pass the X/Y point coordinates to function. For more info, please refer to Dan Geringer’s original post “Tips on tuning SDO_NN (nearest neighbor) queries” to see an example of how this is done.

* Note: only use PCTFREE 0 for read only tables. In this example, the BUSINESS2 is read only and using PCTFREE 0 allows more rows to be packed into a single database block.

Step 2) add diminfo and create spatial index

begin

mdsys.sdo_meta.change_all_sdo_geom_metadata( USER, ‘BUSINESS2’, ‘GEOM’, mdsys.sdo_dim_array (
     mdsys.sdo_dim_element('X', -180, 180, .5),
     mdsys.sdo_dim_element('Y', -90, 90, .5) ),8307);
end;
/

CREATE INDEX xsp_business2 ON business2(geom)
INDEXTYPE is mdsys.spatial_index PARAMETERS(' SDO_RTR_PCTFREE=0 WORK_TABLESPACE=WORK');

* Note: it is recommended to use a WORK_TABLESPACE that is a different tablespace than the one where the index will be created. Although optional, the use of a WORK_TABLESPACE can reduce the fragmentation of your spatial index which is important for performance.

* Note: only use SDO_RTR_PCTFREE=0 for read only data. In this example, the US BUSINESS listing table is pretty much a read only dataset and gets completely replaced every month.

Step 3) Perform query comparison test

In SQL*PLUS, set timing and flush the buffer cache

SQL> SET TIMING ON;
SQL>
SQL> ALTER SYSTEM FLUSH BUFFER_CACHE;

System altered.

* Note: you should not flush the buffer cache on a production system. But in a test environment, flushing the buffer cache can help with testing comparative performance by ensuring there are no blocks in the SGA from prior queries that could skew performance results.

Run the first query on the non-spatially ordered table

select
  sum(b.num_of_employees)
from
 business1 b
where
  sdo_within_distance( b.geom,
                       mdsys.sdo_geometry(2001,8307,
                       mdsys.sdo_point_type( -117.047071, 32.75521, null),null,null),
                       'distance=5 unit=mile' ) = 'TRUE';

SUM(B.NUM_OF_EMPLOYEES)
-----------------------
                 137964

Elapsed: 00:00:35.14

Run the query on the spatially ordered business table and compare results.

select
  sum(b.num_of_employees)
from
 business2 b
where
  sdo_within_distance( b.geom,
                       mdsys.sdo_geometry(2001,8307,
                       mdsys.sdo_point_type( -117.047071, 32.75521, null),null,null),
                       'distance=5 unit=mile' ) = 'TRUE';

SUM(B.NUM_OF_EMPLOYEES)
-----------------------
                 137964

Elapsed: 00:00:01.49

Conclusion

In this example, I was able to achieve more than 20X better performance just by taking the same table and ordering by a geographic extent. The queries in the example utilized the spatial operator SDO_WITHIN_DISTANCE to return the sum of employees with a 5 mile radius around a location. Since both tables (BUSINESS1 and BUSINESS2) are identical except in storage of the order of rows of data, the spatial index performance should be pretty much the same. The difference in performance that we are seeing is due to the amount of disk and memory I/O processing caused by the different number of blocks that needed to be accessed for each of the queries. This is a substantial improvement in performance and highlights the importance of the order of spatial data by geographic proximity. And as mentioned in the beginning of the post, ordering spatial data may boost any spatial query operations, whether you are performing within distance queries, sdo relate queries, nearest neighbor or performing map visualizations. Results can vary based on size of table, the speed of disk I/O and also the order of the original dataset. It is possible that the original dataset you are working with is already organized by some geographic extent such as county or ZIP Code so additional ordering using the method described in this post could result in some performance gains but nothing as significant as the 20X I have experienced. But if the order of the table is completely random and not tied to any geographic extent, then you can expect to see greater performance gains. All in all, I definitely recommend looking into organizing larger and more frequently queried spatial data by geographic proximity as a best practice technique for optimizing your spatial data.


Monday Oct 06, 2014

OOW 2014 Spatial SIG Meetup Discussion

sigphoto

At Oracle OpenWorld last week in San Francisco, the Oracle Spatial SIG invited users to join an informal meetup -- at the OTN Lounge in Moscone South.  Users from the US and Europe, from industries such as rail, local city government, national land management agencies, and defense, met with SIG Board and  Oracle Spatial product team members to discuss questions around Oracle Spatial and MapViewer implementations.

Topics included: 

-- challenges of and strategies for handling complex geometries

-- enterprise deployments with various GIS tools

-- MapViewer capabilities such as HTML5 / JSon support, editing, and new mobile/BI features

Read a full recap of the discussion topics here.

And check out more photos from the event. 

Monday Aug 11, 2014

Save the dates! Jan 27-29, 2015 -- Oracle Spatial Summit 2015

Oracle Corporation
header image

Oracle Spatial Summit 2015

Moving to California as part of the Oracle BIWA Summit 2015

divider line
shadow

Save the dates! Jan 27-29, 2015

Oracle Conference Center at Oracle HQ Campus, Redwood Shores, CA

  • Learn directly from the senior management and developers of Oracle’s industry-leading spatial technologies -- Dan Geringer, Siva Ravada, Xavier Lopez, Jayant Sharma and Jim Steiner.
  • Participate in Hands-On-Labs for Oracle Spatial and MapViewer.
  • Hear from leading practitioners and customers about their experiences with Oracle’s spatial technologies, Exadata, OBIEE and more.
  • Experience Oracle BIWA with expert and customer sessions on Business Intelligence, Data Warehouse, and Big Data technologies.

SUBMIT YOUR SESSION ABSTRACT FOR THE ORACLE SPATIAL SUMMIT 2015 NOW!

Hardware and Software, Engineered to Work Together
Copyright © 2014, Oracle and/or its affiliates.
All rights reserved.
Contact Us | Legal Notices and Terms of Use | Privacy Statement

Tuesday Jul 29, 2014

How much space do your SDO_GEOMETRY tables consume?

 By Karin Patenge, Principle Sales Consultant, Oracle Germany

I would like to briefly offer a simple function to calculate the memory allocated for SDO_GEOMETRY tables. This takes into account only tables, LOB and index segments (this is not applicable for partitioned tables).


create or replace function alloc_space_in_mbytes (tablename in varchar2)
return number 
is
table size number;
begin
  select 
    sum (bytes) into table size
  from (
    select 
      segment_name, 
      bytes  
    from 
      USER_SEGMENTS - table segments             
    where  
      segment_name = tablename
    union all
    select 
      s.segment_name segment_name, 
      s.bytes bytes
    from 
      user_lobs l, - Lobsegmente
      USER_SEGMENTS s
    where 
      l.table_name = tablename and
      s.segment_name = l.segment_name
    union all
    select 
      s.segment_name segment_name, 
      s.bytes bytes
    from 
      USER_INDEXES i, - index segments
      USER_SEGMENTS s
    where 
      i.table_name = tablename and
      s.segment_name = i.index_name);

  table size: = tablesize/1024/1024; - Conversion in MB
  return table size;
end;
/
The call to the function can then look like this:
select
  alloc_space_in_mbytes ('GEOM_TABLE_UNTRIMMED') untrimmed, 
  alloc_space_in_mbytes ('GEOM_TABLE_TRIMMED') trimmed
from dual
/
A test table with 218 237 polygons, a total of 60,754,462 bases and existing spatial index result in the following values:

SDO_ORDINATE_ARRAY values with predominantly 12 to 13 decimal places: 1643.5 MB
SDO_ORDINATE_ARRAY values truncated to 5 decimal places: 1033.5 MB

Monday Feb 24, 2014

Performance Boost with Aggregating Geometries - with 12c Spatial Vector Performance Acceleration

Nick Salem of Neustar recently shared some impressive performance gains realized using Oracle Spatial 12c Vector Performance Acceleration, in a Google+ post.   He observed performance gains of 40X to 300X -- with a use case aggregating all ZIP Code geographies in California using a SDO_AGGR_UNION operations (plain and with group by/mod functions as described in 11g documented best practices). 

Read more details of his test case and results here.  https://plus.google.com/114373574274269737617/posts/2caAypKxwff

Thanks for sharing, Nick!

Wednesday Feb 05, 2014

Data Validation with Esri and Oracle Spatial

By:  Daniel Geringer, Senior Development Manager, Oracle Spatial and Graph

One of the things users can run into when using Esri with Oracle Spatial can be problems with data validation when loading data using Esri utilities into Oracle Spatial data types. From my experience, the solution is a relatively straightforward change in the workflow used to load data.

Here is an excerpt from a user problem where it sounds like Esri is calling Oracle's validation on load (not Esri's validation).

I don't believe Esri's validation would be sensitive to the database version.

  • It seems there are tighter tolerances (or the tolerances work) in 11.2.0.3 and features that were not reporting as invalid in 11.2.0.2 are now reporting as invalid features (mainly self-intersecting polygons) in the 11.2.0.3.  We are assuming that for some reason the SDE tools are crashing when they hit an invalid feature.

This sounds more like a workflow defined by Esri than differences in validation.  I think the issue is once Esri encounters an invalid geometry, it stops loading the rest (even if they are valid).  I believe Esri chooses this workflow to ensure all geometries in a layer are valid.  I agree the outcome is optimal, and what Oracle recommends too.  A deficiency in the workflow is:

1. It prevents valid geometries from being loaded after an invalid one is encountered
2. Not loading the invalid geometry does not provide a way to inspect it

From a product perspective, a change in the Esri workflow would have to be implemented by Esri.  (Maybe the default behavior can remain the same, but enable a switch that populates invalid geometries into a different table for inspection at a later time. This is just one suggestion.  There may be other options.)

Currently, the best way to circumvent this is to do what this customer did, and load data with Oracle or third party tools like FME.  Once the data is loaded, Oracle also recommends validation, and removing/isolating invalid geometries until they can be fixed.  I discuss a strategy to do this in a best practices presentation. 

One thing I keep hearing is the claim that Esri and Oracle validate differently.

  • Both ESRI and Oracle validate by OGC rules. 
  • The only difference I see is when the data is geodetic.  Esri does not consider geodesic paths between vertices for linestrings or polygons, but Oracle does. 
  • There is only one case I have come across that is valid in Esri, but not in Oracle or by OGC standards. 
    • Imagine a box with a triangle void, where the triangle only touches the box at one point. 
    • Esri can model this as a single exterior ring with no voids, and considers it valid.  This does not comply with OGC standards.
    • Oracle and the OGC standard state that this case is valid if it is modeled with two rings, one exterior and one interior, where the interior ring touches the exterior at one point.
  • Other than geodetic data validation, I believe both Esri and Oracle offer comprehensive validation, and for all intents and purposes, the validation offered by both solutions is equivalent.

Friday Jan 10, 2014

New Point-in-Polygon function in Oracle Spatial and Graph 12c

By: Jim Steiner, Siva Ravada, Rick Anderson

With the increased adoption of Exadata for spatial workloads, we have been looking at ways to exploit more and more of the capabilities of this architecture to address problems faced in large scale spatial analysis. The new point-in-polygon function in Spatial can result in 100s of times faster UPDATE and INSERT operations with no degradation in query performance for large scale point-in-polygon operations. Mask operations (DISJOINT, TOUCH, INSIDE, ANYINTERACT) can be performed with the point-in-polygon function.

When working with point data and performing point-in-polygon analysis the existing spatial operators to do a fast query on the data if there is a Spatial index on the point data. However, in many cases, the data volume is very high, so creating and maintaining the index becomes very expensive. With 12c, we exploit Exadata smartscan by implementing a different model to take advantage of all the CPUs to do point in polygon operations and not have the overhead of a Spatial index. The mdsys.PointInPolygon() function returns those rows that reside within a specified polygon geometry. This parallel-enabled Point-In-Polygon function takes an arbitrary set of rows whose first column is a point's x-coordinate value and the second column is a point's y-coordinate value.

The mdsys.PointInPolygon() function API is the following:

mdsys.sdo_PointInPolygon(cur SYS_REFCURSOR,
                         geom_obj IN SDO_GEOMETRY,
                         tol IN NUMBER,
                         params IN VARCHAR2 DEFAULT NULL);

The "cur" parameter is used to select an "x" and "y" point coordinate from a

user table. The two columns must be of type NUMBER; this is NOT a geometry

parameter.

The "geom_obj" parameter is either a polygon geometry from a table, or a

Transient instance of a polygon geometry, against which all of the selected points from "cur" will be validated.

The "tol" parameter is the desired tolerance value, which must be greater than the value "0.0".



The following examples show the performance benefits of this new approach:

Here we select all rows from the "weather_sensor" table and query those rows against a transient polygon geometry instance. Only 1 weather_sensor row (out of 4) resides within the specified polygon.

SQL> SELECT *

2 FROM TABLE(mdsys.sdo_PointInPolygon(

3 CURSOR(select * from weather_sensor),

4 MDSYS.SDO_GEOMETRY(

5 2003,

6 NULL,

7 NULL,

8 MDSYS.SDO_ELEM_INFO_ARRAY(1, 1003, 1),

9 MDSYS.SDO_ORDINATE_ARRAY(5, 1, 8, 1, 8, 6, 5, 7, 5, 1)),

10 0.05));

In order to utilize parallel query servers, you must either specify the

"/*+ PARALLEL(4) */" optimizer hint, or enable parallel query execution,

using the command:

alter session force parallel query;

Below is the same as above, but uses 4 parallel query servers:

SQL> SELECT /*+ PARALLEL(4) */ *

2 FROM TABLE(mdsys.sdo_PointInPolygon(

3 CURSOR(select * from weather_sensor),

4 MDSYS.SDO_GEOMETRY(

5 2003,

6 NULL,

7 NULL,

8 MDSYS.SDO_ELEM_INFO_ARRAY(1, 1003, 1),

9 MDSYS.SDO_ORDINATE_ARRAY(5, 1, 8, 1, 8, 6, 5, 7, 5, 1)),

10 0.05));

There can be a huge performance benefit to using parallel query servers.

The following "worst-case" example queries 1-million rows against a transient

polygon geometry instance, using the non-parallel query execution:

SQL> -- instead of the actual data...

SQL>

SQL> -- Test "non-parallel" execution first

SQL> timing start "sdo_PointInPolygon()"

SQL> SELECT COUNT(*)

2 FROM TABLE(mdsys.sdo_PointInPolygon(

3 CURSOR(select * from pip_data),

4 MDSYS.SDO_GEOMETRY(

5 2003,

6 NULL,

7 NULL,

8 MDSYS.SDO_ELEM_INFO_ARRAY(1, 1003, 1),

9 MDSYS.SDO_ORDINATE_ARRAY(5, 1, 8, 1, 8, 6, 5, 7, 5, 1)),

10 0.05));

timing for: sdo_PointInPolygon()

Elapsed: 00:05:00.73

Enabling the parallel query servers dramatically reduces the query execution time:

SQL> -- Now test using 4 parallel query servers

SQL> timing start "sdo_PointInPolygon()"

SQL> SELECT /*+ PARALLEL(4) */ COUNT(*)

2 FROM TABLE(mdsys.sdo_PointInPolygon(

3 CURSOR(select * from pip_data),

4 MDSYS.SDO_GEOMETRY(

5 2003,

6 NULL,

7 NULL,

8 MDSYS.SDO_ELEM_INFO_ARRAY(1, 1003, 1),

9 MDSYS.SDO_ORDINATE_ARRAY(5, 1, 8, 1, 8, 6, 5, 7, 5, 1)),

10 0.05));

SQL> timing stop

timing for: sdo_PointInPolygon()

Elapsed: 00:02:18.18

For more information about this new feature, link to the documentation URL:
SDO_PointInPolygon


About

The official blog for the spatial features of Oracle Spatial and Graph, an option of Oracle Database - brought to you by the product managers and developers. Get technical tips, product information, and the latest news here. Visit our official product website at http://www.oracle.com/technetwork/database/options/spatialandgraph/overview/spatialfeatures-1902020.html

Search

Categories
Archives
« July 2015
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
       
Today