Tips on tuning SDO_NN (nearest neighbor) queries

By:  Daniel Geringer, Senior Development Manager, Oracle Spatial and Graph 

Many times I come across customers who ask for help tuning their SDO_NN queries.

I generally group SDO_NN queries into two categories.
The category defines which SDO_NN parameter to use, SDO_NUM_RES or SDO_BATCH_SIZE.
Choose either SDO_NUM_RES or SDO_BATCH_SIZE, never both.

SDO_NUM_RES -    Find the nearest candidates only with respect to proximity to the window geometry.
                                     There are no additional conditions in the WHERE clause. 
SDO_BATCH_SIZE - Find the nearest candidates, and match additional conditions in the WHERE clause.
                                       For example, find the 5 nearest banks with name = 'CITIBANK'


For "SDO_NN with SDO_BATCH_SIZE", a template is recommended.
Optimizations have been developed when the template is used for "SDO_NN with SDO_BATCH_SIZE" queries.

The template must include the FIRST_ROWS hint and ORDER BY distance for the optimization to take effect.
The following is an example of the template.

--
--  Find 30 closest banks named CITIBANK to the window geometry no further than 8046.72 meters (or 5 miles) away.
--   This SDO_BATCH_SIZE query includes the first_rows hint, and also ORDER BY distance.
--
--   Specifying the cutoff distance is not required, but it can also really help performance.
--   For example, distance=8046.72 means stop searching once you reach 5 miles away from the window geometry.
--
SELECT bank_id, distance
FROM (SELECT /*+ first_rows */ a.bank_id, sdo_nn_distance(1) distance
             FROM banks a
             WHERE sdo_nn (geometry, window_geometry, 'SDO_BATCH_SIZE=10 distance=8046.72', 1)='TRUE'
                 AND a.bank_name = 'CITIBANK'
             ORDER BY distance)
WHERE rownum <=  30;


Not always necessary, but for SDO_NN queries against very dense data (for example, when searching customers in a city with a large population), major performance gains can be achieved by ordering the data in your table spatially by a linear key.

The idea is to order the table during load, where rows containing geometries close to each other
get placed in the same database block. That way, when nearest neighbor gets the next sdo_batch_size set of rows,
and checks additional conditions for a match, the data for the additional condition checks are likely to be in a
database block that is already cached in the SGA. 

If the table is not ordered spatially, potentially, every nearest neighbor candidate (obtained by the spatial index)
might have its additional attributes in database blocks spread all over the place.  This results in much more
disk reads vs in-memory hits to compare the additional conditions in the query.

SDO_NN with SDO_BATCH_SIZE can perform much faster when this is done.
Below are a polygon and point example of how to order a table spatially by a linear key:

--------------------------------------------------------------------------------
-- Make a copy of the POLYGON table.  Call it POLYGON_ORDERED.
--------------------------------------------------------------------------------
connect / as sysdba
grant execute on mdsys.md to some_user;
exit;

connect some_user/some_user
SET TIMING ON
ALTER SESSION ENABLE PARALLEL DDL;
ALTER SESSION ENABLE PARALLEL DML;
ALTER SESSION ENABLE PARALLEL QUERY;
DROP TABLE polygon_ordered;
CREATE TABLE polygon_ordered NOLOGGING TABLESPACE users PARALLEL AS
WITH
  part1 AS (SELECT /*+ parallel (8) */
                   column1,
                   column2,
                   column3,
                   geom,
                   sdo_geom.sdo_centroid(geom,.05) centroid
              FROM polygon_table)
SELECT /*+ parallel (8) */
       p1.column1,
       p1.column2,
       p1.column3,
       p1.geom,
       row_number() OVER (ORDER BY md.hhencode_bylevel (p1.centroid.sdo_point.x, -180, 180, 27,
                                                                                                     p1.centroid.sdo_point.y,  -90,  90, 26)) rn
FROM part1 p1;


--------------------------------------------------------------------------------
-- Make a copy of the POINT table.  Call it POINT_ORDERED.
--------------------------------------------------------------------------------
connect / as sysdba
grant execute on mdsys.md to some_user;
exit;

connect some_user/some_user
SET TIMING ON
ALTER SESSION ENABLE PARALLEL DDL;
ALTER SESSION ENABLE PARALLEL DML;
ALTER SESSION ENABLE PARALLEL QUERY;
DROP TABLE point_ordered;
CREATE TABLE point_ordered NOLOGGING TABLESPACE users PARALLEL AS
WITH
  part1 AS (SELECT /*+ parallel (8) */
                   column1,
                   column2,
                   column3,
                   geom
              FROM point_table)
SELECT /*+ parallel (8) */
       p1.column1,
       p1.column2,
       p1.column3,
       row_number() OVER (ORDER BY md.hhencode_bylevel (p1.geom.sdo_point.x, -180, 180, 27,
                                                                                                     p1.geom.sdo_point.y,  -90,  90, 26)) rn
FROM part1 p1;

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

The official blog for the spatial features of Oracle Spatial and Graph, an option of Oracle Database - brought to you by the product managers and developers. Get technical tips, product information, and the latest news here. Visit our official product website at http://www.oracle.com/technetwork/database/options/spatialandgraph/overview/spatialfeatures-1902020.html

Search

Categories
Archives
« May 2015
SunMonTueWedThuFriSat
     
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
20
21
22
23
24
25
26
27
28
29
30
31
      
Today