By: Jim Steiner, Siva Ravada, Rick Anderson
With the increased adoption of Exadata for spatial workloads, we have been looking at ways to exploit more and more of the capabilities of this architecture to address problems faced in large scale spatial analysis. The point-in-polygon function in Spatial can result in 100s of times faster UPDATE and INSERT operations with no degradation in query performance for large scale point-in-polygon operations. Mask operations (DISJOINT, TOUCH, INSIDE, ANYINTERACT) can be performed with the point-in-polygon function.
When working with point data and performing point-in-polygon analysis the existing spatial operators to do a fast query on the data if there is a Spatial index on the point data. However, in many cases, the data volume is very high, so creating and maintaining the index becomes very expensive. We exploit Exadata smartscan by implementing a different model to take advantage of all the CPUs to do point in polygon operations and not have the overhead of a Spatial index. The mdsys.PointInPolygon() function returns those rows that reside within a specified polygon geometry. This parallel-enabled Point-In-Polygon function takes an arbitrary set of rows whose first column is a point’s x-coordinate value and the second column is a point’s y-coordinate value.
The mdsys.PointInPolygon() function API is the following:
mdsys.sdo_PointInPolygon(cur SYS_REFCURSOR
geom_obj IN SDO_GEOMETRY,
tol IN NUMBER,
params IN VARCHAR2 DEFAULT NULL);
The “cur” parameter is used to select an “x” and “y” point coordinate from a user table. The two columns must be of type NUMBER; this is NOT a geometry parameter.
The “geom_obj” parameter is either a polygon geometry from a table, or a transient instance of a polygon geometry, against which all of the selected points from “cur” will be validated.
The “tol” parameter is the desired tolerance value, which must be greater than the value “0.0”.
The following examples show the performance benefits of this approach:
Here we select all rows from the “weather_sensor” table and query those rows against a transient polygon geometry instance. Only 1 weather_sensor row (out of 4) resides within the specified polygon.
SQL> SELECT *
FROM TABLE(mdsys.sdo_PointInPolygon(
CURSOR(select * from weather_sensor),
MDSYS.SDO_GEOMETRY(
2003,
NULL,
NULL,
MDSYS.SDO_ELEM_INFO_ARRAY(1, 1003, 1),
MDSYS.SDO_ORDINATE_ARRAY(5, 1, 8, 1, 8, 6, 5, 7, 5, 1)),
0.05));
In order to utilize parallel query servers, you must either specify the “/*+PARALLEL(4) */” optimizer hint, or enable parallel query execution, using the command:
alter session force parallel query;
Below is the same as above, but uses 4 parallel query servers:
SQL> SELECT /*+ PARALLEL(4) */ *
FROM TABLE(mdsys.sdo_PointInPolygon(
CURSOR(select * from weather_sensor),
MDSYS.SDO_GEOMETRY(
2003,
NULL,
NULL,
MDSYS.SDO_ELEM_INFO_ARRAY(1, 1003, 1),
MDSYS.SDO_ORDINATE_ARRAY(5, 1, 8, 1, 8, 6, 5, 7, 5, 1)),
0.05));
There can be a huge performance benefit to using parallel query servers.
The following “worst-case” example queries 1 million rows against a transient polygon geometry instance, using the non-parallel query execution:
SQL> -- instead of the actual data...
SQL>
SQL> -- Test "non-parallel" execution first
SQL> timing start "sdo_PointInPolygon()"
SQL> SELECT COUNT(*)
FROM TABLE(mdsys.sdo_PointInPolygon(
CURSOR(select * from pip_data),
MDSYS.SDO_GEOMETRY(
2003,
NULL,
NULL,
MDSYS.SDO_ELEM_INFO_ARRAY(1, 1003, 1),
MDSYS.SDO_ORDINATE_ARRAY(5, 1, 8, 1, 8, 6, 5, 7, 5, 1)),
0.05)); timing for: sdo_PointInPolygon()
Enabling the parallel query servers will result in close to linear speedup for the query execution time.
SQL> -- Now test using 4 parallel query servers
SQL> timing start "sdo_PointInPolygon()"
SQL> SELECT /*+ PARALLEL(4) */ COUNT(*)
FROM TABLE(mdsys.sdo_PointInPolygon(
CURSOR(select * from pip_data),
MDSYS.SDO_GEOMETRY(
2003,
NULL,
NULL,
MDSYS.SDO_ELEM_INFO_ARRAY(1, 1003, 1),
MDSYS.SDO_ORDINATE_ARRAY(5, 1, 8, 1, 8, 6, 5, 7, 5, 1)),
0.05));
SQL> timing stop timing for: sdo_PointInPolygon()
You will notice that the query now runs 4 times faster with a degree of parallelism (DoP) of 4.
For more information about this feature, here is a link to the documentation: SDO_PointInPolygon
