Is Parallelism married to Partitioning?
By Rekha Balwada on Jul 18, 2011
In the last couple of months I have been asked a number of times about the dependency of parallelism on partitioning. There seems to be some misconceptions and confusion about their relationship, and some people still believe that partitioning is required to perform operations in parallel. Well, this is not the case, so let me try to briefly explain how Oracle parallelizes operations. Note that 'parallel operation' in this context means to break down a single operation – e.g. a single SQL statement – into smaller unit of works that are processed in parallel to leverage more or even all resources in a system to return the result faster. More in-depth information and whitepapers on Parallel Execution and Partitioning can be found on OTN.
Let’s begin with a quick architecture recap. Oracle is a shared everything system, and yes, it can run single operations in a massively parallel fashion. All of the data is shared between all nodes in a cluster and the capability of how to parallelize operations is not restricted to any predetermined static data distribution done at database/table setup (creation) time. In a pure shared nothing architecture, database files (or tables) have to be partitioned to map disjoint data sets – “partitions” of data – to specific “nodes” of a multi-core or multi-node system to enable parallel processing as illustrated below (note that the distribution in a shared nothing system is normally done based on a hash algorithm; I just used letter ranges for illustration purposes here):
Design strategies for parallelization: Shared Nothing v Shared Everything architecture
In other words, with the Oracle Database any node in a cluster configuration can access any piece of data in an arbitrary parallel fashion, and the data does not have to be “partitioned” at creation time for parallelization.
Oracle’s parallel execution determines the most optimal subset of the smaller parallel units of work based on the user’s request – namely the SQL statement. Oracle does not rely on Oracle Partitioning to enable parallel execution, but takes advantage of partitioning wherever it makes sense, just like a shared nothing system; we will talk about this a bit later.
So how does Oracle dynamically determine these smaller units of work? In Oracle, the basic unit of work in parallelism is called a granule. Oracle divides the operation being parallelized (for example a table scan, table update, or index creation) into granules. Oracle distinguishes two different kinds of granules for initial data access: Block range granules and Partition based granules.
Block range granules are ranges of physical blocks from a table. Each parallel execution server works on its own block range granule. Block range granules are the basic unit of most parallel operations even on partitioned tables. Therefore, from an Oracle Database perspective, the degree of parallelism (DOP) is not related to the number of partitions or even the fact whether a table is partitioned or not. The number of granules and their size correlates with the DOP and the object size.
That said, Oracle is flexible in its decisions: some operations can benefit from an underlying partitioned data structure and leverage individual partitions as granules of work, which brings me to the second kind of granules; with partition based granules only one parallel execution server performs the work for all of the data in a single partition or subpartition (the maximum allowable DOP obviously then becomes the number of partitions).
The best known operation that can benefit from parallel granules is a join between large equi-partitioned tables, a so-called partition-wise join. A partition-wise join takes place when tables to be joined are equi-partitioned on the join key or the operation is executed in parallel and the larger one of the tables to being joined is partitioned on the join key.
By joining the equivalent partitions the database has to do less data distribution and the sort/join resource requirements are reduced. The picture below illustrates the execution of a partition-wise join. Instead of joining all data of SALES with all data of CUSTOMERS, partition pairs are joined in parallel by multiple parallel execution servers, having each parallel execution server working on one pair of partitions. You can understand this a little bit like shared-nothing-like processing at runtime.
Now that we know about Oracle’s choice of granules how do we tell which type of granules was chosen? The answer lies in the execution plan. In the example below, operation ID 7 (above the table access of SALES) it says PX BLOCK ITERATOR, which means that block range granules have been used to access table SALES in parallel.
To illustrate partition-wise joins and the use of partition based granules consider the following example: table SALES is range partitioned by date (8 partitions) and sub partitioned by hash on cust_id using 16 subpartitions for each range partition, for a total of 128 subpartitions. The CUSTOMER table is hash partitioned on cust_id with 16 hash partitions.
Below is the execution plan for the following query:
SELECT c.cust_last_name, COUNT(*)
FROM sales s, customers c
WHERE s.cust_id = c.cust_id AND
s.time_id BETWEEN TO_DATE('01-JUL-1999', 'DD-MON-YYYY') AND
GROUP BY c.cust_last_name HAVING COUNT(*) > 100;
Operation ID 11 above the table access of SALES says PX PARTITION RANGE ITERATOR which means partition based granules were used to access partition 8 and 9 and subpartitions 113 to 144 (Partition pruning which be discussed in another blog)
Operation ID 8 says PX PARTITIONS HASH ALL which means the corresponding 16 hash subpartitions of the SALES table are joined to customer table using a full partition-wise join. (For more information on Partition-Wise joins refer to Oracle® Database VLDB and Partitioning Guide)
As we have seen,unlike shared nothing systems, Oracle parallel execution combines the best of both worlds, providing the most flexible way of parallelizing an operation but still take full advantage of pre-existing partitioning strategies for optimal execution when needed.