In the last post we looked at multiple parallelizers and how they could cause a SQL statement use more PX servers than expected. You will notice that some types of statements behave differently in 12c compared to 11g in this regard. For these statements 12c does not use multiple parallelizers, thereby avoiding the negative consequences of multiple parallelizers.
Here is a simple query that checks if the sales table has any rows without a corresponding channel_id in the channels table.
SELECT /*+ parallel(2) monitor */ COUNT(*) FROM sales s WHERE NOT EXISTS (SELECT /*+ no_unnest */ 1 FROM channels c WHERE c.channel_id=s.channel_id);
I have used the no_unnest hint on purpose for illustration purposes to force the optimizer to use a filter operation rather than unnesting the query and using a join between two tables.
First let's look at the 11g plan for this statement. Note that this plan is from a 12c database with optimizer_features_enable set to 22.214.171.124 in the session. You can click on the pictures to see them larger.
We see there are two parallelizers (Line Ids 3 and 7), each with one PX server set, totally 2 PX server sets are used for this statement. The first PX server set scans the sales table and sends the rows to the PX coordinator (Line Ids 4-6). The PX coordinator executes the subquery by using another PX server set (Line Ids 8-10), but does the filtering itself serially (Line Id 2). If we look at the "Activity %" column we see that nearly all the time was spent on Line Ids 2, 3, and 4 as the filter operation is done serially by the PX coordinator. In Line Id 4 we spent time sending 102M rows (indicated by the Actual Rows column) to the PX coordinator, in Line Ids 2 and 3 we spent time getting these rows and running the filter operation.
If we run the same statement with optimizer_features_enable set to 126.96.36.199 to get the 12c behavior here is what we see.
In 12c we do not see two parallelizers anymore, we have only one parallelizer (Line Id 2) which has only one PX server set. The PX servers in this set scan the sales table (Line Ids 6, 7), then each PX server scans the channels table and does the filter operation for the rows it scanned from the sales table (Line Ids 5, 8). After the filter operation each PX server counts the rows (Line Id 4) and sends the result to the PX coordinator.
If we look at the Timeline column for these two cases we see that the same query finished in 67 seconds in 11g and in 31 seconds in 12c.
You can see the SQL Monitor reports for these two examples here, 11g and 12c. You may see a blank page if you try to open these links if your browser is configured to reject unsafe scripts, this is because this blog site is using HTTPS and SQL Monitor reports use HTTP links, please download the reports and open them locally in this case.
Here is a simple query that uses this type of a subquery in the select list.
SELECT /*+ parallel(2) */ (SELECT c.channel_desc FROM channels c WHERE c.channel_id=s.channel_id) channel_desc FROM sales s WHERE cust_id<100000;
Here is the plan for this query with optimizer_features_enable set to 188.8.131.52.
We have two parallelizers (Line Ids 2, 5) and one PX server set under each parallelizer. One PX server set (Line Ids 6-8) scan the sales table and sends the rows to the PX coordinator. The PX coordinator uses another PX server set (Line Ids 2-4) to scan the channels table for the subquery.
Here is the 12c plan for the same statement.
Now we have only one parallelizer (Line Id 1) and one PX server set (Line Ids 2-6). The PX servers in this single set scan the sales table, each PX server scans the channels table for the subquery. 12c introduces a new operation called "Expression Evaluation" (Line Id 3) which indicates the subquery expression is evaluated by each PX server after scanning the sales table.
The elapsed times for this query in 11g and 12c are similar as you can see in the Timeline column.
In all examples here you will notice that the number of executions for the subquery does not match the number of rows from the sales table, this is because of scalar subquery caching. Oracle tries to cache the result of the subquery so that it will not need to run the subquery again for the same values from the sales table. This is also why we see the scan of the channels table is not active for the whole duration of execution (indicated by the Timeline column).
In all examples in this post I have used the Swingbench SH schema with all indexes disabled for illustration purposes.
For correlated filters and scalar subquery expressions in the select list 12c provides improvements so that statements do not use multiple parallelizers. This prevents such statements from the implications of multiple parallelizers we mentioned in the last post. Note that you can still see cases where these optimizations do not kick in depending on the complexity of the statement.