X

Information, tips, tricks and sample code for Big Data Warehousing in an autonomous, cloud-driven world

  • May 17, 2016

Single PX Server Execution

Yasin Baskan
Director, Product Management

I recently helped a customer tune a few queries in their Oracle Database In Memory POC. I want to talk about a simplified version of one of their queries as it is a nice tuning example and also a good opportunity to talk about a new parallel execution feature introduced in Oracle Database 12c.

Let me start with what the feature is and then look at the performance problem this particular customer was having.

Single PX server execution

SQL constructs like rownum can introduce serialization points in the execution plan of a SQL statement. Until 12c these serial steps are executed by the query coordinator (QC). This can cause several issues:


1. The query coordinator (QC) will be doing processing for the serial steps. This can impact performance negatively as the QC will be busy doing processing rather than coordination.

2. The statement can use multiple parallelizers which introduce their own implications.

3. The serialization point can stop the parallelization of the subsequent plan steps, again leaving all further processing to the QC.



Here is an example showing some of these issues, this is from an 11.2.0.4 database. I create a table with 5M rows. The query selects the top 10 rows from this table based on column ID, then joins that result set to the same table.

create table t as 
with ttemp as (select rownum r from dual connect by level<=10000)
select rownum id,rpad('X',100) pad
from ttemp,ttemp
where rownum<=5000000;
select /*+ parallel(2) */ count(*)
from (
select * from (select id
from t
order by id desc)
where rownum<=10
) t1, t
where t1.id=t.id
;

The rownum predicate introduces a serialization point in line ID 12. The data is sent to the QC at that step and all operations in line IDs 8-11 are executed by the QC. In line ID 11 we see that an additional parallelizer has been used.

12c introduces the concept of a single PX server executing parts of a plan instead of the QC. A DFO can be executed by a single PX server to free the QC so that it can do its coordination job. This also prevents the statement from using extra parallelizers. Here is the plan for the same statement in 12c.

There is a new distribution method, PX send 1 slave, in line ID 12. This is the same serialization point caused by the rownum predicate just like in 11g, but this time the data is sent to a single PX server rather than the QC. Operations in line IDs 7-11 are executed by this single PX server, this frees up the load on the QC. Also note that there is no extra parallelizer caused by the serialization point.

Tuning exercise

Let's now look at what issue this customer was having with single PX server execution. I start with the same table with 5M rows, this time declared as inmemory to match the customer's table definition.

create table t inmemory no duplicate distribute by rowid range as 
with ttemp as (select rownum r from dual connect by level<=10000)
select rownum id,rpad('X',100) pad
from ttemp,ttemp where rownum<=5000000;

Again, the query given below joins the top 10 rows based on column ID from table T and joins that result set to the same table. In the customer's case there were a lot more joins, I simplified the case as other parts of the query were irrelevant in this tuning exercise. This is on a 2-node RAC database.

select /*+ parallel(2) */ count(*)
from (
select * from
(select id, rownum
from t
order by id desc)
where rownum<=10
) t1, t
where t1.id=t.id
;

Here is the SQL Monitor report for this query. Line IDs 13 and 14 are interesting, they do 50% of the activity to send and receive 5M rows. The distribution method, PX send 1 slave, is a new distribution method in 12c which sends all rows to a single PX server. This means 5M rows were sent to one PX server in those plan steps. This also means all the steps in that specific DFO (Line IDs 7-13) were executed by that single PX server even though SQL Monitor indicates those steps as parallel with the red people icon.

If you look at the Parallel tab in SQL Monitor you see that one PX server (p001) in parallel set 2 does all the work in that set.

The query is actually asking for top 10 rows from table T in the inner query, then why does it send all 5M rows to a single PX server rather than sending only the top 10 rows from each producer PX server?

This is because the select list contains rownum in addition to the table columns, the evaluation of the select list is done serially so all the rows are sent to a single process. That rownum expression in the select list is not used anywhere in the outer query, the outer query is only asking for the count of rows. So, in this case it is safe to remove rownum from the select list, this gives us the following SQL Monitor report.

select /*+ parallel(2) */ count(*)
from (
select * from
(select id
from t
order by id desc)
where rownum<=10
) t1, t
where t1.id=t.id
;

This time we see that only 19 rows were sent to a single PX server in line ID 12. This is because the rownum is pushed down to the PX servers doing the scan of the table (line ID 17) and only the top 10 rows from each PX server are sent further. Distributing these few rows as opposed to 5M rows before improves the elapsed time dramatically. In addition, the single PX server executing line IDs 7-11 works on only a few rows which also takes less time than before. The elapsed time now is 3 secs compared to 11 secs before.

Join the discussion

Comments ( 2 )
  • guest Wednesday, May 18, 2016

    The very last SQL has no difference than early ones, where is the tuning?

    Thanks,


  • Yasin Baskan Wednesday, May 18, 2016

    If you look at the statement before the change, you will see that there is rownum in the select list in the subquery. After it was removed, the plan changed and the performance was much better.


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.