Tuesday Mar 24, 2015

Finding the Distribution Method in Adaptive Parallel Joins

Parallel Execution(PX) uses a producer/consumer model which requires distribution of rows between producers and consumers (two PX server sets). Until 12c, the distribution method was decided at parse time based on optimizer statistics. 12c brings adaptive parallel distribution methods which enables the distribution method decision to be taken at runtime based on the actual number of rows. This functionality is explained in the "Optimizer with Oracle Database 12c" white paper in the "Adaptive Parallel Distribution Methods" section. This was also discussed by Randolf Geist here.

So, how can we find out which distribution method was chosen at runtime? One approach is to use the view V$PQ_TQSTAT as explained by Randolf in his post, or look at the rows processed by producers and consumers in SQL Monitor as explained in the above mentioned paper. But, the easiest way to find this information is to look at the OTHER column in SQL Monitor.

The OTHER column shows a binocular icon in the lines with PX SEND HYBRID HASH row source.

When you click on those icons you can see the distribution method used at runtime.

As the DOP downgrade reason I talked about in my previous post, this is also a cryptic number code as of 12.1. For the adaptive distribution methods there are three possible values. The mappings for these codes are; 6 = BROADCAST, 5 = ROUND-ROBIN, and 16 = HASH distribution. 

Any comments and feedback welcome. 

Friday Mar 20, 2015

Finding the Reason for DOP Downgrades

Whether with manual DOP or with Auto DOP, the runtime DOP of a statement is not always the same as the requested DOP. There are several reasons for this like hitting the maximum number of parallel execution (PX) servers allowed, hitting the DOP limit set by the resource manager, etc...

To find about why a specific statement's DOP got downgraded you can use the SQL Monitor reports in 12.1. SQL Monitor shows the DOP downgrade in the General section as a people icon with a red down-pointing arrow, when you hover over that icon you can see the actual DOP, the downgrade percentage, the number of PX servers requested, and the number of PX servers allocated, like below.

To find out the reason for the downgrade you can click the binocular icon in the OTHER column in the PX COORDINATOR line.

That shows you the actual DOP after the downgrade and the reason for the downgrade (as a cryptic code obviously) like below.

Here is the list of possible reasons and their number codes as of 12.1.

       350 DOP downgrade due to adaptive DOP
       351 DOP downgrade due to resource manager max DOP
       352 DOP downgrade due to insufficient number of processes
       353 DOP downgrade because slaves failed to join

You can look at an example SQL Monitor report showing this here.

We plan to provide this information in an easy to reach section of SQL Monitor in human readable format in a future release.

Tuesday Mar 10, 2015

Parallel DML on Tables with LOB Columns

Until 12c, running parallel DML on tables with LOB columns required that table to be partitioned. Starting with 12c, we allow parallel INSERTs into non-partitioned tables with LOB columns provided that those columns are declared as SecureFiles LOBs. Parallel UPDATE, DELETE, and MERGE are still not supported. The documentation on this has some errors leading some customers to expect all parallel DML to be supported, so I wanted to point out the current behavior.

INSERTs into non-partitioned table with SecureFiles LOB column

Here is an example showing that the INSERT will be executed in parallel.

SQL> create table t (a number,b varchar2(200),c clob) lob(c) store as securefile;
Table created.

SQL> explain plan for insert /*+ parallel enable_parallel_dml */
  2  into t select /*+ parallel */ * from t;


SQL> select * from table(dbms_xplan.display(format=>'basic +note'));

Other DML on non-partitioned table with SecureFiles LOB column

A DELETE statement on the other hand is executed serially as indicated in the note section of the plan. The note also shows the reason which is that the table is non-partitioned. This behavior applies for UPDATE and MERGE statements too.

SQL> explain plan for delete /*+ parallel enable_parallel_dml */  from t;

SQL> select * from table(dbms_xplan.display(format=>'basic +note'));

DML on partitioned table with SecureFiles LOB column

For partitioned tables with LOB columns, all parallel DML operations are supported as before. Remember that the effective DOP will be the number of partitions, this behavior is also the same as before. This is because we use the partitions as PX granules in this case.

This example shows that parallel DML is allowed on a partitioned table with LOB columns. V$PQ_SESSTAT shows we get a DOP of 4 even though the plan shows a DOP of 8, that is because the table has 4 partitions. 

SQL> create table t (a number,b varchar2(200),c clob)
  2  lob(c) store as securefile partition by hash(a) partitions 4;

Table created.

SQL> explain plan for delete /*+ parallel(8) enable_parallel_dml */  from t;


SQL> select * from table(dbms_xplan.display(format=>'basic +note'));

SQL> delete /*+ parallel(8) enable_parallel_dml */  from t;

0 rows deleted.

SQL> select * from V$PQ_SESSTAT;

------------------------------ ---------- ------------- ----------
Queries Parallelized                    0             0          0
DML Parallelized                        1             1          0
DDL Parallelized                        0             0          0
DFO Trees                               1             1          0
Server Threads                          4             0          0
Allocation Height                       4             0          0
Allocation Width                        1             0          0
Local Msgs Sent                         8             0          0
Distr Msgs Sent                         0             0          0
Local Msgs Recv'd                       8             0          0
Distr Msgs Recv'd                       0             0          0
DOP                                     4             0          0
Slave Sets                              1             0          0

13 rows selected.

As always, we are fixing the documentation about this.

Friday Aug 05, 2011

Big Data: In-Memory MapReduce

Achieving the impossible often comes at a cost. In the today’s big data world that cost is often latency. Bringing real-time, deep analytics to big data requires improvements to the core of the big data platform. In this post we will discuss parallel analytics and how we can bring in-memory – or real-time – to very large data volumes.

The Parallel Platform that run in Memory

One of the often overlooked things we did in Oracle 11.2 is that we changed the behavior of data that resides in memory across a RAC cluster. Instead of restricting the data size to the size of a the memory in a single node in the cluster, we now allow the database to see the memory as a large pool. We no longer do cache fusion to replicate all pieces of data in an object to all nodes.


That cache fusion process is show above. In state A, node 1 has acquired pieces of data into the buffer cache, as has node 2. These are different pieces of data. Cache fusion now kicks in and ensures that all data for that object is in the cache of each node.

In 11.2 we have changed this by allowing parallel execution to leverage the buffer cache as a grid. To do that we no longer do cache fusion, but instead pin (affinitize is the not so nice English word we use) data onto the memory of a node based on some internal algorithm. Parallel execution keeps track of where data lives and shuffles the query to that node (rather than using cache fusion to move data around).


The above shows how this works. P1 and P2 are chunks of data living in the buffer cache and are affinitized with a certain node. The parallel server processes on that node will execute the query and send the results back to the node that originated the query. With this in-memory grid we can process much larger data volumes.

In-Database MapReduce becomes In-Memory MapReduce

We talked about in-database MapReduce quite a bit on this blog, so I won’t repeat myself. If you are new to this topic, have a look at this post.

Because of how the database is architected, any code running within it leveraging parallelism can now use the data hosted in memory of the machine. Whether this is across the grid or not, doesn’t matter. So rather than having to figure out how to create a system that allows for MapReduce code to run in memory, you need to just figure out how to write the code, Oracle will ensure that if the data fits, it will leverage memory instead of disk. That is shown in the following picture.


Now, is this big data? Well, if I look at an Exadata X2-8  machine today, we will have 2TB of memory to work with and with Exadata Hybrid Columnar Compression (yes the data resides in memory on compressed state) this means I should easily be able to run upwards of 10TB in memory.  As memory footprints grow, more data will fit within the memory and we can do more and more interesting analytics on that data, bringing at least realtime to some fairly complex analytics.

More at Oracle Openworld

For those of you who are going to San Francisco (with or without flowers in your hat), expect to see and hear a lot more on big data and MapReduce! See you there!

Monday Jun 13, 2011

Parallel Execution – Precedence of Hints and other Factors

The following table is a reflection of the precedence of hints, things like alter session enable parallel DML when using Auto DOP. It is also important to understand how the DML and query parts work together and how they influence each other.

All of the below is based on a simple statement:

insert into t3 as select * from t1, t2 where t1.c1 = t2.c1;


Some explanatory words based on the lines in the picture above:

Line 1 => The cleanest way to run PX statements, where Auto DOP gets to do its work and we will use the computed DOP of the statement

Line 4 => Because a FORCE parallel is used, we must ensure that the DOP > 1

Line 9 => The statement level hint over rides all other means and we run the statement with the DOP in the hint

A word on internal degree limit. This is NOT (repeat NOT) a parameter you can find, or set or find an underscore for. It is the built in limit to DOPs (set to default DOP or parallel_degree_limit = CPU). It is an internal boundary to ensure we do not blast through the upper bound of CPU power. Also note, for any non-compute degree, those boundaries and limits do not apply. That in itself is a reason to go look at and understand Auto DOP.

Tuesday Jun 07, 2011

Take a Class on Parallel Execution

Thanks to our curriculum development team, we now have a comprehensive, 1 day class on all things parallel. The class has just been released to the public, so there are no scheduled dates yet. If you are interested in taking the class, which can be instructor-led (in-person), a live virtual class or self paced, please register your interest at the course site.


Friday May 20, 2011

Screencasts on Parallel Execution are now Live

New screencasts discussing automatic parallelism in Oracle 11g Release 2[Read More]

Sunday Sep 27, 2009

In-Memory Parallel Execution in Oracle Database 11gR2

As promised, the next entry in our 11gR2 explorations is In-Memory Parallel Execution. If you are going to Oracle OpenWorld next month make sure you check out the following session:

Tuesday, October 13 2009 5:30PM, Moscone South Room 308
Session S311420
Extreme Performance with Oracle Database 11g and In-Memory Parallel Execution.

In this session you will get more details and insight from the folks who actually built this functionality! A must see if this is of any interest, so book that ticket now and register!

Down to business, what is "In-Memory Parallel Execution"?

Let's begin by having a quick trip down memory-lane back to Oracle Database 7 when Parallel Execution (PX) was first introduced. The goal of PX then and now is to reduce the time it takes to complete a complex SQL statement by using multiple processes to go after the necessary data instead of just one process. Up until now these parallel server processes, typically by-passed the buffer cache and read the necessary data directly from disk. The main reasoning for this was that the objects accessed by PX were large and would not fit into the buffer cache. Any attempt made to read these large objects into the cache would have resulted in trashing the cache content.

However, as hardware systems have evolved; the memory capacity on a typical database server have become extremely large. Take for example the 2 CPU socket Sun server being used in new the Sun Oracle Database Machine. It has an impressive 72GB of memory, giving a full Database Machine (8 database nodes) over ½ a TB of memory. Suddenly using the buffer cache to hold large object doesn't seem so impossible any more.

In-Memory Parallel Execution (In-Memory PX) takes advantage of these larger buffer caches but it also ensures we don't trash the cache.

In-Memory PX begins by determining if the working set (group of database blocks) necessary for a query fits into the aggregated buffer cache of the system. If the working set does not fit then the objects will be accessed via direct path IO just as they were before. If the working set fits into the aggregated buffer cache then the blocks will be distributed among the nodes and the blocks will be affinitzed or associated with that node.

In previous releases, if the Parallel Execution of one statement read part of an object into the buffer cache, then subsequent SQL statement on other nodes in the cluster would access that data via Cache Fusion. This behavior could eventually result in a full copy of that table in every buffer cache in the cluster. In-Memory PX is notably different because Cache Fusion will not be used to copy the data from its original node to another node, even if a parallel SQL statement that requires this data is issued from another node. Instead Oracle uses the parallel server process on the same node (that the data resides on) to access the data and will return only the result to the node where the statement was issued.

The decision to use the aggregated buffer cache is based on an advanced set of heuristics that include; the size of the object, the frequency at which the object changes and is accessed, and the size of the aggregated buffer cache. If the object meets these criteria it will be fragmented or broken up into pieces and each fragment will be mapped to a specific node. If the object is hash partitioned then each partition becomes a fragment, otherwise the mapping is based on the FileNumber and ExtentNumber.




To leverage In-Memory PX you must set the initialization parameter PARALLEL_DEGREE_POLICY to AUTO (default MANUAL). Once this is set, the database controls which objects are eligible to be read into the buffer cache and which object will reside there at any point in time. It is not possible to manual control the behavior for specific statements.

Obviously this post is more of a teaser, for in-depth discussions on this, go to Openworld and/or keep an eye out for a new white paper called Parallel Execution Fundemental in Oracle Database 11gR2 that will be coming soon to oracle.com. This paper not only covers In-Memory PX but Auto-DOP and parallel statement queuing.

Stay tuned for more on 11gR2 coming soon...


The data warehouse insider is written by the Oracle product management team and sheds lights on all thing data warehousing and big data.


« March 2015