Wednesday Jul 18, 2012

OWB 11gR2 – Parallelization using DBMS_PARALLEL_EXECUTE

As well as all of the parallel query and DML capabilities exposed in OWB, the 11gR2 release of OWB includes out of the box support for the DBMS_PARALLEL_EXECUTE package in 11gR2 of the Oracle database, there are various articles on how to do the grunt work manually coding it on the web (see here for example). The use cases for this range from Oracle techie stuff like managing roll back segments for large table updates to parallelizing your PLSQL mappings…whatever they may be. Here we will see how with a couple of configuration settings in OWB you can self parallelize mappings – divide and conquer the olde fashioned way.

The use case that is generally mentioned for using this is for doing large table updates – basically its dividing a BIG problem into smaller pieces, the package also exposes a resume capability for reprocessing failed chunks which is also exposed from OWB (when you execute a chunked mapping you can resume or start a mapping). The approach can also be utilized for parallelizing arbitrary PLSQL (support for set based, row based also the table being chunked can be the source or the target or an arbitrary SQL statement). The patch is best for this.

Chunk by SQL Statement

The example below updates employees salaries, driving the parallel worker threads by the departments table. As you see below we the start and end department id are the same, we are doing a distinct, so all employees in a department will be updated in each chunk, so we can increase the salary within the EMPLOYEES table for each department in a chunk. We are using the department id to drive the chunking – each department if processed in its own thread. We will see how we can control the thread pool of workers.

Note above there are other options such as chunk by ROWID and an arbitrary SQL statement).

The mapping now using olde fashioned divide and conquer has new runtime properties which relate back to the DBMS_PARALLEL_EXECUTE package and include the resume functionality for processing failed chunks and the parallel level, so you can change at runtime whether 1,2 or n threads are used to process the chunks.

We are saying that we want 2 child processes processing the chunks, this is a runtime parameter. Its wise to think about the number of chunks and the number of child processes to ensure optimal execution plan. The image below depicts the code that gets generated with chunk parallel level 2, and parallel level 4 – essentially the package ‘main’ procedure uses DBMS_PARALLEL_EXECUTE to process the heart of the mapping in the child processes.

There is much more to the package than meets the eye the resume capability for example provides a way of reprocessing failed chunks, rather than reprocessing everything again. This is also exposed as a runtime parameter for the chunked OWB mapping so you can resume and reprocess only the failed chunks.

Chunk by ROWID

This is the classic Large Table Update example that typical divide and conquer is used for. This example updates employees salaries, driving the parallel worker threads by rows in the target employees table itself. With this example when we configure the mapping we pick chunk method as ROWID, the chunk table EMPLOYEES, chunk type BY_ROWS and size is 1000 for example. The EMPLOYEES table is set to perform and update, I define the target filter for update for the ON clause in the merge/update – so its the ‘how do I identify the rows to update within this chunk’ clause. The other thing I had to do was define a chunking expression – now in this case its kind of redundant since the chunk is being done in the ON clause of the update – so we can trick OWB by just saying ‘start=start and end=end’ using the variables. If you don’t specify this, OWB will complain that the chunking expression is invalid.

So the MERGE statement within the chunking code will increase the SALARY for the chunk, you will see the expression to increase salary by, the dummy chunking expression used in selecting from the source (DUAL) and which rows to match – the rows in this chunk.

This let’s us perform large table updates in chunks and drive the parallelized mapping using mapping input parameters.

The parallelization for PLSQL (or row based mappings) is an interesting case - for example for the likes of match merge which has an inherent divide and conquer strategy (in binning), with match merge out of the box the processing of the bins is serialized by default. Combining the chunk parallelization with the match merge binning lets you boost the performance of such mappings. So if you pair the chunking criteria with the binning you can substantially increase the performance of such mappings.


ETL, CDC, Real-Time DI and Data Quality for the Oracle Database from the inside.


« July 2012 »