X

Welcome to All Things Warehouse Builder

  • ETL
    November 3, 2011

Parallel Processing with DBMS_PARALLEL_EXECUTE

David Allan
Architect

Here is another illustration of some of the powerful capabilities of the DBMS_PARALLEL_EXECUTE package in the Oracle database, carrying on from the earlier post here. One of the comments from that post was on how to insert into a different named table within each chunk and that insert can perform parallel DML also. This kind of scenario could be interesting for very high end processing, it could be end point target tables or tables that are prepared and then you perform partition exchanges with them or something.

The image below shows a variation on the original post where rather than inserting into a specific partition, you write into a specific table.

Driving the whole process can be your own chunking criteria, the question was how to drive this process from a table using SQL such as ‘select distinct level_key, level_key from chunk_table’ where chunk_table has the level_key and the target table name. For example it could contain the following data;


level_keytable_name
1sales_level1
2sales_level2
3sales_level3
4sales_level4
 

So the first chunk with level_key 1 will write the results to table sales_level1 etc.

You can use the DBMS_PARALLEL_PACKAGE as follows to create this functionality. The start/end values have to be of data type NUMBER, so you will have to lookup the (target) table name inside your PLSQL block within the statement provided in the run_task call.

This block has the query to determine the chunks .....

begin
   begin
     DBMS_PARALLEL_EXECUTE.DROP_TASK(task_name => 'TASK_NAME');
   exception when others then null;
   end;
   DBMS_PARALLEL_EXECUTE.CREATE_TASK(task_name => 'TASK_NAME');
   DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_SQL(task_name => 'TASK_NAME',
   sql_stmt =>'select distinct level_key, level_key from chunk_table', by_rowid => false);
end;

Then the next block will construct the and process the tasks......

begin
   DBMS_PARALLEL_EXECUTE.RUN_TASK (task_name => 'TASK_NAME',
     sql_stmt =>'declare
       s varchar2(16000); vstart_id number := :start_id; vend_id number:= :end_id;
       table_name varchar2(30);
       begin
         select table_name into table_name from chunk_table where level_key=vstart_id;
         s:=''insert into ''||table_name||'' select /*+ PARALLEL(STG, 8) */ colx from STAGING_TABLE STG
           where level_key =:vstart_id'';
         execute immediate s using vstart_id;
         commit;
     end;',
     language_flag => DBMS_SQL.NATIVE, parallel_level => 2 );
end;

The anonymous PLSQL block can be any arbitrary code, you can see the table name of the target is retrieved, the example does a parallel INSERT using the hint PARALLEL(STG,8). Anyway, good to share.

Join the discussion

Comments ( 9 )
  • guest Tuesday, November 8, 2011

    Neat to see ongoing information on OWB-- I think the surface has still barely been scratched by most users. Any chance we can get an overview of what was new in the latest 11.2 patch? I heard there are some new features.


  • David Tuesday, November 8, 2011

    There are summarized in the 11.2.0.3 new features documentation for the Oracle Database (linked below), here is a summary;

    Use of Preprocessor in External Table

    Use of the ORACLE_DATAPUMP access driver in External Table

    Table Compression - supports all the Oracle Database 11gR2 table compression options (table and partition level) for OWB Tables.

    Partitioning Options - supports all the partitioning options of the Oracle Database 11gR2.

    Table partitions in OWB Mapping - Mapping table and materialized view Operators have been enhanced to allow specifying partitions.

    Column Encryption - supports column encryption applied to Table or External Table.

    Compiled PL/SQL - supports PL/SQL native compilation in Mapping and Transformation.

    Control Center Agent migration to WebLogic

    Database failover - introduces official failover support for OWB-EE customers.

    Support of Bulk Code Template Mappings - provides bulk data movement code templates.

    Multi-instance implementation - supports multiple Control Center Services running in different database instances created from the same database Oracle_Home

    There is a summary in the Oracle Database 11.2.0.3 New Features Guide;

    http://download.oracle.com/docs/cd/E11882_01/server.112/e22487/chapter1_11203.htm

    There is also a detailed note on each on the Oracle support site, under note 1348291.1.

    Cheers

    David


  • guest Tuesday, January 22, 2013

    Hi David,

    I tried to use this method but facing some issues. Kindly help.

    I created a task and created the chunks by rowid using :

    begin

    2 dbms_parallel_execute.create_task('XYX');

    3 dbms_parallel_execute.create_chunks_by_rowid

    4 ( task_name => 'XYX',

    5 table_owner => 'APPS',

    6 table_name => 'BIG_TABLE',

    7 by_row => false,

    8 chunk_size => 10000 );

    9 end;

    10 /

    after this i was able to see the table chunks in dba_parallel_execute_chunks.

    However when i execute the task using

    begin

    2 dbms_parallel_execute.run_task

    3 ( task_name => 'XYZ',

    4 sql_stmt => 'begin populate_emp( :start_id, :end_id ); end;',

    5 language_flag => DBMS_SQL.NATIVE,

    6 parallel_level => 4 );

    7 end;

    8 /

    NOTE: populate_emp( :start_id, :end_id ) is used to insert data of "big_table" to "emp" table.

    the procedure executes successfully but nothing is inserted to the emp table. Also the status in the dba_parallel_execute_chunks is supposed to change to "PROCESSED" however it remains "UNASSIGNED". Please let me know if I am missing something. I also checked if the populate_emp procedure is in valid state or not and found it in valid state only. When i run the procedure without dbms_parallel_execute, it works. Kindly help.

    Also i have a little confusion as to how the values to :start_id, :end_id are being passed. Do I need to replace it with the actual row_ID's or let it be like this. And what does DBMS_SQL.NATIVE stands for.

    Thanks in advance.

    -Ankit


  • David Tuesday, January 22, 2013

    Hi Ankit

    The code to be executed for each chunk will have the :start_id and :end_id bind variables replaced for each chunk executed.

    The fact that the chunk is UNASSIGNED should be researched, it sounds like the RUN_TASK command is not even being executed, you should ensure this is being executed. If you are running from ODI check the operator log. Best to get a feel for the API by testing in SQLPlus or wherever then try within ODI.

    The You are using chunk by rowid, so you should ensure your procedure is also using rowid in its implementation, see the rowid example below;

    http://docs.oracle.com/cd/E11882_01/appdev.112/e25788/d_parallel_ex.htm#CHDGCHFA

    DBMS_SQL.NATIVE is the standard/default, its possible to execute with older behavior;

    http://docs.oracle.com/cd/E11882_01/appdev.112/e25788/d_sql.htm#i1027592

    Cheers

    David


  • guest Friday, March 1, 2013

    in

    dbms_parallel_execute.run_task

    ( task_name => <tasKname>>,

    sql_stmt => 'begin stored_procedure( :start_id, :end_id ); end;',

    language_flag => DBMS_SQL.NATIVE,

    parallel_level => 4 );

    end;

    how to send ...any other bind variables to sql_stmt

    along with mandatory :start_id and :end_id placeholder...


  • David Thursday, March 7, 2013

    The SQL statement is executed by the chunking engine, the only way to get dynamic content at execution time is either via these bind variables or via your own code - read information from somewhere and dynamically execute code.

    If you are talking about dynamically constructing the statement and not runtime bind variablesm you can concatenate and construct the string.

    Cheers

    David


  • Jean Thursday, February 1, 2018
    Hi,

    I had tried inserting into a table, as follows:

    BEGIN
    dbms_parallel_execute.run_task ( task_name => 'task1', sql_stmt => 'DECLARE
    l_start_id ROWID := :START_ID;
    l_end_id ROWID:= :END_ID;
    l_cnt NUMBER;
    BEGIN
    INSERT /*+ append */
    INTO t1
    (
    /* column list */
    )
    ( SELECT /* column list */
    FROM cii ,
    msib
    WHERE 1 =1
    AND /* join conditions and other conditions*/
    );
    COMMIT;
    END;',
    language_flag => DBMS_SQL.NATIVE,
    parallel_level => 5 );
    END;
    /

    But the table got populated with 177 million rows (~ 4 times the distinct rows), where only 44 million rows were distinct and the rest were redundant. The above code executed for more than 24 hours and then I killed it. How can I avoid redundant data through this approach.
  • Jean Thursday, February 1, 2018
    Hi,

    I had tried inserting into a table, as follows:

    BEGIN
    dbms_parallel_execute.run_task ( task_name => 'task1', sql_stmt => 'DECLARE
    l_start_id ROWID := :START_ID;
    l_end_id ROWID:= :END_ID;
    l_cnt NUMBER;
    BEGIN
    INSERT /*+ append */
    INTO t1
    (
    /* column list */
    )
    ( SELECT /* column list */
    FROM cii ,
    msib
    WHERE 1 =1
    AND /* join conditions and other conditions*/
    );
    COMMIT;
    END;',
    language_flag => DBMS_SQL.NATIVE,
    parallel_level => 5 );
    END;
    /

    But the table got populated with 177 million rows (~ 4 times the distinct rows), where only 44 million rows were distinct and the rest were redundant. The above code executed for more than 24 hours and then I killed it. How can I avoid redundant data through this approach.
  • David Wednesday, March 14, 2018
    Hi Jean

    Hope you resolved this? Was it a problem with the SQL and the bind variables?

    Cheers
    David
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.