Tuesday Aug 03, 2010

Gathering Stats for Workflow's Queue Tables

This blog post contributed by Ajay Verma.

Statistics gathering is an important activity in order to enable the cost based optimizer to take a decision on the execution plan to choose from the various possible ones with which a particular SQL statement can be run. It is recommended to gather the statistics for an object when it is in steady state, i.e. when the corresponding tables have a fair amount of data giving out statistics which would lead the best execution plan to be picked by CBO.

From 10g onwards , Oracle database introduced Automatic Optimizer Statistics Collection feature in which a scheduled job GATHER_STATS_JOB runs in the nights and gathers stats for tables with either empty or stale statistics using DBMS_STATS  package APIs. This indeed is a great feature but for highly volatile tables such as AQ/Streams tables  it is quite possible that when the stats collection job runs, these tables may not have the data that is representative of their full load period. One scenario can be stats collection on queue tables when they were empty, and the optimizer, as a result, choosing poor execution plans when they have a lot of data in them during normal workload hours.

As a direct fix to tackle the above issue, from 10gR2 database onwards, AQ table stats are getting locked immediately after its creation so that the auto stats collector doesn't change it in the future. This can be confirmed as below:

SQL> CREATE TYPE event_msg_type AS OBJECT   
                  (  name            VARCHAR2(10), 
                     current_status  NUMBER(5),  
                     next_status     NUMBER(5)  
                   ); 
Type created.
--Create Queue Table
SQL> EXEC DBMS_AQADM.create_queue_table(
              queue_table=>'test_queue_tab',
              queue_payload_type=>'event_msg_type')
PL/SQL procedure successfully completed. 
--Create Queue
SQL> EXEC DBMS_AQADM.create_queue(queue_name =>'test_queue',
                             queue_table=>'test_queue_tab');
PL/SQL procedure successfully completed.
--Check for locks on stats
SQL> select owner, table_name, stattype_locked
     from dba_tab_statistics
     where stattype_locked is not null
     and owner='APPS' and table_name='TEST_QUEUE_TAB';
OWNER       TABLE_NAME      STAT
---------- ---------------- ----- 
APPS        TEST_QUEUE_TAB   ALL 
--Gather stats
SQL> EXEC dbms_stats.gather_table_stats('APPS','TEST_QUEUE_TAB')
BEGIN dbms_stats.gather_table_stats('APPS','TEST_QUEUE_TAB'); END; 
* 
ERROR at line 1: 
ORA-20005: object statistics are locked (stattype = ALL) 
ORA-06512: at "SYS.DBMS_STATS", line 18408 
ORA-06512: at "SYS.DBMS_STATS", line 18429 
ORA-06512: at line 1

As we can see above , queue table stats are locked and hence further stats gathering fails. We can still go ahead and use force parameter provided in DBMS_STATS package to override any lock on statistics:

SQL>exec dbms_stats.gather_table_stats('APPS','TEST_QUEUE_TAB'
					      ,force=>TRUE)
PL/SQL procedure successfully completed.

The recommended use is to gather statistics with a representative queue message load and keep the stats locked to avoid getting picked up by auto stats collector.

What happens in Apps Instance?

From Oracle Apps 11i  (on top of Oracle Database 10gR2) instance onwards, the automatic statistics gathering job GATHER_STATS_JOB is disabled using $APPL_TOP/admin/adstats.sql , as can be confirmed below:

 
SQL> select job_name, enabled from DBA_SCHEDULER_JOBS 
WHERE job_name = 'GATHER_STATS_JOB';
JOB_NAME                       ENABLE
------------------------------ ------
GATHER_STATS_JOB               FALSE	

Oracle Apps provides separate concurrent programs which make use of procedures in FND_STATS package to gather statistics for apps database objects. FND_STATS is basically a wrapper around DBMS_STATS and is recommended by Oracle for stats collection in Oracle Apps environment because of the flexibility it provides in identifying the objects with empty and stale stats. The modification threshold ( % of rows used to estimate) is fixed to 10% in DBMS_STATS while it can vary from 0 to 100% incase of FND_STATS.

But FND_STATS doesn't provide the force parameter to override any lock on statistics and hence fails while gathering statistics for a queue table whose stats are locked :

 
SQL> select owner, table_name, stattype_locked ,last_analyzed
  2   from dba_tab_statistics where table_name='WF_DEFERRED'
  3  and owner='APPLSYS';
OWNER          TABLE_NAME      STATT LAST_ANAL
-------------- --------------- ----- ---------
APPLSYS        WF_DEFERRED     ALL   23-FEB-09
SQL> exec fnd_stats.gather_table_stats('APPLSYS','WF_DEFERRED');
BEGIN fnd_stats.gather_table_stats('APPLSYS','WF_DEFERRED'); END;
*
ERROR at line 1:
ORA-20005: object statistics are locked (stattype = ALL)
ORA-06512: at "APPS.FND_STATS", line 1505
ORA-06512: at line 1	

The same error is observed on running the Gather Table Statistics concurrent program for such queue tables. Gather Schema Statistics concurrent program in general skips the table incase any error is encountered while gathering its stats and hence the queue tables with locked stats are also skipped.

Hence currently ,  in order to carry out stat collection for Workflow queue tables whose stats are locked,the approach should be to unlock the queue table temporarily using DBMS_STAT's Unlock APIs when the table have a representative load for the correct stats to be gathered, gather the statistics using FND_STATS and lock it again.

SQL> begin
  2    dbms_stats.unlock_table_stats('APPLSYS','WF_DEFERRED');
  3    fnd_stats.gather_table_stats('APPLSYS','WF_DEFERRED');
  4    dbms_stats.lock_table_stats('APPLSYS','WF_DEFERRED');
  5  end;
  6  /
PL/SQL procedure successfully completed.
SQL> select owner, table_name, stattype_locked ,last_analyzed
  2  from dba_tab_statistics where table_name='WF_DEFERRED'
  3  and owner='APPLSYS';
OWNER            TABLE_NAME       STATT LAST_ANALYZED
---------------- ---------------- ----- -------------
APPLSYS          WF_DEFERRED      ALL   12-APR-09	

Another approach can be to unlock the WF queue table stats during the adpatch run itself as Oracle's automatic stats collection feature is disabled in Apps environments and the stats collection in general occurs through the Apps provided concurrent programs. What say?

Monday Aug 02, 2010

Looping within a Workflow Process

This blog post contributed by Ajay Verma.

While designing a workflow process there seems to be a general tendency to make the process as granular as possible. Each steps are broken into several simpler steps, which are executed as part of separate workflow activities. It does give a much finer understanding of the business processes to the end user.But sometimes this drive to 'make it simpler' can get us into different kinds of complexities.

Lets see this with one particular example:

Below is a simplified version of a workflow process part we came across recently:

Loop

      Fig : Process 1

Here, notifications were being sent using workflow looping wherein completion of an activity causes a transition to another activity that has already been completed earlier. 'Get Recipient' activity in this particular case was determining the suitable notification recipient based on certain business rules while 'Send Notification' activity simply made use of Notification System's Send() API to send the notification to the identified recipient. The loop ended when no more eligible recipient were found.

Two issues were observed in this particular set up:

  1. The process taking too long a time (>40 hours) to complete when notifications were being sent to around 35K users.
  2. The rate at which notifications were being created decreased as the time passed, i.e initially when the process started , around 12-16 notifications(for the respective recipient) got created per second and later this rate got reduced to one notification in 2 minutes.

The issue was reproducible in a separate instance for say 2500 notifications with same observations as mentioned above.It took more than 3 hours for the process to complete and the rate of notification creation did decreased with time.

Now see the below process extract wherein the two activities in Process 1 have been merged into a single activity:

no_loop

      Fig : Process 2

 

The activity simply does the job being carried out by those in Process 1 but within a PL/SQL loop rather than the workflow loop:

--------------------------------------------------------------

Loop Starts

    Get Recipient Logic

    Send Notification to the identified recipient

Loop ends

--------------------------------------------------------------

The process 2 completed for 2500 notifications in less than 2 minutes ! And there was no noticeable performance degradation in the rate at which notifications were created.

So , what made the difference?

In general , internally for each execution of a workflow loop , a number of extra DML statements will run while re-executing the same function activity again and again. And workflow never issue any intermediate commits while executing a given flow until it reaches the end of the flow or a blocking activity. Hence the overall DMLs in Process 1 would result in heavy usage of redo logs and rollback segments until a commit happens. The workflow engine in effect is unnecessarily stressed by the loop here leading to performance deterioration when this long running workflow loop can be easily avoided as done in Process 2.

At the end, from a developer perspective use your own judgement and common sense to decide on how granular your workflow process should be such that it doesn't hits us somewhere else as demonstrated above.

Workflow Engine vs Business Event System

Oracle Workflow has two major execution engines.

    • Workflow Engine
    • Workflow Business Event System

Here is a simple comparison of what they process and their associated background components.

Workflow Engine Workflow Business Event System
Executes workflow processes created using Windows based Workflow Builder client Executes subscriptions to business events registered using Event Manager in Workflow Administrator Web Applications Responsibility
Entry point foreground APIs are WF_ENGINE.CreateProcess and WF_ENGINE.StartProcess Entry point foreground API is WF_EVENT.Raise
Execution deferred to background by enqueuing message to AQ WF_DEFERRED_QUEUE_M Execution deferred to background by enqueuing message to AQ WF_DEFERRED
Entry point background API is WF_ENGINE.Background Entry point background API is WF_EVENT.Listen
AQ Payload is SYSTEM.WF_PAYLOAD_T AQ Payload is WF_EVENT_T
Background processing is done by Concurrent Program - FNDWFBG (Workflow Background Engine) Background processing is done by GSC Component - Workflow Deferred Agent Listener
Background Engine is submitted as recurring concurrent request from SRS form or Workflow Manager in OAM Agent Listener is a service component managed through Workflow Manager in OAM

This blog post is as a result of confusion about what AQ and corresponding background process comes into picture when troubleshooting a given problem. For example,

  • When troubleshooting issues with Business Event System, users verify that the Workflow Background Engine is running.
  • When troubleshooting deferred workflow processes, users verify that the Workflow Deferred Agent Listener is running.

It is important to understand the two processing engines in Oracle Workflow, the supporting background components and how these two engines integrate with each other.

PLSQL vs Java Business Event System

Oracle Workflow provides Business Event System implementation within the database (PLSQL) and in the middle tier (Java). The implementation is exactly the same in terms of the event subscription processing in both these layers but the only difference is how the Developer wants to leverage Business Event System's capabilities for event processing requirements. With the availability of Business Event System implementation in PLSQL and Java, different subscription processing scenarios can be achieved.

Subscription Scenarios

    1. Execute PLSQL rule function synchronously or asynchronously. Oracle Workflow provides a bunch of default rule functions in WF_RULE package for your use
    2. Execute Java rule function synchronously or asynchronously. Oracle Workflow provides a default rule function oracle.apps.fnd.wf.bes.WebServiceInvokerSubscription to invoke web services
    3. Launch workflow process
    4. Send event to an agent (AQ)

Event Raise APIs

Workflow Business Events can be raised using one of two following ways.

    1. PLSQL API WF_EVENT.Raise
    2. Java method oracle.apps.fnd.wf.BusinessEvent.raise

Testing Business Events

Starting 12.1.1 and 11.5.10.ATG RUP7 releases, Test Business Event page in Workflow Administrator Web Application responsibility is enhanced to test the business event using both PLSQL and Java Raise APIs.

testevent.png

 

Raise in PLSQL

Clicking button "Raise in PLSQL" for a business event results in the PLSQL API WF_EVENT.Raise being used to raise the event. This results in following behavior for different subscription definitions.

  1. Execute PLSQL subscriptions with phase < 100 in the same session
  2. Enqueue PLSQL subscriptions with phase >= 100 to WF_DEFERRED queue
  3. Enqueue ALL Java subscriptions regardless of phase to WF_JAVA_DEFERRED queue

Raise in Java

Clicking button "Raise in Java" for a business event results in the Java method oracle.apps.fnd.wf.BusinessEvent.raise() being used to raise the event. This results in following behavior for different subscription definitions.

  1. Execute JAVA subscriptions (such as WebServiceInvokerSubscription) with phase < 100 in the same session
  2. Enqueue JAVA subscriptions with phase >= 100 to WF_JAVA_DEFERRED queue
  3. Execute PLSQL subscriptions with phase < 100 in the same session
  4. Enqueue PSLQL subscriptions with phase >= 100 to WF_JAVA_DEFERRED queue

Details of different combinations of subscription types and phases and how the subscription is executed based on where it is raised from is discussed in next section.

Verifying Subscription Execution

As long as the call to the Event Raise APIs does not throw an error, the event is raised successfully. There is a wrong notion among developers and administrators that all the events raised are enqueued to an agent (AQ) such as WF_DEFERRED or WF_JAVA_DEFERRED. No, it is not enqueued to an agent always, but it is driven based on how the subscription is defined. Now how are the event subscriptions dispatched and how to check the status of a subscription execution?

Subscriptions are either synchronous (phase < 100) or asynchronous (phase >= 100) with respect to the layer it is raised such as either PLSQL or Java and that layer's ability to dispatch the subscriptions. Following table gives an idea about how different subscriptions are dispatched.

 

 

 

Scenario Event raised in Subscription type Subscription phase Agent (AQ) Dispatched by
1 PLSQL PLSQL >=100 WF_DEFERRED Workflow Deferred Agent Listener
2 PLSQL PLSQL <100 None Same session as WF_EVENT.Raise
3 PLSQL Java Any WF_JAVA_DEFERRED Workflow Java Deferred Agent Listener
4 Java Java >=100 WF_JAVA_DEFERRED Workflow Java Deferred Agent Listener
5 Java Java <100 None Same session as BusinessEvent.raise()
6 Java PLSQL >=100 WF_JAVA_DEFERRED Workflow Java Deferred Agent Listener
7 Java PLSQL <100 None Same session as BusinessEvent.raise()

 

About

This blog is dedicated to bring latest information on Oracle Workflow new features, best practices, troubleshooting techniques and important fixes directly from it's Development Team.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today