Oracle Support Master Note for AQ Queue Monitor Process (QMON) (Doc ID 305662.1)

Master Note for AQ Queue Monitor Process (QMON) (Doc ID 305662.1)

Copyright (c) 2011, Oracle Corporation. All Rights Reserved.

In this Document
  Purpose
  Scope and Application
  Master Note for AQ Queue Monitor Process (QMON)
     Queue Monitor - QMON
     QMON coordinator
     QMON tasks
     QMON Server Processes 
     Pre-10g QMON Architecture
     10g/11g QMON Architecture
     Significance of AQ_TM_PROCESSES Parameter
     Common Observations / Issues linked to QMON
      - PROCESSED Messages not being removed
      - TM Operations : Delay, Expiration, Retention not working as expected
      - Delay / WAIT Period Incorrect after Daylight Saving Time change
      - High CPU usage from QMON Coordinator process
      - Unexpected Growth in Queue Table Objects
     QMON Space Reclamation / Coalesce Queues
     Collecting Diagnostic Information for Troubleshooting QMON issues
  References


Applies to:

Oracle Server - Enterprise Edition - Version: 8.1.7.0 to 11.2.0.2 - Release: 8.1.7 to 11.2
Oracle Server - Standard Edition - Version: 9.2.0.1 to 11.2.0.1   [Release: 9.2 to 11.2]
Information in this document applies to any platform.

Purpose

In this article, we will discuss the following

1. Queue Monitor Coordinator (QMNC) , Qmon Server Processes (QXXX) and Task Operations which can be assigned to Servers. Collectively named QMON.

2. Known issues which affect these processes.

3. How to collect useful diagnostic information when problems arise with them.

Scope and Application

Database administrators of Advanced Queueing (AQ) and Streams databases.

Master Note for AQ Queue Monitor Process (QMON)

Queue Monitor - QMON

QMON is connected with Oracle Streams Advanced Queueing (AQ), Streams and a variety of other Database products which monitor and maintain system and user-owned AQ persistent and buffered objects. For example, the Oracle job scheduler uses AQ and serves as a client to various database components to allow operations to be coordinated at scheduled times and intervals. Similarly, Oracle Grid Control relies on AQ for its Alerts and Service Metrics and database server utilities such as datapump now use AQ. Furthermore, Oracle Applications has been using AQ for a significant period of time and this will continue.

QMON is primarily associated with the mechanisms for message expiration, retry, delay, maintaining queue statistics, removing PROCESSED messages from a queue table and updating the dequeue IOT as necessary.

QMON has a part to play in permanent and buffered message processing.

If a qmon process should fail, this should not cause the instance to fail. This is also the case with job queue processes.

QMON itself operates on queues but does not use a database queue for its own processing of tasks and time based operations.

QMON can be envisaged as a number of discrete tasks which are run by Queue Monitor processes or servers.

QMON coordinator

The coordinator was introduced in 10.1 and is responsible for allocating tasks to QMON processes. Some of these tasks are scheduled, time based activities whereas others are event driven.

In the case of buffered messaging in a RAC environment, if a RAC instance should fail, an existing QMON process will move ownership of the queues, where necessary to a new owning instance. This would be relevant in a Streams configuration for example where a primary / secondary instance is defined. As an aim of Streams is to maintain messages in memory when an instance is down, the processing of buffered messages has to be done on a surviving instance (a related Capture or Apply process would also need to be relocated); once the owning instance has been changed, QMON can then resume activity on the buffered queue on this instance.

Starting with 11.2.0.1, the coordinator is visible in GV$QMON_COORDINATOR_STATS.

QMON tasks

Tasks relate to a specific action which will be allocated to a QMON server process.

In 11.2.0.1, the view : GV$QMON_TASK_STATS shows all the tasks available at this version in addition to whether any errors may have been encountered in the task processing. The view shows details relating to the following tasks (based on columns : task_name and remark - as detailed in the Oracle Reference Guide) :

Task Name
Remark
QMON_PERSISTENT_TM
Persistent messages time manager activity
QMON_SPILL
Buffered messages spilling
QMON_DEALLOC_SPILLED
Spilled messages memory deallocation
QMON_DELETE_SPILLED
Dequeued spilled messages deletion
QMON_PURGE
Not specified
QMON_COMPUTE_ACKS
Acknowledgement update for a queue locally
QMON_FLUSH_STATS
Replay info table update
QMON_PROCESS_IPC
IPC message send and receive for queue operations
QMON_RECOVER_SPILLED
Spilled messages recovery on startup
QMON_PROP_MSGDELETE
Acknowledged buffered messages deletion
QMON_JOBCACHE_REPARTITION 
Queue table ownership change
QMON_PURGE_SPILLED
Purge spilled messages at startup
QMON_BUFFERED_TM_COORD
Buffered messages time manager activity check
QMON_BUFFERED_TM
Buffered messages time manager activity
QMON_QUEUE_SERVICE_START Start queue services at startup
Start queue services at startup
QMON_PURGE_REGISTRATION
Notification registration purge
QMON_RECOVER_EMON
EMON recovery at startup
QMON_ORPHANED_MSGDELETE Orphaned messages deletion
Orphaned messages deletion
QMON_SEND_ALTEROWNER
Non-owner persistent time manager activity send to owner
QMON_NONDURSUB_SESS_DEL
Session end nondurable subscriber delete
QMON_NONDURSUB_INST_DEL
Instance end Nondurable subscriber delete
QMON_DELETE_DEADREG
Notification delete registrations of dead locations


Note : Earlier versions may not have implemented all of the above .



The task list gives an impression of those operations the QMON process is responsible for. It can be gleaned that a significant number of the above are associated with activities such as cleanout of messages and housekeeping activities, i.e. it is more efficient on the performance of the foreground Application AQ process which is performing enqueue / dequeue operations that cleanout operations be handled in the background . TM (Time Management : delay , retry delay, expiration , retention) related activity is also handled by QMON server processes. e.g when an application enqueues a message with a delay period the message will only become available for dequeue once the delay period has elapsed and QMON has changed the state of the message to READY.

In 11.2.0.1, the view : GV$QMON_TASKS shows an indication of the tasks which are running or have been scheduled by QMON.

Some tasks can only be run on a single instance for a queue such as might be the case with buffered messaging ; others can be run (not at the same time) across multiple instances by different qmon processes. Some tasks are categorised as repeatable operations and are scheduled to run periodically; others are viewed as one time operations with no schedule - as detailed in GV$QMON_TASKS.

QMON Server Processes


These are Processes or Servers at the OS level which are associated with task work activities scheduled by the coordinator.

In 11.2.0.1, the view : GV$QMON_SERVER_STATS presents an indication of the server processes which are active.

Pre-10g QMON Architecture

The number of queue monitor processes was controlled via the dynamic initialisation parameter AQ_TM_PROCESSES. If this parameter is set to a non-zero value X, Oracle creates that number of QMNX processes starting from ora_qmn0_<SID> (where <SID> is the identifier of the database) up to ora_qmnX_<SID> ; if the parameter is not specified or is set to 0, then QMON processes are not created. There can be a maximum of 10 QMON processes running on a single instance. For example the parameter can be set in the init.ora as follows :

aq_tm_processes=1

or set dynamically via

alter system set aq_tm_processes=1;

10g/11g QMON Architecture

Beginning with release 10.1, the architecture of the QMON processes was changed to an automatically controlled coordinator / slave architecture. The Queue Monitor Coordinator, ora_qmnc_<SID>, dynamically spawns slaves named, ora_qXXX_<SID>, depending on the system load up to a maximum of 10 per instance. These server processes are outlined above.

Significance of AQ_TM_PROCESSES Parameter


For version 10.1 onwards it is no longer necessary to set AQ_TM_PROCESSES when Oracle Streams AQ or Streams is used. However, if you do specify a value, then that value is taken into account but the number of processes can still be auto-tuned and so the number of running qXXX processes can be different from what was specified by AQ_TM_PROCESSES.

It should be noted that if AQ_TM_PROCESSES is explicitly specified then the process(es) started will only maintain persistent messages. For example if aq_tm_processes=1 then at least one queue monitor slave process will be dedicated to maintaining persistent messages. Other process can still be automatically started to maintain buffered messages. If you explicitly set aq_tm_processes = 10 then there will be no processes available to maintain buffered messages. This should be borne in mind in environments which use Streams replication and from 10.2 onwards user enqueued buffered messages.

In addition you should never disable the Queue Monitor processes by setting aq_tm_processes=0 on a permanent basis. As can be seen above, disabling will stop all related processing in relation to tasks outlined. This will likely have a significant affect on operation of queues - PROCESSED messages will not be removed and any time related, TM actions will not succeed ; AQ objects will grow in size.

To check whether auto-tuning is enabled or aq_tm_processes=0 do the following:

connect / as sysdba

set serveroutput on

declare
 mycheck number;
 begin
  select 1 into mycheck from v$parameter where name = 'aq_tm_processes' and value = '0'
  and (ismodified <> 'FALSE' OR isdefault='FALSE');
  if mycheck = 1 then
  dbms_output.put_line('The parameter ''aq_tm_processes'' is explicitly set to 0!');
  end if;
  exception when no_data_found then
  dbms_output.put_line('The parameter ''aq_tm_processes'' is not explicitly set to 0.');
end;
/


The parameter should not be set to 0 explicitly. If it is , then it is recommended to unset the parameter. However, this requires bouncing the database. In the meantime, if the database cannot be immediately bounced, the recommended value to set it to is '1', and this can be done dynamically:

connect / as sysdba
alter system set aq_tm_processes = 1;

To unset the parameter:

When using a pfile:

Comment out or remove the aq_tm_processes entry, and restart the database.

When using a spfile:

connect / as sysdba
alter system reset aq_tm_processes scope=spfile sid='*';

and restart the database

Common Observations / Issues linked to QMON


The following outlines a number of commonly observed issues attributable to certain aspects of QMON operation or which may have an affect on QMON. Some cases outline specific steps to resolve and issue or detail steps to run to avoid issues connected with the issue.

Pertinent references are detailed with the intention of providing relevant context into what is being discussed.

 - PROCESSED Messages not being removed

If processed messages are not being cleaned out of queues once all subscribers have dequeued the message, this would suggest that QMON is not operating as expected : is the operation occurring at all or is it taking considerably longer than expected for this to occur.

This consequence of this may be the growth of queue table related objects.

Useful related references are : Note:251737.1 PROCESSED Messages remain in Queue Table after a Successful Dequeue , Note:378247.1 PROCESSED Messages not removed from Queue Table in a RAC database after Reconfiguration, Note:752708.1 Intermittently PROCESSED Messages are not removed from Queue Tables by the QMON Processes.

 - TM Operations : Delay, Expiration, Retention not working as expected

Are any of these Time Manager related features being used . The deferred processing of messages in these cases may require more processing than necessary . Is high CPU being observed which might suggest that something else is behind the problem. Something to consider as a general rule is that high CPU from a process might be typically connected with high buffer gets suggesting that a large object is being accessed, possibly with a Full Table Scan. In such a situation an AWR report and or 10046 / level 12 trace (as detailed below can identify the object) can identify the object. tkprof can then be used to summarise the exection plan as well as statistic information such as buffers accessed. Using retention has the affect of keeping messages for a longer period than they would be otherwise with the obvious knock on affect that queue table related objects will be larger.

Useful related references are as follows : Note:341133.1 Messages not changed from Wait To Ready State in a RAC database , Note:343282.1 CPU Consumption Of Queue Monitor Processes Increases when using Retention, Note:464514.1 Messages Enqueued With a Delay Specified to an Advanced Queue in a RAC Database Are Not Dequeued Immediately After the Delay Expires, Note:732743.1 Qmon Processes Are Not Removing Processed Messages or changing the state of WAITING messages.

 - Delay / WAIT Period Incorrect after Daylight Saving Time change

Following a change in DST, TM based activities may not occur when expected. The enq_time may not as expected and given that the wait time or delay is calculated relative to the enq_time this will have affect on the operation. The related fix referenced in the notes below does correct QMON activity.

This is outlined in Note:429630.1 - A Dequeue Condition fails to work properly after a Daylight Savings Time Change and Note:429681.1 - Casting AQ$QUEUE_TABLE Enqueue and Dequeue Time Values To SESSIONTIMEZONE causes Reporting and Message Processing issues.

 - High CPU usage from QMON Coordinator process

Ensure that aq_tm_processes is not set to 10. All the following refer to this same type of issue which manifests itself as high CPU from the Coordinator : Note:393781.1 , Note:604246.1 and Note:738873.1 all are linked to this issue.

 - Unexpected Growth in Queue Table Objects

First of all please refer to section : QMON Space Reclamation / Coalesce Queues.

QMON should perform periodic clean out of single consumer queue table indexes and coalesce multi consumer IOTs to ensure that space is reclaimed for AQ objects. If this does not work as expected, this can cause growth in these objects when there are actually few messages in the associated queues.

An initial analysis would be to consider enqueue / dequeue activity as well as how many references there are to messages in the queue before then determining the space used by the related objects :

- what is the throughput of messages in the queue - X messages per hour;
- are any of the TM features : delay, retry delay , expiration or retention being used;
- how many messages are currently in the queue (refer to queue table) :

select count(*), msg_state from aq$<queue_table> group by msg_state;
select count(*) from aq$_<queue_table>_i;
select count(*) from aq$_<queue_table>_l; (new in 11.2.0.1)
select count(*) from aq$_<queue_table>_h;
select count(*) from aq$_<queue_table>_t;
select count(*) from aq$_<queue_table>_p; (optional / spill / Streams related)
select count(*) from aq$_<queue_table>_d; (optional / spill / Streams related)

- then, for each of the above and their associated IOTs, determine the related space usage :

select sum(bytes)/1024/1024 MB from user_segments where segment_name='<object_name>';

The above is for a multi consumer queue ; a single consumer queue is simpler to look at as there is only the queue table and related indexes.

Note : If Streams related objects are large , this might be a valid Application issue , suggesting for example that Streams has spilled due to memory pressure possibly indicating some other problem.

If an IOT is particular are large, the following references may be useful : Note:394713.1 Index SYS_IOT_TOP_<N> on History IOT is very large / Qmn uses high CPU, Note:267137.1 QMON does not perform space management operations on the dequeue IOT in Locally Managed Tablespaces using ASSM or when using FREELIST GROUPs, Note:238272.1 Procedure to Manually Purge Messages from a Single-Consumer Queue when QMON fails to do it efficiently, Note:271855.1 Procedure to manually coalesce all the IOTs/indexes associated with Advanced Queueing tables to maintain Enqueue/Dequeue performance and reduce QMON CPU usage and Redo generation.

QMON Space Reclamation / Coalesce Queues

This is linked directly to the potential growth in AQ related objects in section Unexpected Growth in Queue Table Objects. As discussed in Note:271855.1 Procedure to manually coalesce all the IOTs/indexes associated with Advanced Queueing tables to maintain Enqueue/Dequeue performance and reduce QMON CPU usage and Redo generation , QMON does not service all related queue objects correctly until 11.2

Please consult this note and implement the script in your environment since it is probable that queues will have been created in ASSM tablespaces. As well as the space usage implications of this issue, the effect of implementing this procedure will likely be to improve the performance and effectiveness of QMON.

Collecting Diagnostic Information for Troubleshooting QMON issues


If the issue is not one which can be easily understood and addressed in the section : Common Observations / Issues linked to QMON then in an ideal situation troubleshooting any issue is easier to progress with a testcase.

In the absence of this the following are some useful diagnostic steps for troubleshooting QMON issues. Typically this will be a situation in which the QMON process(es) are consuming a large amount of CPU or processed messages are not being removed.

1. For CPU consumption issues sql trace the QMON process in question by doing the following

Determine the pid of the Queue Monitor process (either qmnc or q00*), call it X

sqlplus "/ as sysdba"
oradebug setospid X
oradebug unlimit
oradebug Event 10046 trace name context forever, level 12
--Generate trace for 20 minutes
oradebug Event 10046 trace name context off

Tkprof the raw sql trace file by following Note 232443.1. Provide both the raw trace file and tkprof output to Oracle Support.


2. For issues where a queue table is not being serviced in some way then the following may be useful:

Determine the pid of the Queue Monitor processes (either qmnc or q00*), call them X, Y, etc.

sqlplus "/ as sysdba"
oradebug setospid X
oradebug unlimit
oradebug Event 10046 trace name context forever, level 12
oradebug Event 10850 trace name context forever, level 10
--10852 only applies to 10.1 onwards
oradebug Event 10852 trace name context forever, level 32
--Generate trace for 20 minutes
oradebug Event 10046 trace name context off
oradebug Event 10850 trace name context off
oradebug Event 10852 trace name context off

Repeat this tracing for all the running Queue Monitor Coordinator and Queue Monitor slave processes.

Tkprof the raw sql trace file by following Note 232443.1. Provide both the raw trace file and tkprof output to Oracle Support.

3. For investigating issues with QMON processes in a RAC environment then the following trace events are also useful

oradebug Event 10852 trace name context forever, level 128

this traces queue table ownership changes and

Event = '26700 trace name context forever, level 256'

which traces inter-instance IPC communication.

Note that event 26700 has a different meaning in 9.2 and should not be used.
 

References

NOTE:232443.1 - * How to Identify Resource Intensive SQL for Tuning
NOTE:47318.1 - Init.ora Parameter "AQ_TM_PROCESSES" Reference Note

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

News and Troubleshooting tips for Oracle Database and Enterprise Manager

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today