X

Insights into Statistics, Query Optimization and the Oracle Optimizer

Concurrent Statistics Gathering


Gathering optimizer statistics is one of life's necessary evils even if it can take an extremely long time to complete. In this blog post, we discuss one remedy to improve the efficiency of statistics gathering.

Introduction

Oracle Database 11g Release 2 (11.2.0.2) introduces a new statistics gathering mode, 'concurrent statistics gathering'. The goal of this new mode is to enable a user to gather statistics on multiple tables in a schema (or database), and multiple (sub)partitions within a table concurrently. Gathering statistics on multiple tables and (sub)partitions concurrently can reduce the overall time it takes to gather statistics by allowing Oracle to fully utilize a multi-processor environment.

Concurrent statistics gathering is controlled by a global preference, CONCURRENT, in the DBMS_STATS package. The CONCURRENT preference is of type boolean, so it can be set to TRUE or FALSE. By default it is set to FALSE.  When CONCURRENT is set to TRUE, Oracle employs Oracle Job Scheduler and Advanced Queuing components to create and manage multiple statistics gathering jobs concurrently.

If you call dbms_stats.gather_table_stats on a partitioned table when CONCURRENT is set to true, Oracle will create a separate statistics gathering job for each (sub)partition in the table. The Job Scheduler will decide how many of these jobs will execute concurrently, and how many will be queued based on available system resources. As the currently running jobs complete, more jobs will be dequeued and executed until all (sub)partitions have had their statistics gathered on them.

If you gather statistics using dbms_stats.gather_database_stats,  dbms_stats.gather_schema_stats, or dbms_stats.gather_dictionary_stats, then Oracle will create a separate statistics gathering job for each non-partitioned table, and each (sub)partition for the partitioned tables. Each partitioned table will also have a coordinator job that manages its (sub)partition jobs. The database will then run as many concurrent jobs as possible, and queue the remaining jobs until the executing jobs complete. However, multiple partitioned tables are not allowed to be processed simultaneously to prevent possible deadlock cases. Hence, if there are some jobs running for a partitioned table, other partitioned tables in a schema (or database or dictionary) will be queued until the current one completes. There is no such restriction for non-partitioned tables. The maximum number of concurrent statistics gathering jobs is bounded by the job_queue_processes initialization parameter (per node on a RAC environment) and the available system resources.

The following figure illustrates the creation of jobs at different levels, where Table 3 is a partitioned table, while other tables are non-partitioned tables. Job 3 acts as a coordinator job for Table 3, and creates a job for each partition in that table, as well as a separate job for the global statistics of Table 3.

Job Creation.small3.png
Figure
1: 
Job Scheduling during Optimizer
Statistics Gathering in Oracle 11g R2

As another example, assume that the parameter job_queue_processes is set to 32, and you issued a dbms_stats.gather_schema_stats on the SH schema. Oracle would create a statistics gathering job (Level 1 in Figure 1) for each of the non-partitioned tables;

  • SUPPLEMENTARY_DEMOGRAPHICS
  • COUNTRIES
  • CUSTOMERS
  • PROMOTIONS
  • CHANNELS
  • PRODUCTS
  • TIMES
And, a coordinator job for each partitioned table, i.e., SALES and COSTS, in turn creates a statistics gathering job for each of partition in SALES and COSTS tables, respectively (Level 2 in Figure 1). Then, the Oracle Job Scheduler would allow 32 statistics gathering jobs to start, and would queue the rest (assuming that there are sufficient resources for 32 concurrent jobs). Suppose that 29 jobs (one for each partition + a coordinator job) for the SALES table get started, then three non-partitioned table statistics gathering jobs would also be started. The statistics gathering jobs for the COSTS table will be automatically queued, because only for one partitioned table can be processed at any one time. As each job finishes, another job will be dequeued and started, until all 63 jobs have been completed.

Configuration and Settings


In Oracle Database 11.2.0.2, the concurrency setting for statistics gathering is turned off by default. It can be turned on using the following command.

Begin

DBMS_STATS.SET_GLOBAL_PREFS('CONCURRENT','TRUE');

End;

/


You will also need some additional privileges above and beyond the regular privileges required to gather statistics. The user must have the following Job Scheduler and AQ privileges:

  • CREATE JOB
  • MANAGE SCHEDULER
  • MANAGE ANY QUEUE

The SYSAUX tablespace should be online, as the Job Scheduler stores its internal tables and views in SYSAUX tablespace.

Finally the job_queue_processes parameter should be set to at least 4. If you want to fully utilize all of the system resources during the statistics gathering process but you don't plan to use parallel execution you should set the job_queue_processes to 2* total number of CPU cores (this is a per node parameter in a RAC environment).Please make sure that you set this parameter system-wise (i.e., ALTER SYSTEM ... or in init.ora file) rather than at the session level (i.e., ALTER SESSION).

Using Concurrent Statistics Gathering with Parallel Execution



When using concurrent statistics gathering it is still possible to have each individual statistics gather job execute in parallel. This combination is normally used when the objects (tables or (sub)partitions) being analyzed are large. If you plan to execute the concurrent statistics gathering jobs in parallel you should disable the parallel adaptive multi-user initialization parameter. That is;

Alter system set parallel_adaptive_multi_user=false;

It is also recommended that you enable parallel statement queuing. This requires Resource Manager to be activated (if not already), and the creation of a temporary resource plan where the consumer group "OTHER_GROUPS" should have queuing enabled. By default, Resource Manager is activated only during the maintenance windows. The following script illustrates one way of creating a temporary resource plan (pqq_test), and enabling the Resource Manager with this plan.

-- connect as a user with dba privileges

begin

  dbms_resource_manager.create_pending_area();

  dbms_resource_manager.create_plan('pqq_test', 'pqq_test');

  dbms_resource_manager.create_plan_directive(

        'pqq_test',

        'OTHER_GROUPS',

        'OTHER_GROUPS directive for pqq',

        parallel_target_percentage =>
90);

  dbms_resource_manager.submit_pending_area();

end;

/

ALTER SYSTEM SET RESOURCE_MANAGER_PLAN = 'pqq_test' SID='*';

Monitoring Concurrent Statistics Gathering Jobs

You can use the standard database scheduler views to monitor the concurrent statistics gathering jobs. The comments field of a job in dba_scheduler_jobs shows the target object for that statistics gathering job in the form of owner.table.(sub)partition. All statistics collection job names start with 'ST$' for easy identification. The jobs whose name start with ST$T###_### are created by a coordinator job for a partitioned table, and works on a partition or subpartition of the table.

The jobs whose name begin with ST$SD###_### are created for a table in a schema or database, and either works as a coordinator for its partition level jobs (if the table is partitioned), or directly performs the statistics collection for the table (if the table is not partitioned).

Finally, those with ST$D###_### in their naming are created for dictionary level tasks (when gather_dictionary_stats is used), and jobs does similar tasks as those with SD in their names.

Using the following query you can see all of the concurrent statistics gathering jobs that have been created:

select job_name, state, comments

from dba_scheduler_jobs

where job_class like 'CONC%'


To only see the currently running jobs, filter
by the job state:

select job_name, state, comments

from dba_scheduler_jobs

where job_class like 'CONC%'

and state = 'RUNNING';


Similarly, to see the scheduled (i.e., waiting to run) jobs you just need to change the state:

select job_name, state, comments

from dba_scheduler_jobs

where job_class like 'CONC%'

and state = 'SCHEDULED';


It is also possible to see the elapse time for the currently running statistics gathering jobs:

select job_name, elapsed_time

from dba_scheduler_running_jobs

where job_name like 'ST$%';


Known Limitations


Concurrent statistics gathering does not use any automated way to determine the maximum number of jobs that can concurrently run based on the system load and resources. Concurrent statistics gathering solely relies on the Job Scheduler and the value of the job_queue_processes parameter.
Session-wide parameter settings and events are not transferred to the newly created jobs (scheduler limitation).
Indexes are not gathered concurrently.

Join the discussion

Comments ( 12 )
  • Rakesh Tripathi Tuesday, March 29, 2011
    In an E-Business Suite scenario, Oracle recommend executing specific package - FND_STATS (via a wrapper Concurrent Program, Gather Schema Statistics) and not the database package - DBMS_STATS.
    Do we have such concurrency feature available on FND_STATS (via some patch) as well to optimize on execution timings of Gather Schema Statistics program ?
    Regards,
    Rakesh
  • Maria Colgan Thursday, March 31, 2011
    Hi Rakesh,
    Unfortunately I am not currently aware of a patch to FND_STATS that allows you to take advantage of concurrency in E-Business Suite. But I am sure EBS guys will but add the capability soon.
  • Hans Hufstetler Monday, April 25, 2011
    I continually visit your blog and retrieve everything you post here but I never commented but these days when I saw this post, i couldn’t stop myself from commenting here. great mate!
  • szkolenia bhp Wednesday, April 27, 2011
    Interesting post, THX
  • guest Thursday, December 1, 2011

    Under Known limitations for 11.2, its stated "Indexes are not gathered concurrently"

    If we use the concurrent method, don't we run the risk of stats for data being updated before the index is...?

    This matters to us, because, when you have a multi TB DB and are performing nighly loads getting these out of sync can effect execution plans and performance.

    So is this one of the recommended testing scenerio's we need to take into account before jumping on board the concurrent gather bandwagon?

    Thoughts?


  • Hemant Bhavsar Monday, January 23, 2012

    11g has incremental statistics for partition tables. Try using that.


  • guest Wednesday, April 11, 2012

    CONCURRENT stats gathering does not honor schema level preference for GRANULARITY set to GLOBAL.

    In other words, if I set GRANULARITY to GLOBAL for a particular schema and set CONCURRENT to TRUE at the global level, stats are gathered at partition level anyhow. Ours is an OLTP system and we don't want to gather stats at the partition level since we don't employ partition pruning and use only global indexes.

    Maybe CONCURRENT stats are not to be used in our case?


  • guest Wednesday, April 11, 2012

    CONCURRENT also doesn't honor METHOD_OPT. I had set METHOD_OPT to FOR ALL COLUMNS SIZE 1. The concurrent stats job gathered histograms on all the tables. So I guess there are a few of these bugs that need to be resolved before we can use CONCURRENT. I am using 11.2.0.3.


  • Maria Colgan Wednesday, April 18, 2012

    Hi Prashant,

    I couldn't reproduce the problem you described with CONCURRENT not honoring METHOD_OPT. I ran the following script as my test, can you tell me how your test differed?

    Thanks,

    Maria

    SQL> conn sh/sh

    Connected.

    SQL> select dbms_stats.get_prefs('CONCURRENT') pref from dual;

    PREF

    ------------------------------

    TRUE

    SQL> exec dbms_stats.set_schema_prefs('SH', 'METHOD_OPT', 'FOR ALL COLUMNS SIZE 1');

    PL/SQL procedure successfully completed.

    SQL> col pref format a30

    SQL> select dbms_stats.get_prefs('METHOD_OPT', 'SH', 'SALES') pref from dual;

    PREF

    ------------------------------

    FOR ALL COLUMNS SIZE 1

    SQL> exec dbms_stats.delete_schema_stats(null);

    PL/SQL procedure successfully completed.

    SQL> col column_name format a25

    SQL> select table_name, column_name, histogram from user_tab_col_statistics order by 1, 2;

    no rows selected

    SQL> exec dbms_stats.gather_schema_stats(null);

    PL/SQL procedure successfully completed.

    SQL> select table_name, column_name, histogram from user_tab_col_statistics where table_name = 'SALES' order by 1, 2 ;

    TABLE_NAME COLUMN_NAME HISTOGRAM

    ------------------------- ------------------------- ---------------

    SALES AMOUNT_SOLD NONE

    SALES CHANNEL_ID NONE

    SALES CUST_ID NONE

    SALES PROD_ID NONE

    SALES PROMO_ID NONE

    SALES QUANTITY_SOLD NONE

    SALES TIME_ID NONE

    7 rows selected.


  • Maria Colgan Wednesday, April 18, 2012

    Hi Prashant,

    I wanted to respond to your recent comments that "CONCURRENT stats gathering does not honor schema level preference for GRANULARITY set to GLOBAL.

    In other words, if I set GRANULARITY to GLOBAL for a particular schema

    and set CONCURRENT to TRUE at the global level, stats are gathered at

    partition level anyhow. Ours is an OLTP system and we don't want to

    gather stats at the partition level since we don't employ partition

    pruning and use only global indexes. Maybe CONCURRENT stats are not to be used in our case?"

    I tried to reproduce the behavior you reported using the SH sample schema. However, I was unable to reproduce the reported behavior.

    Here is the test case I used for this one. Can you let me know what you tested that was different to this?

    Thanks,

    Maria

    SQL> conn / as sysdba

    Connected.

    SQL> grant dba to sh;

    Grant succeeded.

    SQL> conn sh/sh

    Connected.

    SQL> exec dbms_stats.set_global_prefs('CONCURRENT', 'TRUE');

    PL/SQL procedure successfully completed.

    SQL> col pref format a10

    SQL> col table_name format a25

    SQL> col partition_name format a20

    SQL> col last_analyzed format a30

    SQL> set lines 200 pages 10000

    SQL> select dbms_stats.get_prefs('CONCURRENT') pref from dual;

    PREF

    ----------

    TRUE

    SQL> exec dbms_stats.set_schema_prefs('SH', 'GRANULARITY', 'GLOBAL');

    PL/SQL procedure successfully completed.

    SQL> select dbms_stats.get_prefs('GRANULARITY', 'SH', 'SALES') pref from dual;

    PREF

    ----------

    GLOBAL

    SQL> select dbms_stats.get_prefs('GRANULARITY', 'SH', 'COSTS') pref from dual;

    PREF

    ----------

    GLOBAL

    SQL> exec dbms_stats.delete_schema_stats(null);

    PL/SQL procedure successfully completed.

    SQL> select table_name, partition_name, cast(last_analyzed as timestamp) last_analyzed from user_tab_statistics order by 3, 1, 2;

    TABLE_NAME PARTITION_NAME LAST_ANALYZED

    ------------------------- -------------------- ------------------------------

    CAL_MONTH_SALES_MV

    CHANNELS

    COSTS COSTS_Q1_1998

    COSTS COSTS_Q1_1999

    COSTS COSTS_Q1_2000

    COSTS COSTS_Q2_1998

    COSTS COSTS_Q2_1999

    COSTS COSTS_Q2_2000

    COSTS COSTS_Q3_1998

    COSTS COSTS_Q3_1999

    COSTS COSTS_Q3_2000

    COSTS COSTS_Q4_1998

    COSTS COSTS_Q4_1999

    COSTS COSTS_Q4_2000

    COSTS

    COUNTRIES

    CUSTOMERS

    DIMENSION_EXCEPTIONS

    FWEEK_PSCAT_SALES_MV

    MV_CAPABILITIES_TABLE

    PLAN_TABLE

    PRODUCTS

    PROMOTIONS

    REWRITE_TABLE

    SALES SALES_1995

    SALES SALES_1996

    SALES SALES_H1_1997

    SALES SALES_H2_1997

    SALES SALES_Q1_1998

    SALES SALES_Q1_1999

    SALES SALES_Q1_2000

    SALES SALES_Q2_1998

    SALES SALES_Q2_1999

    SALES SALES_Q2_2000

    SALES SALES_Q3_1998

    SALES SALES_Q3_1999

    SALES SALES_Q3_2000

    SALES SALES_Q4_1998

    SALES SALES_Q4_1999

    SALES SALES_Q4_2000

    SALES

    SALES_TRANSACTIONS_EXT

    TIMES

    43 rows selected.

    SQL> exec dbms_stats.gather_schema_stats(null);

    PL/SQL procedure successfully completed.

    SQL> select table_name, partition_name, cast(last_analyzed as timestamp) last_analyzed from user_tab_statistics order by 3, 1, 2;

    TABLE_NAME PARTITION_NAME LAST_ANALYZED

    ------------------------- -------------------- ------------------------------

    CAL_MONTH_SALES_MV 13-APR-12 09.50.14.000000 AM

    CHANNELS 13-APR-12 09.50.14.000000 AM

    COSTS 13-APR-12 09.50.14.000000 AM

    COUNTRIES 13-APR-12 09.50.15.000000 AM

    CUSTOMERS 13-APR-12 09.50.15.000000 AM

    DIMENSION_EXCEPTIONS 13-APR-12 09.50.15.000000 AM

    FWEEK_PSCAT_SALES_MV 13-APR-12 09.50.15.000000 AM

    MV_CAPABILITIES_TABLE 13-APR-12 09.50.15.000000 AM

    PLAN_TABLE 13-APR-12 09.50.15.000000 AM

    PROMOTIONS 13-APR-12 09.50.15.000000 AM

    REWRITE_TABLE 13-APR-12 09.50.15.000000 AM

    PRODUCTS 13-APR-12 09.50.16.000000 AM

    SALES 13-APR-12 09.50.16.000000 AM

    TIMES 13-APR-12 09.50.16.000000 AM

    COSTS COSTS_Q1_1998

    COSTS COSTS_Q1_1999

    COSTS COSTS_Q1_2000

    COSTS COSTS_Q2_1998

    COSTS COSTS_Q2_1999

    COSTS COSTS_Q2_2000

    COSTS COSTS_Q3_1998

    COSTS COSTS_Q3_1999

    COSTS COSTS_Q3_2000

    COSTS COSTS_Q4_1998

    COSTS COSTS_Q4_1999

    COSTS COSTS_Q4_2000

    SALES SALES_1995

    SALES SALES_1996

    SALES SALES_H1_1997

    SALES SALES_H2_1997

    SALES SALES_Q1_1998

    SALES SALES_Q1_1999

    SALES SALES_Q1_2000

    SALES SALES_Q2_1998

    SALES SALES_Q2_1999

    SALES SALES_Q2_2000

    SALES SALES_Q3_1998

    SALES SALES_Q3_1999

    SALES SALES_Q3_2000

    SALES SALES_Q4_1998

    SALES SALES_Q4_1999

    SALES SALES_Q4_2000

    SALES_TRANSACTIONS_EXT

    43 rows selected.


  • guest Thursday, November 29, 2012

    Is this feature documented anywhere in the 11gR2 documentation, or on metalink?


  • guest Friday, November 30, 2012

    Unfortunately concurrent statistics gathering is not documented in the current release. This functionality is a subset of a new feature from our upcoming release that was backported to 11.2.0.3 in order to help customers improve the performance of large statistics gathering jobs.


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
Oracle

Integrated Cloud Applications & Platform Services