Concurrent Statistics Gathering


Gathering optimizer statistics is one of life's necessary evils even if it can take an extremely long time to complete. In this blog post, we discuss one remedy to improve the efficiency of statistics gathering.

Introduction

Oracle Database 11g Release 2 (11.2.0.2) introduces a new statistics gathering mode, 'concurrent statistics gathering'. The goal of this new mode is to enable a user to gather statistics on multiple tables in a schema (or database), and multiple (sub)partitions within a table concurrently. Gathering statistics on multiple tables and (sub)partitions concurrently can reduce the overall time it takes to gather statistics by allowing Oracle to fully utilize a multi-processor environment.

Concurrent statistics gathering is controlled by a global preference, CONCURRENT, in the DBMS_STATS package. The CONCURRENT preference is of type boolean, so it can be set to TRUE or FALSE. By default it is set to FALSE.  When CONCURRENT is set to TRUE, Oracle employs Oracle Job Scheduler and Advanced Queuing components to create and manage multiple statistics gathering jobs concurrently.

If you call dbms_stats.gather_table_stats on a partitioned table when CONCURRENT is set to true, Oracle will create a separate statistics gathering job for each (sub)partition in the table. The Job Scheduler will decide how many of these jobs will execute concurrently, and how many will be queued based on available system resources. As the currently running jobs complete, more jobs will be dequeued and executed until all (sub)partitions have had their statistics gathered on them.

If you gather statistics using dbms_stats.gather_database_stats,  dbms_stats.gather_schema_stats, or dbms_stats.gather_dictionary_stats, then Oracle will create a separate statistics gathering job for each non-partitioned table, and each (sub)partition for the partitioned tables. Each partitioned table will also have a coordinator job that manages its (sub)partition jobs. The database will then run as many concurrent jobs as possible, and queue the remaining jobs until the executing jobs complete. However, multiple partitioned tables are not allowed to be processed simultaneously to prevent possible deadlock cases. Hence, if there are some jobs running for a partitioned table, other partitioned tables in a schema (or database or dictionary) will be queued until the current one completes. There is no such restriction for non-partitioned tables. The maximum number of concurrent statistics gathering jobs is bounded by the job_queue_processes initialization parameter (per node on a RAC environment) and the available system resources.

The following figure illustrates the creation of jobs at different levels, where Table 3 is a partitioned table, while other tables are non-partitioned tables. Job 3 acts as a coordinator job for Table 3, and creates a job for each partition in that table, as well as a separate job for the global statistics of Table 3.

Job Creation.small3.png
Figure 1: Job Scheduling during Optimizer Statistics Gathering in Oracle 11g R2

As another example, assume that the parameter job_queue_processes is set to 32, and you issued a dbms_stats.gather_schema_stats on the SH schema. Oracle would create a statistics gathering job (Level 1 in Figure 1) for each of the non-partitioned tables;

  • SUPPLEMENTARY_DEMOGRAPHICS
  • COUNTRIES
  • CUSTOMERS
  • PROMOTIONS
  • CHANNELS
  • PRODUCTS
  • TIMES
And, a coordinator job for each partitioned table, i.e., SALES and COSTS, in turn creates a statistics gathering job for each of partition in SALES and COSTS tables, respectively (Level 2 in Figure 1). Then, the Oracle Job Scheduler would allow 32 statistics gathering jobs to start, and would queue the rest (assuming that there are sufficient resources for 32 concurrent jobs). Suppose that 29 jobs (one for each partition + a coordinator job) for the SALES table get started, then three non-partitioned table statistics gathering jobs would also be started. The statistics gathering jobs for the COSTS table will be automatically queued, because only for one partitioned table can be processed at any one time. As each job finishes, another job will be dequeued and started, until all 63 jobs have been completed.

Configuration and Settings

In Oracle Database 11.2.0.2, the concurrency setting for statistics gathering is turned off by default. It can be turned on using the following command.

Begin

DBMS_STATS.SET_GLOBAL_PREFS('CONCURRENT','TRUE');

End;

/

You will also need some additional privileges above and beyond the regular privileges required to gather statistics. The user must have the following Job Scheduler and AQ privileges:

  • CREATE JOB
  • MANAGE SCHEDULER
  • MANAGE ANY QUEUE
The SYSAUX tablespace should be online, as the Job Scheduler stores its internal tables and views in SYSAUX tablespace.

Finally the job_queue_processes parameter should be set to at least 4. If you want to fully utilize all of the system resources during the statistics gathering process but you don't plan to use parallel execution you should set the job_queue_processes to 2* total number of CPU cores (this is a per node parameter in a RAC environment).Please make sure that you set this parameter system-wise (i.e., ALTER SYSTEM ... or in init.ora file) rather than at the session level (i.e., ALTER SESSION).

Using Concurrent Statistics Gathering with Parallel Execution


When using concurrent statistics gathering it is still possible to have each individual statistics gather job execute in parallel. This combination is normally used when the objects (tables or (sub)partitions) being analyzed are large. If you plan to execute the concurrent statistics gathering jobs in parallel you should disable the parallel adaptive multi-user initialization parameter. That is;

Alter system set parallel_adaptive_multi_user=false;

It is also recommended that you enable parallel statement queuing. This requires Resource Manager to be activated (if not already), and the creation of a temporary resource plan where the consumer group "OTHER_GROUPS" should have queuing enabled. By default, Resource Manager is activated only during the maintenance windows. The following script illustrates one way of creating a temporary resource plan (pqq_test), and enabling the Resource Manager with this plan.

-- connect as a user with dba privileges
begin
  dbms_resource_manager.create_pending_area();
  dbms_resource_manager.create_plan('pqq_test', 'pqq_test');
  dbms_resource_manager.create_plan_directive(
        'pqq_test',
        'OTHER_GROUPS',
        'OTHER_GROUPS directive for pqq',
        parallel_target_percentage => 90);
  dbms_resource_manager.submit_pending_area();
end;
/
ALTER SYSTEM SET RESOURCE_MANAGER_PLAN = 'pqq_test' SID='*';

Monitoring Concurrent Statistics Gathering Jobs

You can use the standard database scheduler views to monitor the concurrent statistics gathering jobs. The comments field of a job in dba_scheduler_jobs shows the target object for that statistics gathering job in the form of owner.table.(sub)partition. All statistics collection job names start with 'ST$' for easy identification. The jobs whose name start with ST$T###_### are created by a coordinator job for a partitioned table, and works on a partition or subpartition of the table.

The jobs whose name begin with ST$SD###_### are created for a table in a schema or database, and either works as a coordinator for its partition level jobs (if the table is partitioned), or directly performs the statistics collection for the table (if the table is not partitioned).

Finally, those with ST$D###_### in their naming are created for dictionary level tasks (when gather_dictionary_stats is used), and jobs does similar tasks as those with SD in their names.

Using the following query you can see all of the concurrent statistics gathering jobs that have been created:

select job_name, state, comments

from dba_scheduler_jobs

where job_class like 'CONC%'


To only see the currently running jobs, filter by the job state:

select job_name, state, comments

from dba_scheduler_jobs

where job_class like 'CONC%'

and state = 'RUNNING';


Similarly, to see the scheduled (i.e., waiting to run) jobs you just need to change the state:

select job_name, state, comments

from dba_scheduler_jobs

where job_class like 'CONC%'

and state = 'SCHEDULED';


It is also possible to see the elapse time for the currently running statistics gathering jobs:

select job_name, elapsed_time

from dba_scheduler_running_jobs

where job_name like 'ST$%';


Known Limitations

Concurrent statistics gathering does not use any automated way to determine the maximum number of jobs that can concurrently run based on the system load and resources. Concurrent statistics gathering solely relies on the Job Scheduler and the value of the job_queue_processes parameter. Session-wide parameter settings and events are not transferred to the newly created jobs (scheduler limitation). Indexes are not gathered concurrently.
Comments:

In an E-Business Suite scenario, Oracle recommend executing specific package - FND_STATS (via a wrapper Concurrent Program, Gather Schema Statistics) and not the database package - DBMS_STATS. Do we have such concurrency feature available on FND_STATS (via some patch) as well to optimize on execution timings of Gather Schema Statistics program ? Regards, Rakesh

Posted by Rakesh Tripathi on March 29, 2011 at 12:05 PM PDT #

Hi Rakesh, Unfortunately I am not currently aware of a patch to FND_STATS that allows you to take advantage of concurrency in E-Business Suite. But I am sure EBS guys will but add the capability soon.

Posted by Maria Colgan on March 31, 2011 at 07:49 AM PDT #

I continually visit your blog and retrieve everything you post here but I never commented but these days when I saw this post, i couldn’t stop myself from commenting here. great mate!

Posted by Hans Hufstetler on April 25, 2011 at 12:49 AM PDT #

Interesting post, THX

Posted by szkolenia bhp on April 27, 2011 at 07:07 AM PDT #

Under Known limitations for 11.2, its stated "Indexes are not gathered concurrently"
If we use the concurrent method, don't we run the risk of stats for data being updated before the index is...?
This matters to us, because, when you have a multi TB DB and are performing nighly loads getting these out of sync can effect execution plans and performance.
So is this one of the recommended testing scenerio's we need to take into account before jumping on board the concurrent gather bandwagon?
Thoughts?

Posted by guest on December 01, 2011 at 07:20 AM PST #

11g has incremental statistics for partition tables. Try using that.

Posted by Hemant Bhavsar on January 23, 2012 at 07:52 AM PST #

CONCURRENT stats gathering does not honor schema level preference for GRANULARITY set to GLOBAL.
In other words, if I set GRANULARITY to GLOBAL for a particular schema and set CONCURRENT to TRUE at the global level, stats are gathered at partition level anyhow. Ours is an OLTP system and we don't want to gather stats at the partition level since we don't employ partition pruning and use only global indexes.
Maybe CONCURRENT stats are not to be used in our case?

Posted by guest on April 11, 2012 at 02:02 AM PDT #

CONCURRENT also doesn't honor METHOD_OPT. I had set METHOD_OPT to FOR ALL COLUMNS SIZE 1. The concurrent stats job gathered histograms on all the tables. So I guess there are a few of these bugs that need to be resolved before we can use CONCURRENT. I am using 11.2.0.3.

Posted by guest on April 11, 2012 at 02:28 AM PDT #

Hi Prashant,

I couldn't reproduce the problem you described with CONCURRENT not honoring METHOD_OPT. I ran the following script as my test, can you tell me how your test differed?

Thanks,
Maria

SQL> conn sh/sh
Connected.
SQL> select dbms_stats.get_prefs('CONCURRENT') pref from dual;

PREF
------------------------------
TRUE

SQL> exec dbms_stats.set_schema_prefs('SH', 'METHOD_OPT', 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> col pref format a30
SQL> select dbms_stats.get_prefs('METHOD_OPT', 'SH', 'SALES') pref from dual;

PREF
------------------------------
FOR ALL COLUMNS SIZE 1

SQL> exec dbms_stats.delete_schema_stats(null);

PL/SQL procedure successfully completed.

SQL> col column_name format a25
SQL> select table_name, column_name, histogram from user_tab_col_statistics order by 1, 2;

no rows selected

SQL> exec dbms_stats.gather_schema_stats(null);

PL/SQL procedure successfully completed.

SQL> select table_name, column_name, histogram from user_tab_col_statistics where table_name = 'SALES' order by 1, 2 ;

TABLE_NAME COLUMN_NAME HISTOGRAM
------------------------- ------------------------- ---------------
SALES AMOUNT_SOLD NONE
SALES CHANNEL_ID NONE
SALES CUST_ID NONE
SALES PROD_ID NONE
SALES PROMO_ID NONE
SALES QUANTITY_SOLD NONE
SALES TIME_ID NONE

7 rows selected.

Posted by Maria Colgan on April 18, 2012 at 09:59 AM PDT #

Hi Prashant,

I wanted to respond to your recent comments that "CONCURRENT stats gathering does not honor schema level preference for GRANULARITY set to GLOBAL.
In other words, if I set GRANULARITY to GLOBAL for a particular schema
and set CONCURRENT to TRUE at the global level, stats are gathered at
partition level anyhow. Ours is an OLTP system and we don't want to
gather stats at the partition level since we don't employ partition
pruning and use only global indexes. Maybe CONCURRENT stats are not to be used in our case?"

I tried to reproduce the behavior you reported using the SH sample schema. However, I was unable to reproduce the reported behavior.

Here is the test case I used for this one. Can you let me know what you tested that was different to this?

Thanks,
Maria

SQL> conn / as sysdba

Connected.

SQL> grant dba to sh;

Grant succeeded.

SQL> conn sh/sh

Connected.

SQL> exec dbms_stats.set_global_prefs('CONCURRENT', 'TRUE');

PL/SQL procedure successfully completed.

SQL> col pref format a10

SQL> col table_name format a25

SQL> col partition_name format a20

SQL> col last_analyzed format a30

SQL> set lines 200 pages 10000

SQL> select dbms_stats.get_prefs('CONCURRENT') pref from dual;

PREF
----------
TRUE

SQL> exec dbms_stats.set_schema_prefs('SH', 'GRANULARITY', 'GLOBAL');

PL/SQL procedure successfully completed.

SQL> select dbms_stats.get_prefs('GRANULARITY', 'SH', 'SALES') pref from dual;

PREF
----------
GLOBAL

SQL> select dbms_stats.get_prefs('GRANULARITY', 'SH', 'COSTS') pref from dual;

PREF
----------
GLOBAL

SQL> exec dbms_stats.delete_schema_stats(null);

PL/SQL procedure successfully completed.

SQL> select table_name, partition_name, cast(last_analyzed as timestamp) last_analyzed from user_tab_statistics order by 3, 1, 2;

TABLE_NAME PARTITION_NAME LAST_ANALYZED

------------------------- -------------------- ------------------------------
CAL_MONTH_SALES_MV
CHANNELS
COSTS COSTS_Q1_1998
COSTS COSTS_Q1_1999
COSTS COSTS_Q1_2000
COSTS COSTS_Q2_1998
COSTS COSTS_Q2_1999
COSTS COSTS_Q2_2000
COSTS COSTS_Q3_1998
COSTS COSTS_Q3_1999
COSTS COSTS_Q3_2000
COSTS COSTS_Q4_1998
COSTS COSTS_Q4_1999
COSTS COSTS_Q4_2000
COSTS
COUNTRIES
CUSTOMERS
DIMENSION_EXCEPTIONS
FWEEK_PSCAT_SALES_MV
MV_CAPABILITIES_TABLE
PLAN_TABLE
PRODUCTS
PROMOTIONS
REWRITE_TABLE
SALES SALES_1995
SALES SALES_1996
SALES SALES_H1_1997
SALES SALES_H2_1997
SALES SALES_Q1_1998
SALES SALES_Q1_1999
SALES SALES_Q1_2000
SALES SALES_Q2_1998
SALES SALES_Q2_1999
SALES SALES_Q2_2000
SALES SALES_Q3_1998
SALES SALES_Q3_1999
SALES SALES_Q3_2000
SALES SALES_Q4_1998
SALES SALES_Q4_1999
SALES SALES_Q4_2000
SALES
SALES_TRANSACTIONS_EXT
TIMES

43 rows selected.

SQL> exec dbms_stats.gather_schema_stats(null);

PL/SQL procedure successfully completed.

SQL> select table_name, partition_name, cast(last_analyzed as timestamp) last_analyzed from user_tab_statistics order by 3, 1, 2;

TABLE_NAME PARTITION_NAME LAST_ANALYZED

------------------------- -------------------- ------------------------------
CAL_MONTH_SALES_MV 13-APR-12 09.50.14.000000 AM
CHANNELS 13-APR-12 09.50.14.000000 AM
COSTS 13-APR-12 09.50.14.000000 AM
COUNTRIES 13-APR-12 09.50.15.000000 AM
CUSTOMERS 13-APR-12 09.50.15.000000 AM
DIMENSION_EXCEPTIONS 13-APR-12 09.50.15.000000 AM
FWEEK_PSCAT_SALES_MV 13-APR-12 09.50.15.000000 AM
MV_CAPABILITIES_TABLE 13-APR-12 09.50.15.000000 AM
PLAN_TABLE 13-APR-12 09.50.15.000000 AM
PROMOTIONS 13-APR-12 09.50.15.000000 AM
REWRITE_TABLE 13-APR-12 09.50.15.000000 AM
PRODUCTS 13-APR-12 09.50.16.000000 AM
SALES 13-APR-12 09.50.16.000000 AM
TIMES 13-APR-12 09.50.16.000000 AM
COSTS COSTS_Q1_1998
COSTS COSTS_Q1_1999
COSTS COSTS_Q1_2000
COSTS COSTS_Q2_1998
COSTS COSTS_Q2_1999
COSTS COSTS_Q2_2000
COSTS COSTS_Q3_1998
COSTS COSTS_Q3_1999
COSTS COSTS_Q3_2000
COSTS COSTS_Q4_1998
COSTS COSTS_Q4_1999
COSTS COSTS_Q4_2000
SALES SALES_1995
SALES SALES_1996
SALES SALES_H1_1997
SALES SALES_H2_1997
SALES SALES_Q1_1998
SALES SALES_Q1_1999
SALES SALES_Q1_2000
SALES SALES_Q2_1998
SALES SALES_Q2_1999
SALES SALES_Q2_2000
SALES SALES_Q3_1998
SALES SALES_Q3_1999
SALES SALES_Q3_2000
SALES SALES_Q4_1998
SALES SALES_Q4_1999
SALES SALES_Q4_2000
SALES_TRANSACTIONS_EXT

43 rows selected.

Posted by Maria Colgan on April 18, 2012 at 10:14 AM PDT #

Is this feature documented anywhere in the 11gR2 documentation, or on metalink?

Posted by guest on November 29, 2012 at 11:01 AM PST #

Unfortunately concurrent statistics gathering is not documented in the current release. This functionality is a subset of a new feature from our upcoming release that was backported to 11.2.0.3 in order to help customers improve the performance of large statistics gathering jobs.

Posted by guest on November 29, 2012 at 09:01 PM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

The Oracle Optimizer blog is written by members of the Optimizer development team. The goal of this blog is to provide an insight into the workings of the Optimizer and the statistics it relies on. The views expressed on this blog are our own and do not necessarily reflect the views of Oracle and its affiliates. The views and opinions expressed by visitors on this blog are theirs solely and may not reflect ours.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today