Incremental Statistics Maintenance – what statistics will be gathered after DML occurs on the table?

Incremental statistics maintenance was introduced in Oracle Database 11g to improve the performance of gathering statistics on large partitioned table. When incremental statistics maintenance is enabled for a partitioned table, oracle accurately generated global level  statistics by aggregating partition level statistics. As more people begin to adopt this functionality we have gotten more questions around how they expected incremental statistics to behave in a given scenario. For example, last week we got a question around what partitions should have statistics gathered on them after DML has occurred on the table?

The person who asked the question assumed that statistics would only be gathered on partitions that had stale statistics (10% of the rows in the partition had changed). However, what they actually saw when they did a DBMS_STATS.GATHER_TABLE_STATS was all of the partitions that had been affected by the DML had statistics re-gathered on them. This is the expected behavior, incremental statistics maintenance is suppose to yield the same statistics as gathering table statistics from scratch, just faster. This means incremental statistics maintenance needs to gather statistics on any partition that will change the global or table level statistics. For instance, the min or max value for a column could change after just one row is inserted or updated in the table. It might easier to demonstrate this using an example.

Let’s take the ORDERS2 table, which is partitioned by month on order_date.  We will begin by enabling incremental statistics for the table and gathering statistics on the table.

After the statistics gather the last_analyzed date for the table and all of the partitions now show 13-Mar-12.

And we now have the following column statistics for the ORDERS2 table.

We can also confirm that we really did use incremental statistics by querying the dictionary table sys.HIST_HEAD$, which should have an entry for each column in the ORDERS2 table.

So, now that we have established a good baseline, let’s move on to the DML. Information is loaded into the latest partition of the ORDERS2 table once a month. Existing orders maybe also be update to reflect changes in their status. Let’s assume the following transactions take place on the ORDERS2 table this month.

After these transactions have occurred we need to re-gather statistic since the partition ORDERS_MAR_2012 now has rows in it and the number of distinct values and the maximum value for the STATUS column have also changed.

Now if we look at the last_analyzed date for the table and the partitions, we will see that the global statistics and the statistics on the partitions where rows have changed due to the update (ORDERS_FEB_2012) and the data load (ORDERS_MAR_2012) have been updated.

The column statistics also reflect the changes with the number of distinct values in the status column increase to reflect the update.

So, incremental statistics maintenance will gather statistics on any partition, whose data has changed and that change will impact the global level statistics.

You can get a copy of the script I used to generate this post here.

+Maria Colgan

Comments:

Sigh. So let me get this right ...

It's a Sunday evening.
It's the day after St. Patricks Day.

... and you're blogging about Incremental Stats?

blogging machine

Posted by Doug Burns on March 18, 2012 at 01:54 PM PDT #

; )

Posted by guest on March 18, 2012 at 03:08 PM PDT #

This is great info, I was about to set up a similar test for one of my customers to verify this behavior. I guess I put it off just the right amount of time :)

Posted by David Mann on March 19, 2012 at 08:42 PM PDT #

Is Incremental stats a good idea in OLTP database?

Posted by guest on April 01, 2012 at 01:04 AM PDT #

Incremental statistics is a good idea if you have a partitioned table where data is inserted into or changed in only a small number of partitions. This could be in an OLTP system but we see it being used in warehouses, and data stores most frequently.
Incremental statistics won't be as beneficial if data is changing is all of the partitions between statistics gathers.

Posted by Maria Colgan on April 02, 2012 at 05:35 AM PDT #

I have found that incremental stats does not work when the statistics on one or more partitions are locked (and this caveat is not included in any of the documentation I've seen regarding incremental statistics).

In the DSS database with which I work, we perform inserts and updates to our largest (partitioned) tables weekly. While the vast majority (over 99%) of the inserts occur in the most recent partition, there are a few transactions each week that result in inserts/updates in older partitions. These few updates in the older partitions would not have a measurable impact on the global statistics. (Our largest table contains roughly 870 million rows).

Therefore, the ideal situation for us would be to:

1). Gather statistics incrementally one time, creating synopses on all partitions

2). Lock the statistics on older partitions

3). Then use incremental statistics gathering on a weekly basis to update the stats on the non-locked partitions and the global stats.

However, when testing this scheme, I found that after locking the statistics on several of the older partitions and applying a week's worth of updates, dbms_stats.gather_table_stats ignores the fact that the incremental preference for the table is set to TRUE, and gathers global stats via full table scan.

Is this behavior expected? Is there a reason that the synopses for locked partitions can't also be locked but still used to gather global stats? Any chance Oracle will make this approach possible in the near future?

I would be happy to provide more detail regarding the tests I performed if that would be helpful.

Posted by guest on April 06, 2012 at 06:11 AM PDT #

Incremental statistics does work with locked partitions statistics as long as no DML occurs on the locked partitions. However, if DML does occurs on the locked partitions then we can no longer guarantee that the global statistics built from the locked statistics will be accurate so we will fall back to the old full table scan approach when gathering global statistics. Ideally you should not lock the older partitions. If no DML occurs on them, we won't re-gather statistics on them. If DML does occur on a sub-set of partitions (the latest one and one or two older ones) then we will re-gather statistics on just this subset of partitions and generate global statistics from the partition level statistics. Typically this will be much faster than using the old approach and will guarantee accurate global statistics.

Posted by Maria Colgan on April 09, 2012 at 01:02 PM PDT #

Is incremental statistics good idea for interval partitioned table?

I'm following your example with and interval partitioned table, but it does work ..

Example:

I have an interval partitioned table (by day) that acts as FIFO,

1) Create table

create table FRI_INDEX.sales
(
sales_id number,
sales_dt date
)
partition by range (sales_dt)
interval (numtodsinterval(1,'day'))
( partition p0901 values less than (to_date('2012-01-01','yyyy-mm-dd')) );

Table created.

2) Enabling incremental statistics

BEGIN
dbms_stats.set_table_prefs('FRI_INDEX','SALES','INCREMENTAL','TRUE');
END;
PL/SQL procedure successfully completed.

3)

SELECT DBMS_STATS.GET_PREFS ('INCREMENTAL','FRI_INDEX','SALES')
FROM DUAL
;

DBMS_STATS.GET_PREFS('INCREMENTAL','FRI_INDEX','SALES')
---------------------------------------------------------
TRUE

4) GET STATS

begin
dbms_stats.gather_table_stats(
ownname=> 'FRI_INDEX',
tabname=> 'SALES' ,
estimate_percent=> 100,
cascade=> DBMS_STATS.AUTO_CASCADE,
degree=> null,
no_invalidate=> DBMS_STATS.AUTO_INVALIDATE,
granularity=> 'ALL',
method_opt=> 'FOR ALL INDEXED COLUMNS SIZE 1');
end;

PL/SQL procedure successfully completed

select table_name, partition_name, global_stats, last_analyzed, num_rows
from dba_tab_partitions
where table_NAME='SALES'
order by 1, 2, 4 desc nulls last

TABLE_NAME PARTITION_NAME GLOBAL_STATS LAST_ANALYZED NUM_ROWS
------------ --------------
SALES P0901 YES 11-APR-12 0

5) Insert some records

DECLARE
i NUMBER := 1;
j NUMBER := 0;
BEGIN
LOOP
i:= 0;
LOOP
INSERT INTO FRI_INDEX.sales
VALUES (i, TO_DATE ('2012-01-01', 'yyyy-mm-dd') + J);
i := i + 1;
EXIT WHEN i > 100;
END LOOP;
J := J + 1;
EXIT WHEN J > 5;
END LOOP;
END;
COMMIT;

SELECT COUNT (9)
FROM FRI_INDEX.SALES;
COUNT(9)
----------
606
6) Get stats:

begin
dbms_stats.gather_table_stats(
ownname=> 'FRI_INDEX',
tabname=> 'SALES' ,
estimate_percent=> 100,
cascade=> DBMS_STATS.AUTO_CASCADE,
degree=> null,
no_invalidate=> DBMS_STATS.AUTO_INVALIDATE,
granularity=> 'AUTO',
method_opt=> 'FOR ALL INDEXED COLUMNS SIZE 1');

end;
PL/SQL procedure successfully completed.

select table_name, partition_name, global_stats, last_analyzed, num_rows
from dba_tab_partitions
where table_NAME='SALES'
order by 1, 2, 4 desc nulls last

TABLE_NAME PARTITION_NAME GLOBAL_STATS LAST_ANALYZED NUM_ROWS
SALES P0901 YES 11/04/2012 10:39:24 AM 0
SALES SYS_P299 YES 11/04/2012 10:39:24 AM 101
SALES SYS_P300 YES 11/04/2012 10:39:24 AM 101
SALES SYS_P301 YES 11/04/2012 10:39:24 AM 101
SALES SYS_P302 YES 11/04/2012 10:39:24 AM 101
SALES SYS_P303 YES 11/04/2012 10:39:24 AM 101
SALES SYS_P304 YES 11/04/2012 10:39:24 AM 101

Confirm incremental stats:

SELECT O.NAME,C.NAME,DECODE (BITAND(H.SPARE2,8),8, 'YES', 'NO') INCREMENTAL
FROM SYS.HIST_HEAD$ H, SYS.OBJ$ O, SYS.COL$ C
WHERE H.OBJ#=O.OBJ#
AND O.OBJ#=C.OBJ#
AND H.INTCOL# = C.INTCOL#
AND O.NAME='SALES'
AND O.SUBNAME IS NULL;

no rows selected.

SELECT MIN (sales_dt)
FROM FRI_INDEX.SALES;
01/01/2012

SELECT MAX (sales_dt)
FROM FRI_INDEX.SALES;

06/01/2012

7) From here we will
Delete the oldest day
insert new records (max sales_dt +1) -- in order to create new partitions,
run stats

DELETE FROM FRI_INDEX.SALES
WHERE SALES_DT BETWEEN TO_DATE ('2012-01-01 00:00:00', 'yyyy-mm-dd HH24:MI:SS') AND TO_DATE ('2012-01-01 23:59:59', 'yyyy-mm-dd HH24:MI:SS');

DECLARE
i NUMBER := 1;
BEGIN
LOOP
INSERT INTO FRI_INDEX.sales
VALUES (i, TO_DATE ('2012-01-07', 'yyyy-mm-dd') );
i := i + 1;
EXIT WHEN i > 100;
END LOOP;
END;
COMMIT ;

101 rows deleted.
PL/SQL procedure successfully completed.
Commit complete.

STATS

begin
dbms_stats.gather_table_stats(
ownname=> 'FRI_INDEX',
tabname=> 'SALES' ,
estimate_percent=> 100,
cascade=> DBMS_STATS.AUTO_CASCADE,
degree=> null,
no_invalidate=> DBMS_STATS.AUTO_INVALIDATE,
granularity=> 'AUTO',
method_opt=> 'FOR ALL INDEXED COLUMNS SIZE 1');
end;

TABLE_NAME PARTITION_NAME GLOBAL_STATS LAST_ANALYZED NUM_ROWS

SALES P0901 YES 11/04/2012 11:02:27 AM 0
SALES SYS_P299 YES 11/04/2012 11:02:27 AM 101
SALES SYS_P300 YES 11/04/2012 11:02:27 AM 101
SALES SYS_P301 YES 11/04/2012 11:02:27 AM 101
SALES SYS_P302 YES 11/04/2012 11:02:27 AM 101
SALES SYS_P303 YES 11/04/2012 11:02:27 AM 101
SALES SYS_P304 YES 11/04/2012 11:02:27 AM 0
SALES SYS_P305 YES 11/04/2012 11:02:27 AM 100

DELETE FROM FRI_INDEX.SALES
WHERE SALES_DT BETWEEN TO_DATE ('2012-01-02 00:00:00', 'yyyy-mm-dd HH24:MI:SS') AND TO_DATE ('2012-01-02 23:59:59', 'yyyy-mm-dd HH24:MI:SS');

DECLARE
i NUMBER := 1;
BEGIN
LOOP
INSERT INTO FRI_INDEX.sales
VALUES (i, TO_DATE ('2012-01-08', 'yyyy-mm-dd') );
i := i + 1;
EXIT WHEN i > 100;
END LOOP;
END;
COMMIT ;

101 rows deleted.
PL/SQL procedure successfully completed.
Commit complete.

TABLE_NAME PARTITION_NAME GLOBAL_STATS LAST_ANALYZED NUM_ROWS

SALES P0901 YES 11/04/2012 11:06:20 AM 0
SALES SYS_P299 YES 11/04/2012 11:06:20 AM 0
SALES SYS_P300 YES 11/04/2012 11:06:20 AM 101
SALES SYS_P301 YES 11/04/2012 11:06:20 AM 101
SALES SYS_P302 YES 11/04/2012 11:06:20 AM 101
SALES SYS_P303 YES 11/04/2012 11:06:20 AM 101
SALES SYS_P304 YES 11/04/2012 11:06:20 AM 0
SALES SYS_P305 YES 11/04/2012 11:06:20 AM 100
SALES SYS_P306 YES 11/04/2012 11:06:20 AM 100

Posted by guest on April 11, 2012 at 03:14 AM PDT #

Hi Yuri,

The reason incremental statistics is not working in your example is because you are manually setting ESTIMATE_PERCENT to 100. In order for incremental statistics to work, the ESTIMATE_PERCENT must be set to the default 'AUTO_SAMPLE_SIZE.

Thanks,
Maria

Posted by Maria Colgan on April 11, 2012 at 09:50 AM PDT #

Hi, a complemantary note:
from metalink.

When you call gather_table_stats ensure you follow the documented restrictions for incremental (listed below) and ensure not to set method_opt to FOR INDEXED COLUMNS unless there are actually index on the table.

---> The user specifies AUTO_SAMPLE_SIZE for ESTIMATE_PERCENT and AUTO for GRANULARITY when gathering statistics on the table.

dbms_stats.gather_table_stats(
ownname=> 'TEST',
tabname=> 'SALES' ,
estimate_percent=> DBMS_STATS.AUTO_SAMPLE_SIZE ,
cascade=> DBMS_STATS.AUTO_CASCADE,
degree=> null,
no_invalidate=> DBMS_STATS.AUTO_INVALIDATE,
granularity=> 'AUTO',
method_opt=> 'FOR ALL COLUMNS SIZE AUTO'
);
end;

and note: 989250.1

Symptoms
Global statistics are not being computed when a partition is modified and statistics are gathered using dbms_stats with incremental option

Cause
Problem was than , for some unknown reason none of the partitions had data for histograms.

Tables tabpart$ and hist_head$ are joined during the process of gathering statistics when incremental mode is used. As sys.hist_head$ did not have any rows for the partitions, global statistics were not correctly gathered.

Solution
-Delete the stats from both the table and partition
-Delete the stats preferences
-Calcualte the stats on the table and partitions with cascade on , sample 10 percent, for all indexed columns size auto
-Put back the prefs and incremental
-Gather statistic using incremental mode

Thanks a lot

Posted by Yuri on May 04, 2012 at 10:12 AM PDT #

The docs state the granularity => 'AUTO' is required for INCREMENTAL to work.

I've just finished some testing that shows that INCREMENTAL also works with granularity => 'ALL'

This is on 11.2.0.2 on Linux

Posted by Jared on November 09, 2012 at 02:35 PM PST #

Incremental statistics will be used as long as the granularity parameter include global statistics. So, AUTO , ALL, GLOBAL AND PARTITION all should work.

Thanks,
Maria

Posted by Maria Colgan on November 12, 2012 at 10:29 AM PST #

Thanks for that clarification Maria.
I should have asked here first before expending time to test if the docs were correct.

Posted by guest on November 12, 2012 at 01:49 PM PST #

Hi
It is very nice explanation.

I would like to know incase we have INCREMENTAL true for one of the table having interval partition and sub partition by range. Guess we load more data in both and it had added new sub partitions. And now we have incremental TRUE for the same table. Does Gather table statistics will take care of all partition and sub partitions newly added

Posted by guest on August 18, 2013 at 04:33 AM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

The Oracle Optimizer blog is written by members of the Optimizer development team. The goal of this blog is to provide an insight into the workings of the Optimizer and the statistics it relies on. The views expressed on this blog are our own and do not necessarily reflect the views of Oracle and its affiliates. The views and opinions expressed by visitors on this blog are theirs solely and may not reflect ours.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today