X

Insights into Statistics, Query Optimization and the Oracle Optimizer

Incremental Statistics Maintenance – what statistics will be gathered after DML occurs on the table?

By: Nigel Bayliss | Product Manager

Hi visitor. This blog post was originally written in 2012. I still serves as a good introduction to incremental statistics. Once you've read it, there's now more on this topic here (part one of a three-part series).

Incremental statistics maintenance was introduced in Oracle Database 11g to improve the performance of gathering statistics on large partitioned table. When incremental statistics maintenance is enabled for a partitioned table, Oracle accurately generated global level  statistics by aggregating partition level statistics. As more people begin to adopt this functionality we have gotten more questions around how they expected incremental statistics to behave in a given scenario. For example, last week we got a question around what partitions should have statistics gathered on them after DML has occurred on the table?

The person who asked the question assumed that statistics would only be gathered on partitions that had stale statistics (10% of the rows in the partition had changed). However, what they actually saw when they did a DBMS_STATS.GATHER_TABLE_STATS was all of the partitions that had been affected by the DML had statistics re-gathered on them. This is the expected behavior, incremental statistics maintenance is suppose to yield the same statistics as gathering table statistics from scratch, just faster. This means incremental statistics maintenance needs to gather statistics on any partition that will change the global or table level statistics. For instance, the min or max value for a column could change after just one row is inserted or updated in the table. It might easier to demonstrate this using an example.

Let’s take the ORDERS2 table, which is partitioned by month on ORDER_DATE.  We will begin by enabling incremental statistics for the table and gathering statistics on the table.

 

After the statistics gather the last_analyzed date for the table and all of the partitions now show 13-Mar-12.

And we now have the following column statistics for the ORDERS2 table.

We can also confirm that we really did use incremental statistics by querying the dictionary table SYS.HIST_HEAD$, which should have an entry for each column in the ORDERS2 table.

So, now that we have established a good baseline, let’s move on to the DML. Information is loaded into the latest partition of the ORDERS2 table once a month. Existing orders maybe also be update to reflect changes in their status. Let’s assume that a large number of update transactions take place on the ORDERS2 table this month. After these transactions have occurred we need to re-gather statistic since the partition ORDERS_MAR_2012 now has rows in it and the number of distinct values and the maximum value for the STATUS column have also changed.

Now if we look at the last_analyzed date for the table and the partitions, we will see that the global statistics and the statistics on the partitions where rows have changed due to the update (ORDERS_FEB_2012) and the data load (ORDERS_MAR_2012) have been updated.

The column statistics also reflect the changes with the number of distinct values in the status column increase to reflect the update.

So, incremental statistics maintenance will gather statistics on any partition, whose data has changed and that change will impact the global level statistics.

You can get a copy of the script I used to generate this post here.

As I mentioned at the beginning, there more on this topic here (this is part one of a three-part series).

 

Join the discussion

Comments ( 14 )
  • Doug Burns Sunday, March 18, 2012

    Sigh. So let me get this right ...

    It's a Sunday evening.

    It's the day after St. Patricks Day.

    ... and you're blogging about Incremental Stats?

    blogging machine


  • guest Sunday, March 18, 2012

    ; )


  • David Mann Tuesday, March 20, 2012

    This is great info, I was about to set up a similar test for one of my customers to verify this behavior. I guess I put it off just the right amount of time :)


  • guest Sunday, April 1, 2012

    Is Incremental stats a good idea in OLTP database?


  • Maria Colgan Monday, April 2, 2012

    Incremental statistics is a good idea if you have a partitioned table where data is inserted into or changed in only a small number of partitions. This could be in an OLTP system but we see it being used in warehouses, and data stores most frequently.

    Incremental statistics won't be as beneficial if data is changing is all of the partitions between statistics gathers.


  • guest Friday, April 6, 2012

    I have found that incremental stats does not work when the statistics on one or more partitions are locked (and this caveat is not included in any of the documentation I've seen regarding incremental statistics).

    In the DSS database with which I work, we perform inserts and updates to our largest (partitioned) tables weekly. While the vast majority (over 99%) of the inserts occur in the most recent partition, there are a few transactions each week that result in inserts/updates in older partitions. These few updates in the older partitions would not have a measurable impact on the global statistics. (Our largest table contains roughly 870 million rows).

    Therefore, the ideal situation for us would be to:

    1). Gather statistics incrementally one time, creating synopses on all partitions

    2). Lock the statistics on older partitions

    3). Then use incremental statistics gathering on a weekly basis to update the stats on the non-locked partitions and the global stats.

    However, when testing this scheme, I found that after locking the statistics on several of the older partitions and applying a week's worth of updates, dbms_stats.gather_table_stats ignores the fact that the incremental preference for the table is set to TRUE, and gathers global stats via full table scan.

    Is this behavior expected? Is there a reason that the synopses for locked partitions can't also be locked but still used to gather global stats? Any chance Oracle will make this approach possible in the near future?

    I would be happy to provide more detail regarding the tests I performed if that would be helpful.


  • Maria Colgan Monday, April 9, 2012

    Incremental statistics does work with locked partitions statistics as long as no DML occurs on the locked partitions. However, if DML does occurs on the locked partitions then we can no longer guarantee that the global statistics built from the locked statistics will be accurate so we will fall back to the old full table scan approach when gathering global statistics. Ideally you should not lock the older partitions. If no DML occurs on them, we won't re-gather statistics on them. If DML does occur on a sub-set of partitions (the latest one and one or two older ones) then we will re-gather statistics on just this subset of partitions and generate global statistics from the partition level statistics. Typically this will be much faster than using the old approach and will guarantee accurate global statistics.


  • guest Wednesday, April 11, 2012

    Is incremental statistics good idea for interval partitioned table?

    I'm following your example with and interval partitioned table, but it does work ..

    Example:

    I have an interval partitioned table (by day) that acts as FIFO,

    1)

    Create table

    create table FRI_INDEX.sales

    (

    sales_id number,

    sales_dt date

    )

    partition by range (sales_dt)

    interval (numtodsinterval(1,'day'))

    ( partition p0901 values less than (to_date('2012-01-01','yyyy-mm-dd')) );

    Table created.

    2)

    Enabling incremental statistics

    BEGIN

    dbms_stats.set_table_prefs('FRI_INDEX','SALES','INCREMENTAL','TRUE');

    END;

    PL/SQL procedure successfully completed.

    3)

    SELECT DBMS_STATS.GET_PREFS ('INCREMENTAL','FRI_INDEX','SALES')

    FROM DUAL

    ;

    DBMS_STATS.GET_PREFS('INCREMENTAL','FRI_INDEX','SALES')

    ---------------------------------------------------------

    TRUE

    4)

    GET STATS

    begin

    dbms_stats.gather_table_stats(

    ownname=> 'FRI_INDEX',

    tabname=> 'SALES' ,

    estimate_percent=> 100,

    cascade=> DBMS_STATS.AUTO_CASCADE,

    degree=> null,

    no_invalidate=> DBMS_STATS.AUTO_INVALIDATE,

    granularity=> 'ALL',

    method_opt=> 'FOR ALL INDEXED COLUMNS SIZE 1');

    end;

    PL/SQL procedure successfully completed

    select table_name, partition_name, global_stats, last_analyzed, num_rows

    from dba_tab_partitions

    where table_NAME='SALES'

    order by 1, 2, 4 desc nulls last

    TABLE_NAME PARTITION_NAME GLOBAL_STATS LAST_ANALYZED NUM_ROWS

    ------------ --------------

    SALES P0901 YES 11-APR-12 0

    5)

    Insert some records

    DECLARE

    i NUMBER := 1;

    j NUMBER := 0;

    BEGIN

    LOOP

    i:= 0;

    LOOP

    INSERT INTO FRI_INDEX.sales

    VALUES (i, TO_DATE ('2012-01-01', 'yyyy-mm-dd') + J);

    i := i + 1;

    EXIT WHEN i > 100;

    END LOOP;

    J := J + 1;

    EXIT WHEN J > 5;

    END LOOP;

    END;

    COMMIT;

    SELECT COUNT (9)

    FROM FRI_INDEX.SALES;

    COUNT(9)

    ----------

    606

    6)

    Get stats:

    begin

    dbms_stats.gather_table_stats(

    ownname=> 'FRI_INDEX',

    tabname=> 'SALES' ,

    estimate_percent=> 100,

    cascade=> DBMS_STATS.AUTO_CASCADE,

    degree=> null,

    no_invalidate=> DBMS_STATS.AUTO_INVALIDATE,

    granularity=> 'AUTO',

    method_opt=> 'FOR ALL INDEXED COLUMNS SIZE 1');

    end;

    PL/SQL procedure successfully completed.

    select table_name, partition_name, global_stats, last_analyzed, num_rows

    from dba_tab_partitions

    where table_NAME='SALES'

    order by 1, 2, 4 desc nulls last

    TABLE_NAME PARTITION_NAME GLOBAL_STATS LAST_ANALYZED

    NUM_ROWS

    SALES

    P0901

    YES

    11/04/2012 10:39:24 AM

    0

    SALES

    SYS_P299

    YES

    11/04/2012 10:39:24 AM

    101

    SALES

    SYS_P300

    YES

    11/04/2012 10:39:24 AM

    101

    SALES

    SYS_P301

    YES

    11/04/2012 10:39:24 AM

    101

    SALES

    SYS_P302

    YES

    11/04/2012 10:39:24 AM

    101

    SALES

    SYS_P303

    YES

    11/04/2012 10:39:24 AM

    101

    SALES

    SYS_P304

    YES

    11/04/2012 10:39:24 AM

    101

    Confirm incremental stats:

    SELECT O.NAME,C.NAME,DECODE (BITAND(H.SPARE2,8),8, 'YES', 'NO') INCREMENTAL

    FROM SYS.HIST_HEAD$ H, SYS.OBJ$ O, SYS.COL$ C

    WHERE H.OBJ#=O.OBJ#

    AND O.OBJ#=C.OBJ#

    AND H.INTCOL# = C.INTCOL#

    AND O.NAME='SALES'

    AND O.SUBNAME IS NULL;

    no rows selected.

    SELECT MIN (sales_dt)

    FROM FRI_INDEX.SALES;

    01/01/2012

    SELECT MAX (sales_dt)

    FROM FRI_INDEX.SALES;

    06/01/2012

    7)

    From here we will

    Delete the oldest day

    insert new records (max sales_dt +1) -- in order to create new partitions,

    run stats

    DELETE FROM FRI_INDEX.SALES

    WHERE SALES_DT BETWEEN TO_DATE ('2012-01-01 00:00:00', 'yyyy-mm-dd HH24:MI:SS') AND TO_DATE ('2012-01-01 23:59:59', 'yyyy-mm-dd HH24:MI:SS');

    DECLARE

    i NUMBER := 1;

    BEGIN

    LOOP

    INSERT INTO FRI_INDEX.sales

    VALUES (i, TO_DATE ('2012-01-07', 'yyyy-mm-dd') );

    i := i + 1;

    EXIT WHEN i > 100;

    END LOOP;

    END;

    COMMIT ;

    101 rows deleted.

    PL/SQL procedure successfully completed.

    Commit complete.

    STATS

    begin

    dbms_stats.gather_table_stats(

    ownname=> 'FRI_INDEX',

    tabname=> 'SALES' ,

    estimate_percent=> 100,

    cascade=> DBMS_STATS.AUTO_CASCADE,

    degree=> null,

    no_invalidate=> DBMS_STATS.AUTO_INVALIDATE,

    granularity=> 'AUTO',

    method_opt=> 'FOR ALL INDEXED COLUMNS SIZE 1');

    end;

    TABLE_NAME PARTITION_NAME GLOBAL_STATS LAST_ANALYZED

    NUM_ROWS

    SALES

    P0901

    YES

    11/04/2012 11:02:27 AM

    0

    SALES

    SYS_P299

    YES

    11/04/2012 11:02:27 AM

    101

    SALES

    SYS_P300

    YES

    11/04/2012 11:02:27 AM

    101

    SALES

    SYS_P301

    YES

    11/04/2012 11:02:27 AM

    101

    SALES

    SYS_P302

    YES

    11/04/2012 11:02:27 AM

    101

    SALES

    SYS_P303

    YES

    11/04/2012 11:02:27 AM

    101

    SALES

    SYS_P304

    YES

    11/04/2012 11:02:27 AM

    0

    SALES

    SYS_P305

    YES

    11/04/2012 11:02:27 AM

    100

    DELETE FROM FRI_INDEX.SALES

    WHERE SALES_DT BETWEEN TO_DATE ('2012-01-02 00:00:00', 'yyyy-mm-dd HH24:MI:SS') AND TO_DATE ('2012-01-02 23:59:59', 'yyyy-mm-dd HH24:MI:SS');

    DECLARE

    i NUMBER := 1;

    BEGIN

    LOOP

    INSERT INTO FRI_INDEX.sales

    VALUES (i, TO_DATE ('2012-01-08', 'yyyy-mm-dd') );

    i := i + 1;

    EXIT WHEN i > 100;

    END LOOP;

    END;

    COMMIT ;

    101 rows deleted.

    PL/SQL procedure successfully completed.

    Commit complete.

    TABLE_NAME PARTITION_NAME GLOBAL_STATS LAST_ANALYZED

    NUM_ROWS

    SALES

    P0901

    YES

    11/04/2012 11:06:20 AM

    0

    SALES

    SYS_P299

    YES

    11/04/2012 11:06:20 AM

    0

    SALES

    SYS_P300

    YES

    11/04/2012 11:06:20 AM

    101

    SALES

    SYS_P301

    YES

    11/04/2012 11:06:20 AM

    101

    SALES

    SYS_P302

    YES

    11/04/2012 11:06:20 AM

    101

    SALES

    SYS_P303

    YES

    11/04/2012 11:06:20 AM

    101

    SALES

    SYS_P304

    YES

    11/04/2012 11:06:20 AM

    0

    SALES

    SYS_P305

    YES

    11/04/2012 11:06:20 AM

    100

    SALES

    SYS_P306

    YES

    11/04/2012 11:06:20 AM

    100


  • Maria Colgan Wednesday, April 11, 2012

    Hi Yuri,

    The reason incremental statistics is not working in your example is because you are manually setting ESTIMATE_PERCENT to 100. In order for incremental statistics to work, the ESTIMATE_PERCENT must be set to the default 'AUTO_SAMPLE_SIZE.

    Thanks,

    Maria


  • Yuri Friday, May 4, 2012

    Hi, a complemantary note:

    from metalink.

    When you call gather_table_stats ensure you follow the documented restrictions for incremental (listed below) and ensure not to set method_opt to FOR INDEXED COLUMNS unless there are actually index on the table.

    ---> The user specifies AUTO_SAMPLE_SIZE for ESTIMATE_PERCENT and AUTO for GRANULARITY when gathering statistics on the table.

    dbms_stats.gather_table_stats(

    ownname=> 'TEST',

    tabname=> 'SALES' ,

    estimate_percent=> DBMS_STATS.AUTO_SAMPLE_SIZE ,

    cascade=> DBMS_STATS.AUTO_CASCADE,

    degree=> null,

    no_invalidate=> DBMS_STATS.AUTO_INVALIDATE,

    granularity=> 'AUTO',

    method_opt=> 'FOR ALL COLUMNS SIZE AUTO'

    );

    end;

    and note: 989250.1

    Symptoms

    Global statistics are not being computed when a partition is modified and statistics are gathered using dbms_stats with incremental option

    Cause

    Problem was than , for some unknown reason none of the partitions had data for histograms.

    Tables tabpart$ and hist_head$ are joined during the process of gathering statistics when incremental mode is used. As sys.hist_head$ did not have any rows for the partitions, global statistics were not correctly gathered.

    Solution

    -Delete the stats from both the table and partition

    -Delete the stats preferences

    -Calcualte the stats on the table and partitions with cascade on , sample 10 percent, for all indexed columns size auto

    -Put back the prefs and incremental

    -Gather statistic using incremental mode

    Thanks a lot


  • Jared Friday, November 9, 2012

    The docs state the granularity => 'AUTO' is required for INCREMENTAL to work.

    I've just finished some testing that shows that INCREMENTAL also works with granularity => 'ALL'

    This is on 11.2.0.2 on Linux


  • Maria Colgan Monday, November 12, 2012

    Incremental statistics will be used as long as the granularity parameter include global statistics. So, AUTO , ALL, GLOBAL AND PARTITION all should work.

    Thanks,

    Maria


  • guest Monday, November 12, 2012

    Thanks for that clarification Maria.

    I should have asked here first before expending time to test if the docs were correct.


  • guest Sunday, August 18, 2013

    Hi

    It is very nice explanation.

    I would like to know incase we have INCREMENTAL true for one of the table having interval partition and sub partition by range. Guess we load more data in both and it had added new sub partitions. And now we have incremental TRUE for the same table. Does Gather table statistics will take care of all partition and sub partitions newly added


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
Oracle

Integrated Cloud Applications & Platform Services