CBO为什么不走索引?

原帖是Itpub上的网友提出一个CBO为什么不走索引的问题, 该问题的演示如下:

SQL> create table maclean1 as select * from dba_objects;

Table created.

SQL> update maclean1 set status='INVALID' where owner='MACLEAN';

2 rows updated.

SQL> commit;                                       

Commit complete.

SQL> create index ind_maclean1 on maclean1(status);

Index created.

SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN1',cascade=>true);

PL/SQL procedure successfully completed.

SQL> explain plan for select * from maclean1 where status='INVALID';

Explained.

SQL> set linesize 140 pagesize 1400
SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 987568083

------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          | 11320 |  1028K|    85   (0)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| MACLEAN1 | 11320 |  1028K|    85   (0)| 00:00:02 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("STATUS"='INVALID')

13 rows selected.

10053 trace 

Access path analysis for MACLEAN1
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for MACLEAN1[MACLEAN1]
  Column (#10): STATUS(
    AvgLen: 7 NDV: 2 Nulls: 0 Density: 0.500000
  Table: MACLEAN1  Alias: MACLEAN1
    Card: Original: 22639.000000  Rounded: 11320  Computed: 11319.50  Non Adjusted: 11319.50
  Access Path: TableScan
    Cost:  85.33  Resp: 85.33  Degree: 0
      Cost_io: 85.00  Cost_cpu: 11935345
      Resp_io: 85.00  Resp_cpu: 11935345
  Access Path: index (AllEqRange)
    Index: IND_MACLEAN1
    resc_io: 185.00  resc_cpu: 8449916
    ix_sel: 0.500000  ix_sel_with_filters: 0.500000
    Cost: 185.24  Resp: 185.24  Degree: 1
  Best:: AccessPath: TableScan
         Cost: 85.33  Degree: 1  Resp: 85.33  Card: 11319.50  Bytes: 0

可以从以上10053中看到因为没有直方图存在,所以这里的Density = 0.5 是从 1/ NDV 算得的
也就意味着粗糙的统计信息显示STATUS='INVALID"的数据行占总行数的一半,
所以优化器选择做全表扫描是有道理的

以上符合”STATUS”=’INVALID’ condition的只有2行,且status列上建有索引,同时也使用了dbms_stats包收集表和索引上的统计信息,照理说CBO因该选择INDEX Range ind_maclean1,而避免全表扫描,但实际优化器opitimizer没有这样做。

实际上这个问题和统计信息收集时是否收集直方图有关系,只要收集了直方图,那么优化器就会了解到status=’INVALID’条件仅有少量的card满足,具有良好的选择性:

[oracle@vrh4 ~]$ sqlplus  / as sysdba

SQL*Plus: Release 11.2.0.2.0 Production on Mon Oct 17 19:15:45 2011

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

SQL> show parameter optimizer_fea

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_features_enable            string      11.2.0.2

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.oracledatabase12g.com  & www.askmaclean.com

SQL> drop table maclean;

Table dropped.

SQL>  create table maclean as select * from dba_objects;

Table created.

SQL> update maclean set status='INVALID' where owner='MACLEAN';

2 rows updated.

SQL>  commit;

Commit complete.

SQL> create index ind_maclean on maclean(status);

Index created.

SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN',cascade=>true, method_opt=>'FOR ALL COLUMNS SIZE 2');

PL/SQL procedure successfully completed.

这里我们仅收集2个bucket的直方图, 就足以让优化器做出正确选择了。

得益于Quest公司的Guy Harrison所写的一个列出FREQUENCY直方图信息的脚本,以下为该脚本:

rem
rem Generate a histogram of data distribution in a column as recorded
rem  in dba_tab_histograms
rem
rem Guy Harrison Jan 2010 : www.guyharrison.net
rem
rem hexstr function is from From http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:707586567563 

set pagesize 10000
set lines 120
set verify off

col char_value format a10 heading "Endpoint|value"
col bucket_count format 99,999,999 heading "bucket|count"
col pct format 999.99 heading "Pct"
col pct_of_max format a62 heading "Pct of|Max value"
rem col endpoint_value format 9999999999999 heading "endpoint|value" 

CREATE OR REPLACE FUNCTION hexstr (p_number IN NUMBER)
    RETURN VARCHAR2
AS
    l_str      LONG := TO_CHAR (p_number, 'fm' || RPAD ('x', 50, 'x'));
    l_return   VARCHAR2 (4000);
BEGIN
    WHILE (l_str IS NOT NULL)
    LOOP
        l_return := l_return || CHR (TO_NUMBER (SUBSTR (l_str, 1, 2), 'xx'));
        l_str := SUBSTR (l_str, 3);
    END LOOP;

    RETURN (SUBSTR (l_return, 1, 6));
END;
/

WITH hist_data AS (
SELECT endpoint_value,endpoint_actual_value,
       NVL(LAG (endpoint_value) OVER (ORDER BY endpoint_value),' ') prev_value,
       endpoint_number,
       endpoint_number,
       endpoint_number
       - NVL (LAG (endpoint_number) OVER (ORDER BY endpoint_value), 0)
           bucket_count
FROM dba_tab_histograms
JOIN dba_tab_col_statistics USING (owner, table_name,column_name)
WHERE     owner = '&owner'
      AND table_name = '&table'
      AND column_name = '&column'
      AND histogram='FREQUENCY')
SELECT nvl(endpoint_actual_value,endpoint_value) endpoint_value ,
       bucket_count,
       ROUND(bucket_count*100/SUM(bucket_count) OVER(),2) PCT,
       RPAD(' ',ROUND(bucket_count*50/MAX(bucket_count) OVER()),'*') pct_of_max
  FROM hist_data;

WITH hist_data AS (
SELECT endpoint_value,endpoint_actual_value,
       NVL(LAG (endpoint_value) OVER (ORDER BY endpoint_value),' ') prev_value,
       endpoint_number,
       endpoint_number,
       endpoint_number
       - NVL (LAG (endpoint_number) OVER (ORDER BY endpoint_value), 0)
           bucket_count
FROM dba_tab_histograms
JOIN dba_tab_col_statistics USING (owner, table_name,column_name)
WHERE     owner = '&owner'
      AND table_name = '&table'
      AND column_name = '&column'
      AND histogram='FREQUENCY')
SELECT hexstr(endpoint_value) char_value,
       bucket_count,
       ROUND(bucket_count*100/SUM(bucket_count) OVER(),2) PCT,
       RPAD(' ',ROUND(bucket_count*50/MAX(bucket_count) OVER()),'*') pct_of_max
  FROM hist_data
ORDER BY endpoint_value;

使用该脚本,我们可以更直观的感受FREQUENCY直方图信息:

这里dbms_stats 包收集到的STATUS=’INVALID’ bucket count=9 percent = 0.04 ,可以和之后的10053 trace中的信息对比以下:

SQL> explain plan for select * from maclean where status='INVALID';

Explained.

SQL>  select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
-------------------------------------
Plan hash value: 3087014066

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |     9 |   837 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| MACLEAN     |     9 |   837 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_MACLEAN |     9 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("STATUS"='INVALID')

以上可以看到只要收集了直方图CBO就会认识到满足STATUS=’INVALID’的cardnality很少 , 该条件具有良好的选择性 ,使用index range scan而非Full table scan。

我们进一步来看看有直方图情况下的10053 trace:

SQL> alter system flush shared_pool;

System altered.

SQL> oradebug setmypid;
Statement processed.

SQL> oradebug event 10053 trace name context forever ,level 1;
Statement processed.

SQL> explain plan for select * from maclean where status='INVALID';

Explained.

SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for MACLEAN[MACLEAN]
  Column (#10):
    NewDensity:0.000199, OldDensity:0.000022 BktCnt:22640, PopBktCnt:22640, PopValCnt:2, NDV:2

  这里的NewDensity= bucket_count / SUM(bucket_count) /2

   Column (#10): STATUS(
    AvgLen: 7 NDV: 2 Nulls: 0 Density: 0.000199
    Histogram: Freq  #Bkts: 2  UncompBkts: 22640  EndPtVals: 2
  Table: MACLEAN  Alias: MACLEAN
 Card: Original: 22640.000000 Rounded: 9 Computed: 9.00 Non Adjusted: 9.00
  Access Path: TableScan
    Cost:  85.30  Resp: 85.30  Degree: 0
      Cost_io: 85.00  Cost_cpu: 10804625
      Resp_io: 85.00  Resp_cpu: 10804625
  Access Path: index (AllEqRange)
    Index: IND_MACLEAN
    resc_io: 2.00  resc_cpu: 20763
    ix_sel: 0.000398  ix_sel_with_filters: 0.000398
    Cost: 2.00  Resp: 2.00  Degree: 1
  Best:: AccessPath: IndexRange
  Index: IND_MACLEAN
         Cost: 2.00  Degree: 1  Resp: 2.00  Card: 9.00  Bytes: 0

上例中我们手动指定收集2 bucket的直方图后CBO优化器才能做出正确的选择,那么岂不是要人工干预来收集列的统计信息,默认的dbms_stats.DEFAULT_METHOD_OPT方式不能为我们提供有效的直方图收集方式吗?

实际上dbms_stats的自动决定直方图的收集与否及收集的桶数受到col_usage$基本中列充当predicate的历史记录影像,关于col_usage$详见<了解你所不知道的SMON功能(四):维护col_usage$字典基表>

假设在统计表上信息的dbms_stats存储过程执行之前,col_usage$中已经存有表上相关列充当predicate的记录,那么dbms_stats存储过程就会考虑为该列收集直方图信息, 如:

SQL> drop table maclean;

Table dropped.

SQL>  create table maclean as select * from dba_objects;

Table created.

SQL> update maclean set status='INVALID' where owner='MACLEAN';

2 rows updated.

SQL> commit;

Commit complete.

SQL> create index ind_maclean on maclean(status);

Index created.

使用dbms_stats默认method_opt收集maclean表

SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN');

PL/SQL procedure successfully completed.

@histogram.sql 

Enter value for owner: SYS
old  12:    WHERE owner = '&owner'
new  12:    WHERE owner = 'SYS'
Enter value for table: MACLEAN
old  13:      AND table_name = '&table'
new  13:      AND table_name = 'MACLEAN'
Enter value for column: STATUS
old  14:      AND column_name = '&column'
new  14:      AND column_name = 'STATUS'

no rows selected

因为缺少col_usage$列使用信息,所以依然没有收集status列的直方图

    declare
    begin
    for i in 1..500 loop
	execute immediate ' alter system flush shared_pool';
	DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;
    execute immediate 'select count(*)  from maclean where status=''INVALID'' ' ;
    end loop;
    end;
    /

PL/SQL procedure successfully completed.

SQL> select obj# from obj$ where name='MACLEAN';

      OBJ#
----------
     97215
SQL> select * from  col_usage$ where  OBJ#=97215;

       OBJ#    INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------
     97215          1              1              0                 0           0          0          0 17-OCT-11
     97215         10            499              0                 0           0          0          0 17-OCT-11

SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN');

PL/SQL procedure successfully completed.

@histogram.sql 

Enter value for owner: SYS
Enter value for table: MACLEAN
Enter value for column: STATUS

Endpoint        bucket         Pct of
value            count     Pct Max value
---------- ----------- ------- --------------------------------------------------------------
INVALI               2     .04
VALIC3           5,453   99.96  *************************************************
Comments:

可以加上china这个标签

Posted by guest on October 09, 2012 at 03:46 AM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

author's avatar The Maclean Liu
Advanced Customer Services
AskMaclean Logo 10g_ocm SHOUG

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today