11g Nested Loop Join的改变

11g中对Nested Loop Join有了一些改变,使用了新的实现方法来减少物理I/O的总体延迟:
官方在线文档在这里:
ttp://docs.oracle.com/cd/E11882_01/server.112/e16638/optimops.htm#PFGRF94642
对于一个示例SQL:
SELECT e.first_name, e.last_name, e.salary, d.department_name
    FROM hr.employees e, hr.departments d
    WHERE d.department_name IN ('Marketing', 'Sales')
      AND e.department_id = d.department_id;

11g以前(比如10g)的执行计划是这样:
-------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                   |    19 |   722 |     3  (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMPLOYEES         |    10 |   220 |     1  (0)| 00:00:01 |
|   2 |   NESTED LOOPS              |                   |    19 |   722 |     3  (0)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL        | DEPARTMENTS       |     2 |    32 |     2  (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN         | EMP_DEPARTMENT_IX |    10 |       |     0  (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("D"."DEPARTMENT_NAME"='Marketing' OR "D"."DEPARTMENT_NAME"='Sales')
   4 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID" )

11g是这样:
------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name              | Rows  | Bytes | Cost(%CPU)| Time      |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                   |    19 |   722 |     3   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                |                   |       |       |            |          |
|   2 |   NESTED LOOPS               |                   |    19 |   722 |     3   (0)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL         | DEPARTMENTS       |     2 |    32 |     2   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN          | EMP_DEPARTMENT_IX |    10 |       |     0   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID| EMPLOYEES         |    10 |   220 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("D"."DEPARTMENT_NAME"='Marketing' OR "D"."DEPARTMENT_NAME"='Sales')
   4 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID" )

并且提到了:Oracle Database 11g会将多个物理I/O请求合并,用一个总的向量I/O来处理,而不是一次处理一个。

那么,实际运行时Oracle究竟做了哪些改变呢?
我在AIX 6.1和linux上分别做了测试,来研究两个问题:
1)这个额外的NESTED LOOPS步骤,究竟有哪些改变?
2)什么是向量I/O?

10g在AIX上的10046 trace是这样:

10046 trace 10.2.0.5 on AIX 6.1:
---------------------------------------
WAIT #1: nam='db file scattered read' ela= 6907 file#=5 block#=20 blocks=5 obj#=-2147482976 tim=395875491122   <======obj#=-2147482976:DEPARTMENTS table Full scan  
...
WAIT #4: nam='db file sequential read' ela= 12942 file#=4 block#=84 blocks=1 obj#=-2147482977 tim=395875507033  <======obj#=-2147482977:EMP_DEPARTMENT_IX index branch block
WAIT #4: nam='db file sequential read' ela= 16 file#=4 block#=625 blocks=1 obj#=-2147482977 tim=395875507110    <======obj#=-2147482977:EMP_DEPARTMENT_IX index branch block
...
WAIT #4: nam='db file sequential read' ela= 14 file#=4 block#=5158 blocks=1 obj#=-2147482977 tim=395875507334   <======obj#=-2147482977:EMP_DEPARTMENT_IX index leaf block
WAIT #4: nam='db file sequential read' ela= 5071 file#=5 block#=12 blocks=1 obj#=-2147482978 tim=395875512477   <======obj#=-2147482978:EMPLOYEES table access by row id
WAIT #4: nam='db file sequential read' ela= 398 file#=5 block#=13 blocks=1 obj#=-2147482978 tim=395875512959    <======obj#=-2147482978:EMPLOYEES table access by row id
...

Nested Loop 11g之前的实现步骤是这样的:
a)全表扫描内部表DEPARTMENTS(可能使用部分过滤条件)
b)使用得到的数据关联外部表的索引,读取索引EMP_DEPARTMENT_IX枝节点(branch block)信息,
比如在这个例子里面,D"."DEPARTMENT_NAME"='Marketing' OR "D"."DEPARTMENT_NAME"='Sales',对应的DEPARTMENT_ID分别是20和80:
20    Marketing
80    Sales

在这一步,Oracle会获取哪些叶子节点(leaf block)才会包含有DEPARTMENT_ID=20或者80的ROWID,注意,它得到的只是叶子节点的block id,而没有真正获得叶子节点中存储的表上的ROWID。
也就是说,在这一步,它还没有读取叶子节点block,但是它已经拿到了所有叶子节点的block_id,作为中间结果(interim result),它实际已经获得了完整的结果集,
只不过这个结果集还在叶子节点存储的ROWID指向的表中的block上,也就是指向指针的指针:
叶子节点block_id => 叶子节点(ROWID) =>表block

c)根据叶子节点block_id读取叶子节点,以得到表ROWID,再读取表block.
这一步是读取一个叶子block,再读取多个表block,而不是一次取出全部叶子block,再一起读取表block。
因为不同叶子节点中存放的ROWID指向的表block可能是相同的,这样做可能提交了重复I/O请求。

11g的改变我猜测应该是把所有叶子节点读出来,再用一个大向量提交所有的表block I/O请求,也就是这个额外的NESTED LOOPS步骤。
但是实际测试的结果出人意料:

10046 trace 11.2.0.2 on AIX 6.1
---------------------------------------
WAIT #4574448288: nam='db file scattered read' ela= 3216 file#=5 block#=139 blocks=5 obj#=76651 tim=314602694473   <======obj#=76651:DEPARTMENTS Full scan
...
WAIT #4573144808: nam='db file sequential read' ela= 10457 file#=4 block#=171 blocks=1 obj#=76650 tim=314602714531 <======obj#=76650:EMP_DEPARTMENT_IX index branch block
WAIT #4573144808: nam='db file sequential read' ela= 9882 file#=4 block#=760 blocks=1 obj#=76650 tim=314602724535
...

WAIT #4573144808: nam='db file sequential read' ela= 4484 file#=4 block#=5255 blocks=1 obj#=76650 tim=314602744557 <======obj#=76650:EMP_DEPARTMENT_IX index leaf block
WAIT #4573144808: nam='db file sequential read' ela= 5378 file#=5 block#=131 blocks=1 obj#=76649 tim=314602750018  <======obj#=76649:EMPLOYEES table access by row id
...
WAIT #4573144808: nam='db file sequential read' ela= 9959 file#=4 block#=271 blocks=1 obj#=76650 tim=314602970033  <======obj#=76650:EMP_DEPARTMENT_IX index leaf block
WAIT #4573144808: nam='db file sequential read' ela= 15470 file#=5 block#=194 blocks=1 obj#=76649 tim=314602985572 <======obj#=76649:EMPLOYEES table access by row id
WAIT #4573144808: nam='db file sequential read' ela= 3234 file#=5 block#=195 blocks=1 obj#=76649 tim=314602988952  <======obj#=76649:EMPLOYEES table access by row id

从10046 trace看不出任何区别,取了truss才看出来差别在操作系统IO调用函数上:

10.2.0.5
------------
3014728:    26083579: kpread(12, 0x07000000128BA000, 8192, 0x000000000284C000, 0x0FFFFFFFFFFE7630) = 8192 <======12 users01.dbf INDEX EMP_DEPARTMENT_IX
3014728:    26083579: kpread(10, 0x0700000016622000, 8192, 0x0000000000018000, 0x0FFFFFFFFFFE7D30) = 8192 <======10 example01.dbf TABLE EMPLOYEES
3014728:    26083579: kpread(10, 0x0700000011AC8000, 8192, 0x000000000001A000, 0x0FFFFFFFFFFE7D30) = 8192 <======10 example01.dbf TABLE EMPLOYEES
3014728:    26083579: kpread(10, 0x0700000017578000, 8192, 0x000000000001E000, 0x0FFFFFFFFFFE7D30) = 8192 <======10 example01.dbf TABLE EMPLOYEES

<=========10.2.0.5上中规中矩,每个'db file sequential read'都由单个kpread提交。

11.2.0.2
-------------
7995454:    23658739: kpread(10, 0x070000000B77E000, 8192, 0x000000000021E000, 0x0700000018836A00) = 8192
<======10 users01.dbf INDEX EMP_DEPARTMENT_IX, 1 block
7995454:    23658739: kpread(10, 0x070000000CA16000, 8192, 0x00000000003EC000, 0x0700000018836A00) = 8192
<======10 users01.dbf INDEX EMP_DEPARTMENT_IX, 1 block
7995454:    23658739: listio64(0x0000000010000004, 0x000000000FFFFFFF, 0x00000000FFFC8890, 0x00000000000000FF, 0x0000000000000000, 0x0000000000000000, 0x0FFFFFFFFFF5F7B0, 0x00000000000000FE) = 0x0000000000000000
7995454:    23658739: aio_nwait64(0x0000000000001000, 0x00000000000000FF, 0x0FFFFFFFFFFD8890, 0x900000000000D032, 0x0000000000000000, 0x00000000000001A8, 0x0000000000000000, 0x0000000000000000) = 0x00000000000000FF
...
7995454:    23658739: kpread(10, 0x070000000FC7A000, 8192, 0x00000000004BA000, 0x0700000018836A00) = 8192
7995454:    23658739: listio64(0x0000000010000004, 0x000000000FFFFFFF, 0x00000000FFFC8890, 0x00000000000000FF, 0x0000000000000000, 0x0000000000000000, 0x0FFFFFFFFFF5F7B0, 0x00000000000000FE) = 0x0000000000000000
7995454:    23658739: aio_nwait64(0x0000000000001000, 0x00000000000000FF, 0x0FFFFFFFFFFD8890, 0x900000000000D032, 0x0000000000000000, 0x0000000000000164, 0x0000000000000000, 0x0000000000000000) = 0x00000000000000FF
...
<=========11.2.0.2上Index节点读取由单个kpread提交,表的数据读取由listio64提交,并用aio_nwait64等待返回结果。
但是仍然不是读取完全部ROWID之后统一提交,而是一个索引叶子block对应一次listio64.
listio64是API接口,操作系统上man lio_listio可以查到类似的解释。

man lio_listio

lio_listio or lio_listio64 Subroutine

       The lio_listio or lio_listio64 subroutine includes information for the
       POSIX AIO lio_listio subroutine (as defined in the IEEE std 1003.1-
       2001), and the Legacy AIO lio_listio subroutine.

       POSIX AIO lio_listio Subroutine

Purpose

       Initiates a list of asynchronous I/O requests with a single call.

Description

       The lio_listio subroutine initiates a list of I/O requests with a
       single function call.

Parameters
       mode
            Determines whether the subroutine returns when the I/O operations
            are completed, or as soon as the operations are queued.
       list
            An array of pointers to aio control structures defined in the
            aio.h file.
       nent
            Specifies the length of the array.
       sig
            Determines when asynchronous notification occurs.

以上正好解释了第2个问题,什么是向量I/O,向量I/O是将多个I/O请求存储在一个结构(structure)中,并用一个请求提交。
它依赖于不同的操作系统接口函数来实现。

综上所述,Oracle 11g关于Nested Loop Join的改变,并不在Join顺序或者中间结果集的存放方法上,而仅仅是在操作系统函数调用上封装了一层,把以前依次提交的多个I/O请求封装到了一个结构体中,并一次提交这些请求。
比较与线性I/O的实现,向量I/O的实现除了可以减少系统调用的次数,还可以经内部的优化提供性能的改善。

评论:

发表一条评论:
  • HTML语法: 禁用
About

本博客由Oracle全球技术支持中国区的工程师维护。为中文用户提供数据库相关的技术支持信息,包括常用的诊断工具、诊断方法、产品新特性、案例分析等。此外,MOS也陆续推出各类中文内容:技术通讯统一发布在Note 1529795.1 中,中文文档列表更新在Note 1533057.1 中,网上讲座请查看MOS文档 1456176.1,在"Archived"中可以下载历史的录音和文档。

Search

Archives
« 七月 2014
星期日星期一星期二星期三星期四星期五星期六
  
1
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
今天