X

Celebrating the joy and power of Oracle SQL with the Oracle Developer Advocate team

Fetching a Row Plus N Either Side Performance Analysis

Chris Saxon
Developer Advocate
In a previous article, we looked methods for fetching a row from a table along with a number of rows either side of it. In the comments, Stew Ashton offered a solution using the match_recognize clause introduced in 12c. How do the numbers stack up on the performance of these different methods?

Let's find out!

To test this, I created a one million row table (with this script). For the tests I looked for one row along with the five either side of it (:lname := 'Aaaq', :lvl := 5). Below is each query along with its elapsed time and execution plans with selected statistics (actual rows, buffer gets and elapsed time).

First up the lag() and lead() approach:

with boundaries as (
  select prevr, nextr 
  from  (
    select person_id, 
           last_name, 
           lead(person_id,:lvl,0) over  (order by person_id) as nextr,
           lag(person_id,:lvl,0) over  (order by person_id) as prevr        
    from test_people
  )
  where last_name like :lname
)
select /*+ gather_plan_statistics */emp.person_id, emp.last_name 
from   test_people emp
join   boundaries b 
on     emp.person_id between b.prevr and b.nextr
order  by emp.person_id;

Elapsed: 00:00:12.72

OPERATION                                               A_ROWS     BGETS    ETIME
------------------------------------------------------- ---------- -------- -----------------
SELECT STATEMENT
  SORT_ORDER BY                                                 11   14,720        12,483,154
    MERGE JOIN                                                  11   14,720        12,303,574
      SORT_JOIN                                          1,000,000    7,360         1,331,089
        TABLE ACCESS_FULL TEST_PEOPLE                    1,000,000    7,360           882,359
      FILTER                                                    11    7,360         9,038,280
        SORT_JOIN                                          971,628    7,360         5,210,613
          VIEW                                                   1    7,360         1,852,479
            WINDOW_SORT                                  1,000,000    7,360         2,644,408
              TABLE ACCESS_FULL TEST_PEOPLE              1,000,000    7,360           829,814

Ouch. That's taking twelve seconds to execute and features two full table scans!

Clearly this isn't going to be a workable solution unless you're certain the table will only ever store a few hundred rows.

What about match_recognize?

select /*+ gather_plan_statistics */person_id, last_name
from   test_people
match_recognize (
  order by person_id
  measures match_number() match_number
  all rows per match
  pattern(a{0, 5} b a{0, 5})
  define b as last_name like :lname
); 

Elapsed: 00:00:04.78

OPERATION                                                   A_ROWS    BGETS             ETIME
------------------------------------------------------- ---------- -------- -----------------
SELECT STATEMENT
 VIEW                                                           11    7,360           334,040
  MATCH RECOGNIZE_SORT                                          11    7,360           334,028
   TABLE ACCESS_FULL TEST_PEOPLE                         1,000,000    7,360           939,029 

That's significantly quicker - just less than five seconds. This is still doing a full table scan, so as the data grows this is going to get slower. I think this is the most straightforward query to read (at least, once you've got to grips with match_recognize!) and may be good enough for ad-hoc analysis. If this is part of an application you'll want something better.

Next, let's take a look at the solution min/max solution:

with boundaries (prevr, nextr, lvl) as (
  select nvl((select max(e.person_id) 
              from   test_people e 
              where  e.person_id < emp.person_id), 
              emp.person_id) prevr,
         nvl((select min(e.person_id) 
              from   test_people e 
              where  e.person_id > emp.person_id),
              emp.person_id) nextr,
         1 lvl
  from   test_people emp
  where  last_name like :lname
  union all
  select nvl((select max(person_id) 
              from   test_people e 
              where  e.person_id < prevr),
              prevr
         ) prevr,
         nvl((select min(person_id) 
              from   test_people e 
              where  e.person_id > nextr),
              nextr
         ) nextr,
         lvl+1 lvl
  from   boundaries
  where  lvl+1 <= :lvl
)
  select /*+ gather_plan_statistics */e.person_id, e.last_name
  from   test_people e
  join   boundaries b
  on     e.person_id between b.prevr and b.nextr
  and    b.lvl = :lvl
  order  by e.person_id;   

Elapsed: 00:00:00.25

OPERATION                                                 A_ROWS    BGETS        ETIME
------------------------------------------------------- -------- -------- ------------
SELECT STATEMENT
 SORT_ORDER BY                                                11       34          550
  NESTED LOOPS                                                11       34          570
   NESTED LOOPS                                               11       33          478
    VIEW                                                       1       30          437
     UNION ALL (RECURSIVE WITH)_BREADTH FIRST                  5       30          518
      SORT_AGGREGATE                                           1        3           32
       FIRST ROW                                               1        3           17
        INDEX_RANGE SCAN (MIN/MAX) PERSON_PK                   1        3           12
      SORT_AGGREGATE                                           1        3           20
       FIRST ROW                                               1        3           10
        INDEX_RANGE SCAN (MIN/MAX) PERSON_PK                   1        3            6
      TABLE ACCESS_BY INDEX ROWID BATCHED TEST_PEOPLE          1        4           54
       INDEX_RANGE SCAN PEOP_NAME_I                            1        3           39
      SORT_AGGREGATE                                           4       10           70
       FIRST ROW                                               4       10           46
        INDEX_RANGE SCAN (MIN/MAX) PERSON_PK                   4       10           26
      SORT_AGGREGATE                                           4       10           62
       FIRST ROW                                               4       10           36
        INDEX_RANGE SCAN (MIN/MAX) PERSON_PK                   4       10           19
      RECURSIVE WITH PUMP                                      4        0           22
    INDEX_RANGE SCAN PERSON_PK                                11        3           26
   TABLE ACCESS_BY INDEX ROWID TEST_PEOPLE                    11        1           33

Nice! Execution time is well under one second - fast enough for most purposes.

Can we do any better?

My latest colleague came up with another approach. At a high level:

  • Find the row of interest
  • Walk along the (primary key) index downwards until our boundary
  • Walk along the (primary key) index upwards until our boundary
Like the min/max method, this uses subquery factoring:
with the_row_i_need as ( 
  select person_id from test_people where last_name like :lname
), lookdown as ( 
  select person_id
  from  ( 
    select person_id
    from   (
       select /*+ index_desc(e) */ e.person_id
       from  test_people e
       where e.person_id < ( select r.person_id from the_row_i_need r )
       order by e.person_id desc
    )
    where rownum <= :lvl
    order by person_id asc
  )
  where rownum = 1
), lookup as ( 
  select person_id
  from ( 
    select person_id
    from   ( 
      select /*+ index_asc(e) */ e.person_id
      from  test_people e
      where e.person_id > ( select r.person_id from the_row_i_need r )
      order by e.person_id asc
    )
    where rownum <= :lvl
    order by person_id desc
  )
  where rownum = 1
)
  select /*+ gather_plan_statistics */ e.*
  from   test_people e,
         lookup,
         lookdown
  where  e.person_id >= lookdown.person_id
  and    e.person_id <= lookup.person_id; 

Elapsed: 00:00:00.35

OPERATION                                                      A_ROWS    BGETS        ETIME
---------------------------------------------------------- ---------- -------- ------------
SELECT STATEMENT
 TEMP TABLE TRANSFORMATION                                         11       24       19,086
  LOAD AS SELECT                                                    0        4       17,533
   TABLE ACCESS_BY INDEX ROWID BATCHED TEST_PEOPLE                  1        4           80
    INDEX_RANGE SCAN PEOP_NAME_I                                    1        3           52
  NESTED LOOPS                                                     11       20        1,057
   NESTED LOOPS                                                    11       18        1,291
    NESTED LOOPS                                                    1       14        1,025
     VIEW                                                           1        8          789
      COUNT_STOPKEY                                                 1        8          776
       VIEW                                                         1        8          758
        SORT_ORDER BY STOPKEY                                       1        8          746
         COUNT_STOPKEY                                              5        8          735
          VIEW                                                      5        8          709
           INDEX_RANGE SCAN DESCENDING PERSON_PK                    5        8          682
            VIEW                                                    1        5          608
             TABLE ACCESS_FULL SYS_TEMP_0FD9D66DB_1374337           1        5          589
     VIEW                                                           1        6          212
      COUNT_STOPKEY                                                 1        6          198
       VIEW                                                         1        6          177
        SORT_ORDER BY STOPKEY                                       1        6          161
         COUNT_STOPKEY                                              5        6          162
          VIEW                                                      5        6          131
           INDEX_RANGE SCAN PERSON_PK                               5        6          103
            VIEW                                                    1        3           66
             TABLE ACCESS_FULL SYS_TEMP_0FD9D66DB_1374337           1        3           51
    INDEX_RANGE SCAN PERSON_PK                                     11        4          226
   TABLE ACCESS_BY INDEX ROWID TEST_PEOPLE                         11        2           90 

The execution time is marginally higher than the min/max approach. It uses fewer buffer gets however. It also only accesses the person_pk index twice instead of four times. If we ask for more rows each side, this solution should scale better.

Unfortunately, there's a catch with this approach.

So far I've assumed that names are unique. Clearly this isn't the case. What happens if the search matches multiple names?

ORA-01427: single-row subquery returns more than one row 

Uh-oh.

The look up and down approach is only guaranteed to work if searching on unique columns. Looks like the min/max approach is the most efficient general case solution.

What do you think? Which method do you like best? Can you find a faster or more elegant approach?

If so, please let us know in the comments!

Join the discussion

Comments ( 2 )
  • Stew Ashton Thursday, June 25, 2015

    Hi Chris,

    I tried min/max with :lname = 'Ey' (57 occurences in my table) and it ran quickly.

    I then tried with :lname = 'E%' and it's still running.

    The trouble with these optimizations written by us is that we cannot change our minds dynamically based on statistics, whereas the Optimizer can.

    We have to know there are few occurences of the "last name" we are searching on, otherwise the min/max solution will be slow to very slow.

    Try this:
    select a.person_id source_id, l.* from test_people a,
    lateral(
    select * from (
    select * from test_people
    where person_id >= a.person_id
    order by person_id
    )
    where rownum <= 6
    union all
    select * from (
    select * from test_people
    where person_id < a.person_id
    order by person_id desc
    )
    where rownum <= 5
    ) l
    where a.last_name = 'Ey'
    order by 1,2;

  • Chris Saxon Thursday, June 25, 2015

    Fair point Stew. Though I'd argue if your original search will return a large number of rows then the point of also returning N either side is probably flawed! A search for E% is almost certain to find names within 10 rows of each-other (there's over 10,000 in my resultset!)

    Nice work with lateral - looks like a new winner to me :)

Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.