X

Enabling Real-Time Analytics With Database In-Memory

  • August 27, 2020

Database In-Memory and the Autonomous Data Warehouse

Andy Rivenes
Product Manager

I was watching Maria Colgan's (@SQLMaria) excellent replay session "Achieving Hyperscale with Oracle Autonomous Database" from the Oracle Developer Live event and one of the examples she used looked very familiar. If you scan forward in the video to the 26:30 mark you will see Maria demonstrating joins and aggregations. Well surprise, surprise the execution plans shows a vector group by plan. But you might be thinking, how can that be? Vector group by is a Database In-Memory optimization and you would be right. But guess what? The Autonomous Data Warehouse (ADW) uses the Database In-Memory feature that enables the use of Exadata flash cache for Database In-Memory columnar formats. This feature was first introduced in 12.2 Exadata System Software.

Note: ADW does not currently support the In-Memory (IM) Column Store, which is different than the Database In-Memory columnar format in Exadata flash.

We have blogged about the use of Exadata flash cache for Database In-Memory columnar formats several times and you can also find information about CELLMEMORY on Roger Macnicol's SmartScan Deep Dive blog. Since ADW uses Database In-Memory columnar formats in flash cache it can also take advantage of Database In-Memory's other query optimizations like vector group by.

I thought it would be fun to explore the query that Maria ran on ADW in a little more detail since it is also applicable to other Exadata environments as well. The following is almost the same query, this is the version from our DBIM hands on lab examples, and results in the same execution plan:

select /*+ NO_RESULT_CACHE */
  d.d_year,
  c.c_nation,
  sum(lo_revenue - lo_supplycost) profit
from
  LINEORDER l,
  DATE_DIM d,
  PART p,
  SUPPLIER s,
  CUSTOMER C
where
  l.lo_orderdate = d.d_datekey
  and l.lo_partkey = p.p_partkey
  and l.lo_suppkey = s.s_suppkey
  and l.lo_custkey = c.c_custkey
  and s.s_region = 'AMERICA'
  and c.c_region = 'AMERICA'
group by
  d.d_year,
  c.c_nation
order by
  d.d_year,
  c.c_nation

Notice the NO_RESULT_CACHE hint. This just ensures that the query runs in the database so we can evaluate the session statistics properly.

Here is the execution plan from running the above query in my ADW environment:

Plan hash value: 2021245554

------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                        | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                             |       |       |  3592 (100)|          |
|   1 |  TEMP TABLE TRANSFORMATION               |                             |       |       |            |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9FD40F_2838F7D9 |       |       |            |          |
|   3 |    HASH GROUP BY                         |                             |     7 |    98 |     3  (34)| 00:00:01 |
|   4 |     KEY VECTOR CREATE BUFFERED           | :KV0000                     |     7 |    98 |     2   (0)| 00:00:01 |
|   5 |      TABLE ACCESS STORAGE FULL           | DATE_DIM                    |  2556 | 25560 |     2   (0)| 00:00:01 |
|   6 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9FD410_2838F7D9 |       |       |            |          |
|   7 |    HASH GROUP BY                         |                             |     1 |     9 |    52  (60)| 00:00:01 |
|   8 |     KEY VECTOR CREATE BUFFERED           | :KV0001                     |     1 |     9 |    27  (23)| 00:00:01 |
|   9 |      TABLE ACCESS STORAGE FULL           | PART                        |   800K|  3906K|    25  (16)| 00:00:01 |
|  10 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9FD411_2838F7D9 |       |       |            |          |
|  11 |    HASH GROUP BY                         |                             |     1 |    22 |     3  (34)| 00:00:01 |
|  12 |     KEY VECTOR CREATE BUFFERED           | :KV0002                     |     1 |    22 |     2   (0)| 00:00:01 |
|* 13 |      TABLE ACCESS STORAGE FULL           | SUPPLIER                    |  4113 | 74034 |     2   (0)| 00:00:01 |
|  14 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9FD412_2838F7D9 |       |       |            |          |
|  15 |    HASH GROUP BY                         |                             |     5 |   190 |    22  (37)| 00:00:01 |
|  16 |     KEY VECTOR CREATE BUFFERED           | :KV0003                     |     5 |   190 |    19  (27)| 00:00:01 |
|* 17 |      TABLE ACCESS STORAGE FULL           | CUSTOMER                    | 60436 |  2006K|    19  (27)| 00:00:01 |
|  18 |   SORT GROUP BY                          |                             |    35 |  3605 |  3512  (26)| 00:00:01 |
|* 19 |    HASH JOIN                             |                             |    35 |  3605 |  3511  (26)| 00:00:01 |
|  20 |     MERGE JOIN CARTESIAN                 |                             |    35 |  2590 |     8   (0)| 00:00:01 |
|  21 |      MERGE JOIN CARTESIAN                |                             |     5 |   300 |     6   (0)| 00:00:01 |
|  22 |       MERGE JOIN CARTESIAN               |                             |     1 |    31 |     4   (0)| 00:00:01 |
|  23 |        TABLE ACCESS STORAGE FULL         | SYS_TEMP_0FD9FD410_2838F7D9 |     1 |     9 |     2   (0)| 00:00:01 |
|  24 |        BUFFER SORT                       |                             |     1 |    22 |     2   (0)| 00:00:01 |
|  25 |         TABLE ACCESS STORAGE FULL        | SYS_TEMP_0FD9FD411_2838F7D9 |     1 |    22 |     2   (0)| 00:00:01 |
|  26 |       BUFFER SORT                        |                             |     5 |   145 |     4   (0)| 00:00:01 |
|  27 |        TABLE ACCESS STORAGE FULL         | SYS_TEMP_0FD9FD412_2838F7D9 |     5 |   145 |     2   (0)| 00:00:01 |
|  28 |      BUFFER SORT                         |                             |     7 |    98 |     6   (0)| 00:00:01 |
|  29 |       TABLE ACCESS STORAGE FULL          | SYS_TEMP_0FD9FD40F_2838F7D9 |     7 |    98 |     0   (0)|          |
|  30 |     VIEW                                 | VW_VT_80F21617              |    35 |  1015 |  3503  (26)| 00:00:01 |
|  31 |      VECTOR GROUP BY                     |                             |    35 |  1680 |  3503  (26)| 00:00:01 |
|  32 |       HASH GROUP BY                      |                             |    35 |  1680 |  3503  (26)| 00:00:01 |
|  33 |        KEY VECTOR USE                    | :KV0000                     |  3815K|   174M|  3501  (26)| 00:00:01 |
|  34 |         KEY VECTOR USE                   | :KV0001                     |  3815K|   160M|  3501  (26)| 00:00:01 |
|  35 |          KEY VECTOR USE                  | :KV0003                     |  3815K|   145M|  3500  (26)| 00:00:01 |
|  36 |           KEY VECTOR USE                 | :KV0002                     |    12M|   431M|  3500  (26)| 00:00:01 |
|* 37 |            TABLE ACCESS STORAGE FULL     | LINEORDER                   |    59M|  1830M|  3499  (26)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  13 - storage("S"."S_REGION"='AMERICA')
       filter("S"."S_REGION"='AMERICA')
  17 - storage("C"."C_REGION"='AMERICA')
       filter("C"."C_REGION"='AMERICA')
  19 - access("ITEM_12"=INTERNAL_FUNCTION("C0") AND "ITEM_13"=INTERNAL_FUNCTION("C0") AND
              "ITEM_15"=INTERNAL_FUNCTION("C0") AND "ITEM_14"=INTERNAL_FUNCTION("C0"))
  37 - storage((SYS_OP_KEY_VECTOR_FILTER("L"."LO_SUPPKEY",:KV0002) AND
              SYS_OP_KEY_VECTOR_FILTER("L"."LO_CUSTKEY",:KV0003) AND SYS_OP_KEY_VECTOR_FILTER("L"."LO_PARTKEY",:KV0001) AND
              SYS_OP_KEY_VECTOR_FILTER("L"."LO_ORDERDATE",:KV0000)))
       filter((SYS_OP_KEY_VECTOR_FILTER("L"."LO_SUPPKEY",:KV0002) AND
              SYS_OP_KEY_VECTOR_FILTER("L"."LO_CUSTKEY",:KV0003) AND SYS_OP_KEY_VECTOR_FILTER("L"."LO_PARTKEY",:KV0001) AND
              SYS_OP_KEY_VECTOR_FILTER("L"."LO_ORDERDATE",:KV0000)))

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1
   - statistics feedback used for this statement
   - vector transformation used for this statement

Notice that a vector group by plan is used. Also notice that the access paths are TABLE ACCESS STORAGE FULL and no mention of INMEMORY. Otherwise it looks just like the execution plan when the Database In-Memory IM column store is enabled. You can check out an in-memory version of the plan here, just scroll down to page 59.

OK great, so you can do a vector group by on ADW. Does it actually use Database In-Memory columnar formats in the flash cache? Well we know that we can validate whether Database In-Memory was actually used by examining the session statistics for the query. We have talked about using session statistics since Database In-Memory was first released starting here.

Rather than list all 176 non-zero statistics generated during the query, I will list the key ones that show the use of CELLMEMORY, key vector processing and vector group by. The CELLMEMORY statistics are the same basic ones that my colleague Markus Kissling described in his blog post on CELLMEMORY.

NAME                                                                        VALUE
---------------------------------------------------------------- ----------------

cell physical IO bytes processed for IM capacity                       1601241088
cell physical IO bytes saved by columnar cache                          676462592

cellmemory IM scan CUs processed for capacity                                1738

key vector filtered on cell                                              59405934
key vector hash cells scanned                                               62329
key vector hash inserts                                                        12
key vector hash probes                                                      62317
key vector probed on cell                                               138639692
key vector queries                                                              1
key vector rows processed by value                                        4883012
key vector serializations for cell                                              4
key vectors created                                                             4
key vectors created (bit wide)                                                  2
key vectors created (nibble wide)                                               2
key vectors created (offset layout)                                             1
key vectors created (simple layout)                                             3
key vectors sent to cell                                                        4

vector group by accumspace cardinality                                         35
vector group by accumspace size                                              1120
vector group by used                                                            1


To sum up, ADW automatically takes advantage of Database In-Memory optimizations for Exadata flash cache and these are the same optimizations that you can take advantage of on other Exadata systems when you enable Database In-Memory.

 

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.