X

Enabling Real-Time Analytics With Database In-Memory

  • June 1, 2020

Ask TOM Office Hours Follow Up

Andy Rivenes
Product Manager

In the last Ask TOM Office Hours session we held on May 27th we talked about Database In-Memory and how population and queries work on RAC. I started with a default population of the SSB schema on a 3 node RAC database. The population looked like the following:


SQL> @rac_imseg

   INST_ID SEGMENT_NAME         PARTITION_NAME       BYTES BYTES_NOT_POPULATED BYTES_POPULATED TOTAL_BYTES
---------- -------------------- --------------- ---------- ------------------- --------------- -----------
         1 CUSTOMER                               24862720                   0        24862720    24862720
         1 DATE_DIM                                 122880                   0          122880      122880
         1 LINEORDER                            3693232128          2617409536      1075822592  3693232128
         2 LINEORDER                            3693232128          2147729408      1545502720  3693232128
         3 LINEORDER                            3693232128          2621325312      1071906816  3693232128
         1 PART                                   56721408                   0        56721408    56721408
         1 SUPPLIER                                1761280                   0         1761280     1761280

7 rows selected.

SQL>

Towards the end of the session I showed an example of populating the LINEORDER table on a specific node using the DISTRIBUTE FOR SERVICE clause:

alter table lineorder inmemory priority high memcompress for capacity high distribute for service "dbim2";

After the population a view of GV$IM_SEGMENTS looked like the following:

SQL> @rac_imseg

   INST_ID SEGMENT_NAME         PARTITION_NAME       BYTES BYTES_NOT_POPULATED BYTES_POPULATED TOTAL_BYTES
---------- -------------------- --------------- ---------- ------------------- --------------- -----------
         1 CUSTOMER                               24862720                   0        24862720    24862720
         1 DATE_DIM                                 122880                   0          122880      122880
         2 LINEORDER                            3693232128                   0      3693232128  3693232128
         1 PART                                   56721408                   0        56721408    56721408
         1 SUPPLIER                                1761280                   0         1761280     1761280

SQL>

The purpose was to show that you can direct the population of a specific object to a specific column store, the LINEORDER table to just instance 2 in this example, using the DISTRIBUTE FOR SERVICE clause.

However, we had a question during the session asking "Do you have to use parallel query if you're querying a table that is fully populated on the same node that you're connected to?".  So I've modified the example that I used in the session to instead populate the LINEORDER table fully on instance 1 so that all of the tables are populated on the same node. The population now looks like this:

SQL> @rac_imseg

   INST_ID SEGMENT_NAME         PARTITION_NAME       BYTES BYTES_NOT_POPULATED BYTES_POPULATED TOTAL_BYTES
---------- -------------------- --------------- ---------- ------------------- --------------- -----------
         1 CUSTOMER                               24862720                   0        24862720    24862720
         1 DATE_DIM                                 122880                   0          122880      122880
         1 LINEORDER                            3693232128                   0      3693232128  3693232128
         1 PART                                   56721408                   0        56721408    56721408
         1 SUPPLIER                                1761280                   0         1761280     1761280

SQL>

I then ran the same serial query that I ran during the session that is a join on the DATE_DIM and LINEORDER tables. In the original example the query ran slowly because it could not access the parts of the LINEORDER table that were populated on the other RAC instances (see the first population example above). Now in this example, when connected to instance 1, the query can access all of the data in the IM column store running with serial execution:

SQL> @imq-serial.sql

Session altered.

SQL>
SQL> -- Demonstrate an in-memory join query
SQL>
SQL> select /*+ MONITOR */
  2         sum(lo_extendedprice * lo_discount) revenue
  3  From   LINEORDER l, DATE_DIM d
  4  Where  l.lo_orderdate = d.d_datekey
  5  And    l.lo_discount between 2 and 3
  6  And    l.lo_quantity < 24
  7  And    d.d_date='December 24, 1996';

   REVENUE
----------
9710699495

Elapsed: 00:00:01.38
SQL>
SQL> set echo off
SQL Monitoring Report

SQL Text
------------------------------
select /*+ MONITOR */ sum(lo_extendedprice * lo_discount) revenue From LINEORDER l, DATE_DIM d Where l.lo_orderdate = d.d_datekey And l.lo_discount between 2 and 3 And l.lo_quantity < 24 And d.d_date='December 24, 1996'

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)
 Instance ID         :  1
 Session             :  SSB (158:12722)
 SQL ID              :  bf93ptmt5vvpq
 SQL Execution ID    :  16777217
 Execution Started   :  06/01/2020 19:50:09
 First Refresh Time  :  06/01/2020 19:50:09
 Last Refresh Time   :  06/01/2020 19:50:10
 Duration            :  1s
 Module/Action       :  SQL*Plus/-
 Service             :  dbim1
 Program             :  sqlplus@oscracvm245 (TNS V1-V3)
 Fetch Calls         :  1

Global Stats
=================================================
| Elapsed |   Cpu   |  Other   | Fetch | Buffer |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  |
=================================================
|    1.38 |    1.31 |     0.07 |     1 |      3 |
=================================================

SQL Plan Monitoring Details (Plan Hash Value=2403472142)
==================================================================================================================================================
| Id |            Operation            |   Name    |  Rows   | Cost |   Time    | Start  | Execs |   Rows   |  Mem  | Activity | Activity Detail |
|    |                                 |           | (Estim) |      | Active(s) | Active |       | (Actual) | (Max) |   (%)    |   (# samples)   |
==================================================================================================================================================
|  0 | SELECT STATEMENT                |           |         |      |         1 |     +1 |     1 |        1 |     . |          |                 |
|  1 |   SORT AGGREGATE                |           |       1 |      |         1 |     +1 |     1 |        1 |     . |          |                 |
|  2 |    HASH JOIN                    |           |    2085 | 6363 |         1 |     +1 |     1 |     2131 | 299KB |          |                 |
|  3 |     JOIN FILTER CREATE          | :BF0000   |       1 |   12 |         1 |     +1 |     1 |        1 |     . |          |                 |
|  4 |      TABLE ACCESS INMEMORY FULL | DATE_DIM  |       1 |   12 |         1 |     +1 |     1 |        1 |     . |          |                 |
|  5 |     JOIN FILTER USE             | :BF0000   |      5M | 6335 |         1 |     +1 |     1 |     2131 |     . |          |                 |
|  6 |      TABLE ACCESS INMEMORY FULL | LINEORDER |      5M | 6335 |         1 |     +1 |     1 |     2131 |     . |          |                 |
==================================================================================================================================================


SQL>

Note that I've added the MONITOR hint to ensure that we captured the SQL Monitor data (i.e. the query is going to run too fast to invoke SQL Monitor capture by default).

And we now see that if a serial query is run from the same node where an object(s) is populated in the IM column store then Parallel Query does not have to be used. Now, in many cases, and this one included, the query time may not be as fast. However, this is not due to not being able to query all of the data in the IM column store, but instead is because parallel execution, even across RAC nodes, can be faster than serial execution. If you're curious and don't want to go watch the video, the parallel query with the LINEORDER table distributed across all three instances ran in 0.57 seconds and this serial query with all of the objects populated in instance 1 ran in 1.38 seconds. However, the serial query that could only access one third of the LINEORDER table in the Ask TOM session took 19.74 seconds.

At this point you might be wondering why any of this matters. There are applications, Oracle's EBS and SAP ERP are two examples, where inter-instance parallelism is not supported. This means that you really can't use Database In-Memory on RAC for those applications and get the best analytic performance. However, with the DISTRIBUTE FOR SERVICE clause you have the ability to direct population and application connections to the same instance. This makes it possible to allow those applications that restrict the use of parallel execution to still take full advantage of Database In-Memory's amazing analytic query performance on RAC databases.

 

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.