X

Enabling Real-Time Analytics With Database In-Memory

How to control where objects are populated into memory on RAC

Maria Colgan
Master Product Manager

If you recall from part 1 of our blog series on RAC each RAC node has it’s own In-Memory column store (IM column store). When a table is populated into memory in a RAC environment it will be automatically distributed across all of the IM column stores in the cluster. That is to say, a piece of the table will appear in each RAC node, effectively making the IM column store a shared-nothing architecture.

We have also discussed how you can use Database Services to control where an object would be populated into memory, in a RAC environment, as long as the object was manually populated (no INMEMORY PRIORTY specified).

Starting in Oracle Database 12c Release 2, it is now much easier to selectively populate objects to specific IM column stores, in a RAC environment, regardless of how they’re populated, by taking advantage of the new DISTRIBUTE FOR SERVICE subclause of the INMEMORY attribute.

The new FOR SERVICE subclause of the INMEMORY DISTRIBUTE clause enables you to control exactly where an object is populated into memory based on where a service is allowed to run. If the service is stopped then the objects distributed for that service would be automatically removed from the IM column store(s).

The FOR SERVICE sub-clause was actually introduced to manage where objects should be populated into memory in an Active Data Guard environment (primary or standby). However, this post will focus on the secondary use, controlling the location where objects will be populated into memory in a RAC environment.

Let’s take a look at an example to see exactly how the new DISTRIBUTE FOR SERVICE clause works. If you prefer to watch a video of the demo just scroll to the bottom of this post.

Imagine we have a two-node RAC cluster (RAC1 and RAC2), on which 3 services have been defined:

srvctl add service -db orcl -service IM1 –preferred “rac1”
srvctl add service -db orcl -service IM2 -r preferred “rac2”
srvctl add service -db orcl -service IM_ALL -r preferred “rac1,rac2”

srvctl start service -db orcl -service "IM1,IM2,IM_ALL"

srvctl status service -db orcl
Service IM1 is running on instance(s) rac1
Service IM2 is running on instance(s) rac2
Service IM_ALL is running on instance(s) rac1,rac2

Now let’s assume we would like to populate the CUSTOMERS table into memory on RAC1, the PRODUCTS table on RAC2 and have the SALES table distributed across the IM column store on both RAC nodes. The new syntax that allows us to do this is really quite simple:

ALTER TABLE customers INMEMORY PRIORITY HIGH DISTRIBUTE FOR SERVICE IM1;

ALTER TABLE products INMEMORY PRIORITY MEDIMUM DISTRIBUTE FOR SERVICE IM2;

ALTER TABLE sales INMEMORY PRIOIRTY HIGH;

Let’s now look at the v$ performance views (gv$INMEMORY_AREA and gv$IM_SEGMENTS) to see what’s happening inside the IM column stores, across the RAC nodes.

   INST_ID POOL             ALLOC_BYTES USED_BYTES POPULATE_STATUS                
---------- --------------- ----------- ---------- ---------------------
         1 1MB POOL         854589440   32505856 	DONE                                
         1 64KB POOL        201326592     917504 	DONE                                
         2 1MB POOL         854589440   70254592 	DONE                                
         2 64KB POOL        201326592    1703936 	DONE                                

As you can see, we have populated data into the IM column store on each of the RAC nodes but we will need to check gv$IM_SEGMENTS to see which objects have been populated into memory and where.

SELECT v.inst_id, 
       v.owner, 
       v.segment_name name, 
       v.populate_status, 
       SUM(v.bytes_not_populated)/1024/1024 MB_not_populated 
FROM   gv$im_segments v 
GROUP  BY v.inst_id,  v.owner, v.segment_name, v.populate_status; 

INST_ID OWNER      NAME                 POPULATE_STAT MB_NOT_POPULATED
------ ---------- -------------------- ------------- --------------
2 		SH         SALES             COMPLETED    262.968
1 		SH         CUSTOMERS         COMPLETED    0
1 		SH         SALES             COMPLETED    476.429
2 		SH         PRODUCTS          COMPLETED    0

The CUSTOMERS table is fully populated in the IM column store on RAC1, while the PRODUCT table is fully populate in the IM column store on RAC2 (BYTES_NOT_POPULATED is 0). Since we didn’t specify a DISTRIBUTE FOR SERVICE clause on the SALES table, it has been partially populated on both (BYTES_NOT_POPULATED is not 0).

In order to confirm the SALES table is fully populated we will have to look at a different view called gv$IM_SEGMENTS_DETAIL and compare the number of blocks in-memory on each node to the total number of blocks in the SALES table.

SELECT m.inst_id, 
       m.blocksinmem, 
       m.datablocks 
FROM   gv$im_segments_detail m, 
       user_objects o 
WHERE  m.dataobj = o.object_id 
AND    o.object_name = 'SALES'; 

  
 INST_ID   BLOCKSINMEM DATABLOCKS
---------- ----------- ----------
	 1	 33660      94643
	 2	 60983      94643

As you can see the SALES table is made up of 94,643 blocks. 33,660 blocks are populated on node 1 and 60,983 on node 2. Adding these values together we see that all 94,643 blocks from the SALES table are populated into memory.

33660 + 60983 = 94643

So, now we know everything is populated correctly, let’s see what happens if we connect to the database via the service IM1 and run a query that accesses all three tables.

SELECT c.cust_city, 
       p.prod_name, 
       SUM(s.amount_sold) 
FROM   sales s, 
       customers c, 
       products p 
WHERE  s.cust_id = c.cust_id 
AND    s.prod_id = p.prod_id 
GROUP  BY c.cust_city, p.prod_name; 

From the execution plan, we can see that the goal is to access some or all of the data in each of the tables from the IM column store, as the access method chosen for each table is TABLE ACCESS INMEMORY FULL.

-----------------------------------------------------
| Id  | Operation                       | Name      | 
-----------------------------------------------------
|   0 | SELECT STATEMENT                |           |
|   1 |  SORT AGGREGATE                 |           |  
|   2 |   VIEW                          |           |  
|   3 |    HASH GROUP BY                |           |  
|   4 |     HASH JOIN                   |           |   
|   5 |      TABLE ACCESS INMEMORY FULL | CUSTOMERS |    
|   6 |      HASH JOIN                  |           | 
|   7 |       TABLE ACCESS INMEMORY FULL| PRODUCTS  |   
|   8 |       TABLE ACCESS INMEMORY FULL| SALES     | 
------------------------------------------------------

However, when we look at the In-Memory session statistics we can see that only two tables were actually accessed via the IM column store (table scan (IM) is 2) and one table was accessed via the disk (IM scan segments disk is 1).

select t1.name, t2.value 
FROM v$sysstat t1, v$mystat t2 
WHERE t1.name IN ('table scans (long tables)',
                  'table scans (IM)',
                  'session logical reads',
                  'session logical reads - IM',
                  'IM scan rows', 
                  'IM scan segments disk')
AND t1.statistic# = t2.statistic# 
ORDER BY t1.name;

NAME			      VALUE
-------------------------  ----------
IM scan rows		      2093832
IM scan segments disk               1
session logical reads          109548
session logical reads - IM      45469
table scans (IM)                    2
table scans (long tables)           1

6 rows selected.

Since the PRODUCTS table is in-memory only on the IM2 service, we won’t be able to access it from the IM1 service.

In reality if our three tables were going to be used by an application that connects to the database via the IM1 service, we should specify that service in the DISTRIBUTE FOR SERVICE clause on all three tables. Let’s do that now and see what happens to our query.

ALTER TABLE products INMEMORY PRIORITY MEDIMUM DISTRIBUTE FOR SERVICE IM1;

ALTER TABLE sales INMEMORY PRIOIRTY HIGH DISTRIBUTE FOR SERVICE IM1;

Changing the DISTRIBUTE FOR SERVICE clause on the PRODUCTS and SALES tables will trigger those tables to be removed from the IM column store completely and repopulated honoring the new distribution method.

SELECT v.inst_id, 
       v.owner, 
       v.segment_name name, 
       v.populate_status, 
       SUM(v.bytes_not_populated)/1024/1024 MB_not_populated 
FROM   gv$im_segments v 
GROUP  BY v.inst_id,  v.owner, v.segment_name, v.populate_status; 

INST_ID OWNER      NAME                 POPULATE_STAT MB_NOT_POPULATED
------ ---------- -------------------- ------------- --------------
1 		SH         CUSTOMERS         COMPLETED    0

Once the repopulation completes we will see all three table fully populated on RAC1.

SELECT v.inst_id, 
       v.owner, 
       v.segment_name name, 
       v.populate_status, 
       SUM(v.bytes_not_populated)/1024/1024 MB_not_populated 
FROM   gv$im_segments v 
GROUP  BY v.inst_id,  v.owner, v.segment_name, v.populate_status; 

INST_ID OWNER      NAME                 POPULATE_STAT MB_NOT_POPULATED
------ ---------- -------------------- ------------- --------------
1 		SH         SALES             COMPLETED    0
1 		SH         CUSTOMERS         COMPLETED    0
1 		SH         PRODUCTS          COMPLETED    0

Now if we try our query again, we will get the same execution plan but the session statistics show we really are accessing all three tables from the IM column store, as expected.

SELECT t1.name, t2.value 
FROM v$sysstat t1, v$mystat t2 
WHERE t1.name IN ('table scans (long tables)',
                  'table scans (IM)',
                  'session logical reads',
                  'session logical reads - IM',
                  'IM scan rows', 
                  'IM scan segments disk')
AND t1.statistic# = t2.statistic# 
ORDER BY t1.name;


NAME			      VALUE
-------------------------  ----------
IM scan rows		      5040744
IM scan segments disk               0
session logical reads          109548
session logical reads - IM     109405
table scans (IM)                    3
table scans (long tables)           1

6 rows selected.

Finally, let’s see what happens if we stop one of database service IM1.

srvctl stop service -db main -service "IM1"

Now that service is stopped, let's check the contents of the IM column stores across the cluster.

SELECT v.inst_id, 
       v.owner, 
       v.segment_name name, 
       v.populate_status. 
       SUM(v.bytes_not_populated)/1024/1024 MB_not_populated 
FROM   gv$im_segments v 
GROUP  BY v.inst_id,  v.owner,  v.populate_status; 

No rows

Since we have stopped the service that all of tables were associated with, they have all been removed from the column store.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.