I've been meaning to update our four part RAC series which we wrote when Database In-Memory was first available with Oracle Database 12c patchset 188.8.131.52. A lot has changed since then but rather than re-invent the wheel I thought it would be better to just update what we had already written. So here goes …
When running Database In-Memory in a RAC environment it is important to consider that each instance will have its own In-Memory column store (IM column store). By default, objects populated into the IM column store will be distributed across all of the IM column stores in the cluster, effectively making the IM column store a shared-nothing architecture in a RAC environment. This means that parallel query must be used to access data on other RAC nodes. This post focuses on how parallelization should be used when running Database In-Memory in a RAC environment.
But let’s start with some basic background information on how the IM column stores are populated on RAC and then discuss how to manage parallelization for Database In-Memory in a RAC environment.
In a RAC environment, how an object is populated into the IM column store is controlled by the DISTRIBUTE sub-clause. By default, the DISTRIBUTE sub-clause is set to AUTO. This means that Oracle will distribute the object across the available IM column stores using one of the following options, unless the object is so small that it only consists of 1 In-Memory Compression Unit (IMCU), in which case it will reside on just one node:
The distribution is performed by w0* background processes as a part of the segment population task.
On Oracle Engineered Systems the DUPLICATE sub-clause of the INMEMORY attribute can also be used to mirror the IMCUs for any objects specified with the DUPLICATE sub-clause.
When data is populated in-memory in a RAC environment it is affinitized to a specific RAC node. This means that we need to access the IMCUs in the IM column stores in which they reside. We don’t ship IMCUs across the interconnect to the node on which the query is being executed, this is why all in-memory queries must be parallelized with a degree of parallelism (DOP) equal to at least the number of instances that were involved in the distributed population. Each IM column store must be visited in order to have access to all of the IMCUs for the object(s), otherwise you lose the benefit of Database In-Memory because data will have to be accessed in the row format.
Back in the 184.108.40.206 days you had to use Auto DOP to ensure that the degree of parallelism was at least as great as the number of IM column store instances. It also enabled something called "home location awareness" which meant that the parallel execution coordinator, or query coordinator, knew which IMCUs were on which nodes.
This was important because you have to have enough parallel server processes available to run on each instance where IMCUs for the object(s) involved reside. Otherwise, some of the data will have to come from the buffer cache or disk as there's no other mechanism in Oracle Database to allow a single query to access multiple RAC instances.
All of this became optional in Oracle Database 12.2. Now the only requirement is that the DOP is at least as great as the number of IM column store instances. We still recommend using Auto DOP, but you can manually set the DOP if you want.
This makes things a lot easier since know you have choices about how you manage parallel execution with Database In-Memory. This is usually more important in mixed workload environments, where maintaining capacity for OLTP response time is of primary importance.
In a RAC environment the IM column store is a shared-nothing architecture. This means that for a query to take advantage of the IM column stores in a RAC cluster, parallel query must be used, and there must be parallel server processes available on all instances that have IMCUs populated for the object(s) involved in the query.
The following example uses a 3 node RAC database with a LINETEST table populated across all 3 nodes. Since the table is not partitioned it is distributed by rowid range which means that approximately one third of the table is populated on each of the 3 nodes. We will run the same query twice, once with Auto DOP enabled and once with serial execution. For these tests Auto DOP is enabled and I used a NO_PARALLEL hint to force the serial execution.
For both of these tests I generated SQL Monitor reports, but the "active" version is harder to display in a blog post so I also generated the "text" version. I will use a combination of the two outputs below to show the overall workload and then the execution plan and the time spent in each "activity".
In the Auto DOP case we got a DOP of 3, and elapsed time of 11 seconds and all of it was CPU time. Only the LINETEST table was populated in the IM column stores, but the other tables were already cached in the buffer cache and only resulted in buffer gets. This same setup was repeated for both query executions to get a consistent comparison.
| Elapsed | Queuing | Cpu | Application | Cluster | Other | Fetch | Buffer |
| Time(s) | Time(s) | Time(s) | Waits(s) | Waits(s) | Waits(s) | Calls | Gets |
| 11 | 0.01 | 11 | 0.00 | 0.00 | 0.36 | 4 | 7534 |
Parallel Execution Details (DOP=3 , Servers Allocated=6)
Instances : 3
In the "active" SQL Monitor report below we can see where the time was spent during the execution:
If you hover over the activity bar in a SQL Monitor active report you can see what accounted for the time of that activity. For example, when I hover my mouse over the activity bar for the object with TABLE ACCESS INMEMORY FULL in this example you see that all of the time was spent in "Cpu: in memory" and that's what we would expect. All of the time accessing an in-memory object should be CPU time with no I/O time.
However, in the serial case the query ran for 69 seconds and did lots of I/O, even though the table was fully populated across all three IM column stores. This illustrates why parallel query must be used when data is distributed across multiple IM column stores.
| Elapsed | Cpu | IO | Application | Cluster | Fetch | Buffer | Read | Read |
| Time(s) | Time(s) | Waits(s) | Waits(s) | Waits(s) | Calls | Gets | Reqs | Bytes |
| 69 | 38 | 24 | 0.00 | 7.26 | 4 | 1M | 9089 | 9GB |
In the "active" version below we can see where the time was spent during the execution:
Now we see that over 87% of the execution time was spent accessing the LINETEST table and if we hover over the activity bar for the TABLE ACCESS INMEMORY FULL operation for the LINETEST table we see something very different:
Now we have spent "Cpu" time, but not "CPU: in memory" time, and if we hover over the activity for the other part of the activity bar:
We see "User I/O: db file scattered read" which means we did I/O, but that is to be expected since we could not access the other two IM column stores because this was a serial query. Instead, we had to access the data that is populated in the other two IM column stores from the row store, which means either from the buffer cache or disk on the local instance.
Hopefully these examples have helped to illustrate why parallel query is required to access data distributed across IM column stores in a RAC environment.
Andy Rivenes is a Senior Principal Product Manager at Oracle Corporation and is a Product Manager for Database In-Memory. Andy has been working with Oracle products since 1992 and spent a large part of his career managing Oracle databases for both custom and E-Business Suite applications. Andy now spends his time helping customers implement Database In-Memory and providing information about Database In-Memory through blogs, conferences, and customer visits.