X

Enabling Real-Time Analytics With Database In-Memory

  • August 20, 2018

Using In-Memory Dynamic Scans – Part 2

Andy Rivenes
Product Manager

In-Memory Dynamic Scans (IMDS) is a new feature in Oracle Database 18c that allows parallelizing In-Memory table scans without having to use Parallel Query (PQ). However, it does work with PQ just fine. I blogged about IMDS here and I thought it was worth following up with more details since this is such a powerful feature. As a quick review, IMDS is able to parallelize In-Memory table scans using lightweight threads. This feature is used in conjunction with the Resource Manager, and this is required because Resource Manager is what determines how many threads will be used based on CPU count and the current load on the system.

In the previous blog post I touched upon the characteristics of an In-Memory query and how parallelism is a key method to speed up those queries. In-Memory queries are typically gated on CPU resources and if there is sufficient CPU capacity, then IMDS is a great feature to use to further increase Database In-Memory query performance because it is dynamic and can automatically increase or decrease its parallelism based on query need and CPU capacity. Since IMDS only provides its parallelization benefit during an in-memory scan (i.e. TABLE ACCESS INMEMORY FULL), PQ is still needed if other parts of the query can also be parallelized. In this way the two features are complimentary and can work together.

The key statistic in determining whether IMDS was used is "IM scan (dynamic) multi-threaded scans". I showed in the previous post that there are many "IM scan (dynamic) …" statistics but if "IM scan (dynamic) multi-threaded scans" is greater than 0 then the query benefitted from IMDS. Since IMDS is controlled by the Resource Manager, the actual number of lightweight threads used by the query can fluctuate. This makes IMDS dynamic, and the easiest way to see how much work was processed by IMDS is to compare the session statistics "IM scan (dynamic) rows" with "IM scan rows valid". This compares the number of rows scanned with IMDS versus the total number of valid IM rows scanned.

In the previous post I listed an example query with the session statistics for a query of the LINEORDER table where it performed an aggregation on the quantity attribute (i.e.  sum(LO_QUANTITY)). Below is the example with the statistics:

I have highlighted three of the statistics, and we can see that "IM scan (dynamic) multi-threaded scans" is 1 so we used IMDS and then we see that the "IM scan (dynamic) rows" is essentially the same as "IM scan rows valid". This query scanned all of the in-memory rows using IMDS.

At this point you might be wondering what operations benefit the most from IMDS? In general, aggregation functions, group by aggregations and expensive filters will benefit the most from IMDS. It is important to realize that IMDS will only speed up the actual scan of an in-memory object. This is because IMDS is parallelizing the actual scan of the IMCUs of the populated object in the IM column store. It won't speed up the other parts of the query which is why Parallel Query used with IMDS can provide the best in query performance and dynamic resource usage.

The following example illustrates the dynamic nature of IMDS. It starts out with IMDS and then turns itself off. The following is the query execution:

And the key session statistics that were generated by running the query:

Notice that IM scan (dynamic) multi-threaded scans is greater than 1 so IMDS is enabled, but the "IM scan (dynamic) rows" is only a fraction of the total number of rows scanned in-memory (i.e. "IM scan rows valid"). In fact, it is only about 3% of the total which means that execution of the query started out using IMDS, but was dynamically stopped when it was internally determined to be costlier than a traditional in-memory scan. This query was run in parallel with a DOP of 8:

The execution plan also shows that not only does IMDS work with Parallel Query it can also take advantage of other Database In-Memory optimizations like In-Memory Aggregation.

IMDS can also be used in a RAC environment. Database In-Memory queries on RAC use Parallel Query to ensure that all of the data on each node can be accessed in-memory. To accomplish this at least one parallel server process is allocated to each node where an IM column store has data for the object. With IMDS enabled dynamic parallelism can be used by each parallel server process in the query. This can provide a very flexible way to enable parallelism in a mixed workload environment without overloading CPU resources.

The following is an example that I ran on a three node RAC database. The LINEORDER table has been distributed between all three nodes:

The following is the example query execution:

The plan output shows us that the query executed in parallel and that it was affinitized for inmemory (which means that it was home location aware and ran fully in-memory across all three RAC nodes):


The statistics show us that we performed three multi-threaded scans, one for each of the three RAC nodes, and that we used IMDS to process all of the rows for the query:

The really exciting part about this is that with a DOP minimized to just the number of RAC nodes we were able to use IMDS to then provide dynamic parallelism on each of the nodes. This is the best of both worlds! Since IMDS works in conjunction with Resource Manager the risk of saturating the CPU resources is significantly reduced. This means parallelism can be fully utilized with Database In-Memory without causing capacity issues since Resource Manager will protect the nodes from exceeding CPU thresholds. And this is just as true on single instance databases as it is on RAC databases.

 

 

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.