X

Enabling Real-Time Analytics With Database In-Memory

Recent Posts

Oracle Database In-Memory

Questions you asked: Database In-Memory and Small Tables

I have received several emails during the last few weeks from customers mentioning that some tables have not populated into the IM Column Store even if they marked them with INMEMORY PRIORITY CRITICAL. The answer was easily found once we realized that each of the tables were smaller then 64KB. Tables smaller than 64KB in size are not populated into in-memory, as they will waste a considerable amount of space inside the IM Column Store since memory is allocated in 1MB chunks. To demonstrate this behavior, we can create a table with two columns and insert one row: Keep in mind that in this case the INITIAL extent size is 65536 bytes (64KB) in the storage_clause when creating the table. We see the same size when we query the table in the user_segments view: To figure out if the table has been populated into the IM Column Store we query the view v$im_segments. We see that the table has not been populated since the size is under 64KB. Let's insert some more rows to increase the size of the table ... and see what happens. Again, we query the user_segments view and see that the table has allocated some more extents and is now larger than 64KB: On the next run of the IMCO (In-Memory Coordinator) background process the background worker processes will be triggered to populate the table into the IM Column Store: In summary: In order for an object to be eligible to be populated into the IM Column Store it must be larger than 64KB. Find more restrictions about objects which are not eligible to be populated into the IM Column Store in our White Paper Oracle Database In-Memory with Database 18c - Technical Overview on page 11.    

I have received several emails during the last few weeks from customers mentioning that some tables have not populated into the IM Column Store even if they marked them with INMEMORY...

Database In-Memory Compression

When I'm speaking to customers or at conferences, and I describe the higher levels of Database In-Memory compression, and I mean anything higher than the default, I invariably get looks like, "Yeah right, I know compression is going to impact performance so I'm not even going to consider it." The problem with this line of thinking is that it doesn't take into account that even if there is a performance penalty, it's not going to be close to the difference of not having the object in the IM column store at all. If your choice is between populating an object in the IM column store at a higher compression level versus not populating it at all, or only partially populating it, then in most cases you will be better off taking the higher compression level. Now, this wouldn't be an issue if you had more than enough memory to populate all of your objects in the IM column store. But in the real world, most customers don't have enough memory to do that and so it's a balancing act between what can be populated with the amount of memory that is available. Database In-Memory supports 5 levels of compression and you can even disable compression altogether if you want. As part of the ALTER TABLE/CREATE TABLE commands and the INMEMORY sub-clause, the options are: NO MEMCOMPRESS MEMCOMPRESS FOR DML MEMCOMPRESS FOR QUERY LOW – the default MEMCOMPRESS FOR QUERY HIGH MEMCOMPRESS FOR CAPACITY LOW MEMCOMPRESS FOR CAPACITY HIGH Each successive level typically decreases the amount of memory required to populate the object at a possible reduction in scan performance. I say possible because some customers actually see a performance increase when going from MEMCOMPRESS FOR QUERY LOW to MEMCOMPRESS FOR QUERY HIGH for example. I thought it would be interesting to run some simple tests using the LINEORDER table from the SSB schema to illustrate the differences in compression for the various compression levels. Now, this is not a benchmark and not meant to be anything more than an illustrative example. The LINEORDER table I chose was approximately 630 GB uncompressed on disk. I wanted it to be large enough so that we would hopefully see a significant variation in memory usage based on the Database In-Memory compression level chosen. I also ran the following query at least three times to give a realtive sense of the performance difference between the different levels of compression: SELECT SUM(lo_quantity), SUM(lo_ordtotalprice) FROM lineorder The following chart lists the compression level, the amount of space occupied in the IM column store for that compression level and the minimum execution time of the three or more executions for that compression level: Based on these results, we can save a huge amount of memory by moving from the default, MEMCOMPRESS FOR QUERY LOW to MEMCOMPRESS FOR QUERY HIGH. At QUERY HIGH we save over 100GB of column store memory and if we go to CAPACITY HIGH then it is less than half the QUERY LOW size. Performance is still pretty good as well. We can see that we go from 7.66 seconds to run the query to 10.14 at QUERY HIGH and 12.19 at CAPACITY HIGH. That's a degradation in query performance of around 25-37% for a space savings of around 1.6x to almost 2.5x. In many situations that may be a perfectly valid trade-off if that allows you to populate a lot more objects in the IM column store. I think it's also worth mentioning that compression numbers for NO MEMCOMPRESS and MEMCOMPRESS FOR DML are basically the same. That's because MEMCOMPRESS FOR DML is optimized for DML operations and performs little or no data compression. In practice, it will only provide compression if all of the column values are the same. At this point you might be thinking that you could just exclude some columns to save space, and you would be right. But I talked about the downside of doing that in my blog post "Questions You Asked: What happens if you access a column that is not populated in the IM column store?". You might also be thinking, well what if I just populate most of the table in the IM column store? In my experiment above, what if I only had 200GB that I could allocate to the IM column store? If I used the default compression level then I cannot populate the entire LINEORDER table because we know that it will take approximately 339 GB. If I only have a 200GB column store then only about 150GB will be populated because some of that space is allocated to the 64KB Pool. If I run the test query above on this configuration it takes around 10 minutes to run. Now my system doesn't have a very good I/O subsystem, but clearly the query is going to take a lot longer than if all the data is populated in the column store. In fact, if I populate the LINEORDER table with CAPACITY HIGH and leave the dimension tables at the default FOR QUERY LOW then I can fit all of the tables in the IM column store with an allocation of only 200GB. I think the takeaway from all of this is that trading a small performance penalty for a huge gain in IM column store capacity may be seriously worth considering. After all, even at the CAPACITY HIGH compression level the scan performance is going to be way faster than not being able to populate the object(s) in memory at all or having them only partially populated.  

When I'm speaking to customers or at conferences, and I describe the higher levels of Database In-Memory compression, and I mean anything higher than the default, I invariably get looks like, "Yeah...

Database In-Memory Sessions at OOW 2018

Use this guide to find sessions covering Oracle Database In-Memory. Main sessions presented by Oracle product management and development are listed under “Key Sessions”. Other sessions that can also have some coverage for Oracle Database In-Memory are listed under “You May Also Like”. Please note that “In-Memory Cost Management” is a service that builds up on Database In-Memory. Sessions for this service will not go into In-Memory technology specifics. In addition to the sessions, please also visit The Exchange (the exhibition hall) for Oracle Database In-Memory Demos. Members from the product team will be available to answer your questions! A pdf version of the guide is also attached. Download pdf guide here Monday Oct 22nd Key Sessions Oracle Exadata: Strategy and Roadmap for New Technologies, Cloud, and On-Premises [PRM4114] 9:00 a.m. - 9:45 a.m. | Moscone West - Room 3008 Oracle Database In-Memory: Under the Hood, What's New, What's Coming [PRM4012] 3:45 p.m. - 4:30 p.m. | Moscone West - Room 3008 Oracle Database In-Memory on Exadata: A Potent Combination [PRO4016] 4:45 p.m. - 5:30 p.m. | Moscone West - Room 3008       You May Also Like Oracle Active Data Guard: Best Practices and New Features Deep Dive [PRO4030] 10:30 a.m. - 11:15 a.m. | Moscone West - Room 3007 Manage Complete Cost-to-Serve with New Oracle In-Memory Cost Management Cloud Service [PRO4242] 4:45 p.m. - 5:30 p.m. | Moscone South - Room 104 Get Ready for Brain Overload with Oracle Database 12.2 and 18c New Features [TIP3543] 5:45 p.m. - 6:30 p.m. | Moscone West - Room 3002   Tuesday Oct 23rd Key Sessions Hands-on Lab: Oracle Database In-Memory [HOL6312] 12:45 p.m. - 1:45 p.m. | Marriott Marquis (Yerba Buena Level) - Salon 1/2 Oracle Database In-Memory 360: Oracle Experts Answer Your Questions [BQS4011] 1:45 p.m. - 2:30 p.m. | Moscone West - Room 3008   You May Also Like SAP on Oracle: Development Update [PRO4405] 12:30 p.m. - 1:15 p.m. | Moscone West - Room 3000 Unlock the Value from Your SaaS Data Using Oracle Cloud Infrastructure [CAS2408] 4:45 p.m. - 5:30 p.m. | Moscone South - Room 153 Saving Your Database from Memory Loss: Oracle Database In-Memory Analytics Improvements [TRN1263] 4:45 p.m. - 5:30 p.m. | Moscone West - Room 3006   Wednesday Oct 24th Key Sessions Hands-on Lab: Oracle Database In-Memory [HOL6312] 9:30 a.m. - 10:30 a.m. | Marriott Marquis (Yerba Buena Level) – Salon 1/2 3:45 p.m. - 4:45 p.m. | Marriott Marquis (Yerba Buena Level) – Salon 1/2 Oracle Database In-Memory: When to Use and Quick Start Guide [TRN4013] 12:30 p.m. - 1:15 p.m. | Moscone West - Room 3008 Using Oracle Database In-Memory to Accelerate Analytics in the Cloud [CAS2401] 3:45 p.m. - 4:30 p.m. | Moscone West - Room 3004   You May Also Like Deep Dive: Oracle Database 18c New Features: Continuing Database Innovations [TRN6462] 8:30 a.m. - 12:30 p.m. | Marriott Marquis (Atrium Level) - Foothill G1/G2   Thursday Oct 25th Key Sessions Hands-on Lab: Oracle Database In-Memory [HOL6312] 10:30 a.m. - 11:30 a.m. | Marriott Marquis (Yerba Buena Level) - Salon 1/2 Oracle Database In-Memory: Implementation Best Practices and Deep Dive [TRN4014] 12:00 p.m. - 12:45 p.m. | Moscone West - Room 3008   You May Also Like In-Memory Cost Management for Real-Time Cost and Profitability Analysis [PRO4247] 9:00 a.m. - 9:45 a.m. | Moscone North - Room 22 Oracle Database 12c R1 and R2 Are Awesome, Now Meet Oracle Database 18c [TRN1553] 10:00 a.m. - 10:45 a.m. | Moscone West - Room 3002 Oracle Maximum Availability Architecture: Best Practices for Oracle Database 18c [TIP4028] 11:00 a.m. - 11:45 a.m. | Moscone West - Room 3007 From 54 Systems to Oracle Cloud: Success with Enterprise Cloud Adoption Lifecycle Approach [CAS3499] 11:00 a.m. - 11:45 a.m. | Moscone West - Room 3016 Maximize Analytical SQL Performance with Oracle Database In-Memory [CAS1461] 12:00 p.m. - 12:45 p.m. | Moscone West - Room 3001  

Use this guide to find sessions covering Oracle Database In-Memory. Main sessions presented by Oracle product management and development are listed under “Key Sessions”. Other sessions that can also...

Questions You Asked: What happens if you access a column that is not populated in the IM column store?

I was speaking with a customer recently and they asked the question, "What happens if you access a column that is not populated in the IM column store?" I thought that would make a good blog post and sure enough, it did back in October of 2015. You can read it here. To be honest, I had forgotten that I had written it until I went back and looked for it. Since it was so long ago I thought I would repeat it and add a few more details. When you populate an object in the IM column store it is possible to exclude columns. This can be useful if you have LOB or other columns in an object and your application does not access them from your analytic queries. If you don't populate a column then it won't take up space in the IM column store. But what if a query does access one or more of those excluded columns? Well it is an all or nothing proposition. If you access columns that are not populated then all columns will be accessed from the row store. Let's see an example. This time around we'll use the CUSTOMER table from our SSB schema. Note that the table is fully populated in the IM column store: Something that I didn't mention in the previous post is the ability to verify that a table's columns are eligible to be populated. We can see this if we query the view V$IM_COLUMN_LEVEL. In this case all of the columns are eligible and use the default compression level. We'll refer back to this view shortly.   Now we'll run a query to verify that we can access the CUSTOMER table in the IM column store: We can see from the execution plan and the statistics that we accessed the CUSTOMER table from the IM column store. I have highlighted the key line in the execution plan and a couple of the key statistics. Now let's exclude a column from the CUSTOMER table. I'm going to ALTER the table NO INMEMORY and then ALTER it INMEMORY with the C_NATION column excluded (i.e. NO INMEMORY). This will re-populate the CUSTOMER table without the C_NATION column. It is possible to just issue the ALTER command with the NO INMEMORY column exclusion, but then you don't get the benefit of memory savings until all of the IMCUs have been repopulated based on changes, and that could take a very long time, if ever. It will depend on how much of the table is modified, and in the meantime Database In-Memory will treat the column as if it was not populated so it just makes sense to remove it from the IM column store first.   Once we've verified that the CUSTOMER table has been fully populated let's run the same query that we ran above: And we see that since Oracle Database knows that the C_NATION column has not been populated the Optimizer does not even consider an in-memory access. Instead it chooses TABLE ACCESS FULL and we see from the statistics that we did not access the IM column store. We can also see that the column has been excluded in the V$IM_COLUMN_LEVEL view:   And finally, let's run a query that doesn't access the C_NATION column just to be sure that we can still access the rest of the table in the IM column store:   You can see that we are back to using TABLE ACCESS INMEMORY FULL and that the statistics show us that we did access the data in the IM column store. If you are curious what query I'm using to display the execution plan and session statistics I've listed it below: And now you have a little more detail about how Database In-Memory works when columns are excluded from the IM column store.   .cb11splash{display:none;}

I was speaking with a customer recently and they asked the question, "What happens if you access a column that is not populated in the IM column store?" I thought that would make a good blog post and...

Using In-Memory Dynamic Scans – Part 2

In-Memory Dynamic Scans (IMDS) is a new feature in Oracle Database 18c that allows parallelizing In-Memory table scans without having to use Parallel Query (PQ). However, it does work with PQ just fine. I blogged about IMDS here and I thought it was worth following up with more details since this is such a powerful feature. As a quick review, IMDS is able to parallelize In-Memory table scans using lightweight threads. This feature is used in conjunction with the Resource Manager, and this is required because Resource Manager is what determines how many threads will be used based on CPU count and the current load on the system. In the previous blog post I touched upon the characteristics of an In-Memory query and how parallelism is a key method to speed up those queries. In-Memory queries are typically gated on CPU resources and if there is sufficient CPU capacity, then IMDS is a great feature to use to further increase Database In-Memory query performance because it is dynamic and can automatically increase or decrease its parallelism based on query need and CPU capacity. Since IMDS only provides its parallelization benefit during an in-memory scan (i.e. TABLE ACCESS INMEMORY FULL), PQ is still needed if other parts of the query can also be parallelized. In this way the two features are complimentary and can work together. The key statistic in determining whether IMDS was used is "IM scan (dynamic) multi-threaded scans". I showed in the previous post that there are many "IM scan (dynamic) …" statistics but if "IM scan (dynamic) multi-threaded scans" is greater than 0 then the query benefitted from IMDS. Since IMDS is controlled by the Resource Manager, the actual number of lightweight threads used by the query can fluctuate. This makes IMDS dynamic, and the easiest way to see how much work was processed by IMDS is to compare the session statistics "IM scan (dynamic) rows" with "IM scan rows valid". This compares the number of rows scanned with IMDS versus the total number of valid IM rows scanned. In the previous post I listed an example query with the session statistics for a query of the LINEORDER table where it performed an aggregation on the quantity attribute (i.e.  sum(LO_QUANTITY)). Below is the example with the statistics: I have highlighted three of the statistics, and we can see that "IM scan (dynamic) multi-threaded scans" is 1 so we used IMDS and then we see that the "IM scan (dynamic) rows" is essentially the same as "IM scan rows valid". This query scanned all of the in-memory rows using IMDS. At this point you might be wondering what operations benefit the most from IMDS? In general, aggregation functions, group by aggregations and expensive filters will benefit the most from IMDS. It is important to realize that IMDS will only speed up the actual scan of an in-memory object. This is because IMDS is parallelizing the actual scan of the IMCUs of the populated object in the IM column store. It won't speed up the other parts of the query which is why Parallel Query used with IMDS can provide the best in query performance and dynamic resource usage. The following example illustrates the dynamic nature of IMDS. It starts out with IMDS and then turns itself off. The following is the query execution: And the key session statistics that were generated by running the query: Notice that IM scan (dynamic) multi-threaded scans is greater than 1 so IMDS is enabled, but the "IM scan (dynamic) rows" is only a fraction of the total number of rows scanned in-memory (i.e. "IM scan rows valid"). In fact, it is only about 3% of the total which means that execution of the query started out using IMDS, but was dynamically stopped when it was internally determined to be costlier than a traditional in-memory scan. This query was run in parallel with a DOP of 8: The execution plan also shows that not only does IMDS work with Parallel Query it can also take advantage of other Database In-Memory optimizations like In-Memory Aggregation. IMDS can also be used in a RAC environment. Database In-Memory queries on RAC use Parallel Query to ensure that all of the data on each node can be accessed in-memory. To accomplish this at least one parallel server process is allocated to each node where an IM column store has data for the object. With IMDS enabled dynamic parallelism can be used by each parallel server process in the query. This can provide a very flexible way to enable parallelism in a mixed workload environment without overloading CPU resources. The following is an example that I ran on a three node RAC database. The LINEORDER table has been distributed between all three nodes: The following is the example query execution: The plan output shows us that the query executed in parallel and that it was affinitized for inmemory (which means that it was home location aware and ran fully in-memory across all three RAC nodes): The statistics show us that we performed three multi-threaded scans, one for each of the three RAC nodes, and that we used IMDS to process all of the rows for the query: The really exciting part about this is that with a DOP minimized to just the number of RAC nodes we were able to use IMDS to then provide dynamic parallelism on each of the nodes. This is the best of both worlds! Since IMDS works in conjunction with Resource Manager the risk of saturating the CPU resources is significantly reduced. This means parallelism can be fully utilized with Database In-Memory without causing capacity issues since Resource Manager will protect the nodes from exceeding CPU thresholds. And this is just as true on single instance databases as it is on RAC databases.    

In-Memory Dynamic Scans (IMDS) is a new feature in Oracle Database 18c that allows parallelizing In-Memory table scans without having to use Parallel Query (PQ). However, it does work with PQ just...

How to Determine if Columnar Format on Exadata Flash Cache is Being Used

With Exadata Flash Cache you can easily expand your In-Memory Column Store into Flash and you can run analytical queries over 10s of Terabytes of data. In his blog post Andy Rivenes (@TheInMemoryGuy) described in detail the Columnar Formats in Exadata Flash Cache. One question that often arises is, how can I detect that the Exadata Flash Cache and the Columnar Format is actually being used in my analytical queries? As described in the blog post above, the only thing you have to do is enable Oracle Database In-Memory on the database servers by setting INMEMORY_SIZE greater than 100 MB. This also enables the Columnar Cache on Exadata Flash Cache automatically. In the following example we query tables which are not yet populated into the In-Memory Column Store on the database servers. Just run this query on v$sesstat and v$statname to check that columnar cache is being used in the following statistics: The following statistics are relevant to determine if Columnar Format is being used. "Cell physical IO bytes saved by columnar cache" shows that only the relevant columns have been sent. "Cellmemory IM scan CUs process for capacity" is the default and indicate that only a small portion of the Columns Units (CUs) have been scanned. It is important to know that segments have to be scanned at least twice an hour for them to be eligible for Columnar Cache. In this case the same rule as for AUTOKEEP pool is being applied. Note that this is also applicable for partitions, sub partitions and materialized views.  Also remember that with the release of Oracle Exadata System Software 18c (18.1.0) the limitation of only supporting HCC formatted data for the Oracle Database In-Memory columnar format was removed and now uncompressed tables and OLTP compressed tables can also benefit from the Oracle Database In-Memory columnar format. You can also use statistics to verify segments are fully loaded. For more details see Roger Macnicol's blog post here.  In a later blog post we will talk about Smart Scan deep dive and invalidation/repopulation of In-Memory store contents.    

With Exadata Flash Cache you can easily expand your In-Memory Column Store into Flash and you can run analytical queries over 10s of Terabytes of data. In his blog post Andy Rivenes (@TheInMemoryGuy)...

The In-Memory Column Store

The In-Memory column store (IM column store) is a store and not a cache. We've talked about this in various blog posts: In-Memory Column Store versus the Buffer Cache Memory Usage with Oracle Database In-Memory Ask TOM Office Hours Demo – Why Are Column Store Scans Faster Than Row Store Scans Oracle Database In-Memory Population This topic came up again recently when a colleague was speaking with a potential customer. I know from personal experience that many times when I'm speaking with customers, or at a conference, I find that people think that Database In-Memory is similar to the buffer cache. I have even heard other speakers refer to the IM column store as a cache, and I always cringe since I think we've worked hard at trying to convey that the IM column store is not a cache. I'm sure part of the problem is the idea of "in-memory" in general. I think most people think of in-memory computing as being another way to "cache" an entire working set. What do I mean when I say that the IM column store is not a cache? Database In-Memory will populate an entire segment, assuming there is enough space, in the IM column store and that object won't be removed unless the instance goes down, an ALTER TABLE … NO INMEMORY command is issued, an ADO policy acts on it, or AIM decides to evict it. This is fundamentally different than how a cache works, a good example being the database buffer cache. As Maria described in her In-Memory Column Store versus the Buffer Cache post, "Standard caches such as a buffer cache rely on non-uniform access patterns that are common in OLTP applications. Some rows within a table are accessed much more frequently than other rows. That is why you can have a buffer cache that is 10% the size of a table but that captures 95% of the random (OLTP) accesses to the table." However, that's not how Database In-Memory works. With Database In-Memory an entire object is populated into the IM column store. It is possible that the object being populated won't fit, in which case subsequent queries will access the data that has been populated in the IM column store and then the rest of the data from the row store. But the intent is that the entire object will be populated and will stay populated. The reason for this is that Database In-Memory is targeted at analytic queries and analytic queries need to access all of the values in the columns accessed by the query in order to look for patterns or trends in that data. Since all of the column values will be accessed it makes sense that the best performance will be obtained by having all of the data in-memory. This is fundamentally different than how a cache works. When an object is populated into the IM column store it is changed from a row format into a columnar format, it is compressed for optimized scan performance and In-Memory storage indexes are created. This allows access to the object in the IM column store to take full advantage of all of the features of Database In-Memory like predicate push down and aggregations, SIMD vector processing, hash joins with Bloom filters, In-Memory Aggregation, In-Memory Expressions, Join Groups, and all of the other great features that have been added to speed up analytic queries in Database In-Memory.  

The In-Memory column store (IM column store) is a store and not a cache. We've talked about this in various blog posts: In-Memory Column Store versus the Buffer Cache Memory Usage with Oracle Database...

Automatic Data Optimization for Database In-Memory

Automatic Data Optimization (ADO) and Heat Map were first introduced in Oracle Database 12.1. Both Gregg Christman (@aco_gregg) and I have blogged about Heat Map and ADO as far back as 2014 when the features first appeared. That seems like ancient history now! In Oracle Database 12.2 we introduced support for ADO and Heat Map with Database In-Memory. The same basic infrastructure was used for both versions of ADO, that is on-disk segments and in-memory segments. With in-memory though, we were more concerned with managing the column store rather than strictly compressing segments or tiering them to secondary storage. ADO for Database In-Memory introduced three new policies that only apply to in-memory enabled objects: SET INMEMORY – sets the INMEMORY attribute MODIFY INMEMORY – modifies the compression level NO INMEMORY – sets the NO INMEMORY attribute Each of these policies acts on one segment and only runs once (i.e. they are segment level policies). ADO for Database In-Memory still requires that the initialization parameter HEAT_MAP be enabled (i.e. set to ON), that Database In-Memory is enabled (i.e. INMEMORY_SIZE >=  100MB ) and that the COMPATIBLE parameter be set to 12.2.0 or higher. Let's take a look at an example. We're going to cheat a bit and change how often ADO evaluates policies. By default, it evaluates policies on a daily basis during the maintenance window, but just for this example we're going to change it so that it evaluates policies in seconds. This will allow us to see the effects of our policies without having to wait for a full day. You can determine whether policies are being evaluated in days or seconds (and seconds should really only be used for testing purposes) by querying the view DBA_ILMPARAMETERS and looking at the value for POLICY_TIME (i.e. 0 equals days and 1 equals seconds). As part of the DBMS_ILM_ADMIN package there is the constant ILM_POLICY_IN_SECONDS that can be passed to the CUSTOMIZE_ILM procedure to change policy evaluation from days to seconds. This allows us to set the policy evaluation to seconds with the following: With that changed we can create a policy that will evict the SUPPLIER table after 14 days (seconds in our example) of no access. Does this make sense? Well maybe not for the SUPPLIER table since it is an important dimension table in our SSB schema, but in practice it may be very appropriate to evict a table from the IM column store after 14 days of not being accessed. Just think, if your critical business cycle is one week then having objects "linger" in the column store occupying valuable DRAM memory is a luxury that you may not be able to afford. Especially if the object hasn't been accessed in two weeks. But wait, let's get a little more sophisticated. First, we'll compress the object to a higher compression level once it's no longer being modified and then we'll evict it once it is no longer being accessed. This will require two policies, one for compression to a higher level and then one to evict it when it has not been accessed. Initially our IM column store is populated with the following objects: To create a compression policy that will compress the SUPPLIER table to a higher compression level (i.e. MEMCOMPRESS FOR CAPACITY HIGH) after 3 days of no modification we need to run the following SQL: We can then query the USER_ILMOBJECTS and USER_ILMDATAMOVEMENTPOLICIES views to see how the policy has been created: The policy will run automatically during the maintenance window, but let's force a run to see what the results will be: To run the policy, I invoked the EXECUTE_ILM procedure which is part of the DBMS_ILM package. I then queried the USER_ILMEVALUATIONDETAILS view and we see that policy P1 was selected for execution. Now let's take a look at the results of running the policy: We see that the INMEMORY_SIZE for the SUPPLIER table has gone from 164,429,824 bytes (see the initial size in the query of initially populated objects at the beginning of the post) to just 55,377,920 bytes and that the INMEMORY_COMPRESS attribute has been changed from "FOR QUERY LOW" to "FOR CAPACITY HIGH". Next, I will create a policy to evict the object when it has not been accessed in 14 days: Running this SQL results in a second policy called P2: Notice that our first policy, P1, has been disabled (i.e. ENABLED = NO) since it has already been run. Recall that segment level policies only run once. Now let's run our second policy and see what happens. Remember that we're invoking the policies manually, but normally these policies would be run automatically during the maintenance window. You can see that policy P2 has been selected for execution. Let's take a look at the results: We can see that the SUPPLIER table has been evicted from the IM column store since it is no longer populated and that the INMEMORY attribute has been set to DISABLED. What really happens behind the scene is that the table is altered by the policy invocation and then the normal Database In-Memory processing occurs to remove the SUPPLIER table from in-memory, just like any ALTER TABLE … NO INMEMORY command would. And there you have it. An example of using ADO policies on an in-memory enabled segment to manage that object's in-memory life cycle automatically based on Heat Map heuristics. If you're interested in more possibilities for automating the contents of the IM column store then check out our post on Automatic In-Memory (AIM) which is available in Oracle Database 18c.  

Automatic Data Optimization (ADO) and Heat Map were first introduced in Oracle Database 12.1. Both Gregg Christman (@aco_gregg) and I have blogged about Heat Map and ADO as far back as 2014 when the...

Automatic In-Memory

Automatic In-Memory (AIM) was introduced as part of Database In-Memory in Oracle Database 18c and allows automatic management of the contents of the IM column store. This feature builds on the underlying data of Heat Map, a feature introduced with Automatic Data Optimization (ADO) in Oracle Database 12.1. Support for Database In-Memory was added in Oracle Database 12.2 and now AIM takes the concepts of ADO even further. When AIM is enabled, if the size of the objects that have been enabled for in-memory exceeds the size of the IM column store then the least active populated object(s) will be evicted to make room for more frequently accessed in-memory enabled objects. The really cool part of this is that AIM uses access tracking, column and other relevant statistics to determine which are the least active objects. No guessing required! AIM requires that the initialization parameter INMEMORY_AUTOMATIC_LEVEL be set. There are three values for the parameter: •    OFF (default) •    LOW •    MEDIUM Key to the function of AIM is the concept of "memory pressure". AIM will only kick in when there is not enough space to populate the object. Prior to AIM, it is possible to run out of IM column store space when populating an object. The great thing about Database In-Memory is that even if an object is only partially populated, queries that access the object in the IM column store won't fail. They will access the portion of data that is populated and then get the rest of the column data from the row-store. Obviously, this is not going to provide the best performance and that's where AIM kicks in. With AIM enabled that object can be fully populated because one or more objects with the least activity will be evicted to make room. Let's take a look at AIM in action. First let's verify that we have enabled AIM: Next let's look at the tables involved in our SSB schema with INMEMORY status and priority listed: Here is the current population of objects in the IM column store: And the total space consumed in the IM column store: Now we will access TABLE3 which is enabled for INMEMORY with a priority of NONE: This results in an alert that there is not enough memory to populate the table: Let's take a look at V$IM_SEGMENTS and see how much of TABLE3 was populated: Surprise! We see that TABLE3 has been fully populated, but now TABLE1 is missing. It is no longer populated in the IM column store. Let's see if we can figure out what happened. With the introduction of AIM two new views have been added. These views provide information about the tasks performed by AIM and the details associated with each task. The following shows the DBA_INMEMORY_AIMTASKS view. This view provides information about AIM management tasks: The task that we're interested in is the last one, task_id 192.We can take a look at this task's details by querying the view DBA_INMEMORY_AIMTASKDETAILS: You can see that TABLE1 was evicted as part of the AIM task which freed up room for TABLE3. The LOW value for parameter INMEMORY_AUTOMATIC_LEVEL specifies that when the IM column store is under memory pressure that the least active segment(s) will be evicted. As I said earlier, the determination is made based on access tracking, column and other relevant statistics, and since we never accessed TABLE1 in this example it was obviously a candidate to be evicted. Only objects with an INMEMORY PRIORITY of NONE are eligible to be evicted so if you have objects that you want to remain populated just set a priority for them. AIM provides a powerful tool to automate the population of objects in the IM column store that will get the most benefit from Database In-Memory. Obviously if you have enough memory in the IM column store for all of your in-memory candidate objects then you don't need AIM. However, for most customer this is not the case and AIM makes great sense for those customers. The other thing to remember is that AIM only takes action when there are more objects enabled for INMEMORY than can fit in the IM column store, and the default is that the feature is disabled. You have to explicitly enable it to take advantage of AIM.        

Automatic In-Memory (AIM) was introduced as part of Database In-Memory in Oracle Database 18c and allows automatic management of the contents of the IM column store. This feature builds on...

Questions You Asked: How much space are objects using in the In-Memory column store?

Recently the question, "How much space are objects using in the In-Memory column store?" has come up a couple of times so I thought it would be worth revisiting. I blogged about space usage in the In-Memory column store (IM column store) in the blog post "What is an In-Memory Compression Unit (IMCU)?". In that blog post I talked about the two pools that compose the IM column store, the 1MB pool for IMCUs and the 64KB pool for Snapshot Metadata Units (SMUs) and other metadata. Of course, now all of that information is also documented in the Database In-Memory Guide which first became available in the 12.2 Oracle Database documentation. The following image is from the Database In-Memory Guide: I think this image shows a nice representation of the In-Memory Area with both IMCUs and SMUs. It is worth repeating that an object populated into the IM column store is made up of one or more IMCUs, and an IMCU is made up of one or more 1MB extents taken from the 1MB pool. Each IMCU has a corresponding SMU which is made up of one or more 64KB extents taken from the 64KB pool. It is also worth noting that you cannot control the allocation of the 1MB and 64KB pools. You can only allocate the overall size of the IM column store with the INMEMORY_SIZE initialization parameter. In addition to IMCUs and SMUs, 12.2 introduced In-Memory Expressions and Join Groups. When an In-Memory Expression is created space is allocated from the 1MB pool for In-Memory Expression Units (IMEUs). When Join Groups are created space for the common dictionary is allocated from the 64KB pool. In Oracle Database 18c, In-Memory Optimized Arithmetic will use additional space when enabled because both the traditional Oracle NUMBER data type and the new In-Memory optimized NUMBER type will be stored in the IM column store. It is possible to query information about these pools by querying the dynamic performance view V$INMEMORY_AREA and to see the space used by an individual object by querying the dynamic performance view V$IM_SEGMENTS. As an example, let's take a look at a database where we will create and populate just one table: Let's take a look at the V$INMEMORY_AREA  and V$IM_SEGMENTS dynamic performance views. Note that only one table, IMSPACE, has been populated. I did this to make it easier to see the space used by that object in the IM column store: We can see that 1048576 bytes, or one 1MB extent, has been used from the 1MB pool. This means that we have one IMCU with one 1MB extent. We can also see that we have used 262144 bytes from the 64KB pool or 4 64KB extents for the SMU associated with this table and its single IMCU. It is worth noting that a segment must be greater than 64KB in size to be eligible for population in the IM column store. In addition, the minimum size of an object populated in the column store is 1MB (i.e. one IMCU made up of one 1MB extent) plus any 64KB extents needed for corresponding SMU and other metadata. It is also worth mentioning that since SMU(s) include various metadata about a corresponding IMCU, and since this metadata includes information like the transaction journal, it is entirely possible that overall metadata space usage will grow based on workload like DML. What this means in practice is that the querying of the Database In-Memory dynamic performance views is very much a snapshot in time and can vary based on the workload.    

Recently the question, "How much space are objects using in the In-Memory column store?" has come up a couple of times so I thought it would be worth revisiting. I blogged about space usage in the...

Ask TOM Office Hours Demo – Configuring the IM Column Store

I've created a video of the demonstration that we used in the Ask TOM Office Hours session on configuring the In-Memory (IM) column store. Maria Colgan (@SQLMaria) joined me for the demo and in it we review allocating the IM column store, populating the LINEORDER table and reviewing how much space has been allocated and used. This updates information that we wrote about quite a while ago in our Getting Started series with Installing and Enabling Database In-Memory and In-Memory Population. In this video we also talk about how Database In-Memory works in a Multitenant environment, which we originally covered in this blog post, and how compression affects the size of objects in the column store. We covered compression in the In-Memory Population blog post and in another post where we discussed Advanced Row compression and In-Memory compression. We're going to make this an on-going video series covering the many aspects of Database In-Memory since seeing the feature in action tends to be more intuitive than just reading about it. After all, isn't a picture worth a thousand words? The video is available clicking on the image below and is located on the Database In-Memory YouTube channel. While you're there check out some of the other videos we have available as well.

I've created a video of the demonstration that we used in the Ask TOM Office Hours session on configuring the In-Memory (IM) column store. Maria Colgan (@SQLMaria) joined me for the demo and in it we...

New in 18c: Supercharging In-Memory Queries Using In-Memory Dynamic Scans

In-Memory Dynamic Scans (IMDS) is a new feature in Oracle Database 18c that allows parallelizing In-Memory table scans without having to use Parallel Query. IMDS is able to parallelize In-Memory table scans using lightweight threads. The feature is used in conjunction with the Resource Manager. This is required because Resource Manager is what determines how many threads will be used based on CPU count and the current load on the system. One of the big advantages of IMDS is that communication between the threads only occurs within the invoking process so there is no external contention. When talking about In-Memory Expressions and Join Groups I usually refer to them as the high-performance features of Database In-Memory. IMDS has the potential for taking performance much further and I dub this feature the “supercharged” feature of Database In-Memory. With Database In-Memory, performance is effectively gated by available CPU and memory bandwidth. There is no I/O involved since the objects are populated in their entirety within the IM column store, and since the IMCUs are read only for analytic queries there is very little serialization that has to take place. This is why I refer to the ultimate Database In-Memory query as being dominated by CPU time. The goal is that all of the query time is spent scanning and filtering data in the IM column store. The following SQL Monitor active report illustrates this by showing 100% of the activity being consumed by CPU scanning the LINEORDER table in the IM column store: This brings us back to improving the performance of In-Memory queries. IMDS provides the ability to dynamically parallelize queries without having to invoke parallel query and manage its usage across the instance or RAC. In addition, IMDS will dynamically scale based on CPU load, something that parallel query cannot do. This is especially important in mixed workload environments where parallel operations are often avoided because they can require too many CPU resources. Let’s see an example of IMDS at work. We’ll use a very simplistic query for the LINEORDER table to avoid any complications: select sum(LO_QUANTITY) from lineorder; In our first example we’ll disable Resource Manager (remember that Resource Manager must be enabled to use IMDS) and parallelism so that we get a simple serial query: The execution plan shows a serial, in-memory execution: And the session level statistics show the normal IM Scan statistics that we’ve talked about in our other blog posts: Now we’ll enable Resource Manager and see what changes: The execution plan remains the same: And let’s take a look at the session level statistics: Note that we now see a new set of "IM scan (dynamic)" statistics (I truncated the other statistics to highlight these new ones). These indicate that we used IMDS during the query execution. There is also information available in a SQL Monitor active report to identify if IMDS was used. If you click on the binoculars for the table scan operation in a SQL Monitor active report, you can see that a popup window shows that IMDS was used: Does IMDS also work with Parallel Query? Let’s see: We’ve enabled both Resource Manager and Auto DOP. Let’s see if we get a parallel plan: The query did go parallel, with a parallel degree of 16. Let’s see if we took advantage of IMDS: Sure enough we see that we did take advantage of IMDS for the query execution. In fact, we can see that we executed 16 multi-threaded scans, one for each of the parallel query processes that ran in the query (see the DOP section in the execution plan notes – and again I truncated the rest of the session statistics). What about performance? I ran the same query on a much larger LINEORDER table with and without IMDS for a serial query to highlight the performance difference: Without IMDS the query took 11.76 seconds and as we can see from the session statistics the query ran in-memory. In fact, the query scanned 10,806 IMCUs and almost 5.7 billion rows. With IMDS enabled the query only took 1.43 seconds as we were able to use up to 80 threads to help speed up, or supercharge, the query. Quite an impressive difference! Also let's not forget that since Resource Manager is managing the CPU workload, this will not affect the overall performance of other, existing workload. It is worth noting that if Resource Manager prevents the use of IMDS because sufficient CPU resources are not available then performance will not be any worse than if the query ran serially. IMDS represents a big step forward in not only supercharging Database In-Memory scan performance but also in dynamically taking advantage of idle CPU resources on the database server to automatically scale performance based on available resources. Stay tuned for more information about IMDS.    

In-Memory Dynamic Scans (IMDS) is a new feature in Oracle Database 18c that allows parallelizing In-Memory table scans without having to use Parallel Query. IMDS is able to parallelize In-Memory table...

New in 18c: In-Memory Optimized Number Format Speeds Up Arithmetic Operations

One of the new features in Oracle 18c is In-Memory Optimized Arithmetic. In Oracle 18c we introduced an in-memory optimized NUMBER format for fast calculations using SIMD vector processing.  For tables compressed with QUERY LOW (which is the default), NUMBER columns can be encoded using this optimized format.  Computations are a large portion of analytical workloads and with SIMD vector processing, aggregations and other arithmetic operations can be a lot faster. With the new in-memory optimized NUMBER format you can expect a significant performance boost for arithmetic computations. We have seen up to 9X better performance. The new in-memory optimized NUMBER format is enabled through the INMEMORY_OPTIMIZED_ARITHMETIC initialization parameter. As an example, let’s run the following query without using In-Memory optimized NUMBER format. Now set the initialization parameter INMEMORY_OPTIMIZED_ARITHMETIC to ENABLE (DISABLE is the default) as a user with the necessary privileges: ALTER SYSTEM SET INMEMORY_OPTIMIZED_ARITHMETIC = ‘ENABLE’ SCOPE=BOTH;  Now let’s run the query again to benefit from the In-Memory optimized number format: As you can see, in this example the speedup is about 9X. Note that there is a some space overhead when using the in-memory optimized NUMBER format, which is why it is not enabled by default. This is because we must store both the original Oracle Database NUMBER data type in the column store, as well as the new  In-Memory optimized NUMBER type as an additional In-Memory Expression column (In-Memory Expressions were introduced in Oracle Database 12.2). This space overhead depends on how many NUMBER columns exist in the table and can typically be around 10-15%. However, this overhead may be an acceptable tradeoff in exchange for much faster arithmetic aggregations.

One of the new features in Oracle 18c is In-Memory Optimized Arithmetic. In Oracle 18c we introduced an in-memory optimized NUMBER format for fast calculations using SIMD vector processing.  For...

The Power of In-Memory Join Groups

Speeding up In-Memory Joins using Join Groups What are Join Groups? A new In-Memory feature called Join Groups was introduced with the Database In-Memory Option in Oracle Database 12.2.  Join Groups can be created to significantly speed up hash join performance in an In-Memory execution plan.  Creating a Join Group involves identifying up-front the set of join key columns (across any number of tables) likely to be joined with by subsequent queries.  For example, the following join group object connects EMP and DEPT tables with their respective deptno join key columns: create inmemory join group JG (EMP(deptno), DEPT(deptno)); After the creation of the join group, populating the tables involved in the join group will result in a single domain dictionary built that join keys from the join columns specified in the Join Group are mapped to.  By pre-defining and storing a single map across the join columns in memory, the process of joining two tables can be simplified: a) dictionary encodings can be leveraged such that comparing two join keys is simply a matter of comparing the respective integer codes in the common domain dictionary to each other, b) replacing hash table used for joining with faster index-based array structure, and c) reducing time spent in bloom filter evaluation by pre-computing and caching the hash computations of join keys during join group construction. How do you know if Join Groups were leveraged? Join Groups are currently only leveraged today during hash join processing.  Additionally, all operators below the hash join must be enabled to carry/transmit the join key column encodings all the way through the row sources (and not require expanding or decompressing the encodings).  For example, if we have a parallel hash join plan, underneath the hash join (on either side) may be a Table Queue (TQ) operators.  TQ operators today requires encoded values to be decompressed/decoded before messages are transmitted to receiving end points.  In doing so, we lost the shared mapping / codes which Join Groups provide (and which the new Join Group Aware HJ processing requires) , and have instead the actual join key value/len pairs.  So definitely there can be no distribution happening underneath either side of the hash join. Furthermore, the SQL monitor report provides two new row source statistics associated with the HASH JOIN operator which can determine if Join Groups are leveraged during SQL execution.  These two stats are called “Columnar Encodings Observed” and “Columnar Encodings Leveraged”.  Both stats are cumulative in that when a HASH JOIN operator is running in parallel, these two stats reported are the summation of the stats collected from all slave processes involved in executing the HASH JOIN row source.  Due to a restriction on SQL monitor stats currently, these two stats are overloaded for multiple purposes.  As they relate to Join Group usage, if a slave process optimizes join processing with Join Groups, then the “Columnar Encodings Leveraged” stat is incremented, while the “Columnar Encodings Observed” is not. To summarize, in order for Join Groups to be leveraged for faster hash join processing, the hash join row source must either be evaluated serially or with parallel execution without redistribution.  SQL monitor report will further show a positive “Columnar Encodings Leveraged” statistics, and a non-existent “Columnar Encodings Observed” stat. When are Join Groups NOT Leveraged Join Groups will not always get leveraged during SQL execution for a variety of reasons.  The main reason is because the execution plan overall disallows its use, as noted earlier.  Listed below are some other reasons explaining why Join Groups won’t get used: Inner (build) table contains multiple duplicate keys (i.e. many to many join) Not enough memory available to store Join Group array. Too many modified / invalid rows in the column store (currently 12.5% threshold). Payload columns from inner (build) table are too large, or if payload record itself is too large. The build side of the join (payload columns + join key) must reside entirely in-memory (no spilling).

Speeding up In-Memory Joins using Join Groups What are Join Groups? A new In-Memory feature called Join Groups was introduced with the Database In-Memory Option in Oracle Database 12.2.  Join Groups can...

New in 18c: Introduction to In-Memory External Tables in 18c

On June 10th 2014, Larry Ellison officially launched the new Oracle Database In-Memory option which introduced revolutionary dual-format storage to access the same data as either rows or columns transparently. At OOW that year, two ISVs put in a request for us to also support In-Memory External Tables (IMXT). The use cases of IMXTs include:  Low-value or ephemeral data that the customer doesn’t wish to load into Oracle storage but needs to scan repeatedly in a short space of time Big Data that has been summarized through Map/Reduce or other Hadoop aggregation tools that now needs to be joined to Experprise Data for reporting Data that needs to be queried both from the RDBMS side and from Hadoop tools and thus doesn't need to be duplicated again in Oracle storage Simply integrating Enterprise Data and Big Data will not generate value for a company, instead advanced analytics that draw on data from both sources are required. Such queries often require an iterative approach as a data analyst explores multiple avenues for revenue enhancement. These analytics need to run within Oracle where the richest set of SQL tools in any ecosystem exist, and they need high performance access to both data sources. Any time external data needs to be queried repeatedly in a relatively short space of time, going out to HDFS or other external disk formats is not cost effective and hence can benefit from Oracle's DBIM technology. Prior to 18.1, there was no way to get fast scans without loading external data into Oracle. One mechanism that people did use to perform repeated loads and high performance scans in prior releases use was to define an: Create Materialized View <imxtmv> Build Immediate Refresh Complete On Demand As Select * From <external table> INMEMORY; INMEMORY Materialized View and set the session Query_Rewrite_Integrity parameter to STALE_TOLERATED. In this case you would see the query plan say: MAT_VIEW ACCESS INMEMORY FULL as its rowsource. Other related requests that I'm aware have been made include In-Memory DBLINKs In-Memory only Materialized Views ​I personally am not aware of any plans to implement these. What's in Oracle 18.1 In 18.1 we implemented the INMEMORY MEMCOMPRESS clause for External Tables that use the two legacy drivers (ORACLE_LOADER and ORACLE_DATAPUMP). We are still working on the adding support for the Big data drivers (ORACLE_HDFS and ORACLE_HIVE). The INMEMORY clause goes at the end of the table definition in the same section as the Reject Limit clause:. create table s_et(     s_suppkey            number ,     s_name               char(25) ,     s_address            varchar2(40) ,     s_nationkey          number ,     s_phone              char(15) ,     s_acctbal            number ,     s_comment            varchar2(101) ) organization external ( type ORACLE_LOADER default directory T_WORK access parameters (     records delimited by newline     nobadfile     nologfile     fields terminated by '|'     missing field values are null   )   location (T_WORK:'supplier.tbl')) reject limit unlimited INMEMORY MEMCOMPRESS FOR CAPACITY; All the DBIM compression levels are supported and have the same meanings as they do with heap tables: NO INMEMORY INMEMORY INMEMORY NO MEMCOMPRESS INMEMORY MEMCOMPRESS FOR QUERY LOW INMEMORY MEMCOMPRESS FOR QUERY HIGH INMEMORY MEMCOMPRESS FOR CAPACITY LOW INMEMORY MEMCOMPRESS FOR CAPACITY HIGH If you use 'INMEMORY' on its own you will get 'INMEMORY MEMCOMPRESS FOR QUERY LOW' Table must be fully loaded before use One key difference with heap tables is that an In-Memory External Table must be fully loaded into memory before a table scan can use it. External Tables do not currently support hybrid In-Memory/On-Disk scans. See below under 'Data Dictionary' for how to use v$im_segments to check its population status. Session must set Query_Rewrite_Integrity In-Memory External Tables function like refresh on-demand Materialized Views: once we've populated the In-Memory area we are unaware of any changes to the underlying external files pointed to by the Location clause. Unlike heap tables, External Tables don't generally support DMLs so the issue is mostly that one or more of the files may have been replaced with different (newer?) versions.  Therefore, since an In-Memory scan could return different results to an external scan we need the user to tell us explicitly that this is OK (just as you have to when using refresh on-demand Materialized Views). This is done by setting: SQL> alter session set query_rewrite_integrity=stale_tolerated; For the use cases outlined at the top, this actually provides a real benefit. Let's say, for example, that an external process is producing output in csv format or a map-reduce job is creating HDFS files. The decoupling of the In-Memory snapshot from the disk file allows the external process to complete without interrupting any queries that are running. Then, when the external process is complete, the last step can be to call dbms_inmemory.repopulate to create a new In-Memory snapshot of the external data. Any queries currently running against the old snapshot will hold a read latch on IMCUs and so should complete successfully. Data Dictionary The In-Memory attributes of an External Table can be seen in any of six views USER_EXTERNAL_TABLES ALL_EXTERNAL_TABLES DBA_EXTERNAL_TABLES USER_TABLES ALL_TABLES DBA_TABLES So for the example table shown above we could see SQL> column TABLE_NAME format a10 SQL> select table_name, inmemory, inmemory_compression   2  from user_tables where EXTERNAL = 'YES' TABLE_NAME INMEMORY INMEMORY_COMPRESS ---------- -------- ----------------- R_ET       DISABLED N_ET       DISABLED S_ET       ENABLED  FOR QUERY LOW 3 rows selected. We have also enhanced some of the v$ views used by DBIM to include External Tables v$im_segments and v$im_segment_detail both gain a new column called 'IS_EXTERNAL' which has values TRUE, FALSE. For example, let's load our example table and also a heap table loaded from it: SQL> exec dbms_inmemory.populate(USER,'S_ET'); PL/SQL procedure successfully completed. SQL> exec dbms_inmemory.populate(USER,'SUPPLIER'); PL/SQL procedure successfully completed. SQL> select SEGMENT_NAME,INMEMORY_SIZE,BYTES_NOT_POPULATED,POPULATE_STATUS,IS_EXTERNAL    from v$im_segments; SEGMENT_NAME INMEMORY_SIZE BYTES_NOT_POPULATED POPULATE_STAT IS_EX ------------ ------------- ------------------- ------------- ----- SUPPLIER           2359296                   0 COMPLETED     FALSE S_ET               2228224                   0 COMPLETED     TRUE      Before querying an In-Memory External Table, you need to check the bytes_not_populated, and populate_status columns of v$im_segments to be sure that the table is indeed fully loaded. Notes: The following External Table drivers are currently supported: ORACLE_LOADER and ORACLE_DATAPUMP.  You have to explicitly invoke DBMS_INMEMORY.(re)populate to update the in-memory contents of the external table  Population of the in-memory contents of the external table is serial Only serial scans of the In-Memory External Table are supported; PQ is not yet supported. This is unlikely to be a serious performance impediment since an In-Memory scan is so many orders of magnitude faster than a scan that goes out to an external table Only the MEMCOMPRESS sub-clause of the INMEMORY clause is supported; the PRIORITY, DISTRIBUTE and DUPLICATE sub-clauses are not yet supported Only unpartitioned External Tables are supported; specifying INMEMORY on a partition or on the top level of a partitioned External Table will give an error. Roger MacNicol

On June 10th 2014, Larry Ellison officially launched the new Oracle Database In-Memory option which introduced revolutionary dual-format storage to access the same data as either rows or columns...

Using Database In-Memory to Supercharge Exadata

I thought it would be a good idea to remind everyone why In-Memory is an excellent complementary feature for Exadata users.  I again encountered this question in a recent meeting with a customer: "I have Exadata already, and it is really fast. Do I really need Database In-Memory? Will it still benefit me?" The answer is really simple. The answer is "Yes".  The answer has always been "Yes" since the first release of Database In-Memory with Oracle Database 12.1.0.2. There are a number of additional reasons why this is an even bigger "YES" with the new Exadata specific features in Oracle Database 12cR2 and 18c.  1. Database In-Memory provides the fastest possible analytics on hot data  since it leverages superfast local memory on Exadata Database Servers: Memory that is local to the CPUs of the Compute Servers can be accessed with an order of magnitude higher bandwidth than accesses to Storage Servers. So for the super-critical, hottest data needing real-time analytics, you will benefit by populating that data into the In-Memory column store.  This will allow you to process that data at the rate of billions of rows per second using SIMD vector instructions for near instant analytic response times.   The Exadata X7-2 has between 384GB to 1.5TB of memory per Database Server, and a total of up to 28.5TB of memory per rack; allowing for a very large in-memory column store. 2. Unique In-Memory Fault Tolerance on Exadata Database Servers: On Exadata, a unique feature for the In-Memory column store is In-Memory fault tolerance: All in-memory sections (referred to as In-Memory Compression Units or IMCUs) for a table in the column store can be populated into the memory of two database instances on different database servers in the same RAC cluster. This means that if any instance fails, there is zero application impact from having to repopulate the lost IMCUs since each IMCU is safe on another instance. This mode is enabled by declaring the table to be "INMEMORY DUPLICATE". 3. Full In-Memory Duplication for Fault-Tolerance and Performance: As a special case of the above capability, it is also possible to fully duplicate all the IMCUs of a table on all instances. This provides both availability (since the in-memory contents of the table continue to be available as long as there is at least one surviving instance in the RAC cluster) as well as performance (since all accesses to the IMCUs of the table are local to the instance from which the access is being issued). This mode of full duplication (enabled by declaring the table to be "INMEMORY DUPLICATE ALL") is ideal for smaller reference tables and dimension tables. 4. New from 12cR2 onwards: Database In-Memory Expands the column store to much larger Flash, for intermediate data With the 12cR2 release of Exadata storage software , the same in-memory format that is available on the compute servers is used as the format for the columnar flash cache. This means, that an offloaded smart scan running on the storage server will process data using the same SIMD vector optimizations available on the database servers, leading to much faster smart scan performance. See Andy Rivenes' blog post  for more details. Since the flash cache capacity of a full rack Exadata X7-2 can be an astonishing 350TB, this means that it is possible to expand your in-memory column store to hundreds of Terabytes using Exadata Flash Cache combined with Database In-Memory. The format for in-memory on Flash is referred to as CELLMEMORY. By default, if the inmemory option is enabled, all HCC compressed tables in 12.2 will also be automatically declared to be in the CELLMEMORY format. Starting with Oracle Database 18c, all tables (not just HCC compressed tables) will be marked with CELLMEMORY format if the in-memory option is enabled. No user intervention required!  5. New from 12cR2 onwards: In-Memory on Active Data Guard A unique capability for Database In-Memory on Exadata, starting with Oracle Database 12cR2, is the ability to populate tables into the in-memory column store of an Active Data Guard standby database. This is a huge advantage for Exadata customers: if they run an Active Data Guard standby on Exadata, that standby can be used for real-time reporting on in-memory tables. This is a very powerful and flexible mechanism: the standby database can have a completely different set of tables populated into memory compared to the primary database, and if you are using multiple standby databases, each can have a completely independent in-memory column store. Many different combinations of in-memory tables across primary and standby databases are possible; for instance, recent partitions on the primary, older partitions on the standby, or, smaller tables on the primary (where there is also memory demand from OLTP workloads), larger tables on the standby (which can be exclusively used for Analytics), etc.   Deciding where to populate the contents of a table or partition is done by the new "INMEMORY DISTRIBUTE FOR SERVICE" syntax; you can use this syntax to populate a table or partition on any database instance (primary or secondary) where the specified service is enabled.    6. New in 18c: Automatic In-Memory In Oracle Database 18c, it is possible to automatically manage the contents of the column store on Exadata using the new Automatic In-Memory mechanism. Automatic In-Memory is useful when it is difficult to identify an exact set of Database segments to populate into the column store, and the total size of potential in-memory candidates exceeds the size of available memory. With Automatic In-Memory enabled, the system will automatically track accesses to the in-memory column store and make room for hotter objects by automatically evicting colder objects.  Conclusion Exadata is the premier, flagship platfom for Oracle Database. Not only does Database In-Memory bring additional analytic workload performance to this amazing platform (just as a supercharger brings additional performance to a high end sports car), it also brings with it advanced and unique functionality specific to Exadata.  So the answer to the question: "Does Database In-Memory provide additional value if I already have Exadata?" is a resounding "Yes."  Get the supercharger for your sports car (or for your more restrained looking sedan or SUV) and have fun at each stoplight!         

I thought it would be a good idea to remind everyone why In-Memory is an excellent complementary feature for Exadata users.  I again encountered this question in a recent meeting with a customer: "I...

Oracle Database In-Memory Helps Die Mobiliar Deliver Innovative Product Offerings and Manage Risk

Insurance is serious business that impacts the lives of individuals and helps corporations manage risk. And who better to understand that than Die Mobiliar. Die Mobiliar is a leader in the insurance industry and is the top insurer in Switzerland. It is number one for household contents, business and pure risk life insurance. Die Mobiliar has a home grown application called RiCo (Risk Controlling) that is used for offering new products, managing risk, as well as for compliance reporting to authorities and auditors. The RiCo system manages customers, contracts, products, tariffs and contractual items. Contract details span over more than a decade of history. Die Mobiliar uses Oracle as the data platform, and this is front ended by a SAS analytics application. One of their key challenges has been to deal with increasing complexity of analytics queries in the face of increasing data volumes. Sub 2 minute response times for their workloads is a key requirement.  A high level of flexibility for adding and changing insurance product offerings is important. These requirements lead to increasing complexity in their analytic queries:  increasingly complex filtering, comparison of data sets on the fly, joining of more data sources. This is where Die Mobiliar turned to Oracle Database In-Memory and realized orders of magnitude faster response times. The Die Mobiliar team leaders had the following to say: “Very complex queries should not take more than two minutes of runtime. That allows me to perform explorative analysis efficiently. With In-Memory I can achieve this in most cases.” [die Mobiliar statistic team leaders perspective] “We aim at a better service for the business. We test increasingly demanding queries. In-Memory gives us new opportunities at the lowest possible effort for all teams.” [die Mobiliar application team leader perspective] “I can’t believe it’s already finished.” [die Mobiliar DBA perspective]   Paolo Kreth, head of data management team at Die Mobiliar, spoke at the Oracle Database In-Memory & Multitenant Virtual Conference (link). See the “Real World Experience with Oracle Database In-Memory” recording. Paolo’s presentation can be found here. Also Paolo talked to us about how easy it was for Die Mobiliar to leverage Oracle Database In-Memory for performance gains of several orders of magnitude. Watch the video:  

Insurance is serious business that impacts the lives of individuals and helps corporations manage risk. And who better to understand that than Die Mobiliar. Die Mobiliar is a leader in the insurance...

Automatic In-Memory, External Table Support and Greater Performance in Oracle Database 18c

Oracle Database 18c is here, and it comes with some great new features for Oracle Database In-Memory which you won’t want to miss. In case you haven’t already done so, make sure you take a quick look at Dom Giles’s blog post to learn about Oracle Database 18c. It provides a primer on all the new innovations in Oracle Database 18c. Now, it has been just over three years since we introduced the ground breaking Oracle Database In-Memory capability with our dual format architecture, all seamlessly integrated into Oracle Database. We certainly have not been resting on our laurels since then. In Oracle Database 12c Release 2 (12.2) we greatly enhanced Oracle Database In-Memory: new features to help manage the column store with Automatic Data Optimization, expansion of the column store with Active Data Guard and Exadata Flash Cache, and increased performance with In-Memory Expressions and Join Groups. See this blog post for details. We continue this innovation journey with Oracle Database 18c. Now, with this latest release, you can automatically manage the objects in the IM column store via the new automatic in-memory capability. You can also expand the reach of Database In-Memory by running analytic queries on non-Oracle data via the support for external tables. In addition, there are a number of performance enhancements, a couple of which are highlighted in this blog. We’re really excited about this release and we will be blogging about all of the new Database In-Memory features in the coming weeks, but here’s a teaser to get you started thinking about Oracle Database 18c. Automatic In-Memory We introduced Automatic Data Optimization (ADO) for Database In-Memory in Oracle Database 12.2. ADO allowed policy based management of objects in the IM column store. In Oracle Database 18c we’ve added a new feature called Automatic In-Memory. This expands on the management of the IM column store by allowing you to automatically manage the contents of the IM column store based on Heat Map data. This is another step on our journey to making the IM column store self-managing. When Automatic In-Memory is enabled, if the size of the objects that have been enabled for in-memory exceeds the size of the IM column store, and the population of a new object won’t fit, the least active object(s) will be evicted to make enough room to populate the new object. All of this can happen automatically without needing any administrative intervention. In-Memory External Tables In-Memory External Tables builds on the theme of expanding analytic queries to all data, not just Oracle native data. Oracle Database already supports accessing external data with features like External Tables and Big Data SQL to allow fast and secure SQL query on all types of data. In-Memory External Tables allow essentially any type of data to be populated into the IM column store. This means non-native Oracle data can be analyzed with any data in Oracle Database using Oracle SQL and its rich feature set and also get the benefit of using all of the performance enhancing features of Database In-Memory. In-Memory Dynamic Scans Building on what we call the high-performance options of 12.2, In-Memory Expressions and Join Groups, we’ve added a really innovative feature called In-Memory Dynamic Scans (IMDS). This can supercharge Database In-Memory performance. With IMDS, when additional CPU is available Database In-Memory will automatically parallelize its scans of data in the IM column store. It does this by using multiple lightweight threads of execution within a process. This allows Database In-Memory to further increase scan performance by utilizing idle CPU resources. All of this is controlled by the Resource Manager so you don’t need to be worried about impacting the performance of your other database tasks. In-Memory Optimized Arithmetic Another performance innovation in Oracle Database 18c is the ability to create In-Memory Optimized Arithmetic encoded numbers. Similar to what In-Memory Expressions did in 12.2 with the storing of pre-computed expressions, In-Memory Optimized Arithmetic builds on the ability to pre-optimize NUMBER format data types in the IM column store. This enables faster arithmetic calculations using SIMD hardware and can significantly improve performance by performing arithmetic operations natively right in the hardware. Columnar Format Exadata Flash Cache Enhancements With Oracle Database 18c we now support accessing non-HCC objects in the Exadata Flash cache in full Database In-Memory columnar format. In 12.2 this was restricted to just HCC objects. We blogged about this new ability here since it is dependent on Exadata storage server software as well. The bottom line is that with Exadata it is possible to easily have a multi-terabyte column store and perform analytic queries on huge sets of data. We have also made a lot of other miscellaneous improvements to make Database In-Memory even better at scanning and filtering data enabling you to run your analytic queries even faster than before. Oracle Database 18c continues our commitment to innovate and improve the performance of Database In-Memory. In the upcoming weeks, we will be blogging in detail about these new features and more. In the meantime you can start exploring what’s new in Database In-Memory by taking a look at the new documentation here.  

Oracle Database 18c is here, and it comes with some great new features for Oracle Database In-Memory which you won’t want to miss. In case you haven’t already done so, make sure you take a quick look...

Oracle Database In-Memory - Consultation and Advisory Workshops for Customers

Get the Best Out of Oracle In-Memory   Who Are you an Oracle In-Memory customer but have not yet got around to unleashing its potential for blazingly fast analytics? Do you want to ensure you are targeting the right use case scenarios and getting the best bang for the buck? Do you need a helping hand from an expert to ensure early success of your initial deployment? What Oracle In-Memory, introduced in Oracle 12c, can supercharge your reporting and analytics workloads. It is super easy to use and get started with, and requires no changes to your application. However, it is important to target the right use cases and workloads to maximize your performance results. So a little guidance can’t hurt. Our Offer What better way to do that than have an Oracle In-Memory Product Manager provide personalized guidance? We will setup an online consultation, and can offer a full-day onsite workshop if necessary.  At no cost to you!           Online consultation or onsite workshop can involve: · Discussion of your solution objectives and identification of the right use cases and workloads for the best results · Best practices for deploying Oracle In-Memory Contact & Next Steps Interested in unleashing the power of In-Memory?  If so, please reach out to Oracle In-Memory product management team to setup your consultation session.   Contact: Raj Rathee, raj.rathee@oracle.com Disclaimers Please note that availability of this offer will be dependent on availability of Oracle Product Management resources, and is being offered at our discretion. We may not be able to meet all requests especially for the onsite full day advisory workshop. It is expected that interested customers will have familiarized themselves with Oracle In-Memory and have taken concrete steps to identify target use cases, and have run the Oracle Database In-Memory advisor. 

Get the Best Out of Oracle In-Memory   Who Are you an Oracle In-Memory customer but have not yet got around to unleashing its potential for blazingly fast analytics? Do you want to ensure you are...

Lufthansa Keeps Us Safe With Oracle Database In-Memory

I bet you’ve heard of statistics that show flying is actually safer than driving your car.  According to the National Safety Council (link) your odds of dying in a motor vehicle crash is 1 in 114 compared to 1 in 9,821 odds of dying in air and space transport incidents. Funny how these stats don’t help one bit when my plane is taxing down the runway ready to take off! I travel a lot in my job and one of my favorite airlines for jumping across the pond to Europe and beyond is Lufthansa. Till now I loved Lufthansa for the overall service and predictably pleasant experience. Now I have another reason! Recently l learnt how Lufthansa goes to remarkable extents to keep its planes safe and in tip-top condition.  Every plane, every 6-10 years, goes through what is called a D-check. This takes 4-6 weeks and 30-50K man-hours of labor. The airplane is nearly completely disassembled and every part is examined to see if it needs repair. The repairs are done by Lufthansa’s workshop or subcontractors. As you can imagine, live tracking of all these components is a big deal! Every component needs to be tracked, where it is at any point in time (being transported, being repaired in a workshop, or sitting in the warehouse), when and if it will be useable again, or if not, which other component can be used instead. Various business reports are constantly being run to keep track of it all. Guess what product makes this all possible. Well no surprise there! It is indeed Oracle Database and Oracle Database In-Memory that runs this tracking database. Mr. Thorsten Pensky, who works for Lufthansa Industry Solutions, an IT-provider and subsidiary of Lufthansa Group, presented his experience with Oracle Database In-Memory at Oracle Open World. Mr. Pensky pointed out that he saw query performance increases up-to a factor of 600.  Take a look at his presentation. No application changes required. Amazing Performance. Even gained storage saving by dropping some indexes. See this video where Mr. Thorsten Pensky summarizes his experience with Oracle Database In-Memory.    

I bet you’ve heard of statistics that show flying is actually safer than driving your car.  According to the National Safety Council (link) your odds of dying in a motor vehicle crash is 1 in...

Come Learn About Database In-Memory in 2018!

Happy New Year from the Database In-Memory team! We're going to jump right into the new year with lots of new information about Database In-Memory. You can join us at several upcoming events: I will be in Irving, TX on January 17th for an Oracle Master Class. I will be presenting a Database In-Memory Deep Dive session and then a By Example session where you can see Database In-Memory in action. If you're in the Dallas area please plan on coming and saying hello. Registration is available here. Next up will be the RMOUG Training Days 2018 conference (#TD2018) in Denver, CO on February 20-22. Hurry and register, the early bird deadline is January 12th, and this is a really great conference. One of the largest independent Oracle user group conferences in the US and Denver is a great place to visit. I will then head to Norway for the OUGN Spring Seminar (#OUGN18) in Oslo, Norway from March 8-10. This is one of the largest European conferences and is sure to be a great place to learn about all things Oracle, and of course, Database In-Memory. And to finish off the month of March I will be back at Oracle HQ and presenting at the Analytics and Data Summit 2018 on March 20-22. I'm really excited to be invited back since analytics is what we do! Stay tuned for even more events as we get the word out about how great Database In-Memory is and what you can do with it.    

Happy New Year from the Database In-Memory team! We're going to jump right into the new year with lots of new information about Database In-Memory. You can join us at several upcoming events: I will be...

Storing Values Up To 32KB In Size In The In-Memory Column Store

Oracle Database supports storing up to approximately 4000 bytes of data "inline" in the table row and if greater than 4000 bytes then the data is stored outside the table row (i.e. out-of-line). Since the VARCHAR2 data type is limited to 4000 bytes it is really just LOB data that is affected by this. However, starting in Oracle Database 12c Release 1 (12.1) we introduced the ability to have "Extended Data Types" which support up to 32K byte VARCHAR2 data types. This requires the init.ora parameter MAX_STRING_SIZE to be set to EXTENDED, but the whole in-line versus out-of-line storage issue still applies. So what does this have to do with Database In-Memory you might ask? The In-Memory column store (IM column store) will only populate data that is stored in-line in the on-disk segment. This means that for columns with data greater in size than 4000 bytes, the on-disk segment only stores a locator to the data, the actual data is stored in a separate LOB segment (i.e. out-of-line), and only the locator will be populated into the IM column store. For most LOB data that was OK because in general, LOBs aren't really something that you're accessing in analytic queries. But what about JSON documents? We introduced support for JSON in 12.1 and in Oracle Database 12c Release 2 (12.2) we support a binary JSON format in the IM column store to leverage DBIM features to allow high performance analytics directly on the JSON data. You might also be thinking that typically JSON documents can be quite large, and to add them to Oracle Database 12c you've had to use a LOB or a 32K VARCHAR2 because a normal 4000 byte VARCHAR2 was just too small. There are also many customers who would like to scan their JSON documents that they have stored in the database with analytic queries. Can these customers store these documents in the IM column store, even though they are larger than 4000 bytes? The answer is yes. If you do have JSON documents that are larger than 4000 bytes we can still populate them in the IM column store. There are a couple of things that we do specifically to optimize JSON data with Database In-Memory. The first is that when you define JSON columns we will automatically define virtual columns for those JSON columns under a setting to ensure that if they are populated they will be populated in a special binary format, similar to what we do with binary XML. We refer to this format as OSON, or Oracle JSON. This allows us to populate up to 32K bytes of OSON data in the column store. We do this by storing the OSON data in an In-Memory Expression Unit (IMEU), the same structure that we use for In-Memory Expressions.  This behavior requires that you verify the setting of two init.ora parameters and ensure that each JSON column has an "is json" check constraint: The use of extended data types must be enabled so you have to set MAX_STRING_SIZE to EXTENDED INMEMORY_EXPRESSIONS_USAGE must be set to ENABLE or STATIC_ONLY When a JSON column is defined a corresponding is json check constraint must also be defined The following shows the creation of the J_PURCHASEORDER table from the JSON Developer's Guide (Section 4 Creating a Table With a JSON Column) and the corresponding virtual column definition: That section also shows the insertion of two rows which I do not show here. Since the binary format of JSON (i.e. OSON) is typically much smaller than the text based format, this means that we can populate much larger JSON documents in the IM column store. This also means that you should be able to run analytic queries on a lot more of your JSON documents. This also applies to In-Memory Expressions (IME) as well. If you have large IMEs, that is greater than 4000 bytes but less than 32K bytes, we can also store those in the IM column store. Section 25 In-Memory JSON Data in the JSON Developer's Guide shows the steps necessary to populate the J_PURCHASEORDER table into the IM column store, and the expected result. The following is the plan I got from using SQL*Plus and the autotrace feature: Note that we have a TABLE ACCESS INMEMORY FULL access path and in the "Predicate Information" we can see that we're accessing the SYS_IME_OSON_000100000BF73832 column in-memory (from line 2). I have also listed the output from the V$IMEU_HEADER dynamic performance view to show that the 2 column values for the virtual column have actually been populated into an IMEU (and the corresponding object_id to show that it is associated with the J_PURCHASEORDER table): The following chart shows how JSON data is populated in the IM column store: If you're interested in exploring this further you can reference this information and examples in the In-Memory Guide and in the Database JSON Developer's Guide.  

Oracle Database supports storing up to approximately 4000 bytes of data "inline" in the table row and if greater than 4000 bytes then the data is stored outside the table row (i.e. out-of-line). Since...

Columnar Formats in Exadata Flash Cache

With the introduction of Oracle Database 12c Release 2 it is now possible to take advantage of the Database In-Memory columnar format, and all of its ground-breaking scan performance, in the Exadata flash cache. Unfortunately there isn't much documentation on this feature (see Section 7.9 Enabling or Disabling In-Memory Columnar Caching on Storage Servers in the Oracle Exadata System Software User's Guide). The following is a description of the different columnar formats available in the Exadata flash cache and how it works with Database In-Memory. Oracle Exadata Storage Server Software release 12.1.2.1.0 introduced the ability to transform Hybrid Columnar Compressed (HCC) data into a pure columnar format in the flash cache. The process to accomplish this rearranges 1 MB worth of HCC data into a true columnar form and stores the page mapping in the flash cache's hash table to tell us where the columns are. It is not uncommon to see a savings of 50-80% disk I/O from the columnar cache and significant improvements to the wall clock for queries which are cell CPU bound[1]. Many people mistakenly think that this format is the same as the pure in-memory columnar format used by Database In-Memory (DBIM). This is not the case. In fact, the Database In-Memory columnar format was introduced in Oracle Exadata Storage Server Software, but not until release 12.2.1.1.0 and is only used if the In-Memory column store has been enabled (i.e. inmemory_size is at least 100MB). If those conditions are met then for HCC objects only, the format used in the flash cache will be the same as those used for Database In-Memory. There is a two step process for this to happen. Initially, data from eligible scans will be cached in the 12.1.2.1.0 columnar cache format so that the columnar format is available immediately. Then in the background the data will be rewritten into DBIM format. The background process runs at a lower priority so that the rewrites don’t interfere with the main workload. By default the DBIM compression level used is MEMCOMPRESS FOR CAPACITY LOW. The rewrite to true DBIM formats is done so that Smart Scans can take advantage of SIMD vector instructions (Single Instruction Multiple Data values), dictionary encoding, vector group by, and other enhancements only available to the Database In-Memory columnar format. This also ensures that the rest of the query processing on the DB servers fully utilize the columnar properties of the dataset. With the release of Oracle Exadata System Software 18c (18.1.0) the limitation of only supporting HCC formatted data for the DBIM columnar format was removed and now uncompressed tables and OLTP compressed tables can also benefit from the DBIM columnar format. Again, the restriction remains that this only works if the In-Memory column store has been enabled, otherwise the 12.1.2.1.0 format is used, and only for HCC objects. Since all of this support is provided by the Oracle Exadata System Software, it is possible for both 12.1.0.2 databases (requires a minimum software version of 12.1.0.2.161018DBBP and the patch for bug 24521608) and 12.2.0.1 databases to take advantage of these features. One of the questions we're often asked is, "How do we take advantage of this new feature?" The good news is that you don't have to do anything to take advantage of it. It is done for you, automatically. Depending on the configuration of your environment, as detailed above, the data from eligible scans will be cached in the flash cache in one of the columnar formats. It is possible to change the compression used or to prevent segments from being eligible with the CELLMEMORY command detailed in Section 7.9 in the Oracle Exadata System Software User's Guide as described earlier. One of the other questions we have been asked is, "Will the columnar cached data consume all of the flash cache?" The answer is no. If there is no OLTP workload then DW workload, including columnar formatting, can consume 100% of the flash cache, but OLTP workload will push out the DW workload up to at least 50% of the size of the flash cache. This ensures that OLTP workload performance is not sacrificed for DW scans. Also, the DBIM format is used instead of caching the raw blocks so that there is no significant increase in flash usage. In 18.1 reformatting OLTP compressed blocks into DBIM format can significantly reduce the amount of flash needed as DBIM gets very good compression. The bottom line is that In-Memory columnar format in Exadata flash cache is one more differentiator that makes Exadata the best platform for Database In-Memory and allows customers to economically support terabytes of columnar formatted data for high performance analytic processing.   [1] This description is taken from Roger Macnicol's blog description in "What's new in 12.2: CELLMEMORY Part 1"

With the introduction of Oracle Database 12c Release 2 it is now possible to take advantage of the Database In-Memory columnar format, and all of its ground-breaking scan performance, in the Exadata...

What's New in 12.2: In-Memory Expressions, Part 3

In our previous two posts on In-Memory Expressions (IM expressions) we discussed the two types of IM expressions, user-defined In-Memory virtual columns and automatically detected In-Memory expressions. Now let's visit some of the details of IM expressions. Probably the first thing to note is that since IM expressions are stored in the IM column store, their use is going to require more memory to store the IM expressions. We can see this when we populate IM expressions. In Part 1 of our series, we populated a user-defined IM virtual column for an expression in the LINEORDER table. Let's take a look at the space this required. Prior to creating the virtual column, the IM column store had these segments and their sizes populated in my test system based on V$IM_SEGMENTS: Notice the INMEMORY_SIZE value for the LINEORDER table. Then we created a virtual column for the expression that we had identified and re-populated the table. The following were the new values for INMEMORY_SIZE in V$IM_SEGMENTS: It seems fairly obvious that if we are going to materialize expressions and store them in the IM column store that they are going to take up additional space. How much will depend on the data created. What's more interesting to note though, is that IM expressions are stored in a new structure called the In-Memory Expression Unit (IMEU). When an IM expression is created for a table, one IMEU is created for each In-Memory Compression Unit (IMCU). Recall that an IMCU is a read-only container that stores the columnar data for an object, and each object populated in the IM column store will be composed of one or more IMCUs. Just as you can see information about IMCUs in the views V$IM_HEADER and V$IM_COL_CU there are corresponding views with information for IMEUs called V$IMEU_HEADER and V$IM_IMECOL_CU. Now that we've set up IM expressions, how do we tell which objects have them and whether they're populated in the IM column store? Since IM expressions are based on virtual columns, all of the normal procedures to identify virtual columns applies to IM expressions. In other words, we can use the view ALL|DBA|USER_TAB_COLS to display any virtual columns and their attributes. Let's take a look at what our LINEORDER table looked like from Part 1 of the series: Note the column V1 and that it is a virtual column and the expression is visible as the default value for the column. Now that a virtual column is defined, how do we know if it is enabled for inmemory? Note that this is where user-defined IM virtual columns and automatically detected IM expressions begin to differ in behavior. To determine if a user-defined virtual column is enabled for in-memory storage we need to query the view V$IM_COLUMN_LEVEL: Note that we see the virtual column V1 in this view. Also note the value of UNSPECIFIED for the INMEMORY_COMPRESSION attribute. I will describe this in more detail. To determine if an automatically detected IM expression is enabled for in-memory storage we need to query the view DBA|USER_IM_EXPRESSIONS. It shows us the automatically detected IM expressions that are enabled for in-memory storage. The following shows the automatically detected IM expression from Part 2 of our series: Note that the name starts with SYS_IME and recall that since it is an automatically detected IM virtual column that it is implemented as a hidden virtual column. Other behavioral changes to note involve the population of the two types of IM expressions. The initialization parameter INMEMORY_VIRTUAL_COLUMNS only controls the population behavior of user-defined virtual columns. For automatically detected IM expressions the initialization parameter INMEMORY_EXPRESSIONS_USAGE controls the population behavior. The other big difference is that if a table has user-defined IM virtual columns and it is marked as NO INEMMORY and then re-enabled with INMEMORY, the population is still controlled by the INMEMORY_VIRTUAL_COLUMNS parameter. However, if a table with automatically detected IM expressions is marked as NO INMEMORY and then re-enabled with INMEMORY, the currently defined SYS_IME virtual columns will not be re-enabled. They will have been marked internally as disabled by the NO INMEMORY command. This is done to avoid populating automatically detected IM expressions that might no longer be among the most frequently accessed expressions. It is also possible to view the compression level for user-defined IM virtual columns in the view V$IM_COLUMN_LEVEL. As we noted above, there was a value of "UNSPECIFIED" in the INMEMORY_COMPRESSION field for the V1 virtual column. This means that the table's default MEMCOMPRESS level will be used (i.e. the column compression value was unspecified). If we set the compression to a different level then that level will be used and will be reflected in the view: Note that the setting of the initialization parameter INMEMORY_VIRTUAL_COLUMNS is set to ENABLE in this environment. If it was set to MANUAL then we would also have to explicitly enable the column for in-memory as well. The command would need to be the following: ALTER TABLE lineorder INMEMORY INMEMORY MEMCOMPRESS FOR QUERY HIGH(v1); We've shown how to determine if an IM expression is enabled for in-memory storage. But we asked, is there a way to tell if the IM expression has actually been populated? There is another view, V$IM_IMECOL_CU that shows the columns populated and number of IMEUs they occupy. In the case of the V1 IM virtual column from Part 1 we see: Which shows us that the V1 IM virtual column occupies 133 IMEUs. Recall that there is at least one IMEU that corresponds to every IMCU. To conclude, this series on In-Memory Expressions has described the two types of IM expressions, user-defined IM virtual columns and automatically detected IM expressions, it has shown the performance potential of one of the high-performance features of Database In-Memory, it has shown how the Optimizer and the ESS help automate the detection of frequently used expressions and their population into the IM column store, and then we've described some of the details of IM expressions so you can understand how they are implemented.    

In our previous two posts on In-Memory Expressions (IM expressions) we discussed the two types of IM expressions, user-defined In-Memory virtual columns and automatically detected In-Memory...

Oracle Open World 2017 - It's that time again!

It's almost time for Oracle Open World again. Open World starts on Sunday October 1 and runs through Thursday October 5th. There will be lots of Database In-Memory sessions, hands-on-labs and we will be in the Oracle demo grounds and Engineered Systems showcase as well. Here is a list of sessions to help with your planning. Oracle Presentations: HOL7584 - Oracle Database In-Memory Hands-on Lab Monday, Oct 02, 6:00 p.m. - 7:00 p.m. | Hilton San Francisco Union Square (Lobby Level) - Plaza Room A Tuesday, Oct 03, 11:30 a.m. - 12:30 p.m. | Hilton San Francisco Union Square (Lobby Level) - Plaza Room A Wednesday, Oct 04, 1:15 p.m. - 2:15 p.m. | Hilton San Francisco Union Square (Lobby Level) - Plaza Room A CON6661 - Oracle Exadata: Disruptive New Memory and Cloud Technologies, Monday, Oct 02, 2:15 p.m. - 3:00 p.m. | Moscone West - Room 3014, Juan Loaiza, Senior Vice President, Oracle CON6583 - Memory Without Bounds: Policy-Based Automation in In-Memory Column Store Content, Monday, Oct 02, 3:15 p.m. - 4:00 p.m. | Moscone West - Room 3010, Andy Rivenes, Product Manager, Oracle CON6584 - Oracle Database In-Memory Deep Dive: Past, Present, and Future Tuesday, Oct 03, 11:30 a.m. - 12:15 p.m. | Moscone West - Room 3014 Tirthankar Lahiri, Vice President, Oracle CON6682 - Revolutionize Analytics with Oracle Database In-Memory Tuesday, Oct 03, 12:45 p.m. - 1:30 p.m. | Moscone West - Room 3014, Juan Loaiza, Senior Vice President, Oracle CON6589 - Quick Start Your Oracle Database In-Memory Deployment: Step-by-Step Guide Wednesday, Oct 04, 1:00 p.m. - 1:45 p.m. | Moscone West - Room 3004, Raj Rathee, Product Manager, Oracle CON6660 - Applying Oracle Database 12c and Real-World Performance Techniques to SAP, Wednesday, Oct 04, 2:00 p.m. - 2:45 p.m. | Moscone West - Room 3012, Juergen Mueller, Oracle and Andrew Holdsworth, ,Vice President, Real World Performance, Oracle CON6590 - Oracle Database In-Memory: Oracle Experts Answer Your Questions, Wednesday, Oct 04, 3:30 p.m. - 4:15 p.m. | Moscone West - Room 3004 Customer presentations: SUN4939 - Best Practices for Getting Started with Oracle Database In-Memory 12c, Sunday, Oct 01, 9:45 a.m. - 10:30 a.m. | Moscone South - Room 160, Xinghua Wei, Woqutech SUN5644 - Oracle Database In-Memory: Adventures with SwingBench TPC-DS, Sunday, Oct 01, 11:45 a.m. - 12:30 p.m. | Moscone South - Room 153, Jim Czuprynski, Vion Corporation SUN5680 - The Best Oracle Database 12c and 12cR2 Tuning Features, Sunday, Oct 01, 3:45 p.m. - 4:40 p.m. | Moscone South - Room 153, Rich Niemiec, Viscosity North America CON4349 - Improving Performance with Oracle Database 12c In-Memory Option, Monday, Oct 02, 11:00 a.m. - 11:45 a.m. | Moscone West - Room 3022, Sergiy Smyrnov, Fong Zhuang, Walgreens CON1967 - Oracle In-Memory Applications for Reduced Latency in Maintenance Processes, Wednesday, Oct 04, 2:00 p.m. - 2:45 p.m., Sonja Meyer, Oracle, Thorsten Pensky, Lufthansa The OOW Session Content Catalog is now available so take a look and sign up! You can also use Focus on Documents for other In-Memory Technology sessions.    

It's almost time for Oracle Open Worldagain. Open World starts on Sunday October 1 and runs through Thursday October 5th. There will be lots of Database In-Memory sessions, hands-on-labs and we will...

What's New in 12.2: In-Memory Expressions, Part 2

In our previous post on In-Memory Expressions (IM expressions) we described what an IM expression is and showed the potential for additional high-performance acceleration using the feature. We said that there are two types of IM expressions, user-defined IM virtual columns and automatically detected IM expressions which are stored as hidden virtual columns. This post will focus on automatically detected IM expressions and how they work. In Oracle Database 12c Release 2 (12.2) a new feature called expression tracking was added. You can read more about this feature in an Optimizer blog post available here. The bottom line is that the Optimizer tracks expressions and stores them in a repository called the Expression Statistics Store (ESS). Along with the expression and other information, the number of times the expression was evaluated and the cost of evaluating the expression are tracked. This is exposed in the ALL|DBA|USER_EXPRESSION_STATISTICS view and Database In-Memory uses this information to determine the 20 most frequently accessed expressions in either the past 24 hours (i.e. LATEST snapshot) or since database creation (i.e. CUMULATIVE snapshot). The following is an example for the LINEORDER table in my test system for the LATEST snapshot: Notice that I've highlighted the expression from Part 1 of this series, as it shows up as one of the tracked expressions. In order to populate automatically detected IM expressions the INMEMORY_EXPRESSIONS_USAGE parameter must be set to either ENABLE (the default) or DYNAMIC_ONLY. In my test system it is set to the default value: To capture and populate expressions tracked by the ESS for objects populated in the IM column store we need to run two procedures. The first will "capture" the expressions by identifying the top 20 expressions for objects populated in the IM column store and will create hidden virtual columns for the expression(s) in the appropriate tables. Note that because automatically detected IM expressions are implemented as hidden virtual columns, they must also meet the restrictions for virtual columns. The virtual columns for automatically detected IM expressions will begin with the string "SYS_IME". The following will show the capture for the LATEST, or CURRENT snapshot (note that the ALL|DBA|USER_EXPRESSION_STATISTICS view uses LATEST and the DBMS_INMEMORY_ADMIN.IME_CAPTURE_EXPRESSIONS procedure uses CURRENT): Note that I first executed the DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO procedure to flush the expression tracking data to the ESS. Normally this is done automatically every 15 minutes, but I wanted to make sure that I didn't miss any data. Once this has been run we will see that the top 20 in-memory expressions will have been stored in their respective tables as hidden virtual columns. In my test system, the expression for the LINEORDER table that we saw above was identified as one of the top 20 and was added to the LINEORDER table. We can see this in the view USER_TAB_COLS: Notice that a hidden, virtual column named SYS_IME000100000025B970 has been created and the DATA_DEFAULT attribute shows the expression. How do we know that the column has been enabled for inmemory? We can query the view DBA|USER_IM_EXPRESSIONS views. In this case I queried the user_im_expressions view: Now we have to re-populate the LINEORDER table to take advantage of the newly added IM expression. This will be done as IMCUs are re-populated based on DML changes, or the DBMS_INMEMORY_ADMIN.IME_POPULATE_EXPRESSIONS procedure can be used to force the population of all expressions captured in the latest invocation of the DBMS_INMEMORY_ADMIN.IME_CAPTURE_EXPRESSIONS procedure. The following forces the population of any expressions that were added: Since IM expressions are stored in the IM column store they do consume memory. How much will depend on the data. You can see this in the V$INMEMORY_AREA and V$IM_SEGMENTS views. We will talk about this plus other internal details of IM expressions in the final part of this series.  

In our previous post on In-Memory Expressions (IM expressions) we described what an IM expression is and showed the potential for additional high-performance acceleration using the feature. We said...

Database In-Memory

How to control where objects are populated into memory on RAC

If you recall from part 1 of our blog series on RAC each RAC node has it’s own In-Memory column store (IM column store). When a table is populated into memory in a RAC environment it will be automatically distributed across all of the IM column stores in the cluster. That is to say, a piece of the table will appear in each RAC node, effectively making the IM column store a shared-nothing architecture. We have also discussed how you can use Database Services to control where an object would be populated into memory, in a RAC environment, as long as the object was manually populated (no INMEMORY PRIORTY specified). Starting in Oracle Database 12c Release 2, it is now much easier to selectively populate objects to specific IM column stores, in a RAC environment, regardless of how they’re populated, by taking advantage of the new DISTRIBUTE FOR SERVICE subclause of the INMEMORY attribute. The new FOR SERVICE subclause of the INMEMORY DISTRIBUTE clause enables you to control exactly where an object is populated into memory based on where a service is allowed to run. If the service is stopped then the objects distributed for that service would be automatically removed from the IM column store(s). The FOR SERVICE sub-clause was actually introduced to manage where objects should be populated into memory in an Active Data Guard environment (primary or standby). However, this post will focus on the secondary use, controlling the location where objects will be populated into memory in a RAC environment. Let’s take a look at an example to see exactly how the new DISTRIBUTE FOR SERVICE clause works. If you prefer to watch a video of the demo just scroll to the bottom of this post. Imagine we have a two-node RAC cluster (RAC1 and RAC2), on which 3 services have been defined: srvctl add service -db orcl -service IM1 –preferred “rac1” srvctl add service -db orcl -service IM2 -r preferred “rac2” srvctl add service -db orcl -service IM_ALL -r preferred “rac1,rac2” srvctl start service -db orcl -service "IM1,IM2,IM_ALL" srvctl status service -db orcl Service IM1 is running on instance(s) rac1 Service IM2 is running on instance(s) rac2 Service IM_ALL is running on instance(s) rac1,rac2 Now let’s assume we would like to populate the CUSTOMERS table into memory on RAC1, the PRODUCTS table on RAC2 and have the SALES table distributed across the IM column store on both RAC nodes. The new syntax that allows us to do this is really quite simple: ALTER TABLE customers INMEMORY PRIORITY HIGH DISTRIBUTE FOR SERVICE IM1; ALTER TABLE products INMEMORY PRIORITY MEDIMUM DISTRIBUTE FOR SERVICE IM2; ALTER TABLE sales INMEMORY PRIOIRTY HIGH; Let’s now look at the v$ performance views (gv$INMEMORY_AREA and gv$IM_SEGMENTS) to see what’s happening inside the IM column stores, across the RAC nodes. INST_ID POOL ALLOC_BYTES USED_BYTES POPULATE_STATUS ---------- --------------- ----------- ---------- --------------------- 1 1MB POOL 854589440 32505856 DONE 1 64KB POOL 201326592 917504 DONE 2 1MB POOL 854589440 70254592 DONE 2 64KB POOL 201326592 1703936 DONE As you can see, we have populated data into the IM column store on each of the RAC nodes but we will need to check gv$IM_SEGMENTS to see which objects have been populated into memory and where. SELECT v.inst_id, v.owner, v.segment_name name, v.populate_status, SUM(v.bytes_not_populated)/1024/1024 MB_not_populated FROM gv$im_segments v GROUP BY v.inst_id, v.owner, v.segment_name, v.populate_status; INST_ID OWNER NAME POPULATE_STAT MB_NOT_POPULATED ------ ---------- -------------------- ------------- -------------- 2 SH SALES COMPLETED 262.968 1 SH CUSTOMERS COMPLETED 0 1 SH SALES COMPLETED 476.429 2 SH PRODUCTS COMPLETED 0 The CUSTOMERS table is fully populated in the IM column store on RAC1, while the PRODUCT table is fully populate in the IM column store on RAC2 (BYTES_NOT_POPULATED is 0). Since we didn’t specify a DISTRIBUTE FOR SERVICE clause on the SALES table, it has been partially populated on both (BYTES_NOT_POPULATED is not 0). In order to confirm the SALES table is fully populated we will have to look at a different view called gv$IM_SEGMENTS_DETAIL and compare the number of blocks in-memory on each node to the total number of blocks in the SALES table. SELECT m.inst_id, m.blocksinmem, m.datablocks FROM gv$im_segments_detail m, user_objects o WHERE m.dataobj = o.object_id AND o.object_name = 'SALES'; INST_ID BLOCKSINMEM DATABLOCKS ---------- ----------- ---------- 1 33660 94643 2 60983 94643 As you can see the SALES table is made up of 94,643 blocks. 33,660 blocks are populated on node 1 and 60,983 on node 2. Adding these values together we see that all 94,643 blocks from the SALES table are populated into memory. 33660 + 60983 = 94643 So, now we know everything is populated correctly, let’s see what happens if we connect to the database via the service IM1 and run a query that accesses all three tables. SELECT c.cust_city, p.prod_name, SUM(s.amount_sold) FROM sales s, customers c, products p WHERE s.cust_id = c.cust_id AND s.prod_id = p.prod_id GROUP BY c.cust_city, p.prod_name; From the execution plan, we can see that the goal is to access some or all of the data in each of the tables from the IM column store, as the access method chosen for each table is TABLE ACCESS INMEMORY FULL. ----------------------------------------------------- | Id | Operation | Name | ----------------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | SORT AGGREGATE | | | 2 | VIEW | | | 3 | HASH GROUP BY | | | 4 | HASH JOIN | | | 5 | TABLE ACCESS INMEMORY FULL | CUSTOMERS | | 6 | HASH JOIN | | | 7 | TABLE ACCESS INMEMORY FULL| PRODUCTS | | 8 | TABLE ACCESS INMEMORY FULL| SALES | ------------------------------------------------------ However, when we look at the In-Memory session statistics we can see that only two tables were actually accessed via the IM column store (table scan (IM) is 2) and one table was accessed via the disk (IM scan segments disk is 1). select t1.name, t2.value FROM v$sysstat t1, v$mystat t2 WHERE t1.name IN ('table scans (long tables)', 'table scans (IM)', 'session logical reads', 'session logical reads - IM', 'IM scan rows', 'IM scan segments disk') AND t1.statistic# = t2.statistic# ORDER BY t1.name; NAME VALUE ------------------------- ---------- IM scan rows 2093832 IM scan segments disk 1 session logical reads 109548 session logical reads - IM 45469 table scans (IM) 2 table scans (long tables) 1 6 rows selected. Since the PRODUCTS table is in-memory only on the IM2 service, we won’t be able to access it from the IM1 service. In reality if our three tables were going to be used by an application that connects to the database via the IM1 service, we should specify that service in the DISTRIBUTE FOR SERVICE clause on all three tables. Let’s do that now and see what happens to our query. ALTER TABLE products INMEMORY PRIORITY MEDIMUM DISTRIBUTE FOR SERVICE IM1; ALTER TABLE sales INMEMORY PRIOIRTY HIGH DISTRIBUTE FOR SERVICE IM1; Changing the DISTRIBUTE FOR SERVICE clause on the PRODUCTS and SALES tables will trigger those tables to be removed from the IM column store completely and repopulated honoring the new distribution method. SELECT v.inst_id, v.owner, v.segment_name name, v.populate_status, SUM(v.bytes_not_populated)/1024/1024 MB_not_populated FROM gv$im_segments v GROUP BY v.inst_id, v.owner, v.segment_name, v.populate_status; INST_ID OWNER NAME POPULATE_STAT MB_NOT_POPULATED ------ ---------- -------------------- ------------- -------------- 1 SH CUSTOMERS COMPLETED 0 Once the repopulation completes we will see all three table fully populated on RAC1. SELECT v.inst_id, v.owner, v.segment_name name, v.populate_status, SUM(v.bytes_not_populated)/1024/1024 MB_not_populated FROM gv$im_segments v GROUP BY v.inst_id, v.owner, v.segment_name, v.populate_status; INST_ID OWNER NAME POPULATE_STAT MB_NOT_POPULATED ------ ---------- -------------------- ------------- -------------- 1 SH SALES COMPLETED 0 1 SH CUSTOMERS COMPLETED 0 1 SH PRODUCTS COMPLETED 0 Now if we try our query again, we will get the same execution plan but the session statistics show we really are accessing all three tables from the IM column store, as expected. SELECT t1.name, t2.value FROM v$sysstat t1, v$mystat t2 WHERE t1.name IN ('table scans (long tables)', 'table scans (IM)', 'session logical reads', 'session logical reads - IM', 'IM scan rows', 'IM scan segments disk') AND t1.statistic# = t2.statistic# ORDER BY t1.name; NAME VALUE ------------------------- ---------- IM scan rows 5040744 IM scan segments disk 0 session logical reads 109548 session logical reads - IM 109405 table scans (IM) 3 table scans (long tables) 1 6 rows selected. Finally, let’s see what happens if we stop one of database service IM1. srvctl stop service -db main -service "IM1" Now that service is stopped, let's check the contents of the IM column stores across the cluster. SELECT v.inst_id, v.owner, v.segment_name name, v.populate_status. SUM(v.bytes_not_populated)/1024/1024 MB_not_populated FROM gv$im_segments v GROUP BY v.inst_id, v.owner, v.populate_status; No rows Since we have stopped the service that all of tables were associated with, they have all been removed from the column store.

If you recall from part 1 of our blog series on RAC each RAC node has it’s own In-Memory column store (IM column store). When a table is populated into memory in a RAC environment it will...

What's New in 12.2: In-Memory Expressions

In-Memory Expressions (IM expressions) provide the ability to materialize simple expressions and store them in the In-Memory column store (IM column store) so that they only have to be calculated once, not each time they are accessed. They are also treated like any other column in the IM column store so the database can scan and filter those columns and take advantage of all Database In-Memory query optimizations like SIMD vector processing and IM storage indexes. The Database In-Memory Guide, in case you missed it we created a manual dedicated to Database In-Memory in 12.2, defines an In-Memory Expression as "... a combination of one or more values, operators, and SQL or PL/SQL functions (DETERMINISTIC only) that resolve to a value." I often tell people when I'm presenting the feature that "the fastest way to do something is to not do it at all"[1], but the next fastest way to do something is to do it only once. I think that's the easiest way to think about In-Memory Expressions. There are actually two types of IM expressions, a user-defined In-Memory virtual column (IM virtual column) that meets the requirements of an IM expression, and automatically detected IM expressions which are stored as a hidden virtual column when captured. The automatically detected IM expressions are captured in the new Expression Statistics Store (ESS). IM expressions are fully documented in the In-Memory Guide so I'm not going to repeat that documentation here. Instead, let's take a look at what an IM expression is, how to create them, and why IM expressions are one of the high-performance features of Database In-Memory. In this post, we will look at how to create user-defined IM virtual columns and how they can significantly improve query performance. In the next post, we will explore automatically detected IM expressions and the details of how they work. Let's start with an example. In our other blog posts, and in our Hands On Labs, we use a modified Star Schema Benchmark (SSB) schema. Many of the queries that we use have expressions in them. The following is an example: SELECT lo_shipmode, SUM(lo_ordtotalprice), SUM(lo_ordtotalprice - (lo_ordtotalprice*(lo_discount/100)) + lo_tax) discount_price FROM LINEORDER GROUP BY lo_shipmode ORDER BY lo_shipmode; The following shows how the query ran on my test system: Notice that the query ran in 2.69 seconds. The execution plan shows in-memory access: And the session statistics confirm that we accessed the IM column store: Notice the following expression in the query: (lo_ordtotalprice - (lo_ordtotalprice*(lo_discount/100)) + lo_tax) This expression is simply an arithmetic expression to find the total price charged with discount and tax included, and meets our criteria that we defined at the beginning of the post. We will create a virtual column and re-populate the LINEORDER table to see what difference it makes. To start we need to check a couple of things. To make sure that we populate virtual columns in the IM column store we need to ensure that the initialization parameter INMEMORY_VIRTUAL_COLUMNS is set to ENABLE or MANUAL. The default is MANUAL which means that you must explicitly set the virtual columns as INMEMORY enabled. In this test environment I have set it to ENABLE to automatically populate any virtual columns in tables we populate into the IM column store: Next, we will add a virtual column to the LINEORDER table for the expression we identified above and re-populate the table: We can verify that the LINEORDER table has been fully populated by querying the V$IM_SEGMENTS view: Now let's re-run our query and see if there is any difference: Notice that now the query runs in only 1.30 seconds! The query ran in 2.69 seconds without the IM expression. The session statistics show that we accessed IM expressions: Notice the statistics that start with "IM scan EU ...". IM expressions are stored in the IM column store in In-Memory Expression Units (IMEUs) rather than IMCUs, and the statistics for accessing IM expressions all show "EU" for "Expression Unit". This simple example shows that even relatively simple expressions can be computationally expensive, and the more frequently they are executed the more savings in CPU resources IM expressions will provide. There's much more to IM expressions and in the next post we will explore automatically detected IM expressions, and then in the final post we will take a look at some of the internal details of IM expressions. [1] I attribute this to Cary Millsap since he was the first person I heard say this phrase in describing Oracle performance.

In-Memory Expressions (IM expressions) provide the ability to materialize simple expressions and store them in the In-Memory column store (IM column store) so that they only have to be calculated...

Oracle In-Memory Toolkit for SAP Applications

If you're running SAP NetWeaver applications and you don't know about the Oracle 12c In-Memory Toolkit for SAP NetWeaver then this post is for you. Based on work done by the Oracle Real World Performance Team, the Oracle 12c In-Memory Toolkit can be used to enable your SAP applications to take advantage of Oracle Database In-Memory on Oracle Database 12c. The toolkit is also available as SAP Note 2351252. The toolkit works with SAP BW as well as  OLTP-like applications, for example SAP ECC and SAP SCM. In the case of SAP BW, the toolkit enables In-Memory for all NetWeaver 7.x versions supported for Oracle Database 12c. There is no need to be on the latest SAP BW version. The toolkit allows you to populate specific SAP BW objects (i.e. DSO, InfoCubes, etc.) into the Database In-Memory column store (IM column store). To replace an existing BWA the toolkit generates In-Memory materialized views. For OLTP-like applications such as ECC and SCM, the toolkit helps to generate and manage the tables for real-time reporting. Once these tables are populated into the IM column store, the real-time reports can be run directly on the OLTP system with no impact on the OLTP workload. Real-time reports in this context mean dashboard type reports. For example, reports that answer questions like what is the revenue to date? Which material has the lowest inventory? How many sales orders were processed this week, this month, this quarter?  For these kinds of reports there is no need to export the data to BW or any other tool! For decades, SAP reports are written with the following methodology: Step 1: Read the entire table(s) of interest into the application server Step 2: On the application server run through FOR LOOPs to apply the report logic within the FOR LOOP and check the logic against the database Step 3: Spool the report This obsolete style of data processing dates back to the days of the mainframe with the goal of saving expensive CPU cycles. It is much more efficient with databases like Oracle to focus on set based processing rather than row by row processing in the application. The Oracle database can process large row sets much more efficiently and this type of processing also leverages all of the optimizations in the IM column store. In short, you should also think about simplifying your old-fashioned SAP reports to fully utilize the performance improvements of Oracle Database and Database In-Memory. The toolkit will generate SQL scripts to enable objects to be populated into the IM column store. The toolkit consists of a tar file that contains SQL scripts and instructions on how to use the toolkit. In addition to that there is also contact information if you need help or want to provide feedback, and a link to an Oracle Learning Library tutorial to help you set up and use the toolkit.  

If you're running SAP NetWeaver applications and you don't know about the Oracle 12c In-Memory Toolkit for SAP NetWeaver then this post is for you. Based on work done by the Oracle Real World...

What's New in 12.2: IM FastStart

In-Memory FastStart was introduced in 12.2 to speed up the re-population of the IM column store when an instance is restarted. IM FastStart works by periodically checkpointing IMCUs to a designated IM FastStart tablespace. This is done automatically by background processes. The motivation for FastStart is to reduce the I/O and CPU intensive work required to convert row based data into columnar data with the associated compression and IM storage indexes that are part of the population process. With IM FastStart the actual columnar formatted data is written out to persistent storage and can be read back faster and with less I/O and CPU than if the data has to be re-populated from the row-store. It is also worth noting that if the IM FastStart tablespace fills up or becomes unavailable the operation of the IM column store is not affected. The IM FastStart area does not change the behavior of population. Priorities are still honored and if data is not found in the IM FastStart area, that is it hasn't been written to the IM FastStart area yet or it has been changed, then that data is read from the row-store. If a segment is marked as "NO INMEMORY" then it is removed from the IM FastStart area. The bottom line is that IM FastStart hasn't changed the way Database In-Memory works, it just provides a faster mechanism to re-populate in-memory data. So how does this work? The first thing you have to do is to enable IM FastStart. You do this by designating a tablespace that you create as the IM FastStart tablespace. It must be empty and be able to store the contents of the IM column store. Oracle recommends that the tablespace be sized at twice the size of the INMEMORY_SIZE parameter setting. On my test system, my INMEMORY_SIZE setting is 1500 MB so I created a tablespace that was 3000 MB: To enable the IM FastStart area I then ran the FASTSTART_ENABLE procedure in the DBMS_INMEMORY_ADMIN package and specified the tablespace name that I created in the previous step: To check the status and size of the IM FastStart tablespace you can query the V$INMEMORY_FASTSTART_AREA view: The actual IM FastStart data is written to a SecureFile LOB in the IM FastStart tablespace. You can display the LOB information from the DBA_LOBS view: And that's all there is to it. We are now using the IM FastStart area for my database. Let's say that my test database is brought down or maybe we alter one or more of the tables that are enabled for "INMEMORY" to "NO INMEMORY". How do we tell that we used the IM FastStart area to re-populate these objects? There is a new system statistic that tells us how many IMCUs were read from the IM FastStart area. The statistic is "IM faststart read CUs requested" and is available in the V$SYSSTAT view. To conclude, IM FastStart provides a new feature to speed the re-population of the IM column store and enable the database to provide the full benefits of Database In-Memory in less time when re-starting a database instance.  

In-Memory FastStart was introduced in 12.2 to speed up the re-population of the IM column store when an instance is restarted. IM FastStart works by periodically checkpointing IMCUs to a designated IM...

Oracle Database In-Memory

What's New in 12.2: Join Groups

I'm starting a new series called "What's New in 12.2" and I will highlight the new Database In-Memory features in Oracle Database 12c Release 2 as outlined in this blog post here and in the Database In-Memory technical white paper available here. To further improve join performance in 12.2 a new feature called Join Groups was introduced. A Join Group is created explicitly with a DDL statement and tells Oracle that the named columns will be used in joins. When a Join Group is created the compression dictionary entries for the specified columns will be shared and this allows the join to be performed directly on the dictionary entries rather than the data values themselves. This saves CPU by not having to decompress and hash the data to perform the join. This use of a common dictionary requires that the segments be re-populated in the IM column store after the Join Group has been created. When a segment is populated into the IM column store a common dictionary is created for all distinct values for the column in the table. The following illustrates a simple example. Notice that all CUs share the same dictionary entries. With a Join Group defined the dictionary entries for the specified columns are shared between the segments specified. The following illustrates an example of a Join Group and the synchronization of the common dictionary between the two segments. Let's see how we can use Join Groups in a real example. A Join Group is created with a new JOIN GROUP DDL command. The following commands will create Join Groups for our SSB schema: CREATE INMEMORY JOIN GROUP lineorder_jg1 ( lineorder(lo_orderdate), date_dim(d_datekey)); CREATE INMEMORY JOIN GROUP lineorder_jg2 ( lineorder(lo_partkey), part(p_partkey)); CREATE INMEMORY JOIN GROUP lineorder_jg3 ( lineorder(lo_suppkey), supplier(s_suppkey)); Once created with the above SQL the segments then need to be re-populated. You can query the user_joingroups view to see if the Join Group definitions have been created: Now we will run a query to see if we can make use of the Join Groups that we have defined. We'll use the following SQL: SELECT   d.d_year, p.p_brand1, sum(lo_revenue) rev FROM     lineorder l,          date_dim d,          part p,          supplier s WHERE    l.lo_orderdate = d.d_datekey AND      l.lo_partkey   = p.p_partkey AND      l.lo_suppkey   = s.s_suppkey AND      p.p_category     = 'MFGR#12' AND      s.s_region     = 'AMERICA' AND      d.d_year     = 1997 GROUP BY d.d_year, p.p_brand1; In order to determine if we actually used the Join Group(s) we have to check a SQL Monitor Active Report for the query. Using SQL Monitor Active Reports is currently the only way to verify that you used Join Groups in your query. Notice that I have created a SQL Monitor Active Report below and clicked on the eye glasses icon for one of the hash joins (the fourth one on Line 7 in this case) and we can see that "Columnar Encodings Leveraged" is set to 1. This means that we used a Join Group. If it was any other value then we would not have used a Join Group. It is also possible to see "Columnar Encodings Observed" and other non-zero values for "Columnar Encodings Leveraged" and these are used for a new type of hash join that was introduced and is not associated with Join Groups. And there you have it. You can create Join Groups for segments that you know will be joined often. For example, in Data Warehouse environments where you have star schemas and know which columns are used to join dimensions tables with fact tables. Just like our example SSB star schema.  

I'm starting a new series called "What's New in 12.2" and I will highlight the new Database In-Memory features in Oracle Database 12c Release 2 as outlined in this blog post here and in the Database...

Upto 200X Faster Analytics - Insurance Leader Die Mobiliar Implements Real-Time Risk Analysis With Database In-Memory

Real-time risk analysis is a huge competitive advantage in the insurance industry. Being able to perform risk-analysis on live operational data even as transactions are going on is a big deal! It enables the insurer real-time-insight, visibility and agility for the business. Not only does this improve the customer experience, but also allows for the real time up/cross selling of additional products for new revenue opportunities. Die Mobiliar was able to deploy real-time risk-analysis by moving away from the traditional latency ridden approach that involves moving operational data via ETL to a separate DW. They did this by leveraging Oracle Database In-Memory. With Oracle Database In-Memory, Die Mobiliar was able to: Deliver on real-time risk analysis Maintain a single copy of data and eliminate need for new analytic indices and materialized views Save on valuable resources since data transformation and manual tuning was no longer needed With Oracle Database In-Memory, Die Mobiliar was able to utilize their operational data directly as that data is being generated, and, in real-time, provide their customers with additional coverage options and discounts thus opening up new revenue opportunities. For further details on Die Mobiliar, as well as other Database In-Memory customer success stories, please see this link.

Real-time risk analysis is a huge competitive advantage in the insurance industry. Being able to perform risk-analysis on live operational data even as transactions are going on is a big deal! It...

Oracle Database In-Memory

New Database In-Memory Features in Oracle Database Release 12.2

Oracle Database 12.2.0.1 was made generally available on March 1, 2017. (Note that Oracle Database 12.2 has also been available on Oracle Cloud since late September 2016.) Oracle Database 12.2  includes major enhancements to Database In-Memory covering four general areas: Performance, Availability, Capacity, and Ease of Use. This post is a high level summary of these enhancements. 1. Performance: Database In-Memory allows real-time analytics by processing data at the rate of billions of rows per second. With the Oracle Database 12.2 release, the following Database In-Memory features provide further performance improvements for more use cases: a) Join Groups: With Oracle Database 12.2., it is possible to declare that two tables will be joined on a set of columns using a new DDL construct known as a Join Group. Doing so enables a variety of join processing optimizations, such as co-encoding the compressed join columns with the same dictionary so that joins can be performed on compact dictionary codes instead of on values. Join groups have been shown to provide an additional 2-3x performance boost on top of already very fast in-memory joins. b) In-Memory Expressions: While Database In-Memory speeds up analytic data access by orders of magnitude, another cost of Analytic workloads is Analytic data processing or computation. In-Memory Expressions attempt to substantially reduce the cost of analytic data processing by using the column store to materialize the result of commonly evaluated expressions. For instance, if an application frequently computes the net_price as (price * ( 1  - discount) * (1 + tax)) it is possible to declare this net_price expression as an inmemory virtual column. The result of the evaluation is then stored within the column store, and is maintained along with other columns as the base data columns change. In addition to explicitly declared inmemory virtual columns, it is also possible to automatically detect frequently evaluated expressions using a mechanism known as the Expressions Statistics Store. Omitting the repeated reevaluation of complex expressions has been found to improve the performance of complex queries by 5x or more.  c) In-Memory JSON columns: With Database 12.2, JSON values are stored in a highly optimized inmemory format within the inmemory column store.  This allows all the machinery of the inmemory columnar engine to be applied to JSON queries. In-Memory JSON can provide speedups of up to 60x for queries against semi-structured JSON data (e.g. Find all books where books.author.name contains "Kafka" ). 2. Availability: In Oracle Database 12.1, the column store must be rebuilt (populated) on instance restart, although the database is immediately available and queries need not wait for population to complete. While the column store is being rebuilt, queries may encounter some loss of performance. With Exadata and Supercluster, in-memory fault tolerance allows an object to be populated into memory of more than one instance in a RAC cluster (similar to storage mirroring) so that there is no performance downtime while an instance is being restarted. Oracle Database 12.2 provides the following additional enhancements to increase overall column store availability:   a) In-Memory Fast-Start: Ordinarily when an instance is restarted, the inmemory column store must be rebuilt from scratch, a process referred to as inmemory populate. The process of populating the column store can be CPU intensive since it must convert row major data into compressed columnar data. With Oracle Database 12.2, the In-Memory Fast-Start mechanism can significantly reduce the total time required for population by keeping a checkpointed copy of the column store on storage (using Secure File lobs preferably on a Flash-based tablespace). As a result when the instance is restarted, the checkpointed copy can be read back into memory without requiring any further transformation of the data. In-Memory Fast-Start can provide a 3x reduction in the time required to instantiate the column store. b) In-Memory on Active Data Guard: With Oracle Database 12.2 on Exadata, Supercluster or on Oracle Public Cloud PaaS offerings (e.g. DBaaS - Extreme Performance, Exadata Cloud Service,  Exadata Cloud Machine), it is now possible to use the Active Data Guard standby for inmemory query processing. In those environments it is possible to designate tables and partitions to be populated into memory on the Active Data Guard standby, resulting in orders of magnitude improvement for analytic queries running on the standby database. In-Memory on Active Data Guard can significantly improve column store availability: When the same objects are  brought into memory on both the primary site and the standby site, inmemory queries can be run on either site regardless of temporary outages of either site. This is effectively another form of in-memory fault-tolerance across the primary and the standby sites.   3. Capacity: The following features in Oracle Database 12.2 help to increase the total effective inmemory columnar capacity,  a) In-Memory on Active Data Guard: The same feature described in 2(b) above can also help to increase the total columnar capacity of a deployment featuring a Primary database and one or more Active Data Guard Standbys. This is because it is possible to populate completely different objects into the column store of a Standby database. For instance, it is possible to populate the partition for the current month of orders in the Primary database, and to populate the partitions for the prior one year of orders into the memory of a Standby database. Queries on the one month partition must run on the Primary database, while queries on the last one year of partitions must run on the Standby database. This mechanism thus provides a much larger effective column store across the two databases. b) In-Memory on Exadata Flash Cache: The 12.2 release of Exadata cell software features a format referred to CELLMEMORY, that is identical to the format of INMEMORY  tables on DB nodes. Using the same inmemory optimized algorithms to format and access data on much larger (10-100x larger) Flash allows a much larger effective inmemory column store capacity for the system as a whole. Very hot tables and partitions that can fit in memory should still be populated into the In-Memory column store on DB nodes, while tables of intermediate activity and very large tables should be kept in the CELLMEMORY format on Storage nodes. As a result, scans offloaded to Exadata storage servers can benefit from the same inmemory optimizations as inmemory queries on the DB nodes. Note that this feature requires the 12.2 release of Exadata Storage Server software, but works with the 12.1 release of the database. Use of the 12.2 release of the database enables additional performance optimizations for offloaded CELLMEMORY scan processing.  4. Ease of Use: Database In-Memory is a very easy to use inmemorytechnology, requiring only that the user specify the INMEMORY attribute onselected tables, partitions, and sub-partitions that are required for real-timein-memory processing. Once an object is no longer required in the column store,it can be marked NOINMEMORY in order to evict it in order to make room for moreimportant objects. Even so, further improvements were made in Oracle Database 12.2so that system administrators can manage the column store even more seamlessly. a) Dynamic In-Memory area: In Oracle Database 12.1, the size of the inmemory area  (specified by the parameter inmemory_size) is static and cannot be altered without restarting the instance. Oracle Database 12.2, the current size of the inmemory area can be dynamically increased without requiring the instance to be restarted. This allows a DBA to accommodate an increase in the total size of the objects that need to be populated into the inmemory column store. b) Automatic In-Memory Policies:  Oracle Database 12.1  introduced Automatic Data Optimization to perform automated Information Lifecycle Management (ILM) on database objects. Automatic Data Optimization allows the user to define custom ILM storage-tiering policies, for instance, to change the compression and storage properties of objects as accesses to the objects cool with time. With Oracle Database 12.2, the Automatic Data Optimization feature is extended to allow automated policy based management of the inmemory column store. It is now possible to set policies that: Increase the inmemory compression level for a table as the table cools down. A  table that is heavily modified should be minimally compressed, a table that is heavily read and less frequently modified can be compressed optimally for query processing performance, and a table that is infrequently accessed can be compressed to maximize space savings. Evict a table altogether from the column store after a certain number of days or after a certain period of inactivity. This is ideal for sliding window access patterns – for instance a near-term reporting application against data partitioned by week, but that operates only on the last four weeks of activity. Once data crosses the 1 month threshold it can be evicted from the column store in order to make room for new data.  Using Automatic Data Optimization therefore, a system administrator needs to perform the one-time task of setting up the inmemory ILM policies after which the inmemory column store becomes self-managing. In conclusion, Oracle Database 12.2 provides many improvements with Database In-Memory for Performance, Availability, Improved Capacity, as well as Ease of Use. We are looking forward to having our customers try it out! 

Oracle Database 12.2.0.1 was made generally available on March 1, 2017. (Note that Oracle Database 12.2 has also been available on Oracle Cloud since lateSeptember 2016.) Oracle Database 12.2 ...

Oracle Database In-Memory

Questions You Asked: I installed the latest PSU, why am I still getting this error?

The Database In-Memory team has been busy attending conferences in the last couple of months letting you know what's new with Database In-Memory. These conferences have included Open World, the East Coast Oracle Users Conference (ECO), the Bulgarian Oracle User Group (BGOUG) conference, the German Oracle User Group (DOAG) conference, the All India Oracle Users Group (AIOUG) conference SANGAM16 and we would like to thank everyone for attending our sessions and stopping by to ask questions. One of the questions that continues to get asked is about Patch Set Updates (PSUs). We're still finding that people are installing PSUs (see MOS Note 854428.1), or even Critical Patch Updates (CPUs), and expecting to get fixes for Database In-Memory. We've blogged about this in the past here, but I think it's worth repeating. Database In-Memory became available in the first Oracle Database 12c Release 1 patch set (12.1.0.2) back in July 2014. Since then we've made numerous enhancements and fixes to the product, but the only way to get all of these changes is to install what is now known as a Database Proactive Bundle Patch (see MOS Note 1937782.1). Many people are then concerned that Database Proactive Bundle Patches are only for engineered systems, and others are concerned that they don't include all of the fixes in PSUs and/or CPUs. As MOS Note 1937782.1 explains, each bundle patch includes all database PSU fixes along with additional proactive fixes. Some of these additional proactive fixes include all of the fixes to Database In-Memory.The bottom line is please apply the latest Database Proactive Bundle Patch when using Database In-Memory. After all, there's no sense in re-inventing the wheel and discovering problems that have already been fixed, or not getting to take advantage of the enhancements that have been made.

The Database In-Memory team has been busy attending conferences in the last couple of months letting you know what's new with Database In-Memory. These conferences have included Open World, the...

Oracle Database In-Memory

Analytics In The Cloud

With the introduction of Oracle Database Exadata Express Cloud Service I thought it would be nice to review where Database In-Memory is offered in Oracle's cloud services. The Oracle Cloud not only offers customers an easy and cost effective way to deploy their applications in the cloud, but it also allows customers to easily try out Oracle options that they might not have otherwise licensed. In the case of Oracle Database Exadata Express Cloud Service it is possible to not only try out Oracle Database 12.2 but also the new 12.2 features of Database In-Memory with the X50IM option. The Oracle Database Exadata Express Cloud Service can be an excellent entry level environment for a test or development system. Oracle offers three database cloud services that have options for Database In-Memory: Oracle Database Cloud Service (Extreme Performance) Oracle Database Exadata Cloud Service (quarter, half, full rack) Oracle Database Exadata Express Cloud Service (XIM50 option) Currently the Oracle Database Cloud Service and Oracle Database Exadata Cloud Service options support Database In-Memory on Oracle Database 12.1.0.2. The Oracle Database Exadata Express Cloud Service supports Database In-Memory on Oracle Database 12.2 with the XIM50 option.So what do the configurations of the different options look like? Oracle Database Cloud Service (Extreme Performance)Runs 12.1.0.2 with Database In-Memory. Only limitations are how much CPU and memory you purchase.Oracle Database Exadata Cloud ServiceRuns 12.1.0.2 with Database In-Memory. Only limitations are how much CPU and memory is available in each configuration (e.g. quarter, half or full rack of Exadata).Oracle Database Exadata Express Cloud ServiceX50IM option only:•    Single PDB running 12.2•    No parallel query supported•    Maximum 5 GB column store•    In-Memory Join Groups limited to a maximum of 5 columns per group*•    In-Memory Expressions only STATIC_ONLY and DISABLE settings are supported for INMEMORY_EXPRESSIONS_USAGE initialization parameter** See the Oracle Database Exadata Express Cloud Service Feature Restrictions and Limitations documentation for more information

With the introduction of Oracle Database Exadata Express Cloud Service I thought it would be nice to review where Database In-Memory is offered in Oracle's cloud services. The Oracle Cloud not only...

Oracle Database In-Memory

Time to start planning for Oracle Open World 2016

There is less than 6 weeks to go until Oracle Open World, the largest gathering of Oracle customers, partners, developers, and technology enthusiasts, which begins on September 18th in San Francisco. Of course the In-Memory development group will be there and you will have multiple opportunities to meet up with us, in one of our technical sessions, our hands-on-labs or at the Oracle demogrounds. This year the In-Memory team has 4 technical sessions and there are also a number of excellent customer sessions you shouldn't miss: Monday, September 19th at 5:30pm Oracle Database In-Memory—What’s New and What’s ComingSession CON6380 at Moscone South - room 104Juan Loaiza reveals what motivated Oracle to develop this new technology and provides a clear understanding of how this unique “dual format” architecture works. He will also provide details on how Oracle Database In-Memory takes advantage of Software in Silicon. Tuesday, September 20th at 4:00pm Oracle Database In-Memory Part 2: A Deep Dive into the Next Generation Session CON6409 at Moscone South - room 103Tirthankar Lahiri, the VP of In-Memory development, will provide an insiders view of the technology underlying Oracle Database In-Memory and give you a sneak peek at what you can expect from Oracle Database In-Memory in our next generation. Tuesday, September 20th at 5:15pm Best Practices for Getting Started with Oracle Database In-MemorySession CON6059 at Moscone South - room 103After hearing all about Database In-Memory, we are sure you will be ready to start adopting it on your own system so Andy Rivenes, Database In-Memory Product Manager, will finish the day with a session where he will share our top tips for getting started with Oracle Database In-Memory and the best use cases for this new technology. Wednesday, September 21st at 3:00pm Oracle Database In-Memory Customer Panel Session CON6381 at The Park Central San Francisco - room Franciscan I Four production Database In-Memory customers will participate in a round table discussion on why they chose Oracle Database In-Memory to tackle the demand for real-time analytics and business intelligence in their companies. The session will begin with each customer briefly sharing how they are using Oracle Database In-Memory before we dive into an open discussion. The audience will be actively encouraged to participate in this session and pose any questions they have on Database In-Memory and other alternative solutions considered by the panelists.  You will also have an opportunity to try out Oracle Database In-Memory for yourself in our hands-on lab, Oracle Database In-Memory Option Boot Camp: Everything You Need to Know that will run every day in the Golden Gate room in the Nikko Hotel. If you have burning questions related to Oracle Database In-Memory, you can ask them at the In-Memory demo booth in the Database area of the demogrounds in Moscone South. Members of the In-Memory development team will be there Monday to Wednesday from 9:45am until 5:30pm. The In-Memory team and our customers aren't the only ones talking about Oracle Database In-Memory at this year's conference. There are also a number of great sessions being delivered by Oracle ACEs and the Oracle Application teams. Check out the full searchable OOW catalog on-line to start planning your trip today!

There is less than 6 weeks to go until Oracle Open World, the largest gathering of Oracle customers, partners, developers, and technology enthusiasts, which begins on September 18th in San Francisco. Of...

Oracle Database In-Memory

Questions You Asked: Where did the other bloom filter go?

A couple of weeks ago at the Spring Conference of the Bulgarian Oracle User Group, Joel Goodman gave a great session on Bloom Filters and how the can improve the performance of hash joins in the Oracle Database especially in the In-Memory column store. At the end of that session the chap sitting in front of me turned around to show me an execution plan they got while testing Oracle Database In-Memory and posed the following question, “Where did the other bloom filter go?” Below is a similar execution plan I was able to reproduce easily using the following query: SELECT SUM(lo_extendedprice * lo_discount) revenue FROM lineorder l, date_dim d WHERE  l.lo_orderdate = d.d_datekey AND    l.lo_discount BETWEEN 2 AND 3 AND    l.lo_quantity < 24 AND    d.d_date='December 24, 2015';  Usually, it’s easy to identify bloom filters in the execution plan. They appear in two places, at creation time and again when it is applied. In the plan above, you will see a good example of this, bloom filter BF0001. Bloom filter BF0001 was created on line 4 after the scan of the DATE_DIM table on line 5. It was then used on line 6 as part of the scan of the LINEORDER table. Now you might be wondering why Oracle came up with two bloom filters in this case, BF000 and BF0001 when there was only 1 join in the query and why wasn’t bloom filter BF000 used? We are actually using both bloom filters, to do two different things in this plan, partition pruning (BF000) and converting the join to a filter operation (BF0001). The use of a bloom filter to do partition pruning is commonly referred to as bloom pruning. Although you may not have seen the bloom filter BF000 being used in the plan at first glance (it’s second occurrence is not visible in the NAME column), it’s actually used on line 7, ‘PARTITION RANGE (JOIN FILTER)’, to access only a limited number of partitions in the LINEORDER table. How do I know that? Lets look at a more detailed version of the execution plan so you can see what happened to BF000. In the more detailed version of the plan you will see extra columns including the PARTITION_START and PARTITION_STOP columns. This is where the BF000 filter went. BF000 in the PARTITION_START and PARTITION_STOP columns indicates the bloom filter was used to limit the number of partitions scanned in the LINEORDER table for this query. Bloom pruning is possible in this case as the LINEORDER table is range partitioned on the join column LO_ORDERDATE.

A couple of weeks ago at the Spring Conference of the Bulgarian Oracle User Group, Joel Goodman gave a great session on Bloom Filters and how the can improve the performance of hash joins in the...

Oracle Database In-Memory

Defining Analytics

We tell people that Database In-Memory is all about improving analytic processing. The definition that I use is "using aggregation to find patterns or trends in data". I believe I saw this used in one of Juan Loaiza's OpenWorld presentations and it made sense to me. After all Database In-Memory is really good at scanning and filtering a lot of data, and most of the articles on this blog echo that theme. The problem is, the definition that I use doesn't necessarily make sense for everyone. When we do presentations or give webinars many people don't seem to really get what we mean by analytics.While watching the Golden State Warriors two weeks ago from my hotel room the TV commentators began discussing how the Warriors use analytics pretty extensively and they mentioned that this is in no way mainstream in basketball. This reminded me of where baseball was back when Michael Lewis published Moneyball: The Art of Winning an Unfair Game, the story about how the Oakland Athletics (another SF Bay Area team) used an analytic, evidence-based, approach to competing in Major League Baseball. At the time they were severely chastised by the baseball establishment and the story made for a pretty good book and movie.So I did a little surfing on what the Warriors define as analytics and found the following quote in an article on SF Gate: “We always want to be pioneers, first-adopters, because we believe that having the most information gives us the best chance of making the right decisions,” said Warriors assistant general manager Kirk Lacob, who often spearheads the team’s analytics staff. “I don’t understand anyone who says, 'I don’t like analytics,’ because they’re basically saying, 'I like to just guess.’ Analytics doesn't mean stats; it means using information or data to make informed decisions.” SF Gate, http://www.sfgate.com/warriors/article/Golden-State-Warriors-at-the-forefront-of-NBA-5753776.php I'll re-phrase Kirk Lacob's definition of analytics to "using information or data to make informed decisions". And that's what Database In-Memory is all about. The ability to run adhoc, analytic queries on source data without impacting the existing workload, except for the CPU cycles required to scan the data in-memory. There is also the additional benefit, or side-effect, in a mixed workload environment of allowing the removal of analytic indexes that probably only exist to support analytic reporting. You can even read about a real world use case where Bosch was able to remove 76 analytic reporting indexes to improve the performance of their SAP CRM system. Now that we have a better idea of what analytics are, next up we will try characterizing an analytic query.

We tell people that Database In-Memory is all about improving analytic processing. The definition that I use is "using aggregation to find patterns or trends in data". I believe I saw this used in one...

Oracle Database In-Memory

Questions You Asked: Can I force a query to use the IM column store?

I was recently asked by a customer, "is it possible to force a query to use the In-Memory column store?" and I thought it would be a good idea to share the answer here on the blog. The first thing that comes to mind when you starting thinking about forcing how the Oracle Database will execute a query is Optimizer hints. There is an INMEMORY hint available, but that hint just enables the use of the In-Memory column store (IM column store) if the INMEMORY_QUERY parameter has been set to DISABLE (default is ENABLE). If we look closer at the documentation for the hint, we see that it says "This hint does not instruct the optimizer to perform a full table scan. If a full table scan is desired, then also specify the "FULL hint". So it appears that the answer is that you have to use the FULL hint to ensure that the access method is a full table scan, which is a requirement for accessing the IM column store. If the INMEMORY_QUERY has been set to DISABLE then you would also need to use the INMEMORY hint. Of course this also assumes that the IM column store has been enabled and that the object has been enabled for in-memory.Let's try this solution out with some queries on the LINEORDER table. It has been fully populated into the IM column store, and it has a b-tree index on the LO_ORDERKEY column called STEP3_3. We will experiment with both values of the INMEMORY_QUERY parameter to show the effect of the INMEMORY hint.The query we're going to start with is the following: SELECT Max(lo_ordtotalprice) most_expensive_order FROM lineorder; With the INMEMORY_QUERY parameter set to enable (the default) this should result in an access of the LINEORDER table via the M column store. And it does, so we know that we can access the LINEORDER table via the IM column store. Now let's run a query that will access a single row through an index that has been created on the LINEORDER table. The query is the following: SELECTlo_orderkey, lo_custkey, lo_revenue FROM lineorder lo WHERE lo_orderkey = 500000; The lowest cost access path is to use the STEP3_3 index and we see that the Optimizer has chosen to use that index. Now let's see what happens if we add the FULL hint. The FULL hint has changed the access path to a full table scan and this allows the use of the IM column store. Now let's change the INMEMORY_QUERY parameter to DISABLE. We see that we've performed a full table scan but we haven't accessed the IM column store since we've disabled the ability to use the IM column store. With the INMEMORY_QUERY parameter set to DISABLE we can now use the INMEMORY hint in combination with the FULL hint to force the use of the IM column store. If you found this interesting, then there is a post on the Optimizer blog that covers other In-Memory hints that you might find interesting.

I was recently asked by a customer, "is it possible to force a query to use the In-Memory column store?" and I thought it would be a good idea to share the answer here on the blog. The first thing that...

Oracle Database In-Memory

Questions You Asked: Why do I sometimes get better compression using Advanced Row compression compared to In-Memory Query compression?

The compression techniques that are used for Database In-Memory compression are different from what are used for on-disk compression. On-disk compression is designed to reduce the overall space usage for table segments, while the compression algorithms used for Database In-Memory are designed for query performance. Database In-Memory queries can operate directly on the compressed format. Due to the different techniques used, it is not uncommon for table segments to have different compression sizes in-memory versus on-disk. It is also not uncommon to get better compression on-disk as compared to in-memory compression when using the QUERY levels of compression.Although columnar formats generally have the advantage of potentially better compression because of bigger in-memory compression unit sizes as compared to Oracle database block sizes, and values for the same column (datatype) represented together tend to compress better. But remember that every column in the column store is compressed using its own dictionary because every column is stored in a separate column structure. Advanced Row compression on the other hand (part of Oracle Advanced Compression Option) has it's own advantages in this respect: With on-disk compression using Advanced Row compression we can share the dictionary across all the columns in the table. That is, every database block has a local dictionary, but that dictionary is shared across all the rows and columns in the block. This can give a substantial improvement in compression ratios. Advanced Row compression supports multi-column dictionaries. That is, each symbol in the dictionary can represent more than one column from the table. Single dictionary references can encode more than one column for a row. Column reordering and recursive dictionary to create bigger and longer column multi-column symbols. Column reordering often gives a substantial boost to compression ratios. Compression in general it is very difficult to predict because it is so data dependent. It is entirely possible to get better compression from on-disk compression than from in-memory compression, but the reverse is also true.

The compression techniques that are used for Database In-Memory compression are different from what are used for on-disk compression. On-disk compression is designed to reduce the overall space usage...

Oracle Database In-Memory

Why does BYTES_NOT_POPULATED suddenly show a non-zero value?

I recently got a question from a customer who has just begun adopting Database In-Memory and thought it was worth answering as a blog post in case anyone else had encountered the same scenario and was wondering what was going on.  Here is the question I got: “I've observed a scenario where a table is fully populated in the In-Memory column store with 0 in the  BYTES_NOT_POPULATED column in v$im_segments, but after some time, the BYTES_NOT_POPULATED column shows a non-zero value. Why is Oracle purging part of my table from memory?” Let me begin to answer this by assuring you all that Oracle does not purge your data out of the In-Memory column store (IM column store). Remember, it’s a store not a cache and any data populated into the IM column store will remain there until you either mark the object NO INMEMORY or the instance is shutdown. So, if data isn’t being purged why would the BYTES_NOT_POPULATED column shows a non-zero value? The non-zero value is the result of a data load operation, most likely a direct path load operation if the value in the BYTES_NOT_POPULATED column is large, as it was in this case. If you are not familiar with the term direct path load let me explain what I mean by that. A direct path load occurs if you do a CREATE TABLE AS SELECT command or an INSERT statement with the APPEND hint. A direct path load parses the input data, converts the data for each input field to its corresponding Oracle data type, and then builds a column array structure for the data. These column array structures are used to format Oracle data blocks and build index keys. The newly formatted database blocks are then written directly to the database, bypassing the standard SQL processing engine and the database buffer cache. A direct path load operation is an all or nothing operation. This means that the operation is not committed until all of the data has been loaded. Should something go wrong in the middle of the operation, the entire operation will be aborted. To meet this strict criterion, a direct path loads inserts data into database blocks that are created above the segment high water mark (maximum number of database blocks used so far by an object or segment). Once the direct path load is committed, the high water mark is moved to encompass the newly created blocks into the segment and the blocks will be made visible to other SQL operations on the same table. Up until this point the IM column store is not aware that any data change occurred on the segment.Once the operation has been committed, the IM column store is instantly aware it does not have all of the data populated for the object. The size of the missing data will be visible in the BYTES_NOT_POPULATED column of the v$IM_SEGMENTS view. Let’s take a look at an example. Imagine I have the SALES2 table fully populated into the IM column store as: Now, lets insert a new set of data into the SALES2 table via an INSERT statement with an APPEND hint. If we check the output of v$IM_SEGMENTS before we commit the INSERT statement we still see the BYTES_NOT_POPULATED column is still set to zero. This is because the data inserted into the SALES2 table has gone into database blocks above the high water mark and are therefore not considered to be part of the SALES2 table yet. Let’s now commit our INSERT  command and check the output of v$IM_SEGMENTS. Now the BYTES_NOT_POPULATED column shows a non-zero value as the IM column store becomes aware that we are missing some of the data from the SALES2 table as soon as the commit statement completes. If we had specified a PRIORITY on the SALES2 table then the newly added data would be automatically populated into the IM column store the next time the IMCO (In-Memory coordinator) background process woke up. Remember it’s on a two-minute cycle. Since we didn’t specify a priority, the missing data won’t be populated until we query the SALES2 table again. Let’s try that now and check what happens in v$IM_SEGMENTS. As you can see the POPULATE_STATUS has changed to STARTED, as the missing data is populated into the IM column store. If we wait a minute and check again we will see that the POPULATE_STATUS has changed again to COMPLETED and the BYTES_NOT_POPULATED column is 0.

I recently got a question from a customer who has just begun adopting Database In-Memory and thought it was worth answering as a blog post in case anyone else had encountered the same scenario and was...

Oracle Database In-Memory

Why didn't Database In-Memory improve the performance of my query?

After I posted last weeks blog post on identifying analytic queries that benefit from Database In-Memory, I got the following question, Why didn't Database In-Memory improve the performance of my query?" I've simplified the query in question down to the following, SELECT Count(DISTINCT cust_id) FROM ssh.sales WHERE amount_sold > 1; So, why didn’t Database In-Memory improve the performance of this query? The answer is becomes a little clearer if you look at the SQL Monitor report for the query. Even with the SALES table fully populated into the In-Memory column store (IM column store), the query is completely bottlenecked on the aggregation / sorting step. Since there is only one table involved in the query In-Memory Aggregation or Vector Transformation won’t kick in, as it needs a join to be present. So, what can we do to speed up this query? One option open to us in Oracle Database 12c is to replace the COUNT DISTINCT with the new APPROX_COUNT_DISTINCT function. The APPROX_COUNT_DISTINCT function uses a HyperLogLog algorithm, which enables the processing of large amounts of data significantly faster than a COUNT DISTINCT, with negligible deviation from the exact result. This technique was originally designed to improve the performance of statistics gathering and is used by the DBMS_STATS package to calculate the number of distinct values in a column when the ESTIMATE_PERCENT parameter is set to AUTO_SAMPLE_SIZE (the default) from 11g onwards. There are in fact many real-world use cases where an approximate answer is good enough. For example 'how many distinct visitors came to our website last month?' it may not be necessary in this case to have a precise answer. An approximate answer that is for example within 1% of the actual value would be sufficient. Luckily for me that was the case here and we were able to replace the COUNT DISTINCT with new APPROX_COUNT_DISTINCT function. SELECT Approx_count_distinct(cust_id) FROM ssh.sales WHERE amount_sold > 1; So, just how much faster would it be? Let’s take a look at the SQL Monitor report.  By replacing the COUNT DISTINCT with new APPROX_COUNT_DISTINCT function the query is no longer bottlenecked on the aggregation / sorting step and we were able to improve the performance by a factor of 4X.What about the actual query results?In this case the difference between the COUNT DISTINCT and the APPROX_COUNT_DISTINCT function was negligible (3%). However, given that this is an approximation (be it a very good one) it may not always be possible to replace every COUNT DISTINCT with the new APPROX_COUNT_DISTINCT function. For example, financial period close calculations couldn’t be modified to use this, but it can certainly help improve the performance of a lot of workloads, especially if it’s used in conjunction with Oracle Database In-Memory!

After I posted last weeks blog post on identifying analytic queries that benefit from Database In-Memory, I got the following question, Why didn't Database In-Memory improve the performance of my...

Oracle Database In-Memory

How do I identify analytic queries that will benefit from Database In-Memory?

Although Database In-Memory can be used in both Enterprise OLTP systems and Data Warehouses, only analytic queries will benefit from accessing data from the In-Memory column store (IM column store). Remember the IM column store enables large volumes of data to be rapidly scanned and processed using a variety of optimizations. Simpler transaction based queries, such as a primary key look up do not benefit from the IM column store and are automatically directed to the buffer cache. But how do you identify an analytic query? An analytic query is one where a large amount of data is scanned and filtered to return critical data that drives better business decisions. It doesn’t necessarily mean a query that contains one of Oracle’s analytical functions but they can also see benefits. A star query is a good example of such a query. By star query, I mean a query where a large fact table (center of the star) is joined to a number of smaller dimension tables (the points of the star). If you don’t have a simple star schema (most people don’t) then you can identify a good candidate query by examining some key characteristics. You are looking for queries that scan larger tables (ones that contains many rows) and applying where clause predicates or filters that use operators such as =, <, >, and IN LISTS that limit the number of rows returned. You also want to check that the query selects just a subset of columns from the tables, for example, selecting 5 columns of a 100-column table. The next thing to look at would be the join predicates used in the query. Queries that use equality join predicates that are selective (i.e. reduces the number of rows returned) are best as these join predicates can be converted into a bloom filter predicate that is applied to the larger fact table. Below are some SQL monitor reports that I will use to help demonstrate the type of queries I would consider excellent candidates for In-Memory, as well as some others that will see less benefit as they are limited by other factors. Let’s start with a very simple query that accesses just one table. SELECT /*+ MONITOR */ Count(*), SUM(lo_profit) FROM lineorder l WHERE lo_custkey BETWEEN 48000 AND 48500; You’ll notice that I’m using a hint in my query to force a SQL Monitor report to be generated for the query. This is needed as all of my queries are very fast running, so there is no guarantee SQL Monitor will automatically record them without the hint. Let’s first look at the execution plan for this statement without Database In-Memory. Note: If you right click on the SQL Monitor report above and select view image you will be able to see the report in more detail. Without Database In-Memory we get an index access plan and you will notice that 100% of the time is spent scanning and filtering the data to find the information we are interested in. Since the query accesses only 2 columns in a 20-column table and there is a selective where clause predicate (only 101,000 rows accessed out of 18 million), this query appears to be a good candidate for In-Memory. So, let’s try it out.After marking the LINEORDER table with the INMEMORY attribute and populating it into the IM column store, we get the following execution plan. The access method has changed to a full table scan via the IM column store and the query is no longer bottlenecked on the data scanning and filtering. The query is now bottlenecked on the sort (line 1 of the plan). It also executes 20 times faster than before, so over all a very good candidate for the IM column store. Let’s take a look at a more complex query that includes multiple tables. SELECT p.p_brand1, SUM(lo_revenue) rev FROM lineorder l, part p, supplier s WHERE l.lo_partkey = p.p_partkey AND l.lo_suppkey = s.s_suppkey AND p.p_category = 'MFGR#12' AND s.s_region = 'AMERICA' GROUP BY p.p_brand1; Without Database In-Memory, and because we don’t have any indexes on these tables, the execution plan for this query consists of two hash joins followed by a hash group by. You will notice that this query is bottlenecked on the hash join between the PART and LINEORDER tables. This is actually a very selective join as only 715,000 of the 18 million rows that enter the join find a match. The reason it’s so selective is the where clause predicate on the P_CATEGORY column in the PART table. So would this query benefit from Database In-Memory? Just as before, the query accesses only a subset of columns, there are selective where clause predicates, and a selective join make this query appear to be a good candidate for Database In-Memory. With Database In-Memory the execution plan switches to take advantage of a bloom filter. A bloom filter transforms a join into a filter that can be applied as part of the scan of the table on the right hand side of the join (typically a large fact table). Bloom filters are very efficiently applied to columnar data via SIMD vector processing, thus the bottleneck on the join is removed and the query executes 3X faster by simply introducing Database In-Memory. But what if I don’t have a selective join? Let’s take a look at another join query. SELECT d.d_year, c.c_nation, lo_revenue, lo_supplycost FROM lineorder l, date_dim d, part p, supplier s, customer c WHERE l.lo_orderdate = d.d_datekey AND l.lo_partkey = p.p_partkey AND l.lo_suppkey = s.s_suppkey AND l.lo_custkey = c.c_custkey; In this case there are no selective where clause predicates. If we execute this query with Database In-Memory we get the following execution plan. Even though all of the tables are accessed via the IM column store, we didn’t get bloom filters created for the three resource intensive joins. Why not? The optimizer can only transform a hash join into a bloom filter when there is a selective where clause predicate on the table on the left hand side of the join (typically a small dimension table). In this case we don’t have any selective where clause predicates, so no bloom filters. Then how did we get the bloom filter on the DATE_DIM table in this query? The LINEORDER table is partitioned on the LO_ORDERDATE column, which is also the join column used with the DATE_DIM table.  Even though there is no filter predicate on the DATE_DIM table we can still create a bloom filter to do partition pruning. The use of a bloom filter to do partition pruning is commonly referred to as bloom pruning. Although you may not have seen the bloom filter BF000 being used in the plan at first glance, it’s actually used in the operation ‘PARTITION RANGE (JOIN FILTER)’ on line 11 to access only a limited number of partitions in the LINEORDER table. But I digress. How can we speed up this query if simply putting the tables in the IM column store didn’t help us? The simplest way to improve the performance of a query that is bottlenecked on resource intensive joins is to create a Materialized View (MV) that pre-joins the tables. The optimizer will automatically rewrite our query to take advantage of the MV thus removing the resource intensive joins from the query execution. By placing the MV in the IM column store we can get an even bigger performance improvement. Below is the command I used to create the MV. CREATE materialized VIEW maria_mv ON prebuilt TABLE WITH reduced PRECISION ENABLE query rewrite AS SELECT d.d_year, c.c_nation, lo_revenue, lo_supplycost       FROM lineorder l, date_dim d, part p, supplier s, customer c WHERE l.lo_orderdate = d.d_datekey AND l.lo_partkey = p.p_partkey AND l.lo_suppkey = s.s_suppkey AND l.lo_custkey = c.c_custkey; By placing the MV in the IM column store I’m able to get the following extremely simple and efficient execution plan that is 120X faster than before. Let’s take a look at one more complex query. SELECT d.d_year, c.c_nation, SUM(lo_revenue - lo_supplycost)FROM lineorder l date_dim d, part p, supplier s, customer c WHERE l.lo_orderdate = d.d_datekey AND l.lo_partkey = p.p_partkey AND l.lo_suppkey = s.s_suppkey AND l.lo_custkey = c.c_custkey AND s.s_region = 'AMERICA' AND c.c_region = 'AMERICA' GROUP BY d.d_year, c.c_nation ORDER BY d.d_year, c.c_nation; Without Database In-Memory we get the following execution plan. The execution plan for this query consists of four hash joins followed by a sort group by. You will notice that this query is bottlenecked on both the initial scan of the LINEORDER table and the sort group by. Based on our previous example, we know that placing the tables in the IM column store will improve the performance of the scan of the LINEORDER table but what about the sort group by?Database In-Memory also includes many SQL optimizations designed to accelerate queries with aggregation that are known as In-Memory Aggregation (IMA). By placing our tables into the IM column store we should see an improvement in the scan of the LINEORDER table and the sort group by.As you can see from the plan above the majority of the time is now spent scanning and filtering the LINEORDER table as all of the joins and the group by steps have been converted to key vectors and a vector group by, which are executed as part of the scan of the LINEORDER table.  By taking advantage of IMA the query executes 15X faster simply by placing the tables into the IM column store.As you can see Database In-Memory can improve the performance of a large number of analytical queries and by using it in conjunction with existing performance enhancing features like MVs you can improve the performance of any application without having to change the application code.

Although Database In-Memory can be used in both Enterprise OLTP systems and Data Warehouses, only analytic queries will benefit from accessing data from the In-Memory column store (IM column store)....

Oracle Database In-Memory

Oracle Database In-Memory at Collaborate 16 in Las Vegas

Andy Rivenes (@TheInMemoryGuy) from the Oracle Database In-Memory team will be presenting two hands-on-labs and will be co-presenting a session at this year's IOUG Collaborate 16 conference, in Las Vegas April 10 - 14. Below are details on the sessions he will be presenting. We hope that you have an opportunity to check out some of these sessions if you plan to attend the event! Tuesday April 12th 10:45am - Oracle Database In-Memory Option BootcampSession 1833 South Seas Ballroom D This hands-on lab provides a step-by-step guide on how to get started with Oracle Database In-Memory, how to identify which of the optimizations are being used, and how your SQL statements benefit from them. You will also get to experience firsthand just how easy it is to start taking advantage of this technology and the incredible performance improvements it has to offer. Tuesday April 14th 3:30pm - Oracle Database In-Memory Option BootcampSession 1888 South Seas Ballroom D This hands-on lab provides a step-by-step guide on how to get started with Oracle Database In-Memory, how to identify which of the optimizations are being used, and how your SQL statements benefit from them. You will also get to experience firsthand just how easy it is to start taking advantage of this technology and the incredible performance improvements it has to offer. Thursday April 14th 11:00am - Database In-Memory Meets Oracle RACSession 4765 Andy Rivenes and Markus Michalewicz, Senior Director of Product Management - Oracle Real Application Clusters (RAC), will give brief overviews of the latest versions of Oracle Database In-Memory and Oracle RAC, and will discuss how these state of the art Oracle technologies can be used together to meet your real time data analysis as well as real time transaction processing requirements for all existing applications. They will also discuss the most efficient configuration and deployment scenarios considering performance, scalability, availability as well as isolation in order for you to meet even the most challenging Service Level Agreements. Note: The bootcamps will require that you bring your own laptop and be able to establish a wireless connection to the Oracle Solution Center. Full instructions will be available at the bootcamp. You will also need to be able to ssh or VNC into the bootcamp virtual machine (one will be provided for each bootcamp attendee). Windows users can get an ssh program here or VNC here.

Andy Rivenes (@TheInMemoryGuy) from the Oracle Database In-Memory team will be presenting two hands-on-labs and will be co-presenting a session at this year's IOUG Collaborate 16 conference, in Las...

Oracle Database In-Memory

What is an In-Memory Compression Unit (IMCU)?

In our previous blog series "Getting Started", we described how the In-Memory column store (IM column store) is part of the System Global Area (SGA) and is allocated at instance startup by setting the initialization parameter inmemory_size. For the purposes of this blog entry we have a 12.1.0.2 database running with an 800MB inmemory_size and an sga_target of 3008MB:This results in the following SGA allocation: What we didn’t explain was that the IM column store is actually divided into two pools, a 1MB pool and a 64KB pool. This can be observed by querying the v$inmemory_area dynamic performance view: Notice that out of the 800MB that was allocated to the IM column store, 639 MB was allocated to the 1MB pool and 144MB to the 64KB pool. So what are these pools used for?The 1MB pool is used to store the data populated into the IM column store and the 64KB pool to store metadata about that data. Frequently we are asked if the space allocations can be changed and no, these sizes are determined by Oracle Database.The following picture shows the makeup of the IM column store with both the 1MB pool and the 64KB pool. You’ll notice that the data in the 1MB Pool is stored in IMCUs and the data in the 64KB pool is stored in SMUs. But what are IMCUs and SMUs?An In-Memory Compression Unit or IMCU is a logical unit of storage within the In-Memory column store (IM column store). It is roughly equivalent to an extent within a tablespace.When a segment is populated in the IM column store it is stored within one or more IMCUs. Each IMCU contains thousands of rows from the segment. The average length of rows and the in-memory compression type chosen controls the number of rows per IMCU. The higher the compression level chosen, the more rows in the IMCU.  All of the IMCUs for a given segment contain approximately the same number of rows. The number of rows contained within an IMCU can be observed in the v$im_header dynamic performance view (more on this later). An IMCU will contain an IMCU header, which holds metadata about the IMCU and column compression units or column CUs. There will be one column CU for each column in the segment (by default) plus one for the rowids that correspond to the column values (so that the columns can be easily "stitched" back together to correspond to their original rows). The basic layout is shown below: Each IMCU maps to a Snapshot Metadata Unit or SMU in the 64KB pool that holds the metadata about the IMCU. A common question that gets asked is what is the relationship between an IMCU and the 1MB pool when querying the v$inmemory_area view?Let's populate an object into the IM column store and examine the relationship between the IMCUs and SMUs and the two pools. Let's choose the LINEORDER table for this example. We will alter the table INMEMORY and accept the default compression level of MEMCOMPRESS FOR QUERY LOW. We will then SELECT from the table since the default priority is NONE, which means in-memory population won’t begin until the LINEORDER table is accessed for the first time. During the population there are background processes that perform the actual population. These processes are named in the format of ora_wxxx_sid and in this case we see three processes in my "top" session named ora_w000_orcl, ora_w001_orcl and ora_w002_orcl: Once the population is complete we see the following when we query the v$im_segments and the v$inmemory_area views: The LINEORDER table is fully populated (i.e. bytes_not_populated = 0) and has consumed 550MB from the 1MB pool and 1472KB from the 64KB pool. So how does this relate to IMCUs? Does this mean we have 550 IMCUs for the LINEORDER table (i.e. 550 1MB IMCUs)? Let's find out. There is another dynamic performance view available to help us figure this out that was mentioned earlier. It is called v$im_header and it will show the IMCUs allocated by segment. Since there is only one segment populated it should be pretty easy to see how the LINEORDER table is populated. Now you might notice that I added a filter criteria to my query. I added "where is_head_piece = 1". I did this because if you query more of the columns in the v$im_header view you will discover that an IMCU can be made up of one or more "pieces". It appears then that an IMCU is made up of one or more 1MB extents because it is allocated from the 1MB pool, and each IMCU can be made up of one or more pieces. Why would an IMCU be made up of more than one piece? As was stated earlier, the number of rows that an IMCU holds dictates the amount of space an IMCU consumes. If the target number of rows causes the IMCU to grow beyond the amount of contiguous 1MB extents available in the 1MB pool then additional piece(s), or extents, are created to hold the remaining column CUs.To summarize, when an object is populated into the IM column store it is populated in one or more IMCUs and these IMCUs will contain one or more 1MB extents allocated from the 1MB pool. Each IMCU will also have a corresponding SMU, which has been allocated from the 64KB pool.

In our previous blog series "Getting Started", we described how the In-Memory column store (IM column store) is part of the System Global Area (SGA) and is allocated at instance startup by setting the...

Oracle Database In-Memory

How do I limit the amount of memory each PDB can use in the IM column store?

In case you aren’t familiar with what a PDB is, let me begin by explaining what a Multitenant environment is and how PDBs fit into it. Oracle Multitenant is a new database consolidation model in Oracle Database 12c in which multiple Pluggable Databases (PDBs) are consolidated within a single Container Database (CDB). While keeping many of the isolation aspects of single databases, Oracle Multitenant allows PDBs to share the system global area (SGA) and background processes of a common CDB.When used with Oracle Database In-Memory, PDBs also share a single In-Memory column store (IM column store) and hence the question, "How do I control how much memory each PDB can use in the IM column store?"The total size of the IM column store is controlled by the INMEMORY_SIZE parameter setting in the CDB. By default, each PDB sees the entire IM column store and has the potential to fully populate it and starve it’s fellow PDBs.In order to avoid starving any of the PDBs, you can specify how much of the shared IM column store a PDB can use by setting the INMEMORY_SIZE parameter inside the specific PDB using the following command: ALTER SYSTEM SET inmemory_size = 4G container = CURRENT scope = spfile; Not all PDBs in a given CDB need to use the IM column store. Some PDBs can have the INMEMORY_SIZE parameter set to 0, which means they won't use the In-Memory column store at all. The following shows an example with three PDBs:It is also not necessary for the sum of the PDB's INMEMORY_SIZE parameters to be less than or equal to the size of the INMEMORY_SIZE parameter of the CDB. It is possible for the PDBs to oversubscribe to the IM column store. Oversubscription is allowed to ensure that valuable space in the IM column store is not wasted should one of the pluggable databases be shutdown or unplugged. Since the INMEMORY_SIZE parameter is static (i.e. requires a database instance restart for changes to be reflected) it is better to allow the PDBs to oversubscribe, so all of the space in the IM column store can be used.However, when doing this it is possible for one PDB to starve another PDB of space in the IM column store due to this oversubscription. If you don’t expect any PDBs to be shut down for extended periods of time or any of them to be unplugged it is recommended that you don’t oversubscribe. You might now be wondering about how to control the population order with PDBs.Each PDB is a full Oracle database in its own right, so each PDB will have its own priority list. When a PDB starts up the objects on its priority list will be populated into the In-Memory column store in order, assuming there is available space.

In case you aren’t familiar with what a PDB is, let me begin by explaining what a Multitenant environment is and how PDBs fit into it. Oracle Multitenant is a new database consolidation model in...

Oracle Database In-Memory

What happens to a table In-Memory if one of my RAC nodes goes down?

We’ve written a number of blog posts on how Database In-Memory behaves in a RAC environment but recently we’ve gotten a lot of questions regarding what happens if one of the RAC nodes should fail. So, I thought I would tackle this question today and point out a couple of other interesting aspects of running Database In-Memory on RAC in this post. Quick Recap If you recall from part 1 of our RAC series each RAC node has it’s own In-Memory column store (IM column store). When a table is populated into memory in a RAC environment it will be distributed across all of the IM column stores in the cluster. That is to say, a piece of the table will appear in each RAC node. Let’s take a look at an example using the LINEORDER table, which has 5 million rows in it and is approximately 550MB in size on my 3 node RAC cluster. As you can see below, the INMEMORY attribute has been specified on the LINEORDER table but the IM column store on each RAC node is empty. Remember, if you don’t specify the PRIORTY subclause when you apply the INMEMORY attribute to a table, then it won’t be populated into the IM column store until someone queries the table. So let’s do that now. Approximately 1/3 of the LINEORDER table is now populated on each of the RAC nodes. You’ll notice that the BYTES_NOT_POPULATED column is not 0, even though the POPULATE_STATUS is COMPLETE. That’s because not all of the data for the LINEORDER table is populated on each RAC node. So, how do I know if the entire LINEORDER table got populated into memory? In order to answer that question we need to look at a different view called gv$IM_SEGMENTS_DETAIL and compare the number of blocks in-memory on each node to the total number of blocks in the LINEORDER TABLE. As you can see the LINEORDER table is made up of 69,381 blocks. 19,619 blocks were populated into memory on node 1, 21,202 blocks were populated into memory on node 2, and 28,560 populated into memory on node 3. Add these three values together and we see all 69,381 block from the LINEORDER table were populated. 19,619 + 21,202 + 28,560 = 69,381 But how does that correlate to the BYTES_NOT_POPULATED column that we see when we query gv$IM_SEGMENTS? A little more math is required to generate that value. If you subtract the number of blocks in-memory on a node from the total number of blocks in the LINEORDER TABLE and multiple it by the block size (8K in my case) you get the value in the BYTES_NOT_POPULATED column. Note that I’ve been rounding up the bytes to MB to make my query output a little easier to read. But I digress; let’s get on with what happens when we have a node failure. What happens if one of the RAC nodes fails? Let’s shutdown node 1 of the RAC cluster to simulate a node failure. If we look in gv$IM_SEGMENTS we see that 1/3 of the data is now missing for the IM column store. This means that if I run a query on the LINEORDER table 1/3 of the data will have to be read from somewhere else, either the buffer cache, flash, or disk. But if we were to wait 5 minutes after node 1 went down and then query the LINEORDER table again, what we will notice is the missing data is automatically populated into the two remaining RAC nodes, assuming of course there is enough free space in the IM column store to accommodate the additional data. Once the population completes and we run the query again we see all of the data coming the IM column store on the two remaining RAC nodes and if we check gv$IM_SEGMENTS and gv$IM_SEGMENT_DETAILS after the population completes, we can see that all of the data for the LINEORDER table has been populated into the 2 remaining nodes in the RAC Cluster. You may be wondering why we had to wait 5 minutes and also query the table before the automatic population of the missing data took place? There is a 5-minute timeout in place just incase someone is simply bouncing RAC node 1 for maintenance. We don’t want to have to go to the trouble of populating the missing data on to node 2 and 3 only to have to rebalance the data again when node 1 comes back up. The reason we had to query the table is because we didn’t specify a PRIORTY subclause when we applied the INMEMORY attribute to the table at the beginning of this post. As we saw above, you must access a table that does not have a PRIORTY specified to trigger a population. So, what happens when node 1 is brought back up? Nothing will happen until we issue a query against the LINEORDER table, but once we do, the data in the IM column store will be automatically redistributed across the 3 RAC nodes. So, there you have it. In-Memory on RAC will automatically populate missing data into the remaining RAC node(s) should a node go down. It will also redistribute the data across the RAC nodes when a new node joins the cluster. Remember, these steps will happen automatically if you have specified a PRIORITY on your tables or immediately after you query the table if no PRIORITY is specified.

We’ve written a number of blog posts on how Database In-Memory behaves in a RAC environment but recently we’ve gotten a lot of questions regarding what happens if one of the RAC nodes should fail. So,...

Oracle Database In-Memory

Questions You Asked: When using interval partitioning, will new partitions be placed In-Memory?

This is the second in our series of "Questions You Asked" and this time the question has to do with interval partitioning and whether newly created partitions will be populated into the IM column store. We'll begin our experiment by creating an interval partitioned table based on the SUPPLIER table from our SSB schema. I've added a key_no column to make the interval partitioning easy. We then insert data into three partitions and list the results. Note that the first partition is named p1 because we had to create at least one partition with our CREATE TABLE statement. The other two are system generated names and those partitions were created automatically as part of the interval partitioning feature when we ran the second and third insert statements: Now let's alter the table in-memory and see what happens: So far so good, our partitions are now enabled forin-memory. What happens when the next partition is added? So yes, newly created partitions will be added to the IM column store when using interval partitioning because the newly created partition inherits the INMEMORY attribute from the table. Now we could stop there, but there are some subtleties with partitioning that we should explore if we really want to do this topic justice. For instance, what happens if we only want the current n number of partitions to be in-memory? How do we remove just one partition from the IM column store without removing all of them? Let's start with looking at the different data dictionary views associated with partitions: Notice that in the user_tables view that the INMEMORY field is blank. Since the table we are interested in is a partitioned table, which is a logical entity in the database not a physical segment, the table level views doesn't display the segment attribute. Instead we need to query the view USER_PART_TABLES and we see that there is a new column there called DEF_INMEMORY for default INMEMORY, and it is this column, or its attribute, that is controlling whether new partitions are being populated into the IM column store. So the answer to the question of how do we remove "older" partitions for the IM column store is easily answered. We need to perform the action at the partition level. For instance, let's remove the first partition we created, p1, from the IM column store: To follow on with this line of reasoning, if we had not altered the table to be INMEMORY, then the DEF_INMEMORY attribute would not be set, and we could have controlled whether a partition was populated into the IM column store manually by altering just the partition(s) that we wanted to populate. Of course, this isn't what the original question was, but it is important to understand how partitioning works so that you can make the appropriate decisions for your environment.

This is the second in our series of "Questions You Asked" and this time the question has to do with interval partitioning and whether newly created partitions will be populated into the IM column...

Oracle Database In-Memory

Oracle OpenWorld 2015 Wrap up!

I can't believe another Oracle OpenWorld has been and gone. Thanks to all those that attended the In-Memory sessions and stopped by the In-Memory demopod during this years Oracle OpenWorld. It was an amazing conference this year with lots of great technical sessions and interesting discussion at the demo grounds. I've been inundated with requests to share the presentation we gave on In-Memory at the conference, so I thought it would be a good idea to share the links to the presentations here. Session CON6775 Best Practices for Getting Started with Oracle Database In-Memory Session CON8659 Oracle Database In-Memory—What’s New and What’s Coming Session CON8652 Oracle Database In-Memory: Under the Hood A number of Oracle Database In-Memory customers present on their experiences of using In-Memory. You can find details from their sessions below; Session CON8657 Oracle Database In-Memory Customer Panel Session CON4292 How Oracle Exadata and Oracle Database In-Memory Power Precision Marketing at General Mills Session CON2715 Top Five Ways Swiss Mobiliar Improved Business Value with Oracle Database In-Memory Session CON3087 Oracle Database In-Memory: a Practical Solution We also gave folks the opportunity to try out Oracle Database In-Memory for themselves in our hands-on lab, Oracle Database In-Memory Option Boot Camp: Everything You Need to Know, so I thought I would share the lab handbook in case anyone want to try some of the labs at home. Finally there was a lot of talk about how Oracle Database In-Memory takes advantage of Software in Silicon ,so I thought it would be good to share this link to a short video of Juan Loaiza describing exactly what we mean by Software in Silicon. We hope you enjoyed the conference as much as we did and we look forward to see you all again next year at OOW 2016 September 18–22, 2016 in San Francisco. Maria Colgan+

I can't believe another Oracle OpenWorld has been and gone. Thanks to all those that attended the In-Memory sessions and stopped by the In-Memory demopod during this years Oracle OpenWorld. It was an...

Oracle Database In-Memory

Day 4 of Oracle OpenWorld 2015 October 28th

Thanks to all those who stopped by the demogrounds to chat with the In-Memory developers over the last two days. Remember, today is the last day of the demogrounds, so if you haven't had a chance to stop by yet, do so today. The In-Memory developers will be there from 10:00am until 4pm. This may be your last chance to snag an In-Memory bumper sticker, so don't miss out! But before you head to the demogrounds why not start your day at our hands-on-lab Oracle Database In-Memory Bootcamp (Session HOL10483) in the Golden Gate room (25th Floor) at the Hotel Nikko. The fun begins bright and early at 8:45 am! There are also some great technical sessions on today, including your first opportunity to ask questions of production Oracle Database In-Memory customers and see if the technology lives up to the hype. I will be hosting our first Oracle Database In-Memory Customer Panel (Session CON8657) at 1:45pm in Moscone South - room 103. The session will begin with each customer briefly sharing how they are using Oracle Database In-Memory before we dive into an open discussion. I strongly encouraged you to actively participate in this session and pose any questions you may have about Database In-Memory and other alternative solutions considered by the panelists. Then in the afternoon we have our last customer session on Database In-Memory given by General Mills, How Exadata & Database In-Memory Power Precision Marketing at General Mills (Session CON4292) at 3:00pm in Moscone South - room 254. Mark and Josh from General Mills explain how Oracle Exadata with Oracle Database In-Memory gave them a scalable and reliable platform from which to build a consumer-first data warehouse. Finally before you rush off to put on your dancing shoes for the appreciation event, we have another great technical session, Oracle Database In-Memory: Under the Hood (Session CON8652) at 4:15pm in Moscone South - room 103 whereTirthankar Lahiri, the VP of Database In-Memory development, will provide an insiders view of the technology underlying Oracle Database In-Memory and give you a sneak peek at what you can expect from Oracle Database In-Memory in our next generation!

Thanks to all those who stopped by the demogrounds to chat with the In-Memory developers over the last two days. Remember, today is the last day of the demogrounds, so if you haven't had a chance...

Oracle Database In-Memory

Day 3 of Oracle OpenWorld 2015 October 27th

Hopefully you enjoyed yesterday, the first full day of technical sessions at Oracle OpenWorld, and are ready for more today! First thing this morning at 8:45 am is your second opportunity to attend our hands-on-lab, Oracle Database In-Memory Bootcamp (Session HOL10483) in the Golden Gate room (25th Floor) at the Hotel Nikko. I know it's early but trust me it will be a great way to start your day as it will leave you inspired and energized! Today we will also give our first two technical sessions. First up is  Best Practices for Getting Started with Oracle Database In-Memory (Session CON6775) at 11:00am, in Moscone South - room 104. In this session, I'll provide you with a step-by-step guide on how to get started with Oracle Database In-Memory, including advice on when and where you should take advantage of Oracle Database In-Memory. I'll also outlines strategies to help ensure that you get the promised performance boost regardless of your database environment. If you do come to this session, be sure to keep an eye out for this little fellow,as he will be on top of all of the slides with the top tips! Later in the day we have our 3rd customer session when Dan Huls from AT&T WiFi presents, Oracle Database In-Memory: A Practical Solution (Session CON3087) at 4pm in Moscone South - room 307. Dan will share with you how they implemented Oracle Database In-Memory on a large customer-facing data warehousing system in under two months with little or no changes to their application. He will also provide details on why they chose Oracle Database In-Memory versus a more expensive SAP HANA solution. Finally Juan Loaiza, EVP of Database Development, will present Oracle Database In-Memory—What’s New and What’s Coming (Session CON8659) at 5:15pm in Moscone South— room 103. In this session Juan will explain in detail what motivated Oracle to develop this new technology and provides a clear understanding of how this unique “dual format” architecture works. He will also give you a sneak preview of what to expect next for Database In-Memory. A definite must see session! +Maria Colgan

Hopefully you enjoyed yesterday, the first full day of technical sessions at Oracle OpenWorld, and are ready for more today! First thing this morning at 8:45 am is your second opportunity to attend our...

Oracle Database In-Memory

Day 2 of Oracle OpenWorld 2015 October 26th

Oracle OpenWorld started yesterday and San Francisco is just buzzing with Oracle folks. If you are attending the conference don't miss the opportunity to chat with the Oracle Database In-Memory team at one of our sessions or at the Oracle demogrounds. Today kicks off with one of the best general sessions of the conference, General Session: Next-Generation Data Management Architecture Session GEN6765 at 11am in Marriott Marquis—Salon 7/8/9 (Yerba Buena Ballroom) in which Andy Mendelsohn, EVP of Database development, will share how Oracle Database 12c and its next-generation architecture are helping customers on their journey to the cloud. Then in the afternoon the first 2 of the 8 customers speaking about their experiences with Oracle Database In-Memory will have their sessions. First up is Facebook’s Use of Oracle Database In-Memory from Inception to RolloutSession CON1685 at 2:45pm in Moscone South - room 301. Sudhi Vijayakumar will describe how Facebook evaluated Oracle Database In-Memory and then rolled it out in production to reap the benefits of faster response times for analytical queries. Then at 4:00pm Five Ways Swiss Mobiliar Improved Business Value with Database In-Memory Session CON2715 in Moscone South - room 310. In this session folks from Swiss Mobiliar explain how they extended their traditional business intelligence (BI) systems to a more real-time solution using Oracle Database In-Memory. They will also compare this new solution to their original IBM DB2 solution. Today is also your first chance to catch our hands-on-lab, Oracle Database In-Memory Bootcamp (Session HOL10483) in the Golden Gate room (25th Floor) at the Hotel Nikko. In this hands-on lab we'll provide a step-by-step guide on how to get started with Oracle Database In-Memory, how to identify which of the optimizations are being used, and how your SQL statements benefit from them. You will also get to experience firsthand just how easy it is to start taking advantage of this technology and the incredible performance improvements it has to offer. Remember it's about a 15 minute walk from Moscone to the Hotel Nikko, so make sure you give yourself plenty of time to get there. Members of the In-Memory team will also be available at booth SLD 28 from 10:00am to 6pm at the Oracle Database demogrounds in Moscone South, to answer any questions you might have. Remember the In-Memory demo booth is the best place to snag a limited edition In-Memory bumper sticker! +Maria Colgan

Oracle OpenWorld started yesterday and San Francisco is just buzzing with Oracle folks. If you are attending the conference don't miss the opportunity to chat with the Oracle Database In-Memory team at...

Oracle Database In-Memory

Day 1 of Oracle OpenWorld 2015 October 25th

The biggest week of the Oracle calendar has arrived. Oracle Open World 2015 officially starts today and it's going to be a fantastic conference for anyone interested in Oracle Database In-Memory. We have tons of great sessions planned for you, as well as hands on labs and of course plenty of opportunities to talk directly with the In-Memory development team. The show kicks off with tons of great technical sessions today, selected by the Oracle User Groups, including the following sessions, which feature Oracle Database In-Memory: Eighteen Months of Production Experience with Oracle Database 12c at CERN Session UGF 6827 at 9:00 am in Moscone South - room 306. Eric Grancher and Prasanth Kothuri from CERN share the reasons why they decided to become early adopters of Oracle Database 12c  and their experiences both during the migration and also now that they have been in production for over a year. Oracle Database In-Memory Option: Challenges and Possibilities Session UGF10319 at 11:00 am in Moscone South - room 262. Christian Antognini explains the key concepts of Oracle Database In-Memory option, reviews what to expect from this new technology, and looks at situations in which the user can take advantage of it. More Than Another 12 on Oracle Database 12c Session UGF 3190 at 1:30pm in Moscone South - room 306. In this session twelve of the best speakers in EMEA—all ACE directors— and little old me will share our knowledge on Oracle Database 12c. Quick-fire, seven-minute topics provide nuggets of wisdom to consider and explore during and after Oracle OpenWorld. A must attend session if you love different accents! And of course, Larry's keynote is this evening from 5:00pm – 7:00pm, Moscone North. A must see, as he is bound to make some exciting announcements to get the show started! +Maria Colgan

The biggest week of the Oracle calendar has arrived. Oracle Open World 2015 officially starts today and it's going to be a fantastic conference for anyone interested in Oracle Database In-Memory....

Oracle Database In-Memory

Star Schema Challenge - Part 5

In my previous post I established a baseline for 100 concurrent users running a variety of business intelligence queries against a 500 million row star schema on a commodity Intel-based server with 60 cores and 1 TB of DRAM.  All tables were loaded in the In-Memory column store (IM column store).  At 100 users, the star schema ran at a median of 21.5 seconds per query and the de-normalized fact table ran at a median of 10.0 seconds per query. Given that this test runs 100 users on a 60-core server, it should be obvious that it is asking a lot of the server.  With 25 users the star ran at a median of 3.85 seconds and the de-normalized table ran at median of 1.57 seconds per query, an excellent result for this more reasonable workload.  Median query times for in-memory star and de-normalized tables at 25, 50, 75 and 100 concurrent users. With tables in the IM column store, no I/O or other significant wait events and the server running at 90% host CPU, I need to find other ways to be more efficient and use less CPU per query. Preferably about 90% less, if I'm to get to about 2 seconds per query at 100 users.  No worries.  I just need to use other features of the Database along with Database In-Memory (which has already done a fantastic job of speeding up this workload). A quick review of the first post in the series makes note of two important characteristics of this challenge: The workload is diverse. There are business intelligence dashboard type queries (often high level aggregations), slice and dice ad-hoc queries and anything in-between. Queries can use any subset or all of the dimension tables. In total, over 50,000 queries will be run. I can do anything I want to optimize the application. I can specify how data is organized and physically stored, how the queries are formed and what features of the Oracle Database I use. I must, however, use features that are available to and easily implemented by anyone. And in the second post I provided some more information about the query workload: The query workload will be familiar to anyone that supports a business intelligence platform such as Oracle Business Intelligence (or Business Objects, MicroStrategy, etc.).  Users are presented with a variety of reports in a business intelligence dashboard and from there are allowed to explore by drilling down, adding filters, choosing different measures, etc. My challenge uses generated queries that start at high level aggregates (the first query in a series) and drills down 5 to 8 times depending on the dimensions used in the query.  This process repeats until 528 queries are run for each user (a total of 52,800 queries for 100 users).  The dimensions and measures vary for each series, so there are very few repeated queries (on average, 511 unique queries per user).  It’s time to take a closer look at the workload. There are a few things that immediately stand out: There are groupings of queries that start out at high-level aggregates, followed by 5 to 8 drill downs to more detailed data per grouping. Queries select from between 4 and 9 dimension tables. There are a lot of users running a lot of queries (52,800 queries in total).  The first two characteristics are pretty typical for a business intelligence application.  The 100 users running with little time between queries is a bit aggressive on this size server. With these characteristics identified for the workload, it is pretty safe to make a few assumptions: There are a lot of queries that access high-level aggregate data.  These will be the first few queries in the grouping of 5 or 8.  Each grouping will start out with a grouping of levels such as Year, Region, Department, Channel and either the total of the demographic dimensions or the detail of those dimensions.  These queries will tend to access and aggregate large numbers of fact rows.   It’s probably safe to assume that these will be relatively expensive queries because they access and aggregate the most data. There will probably a small number of queries that are repeated often.  These will probably be the highest-level aggregates (usually the first query in a grouping) and the four dimensional queries.  This is because there are relatively few dimensions with relatively few members at these aggregate levels.  Think of these as being like reports on a business intelligence dashboard. Queries that reference more dimensions will spend more time in joins than queries that reference fewer dimensions.  So, the queries that reference 9 dimensions can be expected to be more expensive than queries that reference 4 dimensions. Queries that drill down to more detailed data will tend to be less expensive because they access and aggregate from fewer fact rows.  The more detailed, 4 dimensional queries will probably be the least expensive. To test these assumptions, I logged each query executed and did some analysis.  In the 100 user query run: There were 52,800 queries executed, of which there were 45,458 unique queries (7,352 queries that were executed more than once).  156 queries were executed 10 or more times.  These represented .34% of the unique queries, but 12.1% of the processing time (as measured by elapsed seconds).  On average these queries where within the first 2.7 drill downs, so they were generally at upper aggregates. The first queries in a grouping (the first drill down and highest level aggregates) returned in a median of 35.9 seconds and accounted for 532,702 elapsed seconds.  The last queries in a grouping (the last drill and the lowest level aggregates) returned in a median of 11.8 seconds and accounted for 23,413 elapsed seconds. Queries with 4 dimensions returned in a median of 18.7 seconds and queries with 9 dimensions returned in a median of 32.3 seconds. Queries Executed Most Often The queries that are repeated most often tend to be at higher levels of aggregation and more expensive to execute.   These are good candidates for optimization. Median Query Time and Total Elapsed Time by Drill Grouping Median query time (seconds) and total execution time (all queries, all users) by drill group.  Again, the higher level aggregates are more resource intensive. Median Query Time by Number of Dimensions Median query time by number of dimensions.  More dimensions are more expensive. Median query times for 4 and 9 dimensional queries.  More dimension, and thus more joins, are more expensive. These observations support the hypothesis that the highest-level queries are the most expensive and most often repeated and that queries with more joins are more expensive than queries with fewer joins. My next pass at optimizing this workload focused on the higher level aggregates where queries tended to repeat more often and take longer to execute and thus use more CPU time. I wanted something very easy to implement, so I chose to use the Oracle result cache.   My implementation was very simple:  I used the RESULT_CACHE hint for all queries in the first three drill groupings and prevented the use of the result cache with the NO_RESULT_CACHE hint with queries in drill groupings 4 and higher. To get a sense of the efficacy of this strategy I ran a quick test with 10 concurrent users and no wait time between queries (as compared to the usual average 5 seconds between queries). Results were impressive.  The total elapsed time for those queries that where executed 5 or more time was 8,149 seconds without using the result cache and 1,193 seconds when using the RESULT_CACHE hint (an 85% reduction).  Median query time dropped from 22.0 seconds to 2.4 seconds (an 89% reduction). Median Elapsed Time of Queries Executed 5 or More Times The average of median query times for queries executed 5 or more times in the 10 user test.  The result cache proved to be very effective in reducing queries times. Total Elapsed Time of Queries Executed 5 or More Times The total elapsed time (all queries, all users) for queries executed 5 or more times in the 10 user test.  For queries that repeat often, the result cache can significantly reduce the load on the server. Will this get me to queries of just a few seconds with 100 users?  With 100 users the median query time for the star schema was 21.5 seconds.  That means I need to reduce CPU use by about 90 percent to get down to about 2 seconds.   Using the result cache with queries that were often repeated was quick, easy and very effective, but the overall reduction in elapsed time was about 13% in the 10 user run.  That suggests that the result cache would be extremely effective for queries that are run often, but I need to do a bit more work to get where I need to be with the overall workload. This test also makes a very important point:  Oracle isn't just a one trick pony, there are many features that can be used together to optimize a query workload.  So, stay tuned.  Since we are coming up on Open World and there will be a lot of things to talk about then, I think I’ll finish up this challenge in the next post of this series.

In my previous post I established a baseline for 100 concurrent users running a variety of business intelligence queries against a 500 million row star schema on a commodity Intel-based server with 60...

Oracle Database In-Memory

Questions you asked: What happens if a column is not In-Memory?

When we're talking to customers or giving presentations questions often get asked that seem simple, but could have an answer that is worth showing with an example rather than just saying, "yes, it works this way". So I'm going to start a series of posts that will address those questions. I will continue the series as I come across what I think are interesting questions. This first post in the series of "Questions You Asked" will be "What happens if a column is not In-Memory?" And the follow on to that question was “… and will it show up in the execution plan?” So let's try it. We know that if all of the columns we are querying are populated in the In-Memory column store (IM column store) that we will see an execution plan that shows "TABLE ACCESS INMEMORY FULL" and our session statistics will also show that we accessed the object in the IM column store. So let's see what happens if we query an object where not all of the columns are populated into the IM column store and we query one or more of those columns. We'll use the LINEORDER table and we'll populate it minus the lo_orderdate, lo_quantity and lo_tax columns: Next we'll query the LINEORDER table and select one of the columns ( lo_quantity) that we didn't populate in the IM column store: We can see from the execution plan that we didn't access the table in-memory. We did a normal full table scan. So the optimizer clearly recognizes that if not all of the columns of an object are populated in the IM column store then it has to access the object through the buffer cache/direct path. Is this also true if we access one of the columns that isn't populated through the WHERE clause? Yes, even though the columns that we're selecting are populated in the IM column store, the fact that we require a column that is not populated in the WHERE clause also forced us to access the table from the buffer cache/direct path. Just for fun let's run our query with just columns that are populated in the IM column store. And of course we see that we have an in-memory execution plan. So, if you query an object and one or more of the columns that you are querying have not been populated in the IM column store then your query will access that object through the row store, and the execution plan will not show an INMEMORY access.

When we're talking to customers or giving presentations questions often get asked that seem simple, but could have an answer that is worth showing with an example rather than just saying, "yes, it...

Oracle Database In-Memory

Oracle Open World 2015 Time to plan your schedule!

There are only 6 weeks to go until Oracle Open World, the largest gathering of Oracle customers, partners, developers, and technology enthusiasts, which begins on October 25th in San Francisco. Of course the In-Memory development group will be there and you will have multiple opportunities to meet up with us, in one of our technical sessions, our hands-on-labs or at the Oracle demogrounds. This year the In-Memory team has 4 technical sessions and there are also 5 excellent customer sessions you shouldn't miss; Monday, October 26th at 2:45pm Facebook’s Use of Oracle Database In-Memory from Inception to RolloutSession CON1685 at Moscone South - room 301Sudhi Vijayakumar describes how Facebook evaluated Oracle Database In-Memory and then rolled it out in production to reap the benefits of faster response times for analytical queries. Monday, October 26th at 4:00pm Five Ways Swiss Mobiliar Improved Business Value with Database In-MemorySession CON2715 at Moscone South - room 310Folks from Swiss Mobiliar explain how they extended their traditional business intelligence (BI) systems to a more real-time solution using Oracle Database In-Memory. They will also compare this new solution to their original IBM DB2 solution. Tuesday, October 27th at 11:00am Best Practices for Getting Started with Oracle Database In-MemorySession CON6775 at Moscone South - room 104During this session I'll share my top tips for getting started with Oracle Database In-Memory and the best use cases for this new technology. Tuesday, October 27th at 4:00pm Oracle Database In-Memory: A Practical Solution Session CON3087 at Moscone South - room 307 Dan Huls from AT&T Wi-Fi Services tells how they implemented Oracle Database In-Memory on a large customer-facing data warehousing system in under two months with little or no changes to their application. He will also provide details on why they chose Oracle Database In-Memory versus a more expensive SAP HANA solution. Tuesday, October 27th at 5:15pm Oracle Database In-Memory—What’s New and What’s ComingSession CON8659 at Moscone South - room 103 Juan Loaiza reveals what motivated Oracle to develop this new technology and provides a clear understanding of how this unique “dual format” architecture works. He will also provide details on how Oracle Database In-Memory takes advantage of Software in Silicon. Wednesday, October 28th at 1:45pm Oracle Database In-Memory Customer PanelSession CON8657 at Moscone South - room 103 Four production Database In-Memory customers will join me in a round table discussion on why they chose Oracle Database In-Memory to  tackle the demand for real-time analytics and business intelligence in their companies. The session will begin with each customer briefly share how they are using Oracle Database In-Memory before we dive into an open discussion. The audience will be actively encouraged to participate in this session and pose any questions they have on Database In-Memory and other alternative solutions considered by the panelists.  Wednesday, October 28th at 3:00pm How Exadata & Database In-Memory Power Precision Marketing at General MillsSession CON4292 at Moscone South - room 254 Mark and Josh from General Mills explain how Oracle Exadata with Oracle Database In-Memory gave them a scalable and reliable platform from which to build a consumer-first data warehouse.Wednesday, October 28th at 4:15pm Oracle Database In-Memory: Under the HoodSession CON8652 at Moscone South - room 103Tirthankar Lahiri, the VP of the In-Memory development, will provide an insiders view of the technology underlying Oracle Database In-Memory and give you a sneak peek at what you can expect from Oracle Database In-Memory in our next generation. Thursday, October 29th at 9:30am Introducing Analytic Views—Bringing Star Queries into the Twenty-First CenturySession CON8742 at Moscone South - room 309During this session Bud Endress will introduce Analytic Views and describe how they easily transform tables into the next-generation data source for BI applications and Excel Pivot Tables. You will also have an opportunity to try out Oracle Database In-Memory for yourself in our hands-on lab,Oracle Database In-Memory Option Boot Camp: Everything You Need to Know that will run every day in the Golden Gate room in the Nikko Hotel. If you have burning questions related to the Oracle Database In-Memory, you can ask them at the In-Memory demo booth in the Database area of the demogrounds in Moscone South. Members of the In-Memory development team will be there Monday to Wednesday from 9:45am until 5:30pm. You may even be able to pick up an In-Memory bumper sticker. More details on the bumper stickers in my next post! The In-Memory team and our customers aren't the only ones talking about Oracle Database In-Memory at this year's conference. There are also a number of great sessions being delivered by Oracle ACEs. Check out the full searchable OOW catalog on-line to start planning your trip today! Maria Colgan+

There are only 6 weeks to go until Oracle Open World, the largest gathering of Oracle customers, partners, developers, and technology enthusiasts, which begins on October 25th in San Francisco. Of...

Oracle Database In-Memory

Star Schema Challenge - Part 4

In my previous post Star Schema Challenge - Part 3, I reported the first of the in-memory performance results with 25 users querying the star schema in the In-Memory column store (IM column store). In Star Schema Challenge - Part 3.1 I did the same for the de-normalized fact table in the IM column store.  This serves as a comparison between the row format in-memory (buffer cache) and IM column store. For this comparison I stopped at 25 users because it became clear this was enough users for the row store table.  Now it is time to start working towards 100 concurrent users. As a reminder, the goal of this challenge is to support 100 concurrentusers running a variety of business intelligence queties against a 500million row star schema on a commodity Intel-based server with 60 cores and 1TB of DRAM.  (See Star Schema Challenge - Part 1 for more details regarding the server, data, query workload and goals.) So far, I’ve run the following tests: 25 users querying the star schema in row storeformat. Median query time was 16.9seconds per query.) 25 users querying a de-normalized fact table inrow store format. Median query time was33.5 seconds per query.) 25 users querying the star schema in thein-memory column store. Median querytime as 3.85 seconds per query. 25 users querying the de-normalized fact tablein the in-memory column store. Medianquery time was 1.57 seconds per query. Median query times for25 users. AWR reports showed the Database to be operating efficiently;there were no unusual wait events. At 25users there was still some CPU cycles to spare for both the star andde-normalized tables. At 25 users I could call it a success at 1.57 seconds perquery on the de-normalized table. If Ipreferred the star schema for some reason (for example, perhaps that is what anapplication is already using) I would want reduce query time by at least half(from 3.85 seconds per query to under 2 seconds). My goal, however, is 100 users.  So how am I doing there? Well, the short of it is that I still need tomake some improvements to get the query time down to just a few seconds. Using the same configuration as the 25 userrun, I also ran with 50, 75 and 100 concurrent users. At 100 users the star schema ran at a medianof 21.5 seconds per query and the de-normalized fact table run at a median of10.0 seconds per query. Median query times forin-memory star and de-normalized tables at 25, 50, 75 and 100 concurrent users. Even at 25 users there is little opportunity for parallelquery on this 60 core server. And, as would be expected the server gets moresaturated as the number of users increases. The following chart compares Host CPU (% User) for various test runs. Percent Host CPU (%User) for in-memory star and de-normalized tables at 25, 50, 75 and 100concurrent users. Another indication that the server is saturated is the in-memoryscan rows per second flatten out at 75 users. Note that with both the star and the de-normalized table there areoperations in the query plan other than TABLE ACCESS INMEMORY FULL. Both style queries aggregate data. The star query includes joins. As a result, these queries are not the bestmeasurement of pure scan speed. Even so,it’s just another indication that the server is saturated with 100 users inthis particular configuration As I move forward toward my goal of fast query for 100concurrent users I will look for ways to be even more CPU efficient.   I will closely examine the query workloadand begin to blend in additional features of the Database to improveperformance.  I will pay particular attention to how queries are distributed across the business model (for example, the dimensions and levels of aggregation of the queries) and the number of rows accessed and aggregated and use that knowledge to tune the workload.

In my previous post Star Schema Challenge - Part 3, I reported the first of the in-memory performance results with 25 users querying the star schema in the In-Memory column store (IM column store). In...

Oracle Database In-Memory

Popular Statistics with Database In-Memory

Throughout our previous posts we have mentioned various session level statistics that are available to help identify what is actually going on with Database In-Memory. Since these statistic definitions didn't make it into the Reference manual I was asked recently if there is a list anywhere that defines what these statistics are. Since there isn't I thought I'd post a summary of the popular statistics that we've mentioned in the blog and that you might want to know about. Name Description IM scan rows optimized Number of rows that were skipped (because of storage index pruning) or that weren't accessed due to aggregations with predicate push downs IM scan rows projected Number of rows returned to the upper layer IM scan rows Number of rows scanned in all IMCUs IM scan rows valid Number of rows scanned in all IMCUs after applying valid vector IM scan CUs no memcompress IM scan CUs memcompress for dml IM scan CUs memcompress for query low IM scan CUs memcompress for query high IM scan CUs memcompress for capacity low IM scan CUs memcompress for capacity high Number of times IMCUs of each mecompress type were touched IM scan CUs columns accessed Number of columns accessed by a scan IM scan CUs invalid or missing revert to on disk extent Number of on disk extents accessed due to missing or invalid IMCUs IM scan CUs pruned Number of IMCUs with no rows passing min/max IM scan segments minmax eligible Number of IMCUs that are eligible for min/max pruning IM scan segments disk Number of times a segment marked for in-memory was accessed entirely from the buffer cache/direct read IM scan CUs predicates applied Number of min/max predicates applied IM scan CUs predicates optimized Number of IMCUs where either all rows passed min/max or no rows passed min/max IM scan CUs predicates received Number of min/max predicates received IM populate segments requested Number of population tasks for in-memory segments table scan disk IMC fallback Number of rows in blocks scanned from buffer cache/direct read where an IM scan was possible table scan disk non-IMC rows gotten Number of rows in blocks scanned from buffer cache/direct read where an IM scan was not possible table scans (IM) Number of segments scanned in-memory session logical reads - IM Number of blocks scanned in an IMCU

Throughout our previous posts we have mentioned various session level statistics that are available to help identify what is actually going on with Database In-Memory. Since these...

Understanding new In-Memory notes in an execution plan

If you have started to play around with Database In-Memory, chances are you have been examining execution plans to see if the optimizer has chosen to use the In-Memory column store (IM column store) or not. In addition to the execution plan itself, you should also check out the NOTE section under the plan, which contains more information about how a query was executed, such as if dynamic sampling was used during the query optimization. The information in the NOTE section is especially useful when you start executing queries in parallel in a RAC environment. If you recall, in our previous series on RAC we explained that each RAC instance has its own IM column store and when data is populated in-memory in RAC it is distributed across all of the IM column stores in the cluster. In fact the data is affinitized to a specific RAC instance, which means that we need to access that piece of the data via the IM column store(s) it resides in. We don’t ship in-memory data (IMCUs) across the interconnect to the RAC instance on which the query is being executed, which is why all in-memory queries on RAC must be parallelized with a degree equal to at least the number of instances that were involved in the distributed population. We also said, the best way to parallelize your statements to ensure this data affinity is taken into consideration was to use AutoDOP. So, how can you tell if you are using AutoDOP and if you got data affinity? Information on what parallel degree was chosen for a SQL statement and why is now shown in the NOTE section of the execution plan, as are the details on whether affinity was used or not. If you prefer to use SQL Monitor to examine execution plans then you can access the NOTE section by clicking on the yellow Plan Note just above the plan in an active report. In this example when we click on the to Plan Note we get the following popup window. Not exactly the same format we got in the NOTE section when we displayed the plan using the DBMS_XPLAN package in SQL*Plus. But it does have the same information, once you know how to decode it. Let’s start with whether or not AutoDOP kicked in. The first two lines of the Plan Note let us know what Degree of Parallelism (DOP) was used for the statement and why. In this example the DOP used was 2 and it was chosen because scan of object SSB.LINEORDER. What does that mean? Remember with AutoDOP, Oracle will automatically determine the DOP for a statement based on a number of criteria including if there are any scan operations in the execution plan. In this case, there was a full table scan of the LINEORDER table in the plan, which the optimizer considered expensive and so it decided to employ parallel execution. The degree chosen is based on the amount of work the optimizer believes needs to be done to complete the scan and the maximum DOP allowed as specified by the parameter PARALLEL_DEGREE_LIMIT. What about affinity? Did we get data affinity for this statement? Yes, the third line in the note, PX In Memory Imc yes indicates that the parallel scan of the LINEORDER table was affinitized. So, that just leaves the last line of the Plan Note PX In Memory no. What does it mean? Strange as this might sound, that entry in the Plan Note has nothing to do with Database In-Memory. It’s actually referring to the use of parallel execution within the buffer cache in a RAC environment. Yasin Baskan, the Product Manager for parallel execution, gives a great explanation of what that particular note actually means in his post on the Data Warehouse Insider blog. Now that you have the decode key for the SQL Monitor note, hopefully it will make your experiments with Database In-Memory easier to interpret!

If you have started to play around with Database In-Memory, chances are you have been examining execution plans to see if the optimizer has chosen to use the In-Memory column store (IM column store)...

Star Schema Challenge - Part 3.1

In my previous post  (Star Schema Challenge – Part 3) I revealed the first of the Database In-Memory results, 25 users querying a 500 million row star schema with tables in the In-Memory column store(IM column store) on a commodity Intel-based server with 60 cores and 1TB of DRAM. In this post I will reveal the first of the in-memory results for the de-normalized version of the fact table. Here is a summary of what I’ve seen so far with 25 concurrent users and an average of 5 second wait time between queries: A median of 16.9 seconds per query with tables in a star schema and conventional row store format. A median of 33.5 seconds per query with tables in a de-normalized table and conventional row store format. A median of 3.85 seconds per query with tables in a star schema loaded into the IM column store (a 77% reduction as compared the star schema in row store format). Median query time for 25 users and row store star, row store de-normalized and in-memory star. As a reminder, the goal of the this challenge is query times of just a few seconds for 100 concurrent users on this 60 core Intel server. As with the star schema, I kept this first in-memory test of the denormalized table very simple, only altering the table to mark it INMEMORY with the default compression level of MEMCOMPRESS FOR QUERY LOW. ATLER TABLE units_fact_500m_10_dnorm INMEMORY; That’s it … about 10 seconds of effort on my part. Let's first take a look at compression.  The star schema compressed from about 52.1 GB on disk to 15.2 GB in the IM column store, a compression ratio of 3.4.  I am perfectly happy with that compression ratio given that the star schema is well constructed with efficient integer keys (that is, not a lot of fluff to compress out). The de-normalized table includes both the integer keys and the much longer text columns representing aggregate level attributes, so it is to be expected that the de-normalized table would be considerably larger.  The average row length for the star fact table is 96 while the average row length of the de-normalized table is 483. On disk, the de-normalized table is 107.7 GB.  In the IM column store the de-normalized table compresses to 17.4 GB , a compression ratio of  6.2 with MEMCOMPRESS FOR QUERY LOW (only 14% larger than the inmemory star table).  Excellent. Row and IM column store sizes of the star and denormalized fact tables.  Since I could use a higher a compression level for the row store tables and there is no real penaltyfor the de-normalized table in the IM column store, I’m not concernedabout the size of the de-normalized table. Let's move on to look at query performance. With the row store tables, thestar schema was the clear winner butWhat happens when the tables are in the IM column store? For 25 users (5 second wait between queries) the de-normalized tableturned in a median query of just 1.57 seconds per query, a reduction of over95% as compared to the row store de-normalized table and about 2x faster thanthe star table in-memory. That’s impressive.  And remember, since this is 25 users on a 60 core server there is not much room for parallel query.  It is really the efficiency of Database In-Memory that makes this performance possible. Median query times for 25 users (5 second wait between queries). While the columns and filters vary, all of the de-normlized queries take the same form as the following example: SELECT calendar_year_name,  calendar_quarter_name,  month_name,  all_products_name,  department_name,  category_name,  all_customers_name,  all_channels_name, SUM(sales), SUM(units), SUM(measure_3), SUM(measure_4)FROM units_factWHERE calendar_year_name ='CY2010' AND calendar_quarter_name='Q4-CY2010' AND all_products_name    ='All Products' AND department_name      ='Computers'GROUP BY calendar_year_name,  calendar_quarter_name,  month_name,  all_products_name,  department_name,  category_name,  all_customers_name,  all_channels_nameORDER BY calendar_year_name,  calendar_quarter_name,  month_name,  all_products_name,  department_name,  category_name,  all_customers_name,  all_channels_name; Plans for the queries against the de-normalized table are very straightforward and the same for row store and in memory, except that the in memory plan uses TABLE ACCESS (INMEMORY FULL).  For example: SQL plan for query of the de-normalized table.   With the row store table, the majority of time was spend in TABLE ACCESS.  With the in memory table, time for TABLE ACCESS was negligible. As usual, the Database ran efficiently in this test and showed no usual wait events. Key AWR statistics for 25 users, star tables and the Database In-Memory tests. At 25 users these numbers look very encouraging.  In the next post I’ll increase the number of concurrent users and see where I stand. 

In my previous post  (Star Schema Challenge – Part 3) I revealed the first of the Database In-Memory results, 25 users querying a 500 million row star schema with tables in the In-Memory column...

Oracle Database In-Memory

Oracle Database In-Memory on RAC - Part 4

Setting Up Independent In-Memory Column Stores In previous posts we've talked about how to use RAC services to enable the IM column store to be run on a subset of nodes in a RAC environment. We also mentioned that it is possible, using RAC services and the DUPLICATE sub-clause on engineered systems, to enable rolling patches and upgrades when running the IM column store on a subset of nodes. In this article we're going to talk about how to set up independent IM column stores on a RAC cluster using services and some database initialization parameters. But first let's ask and answer the question, why would you want to do this? This might be a good idea if you are trying to enforce application affinity at the node level and don't want to allow inter-instance parallelism. This technique could be used to allow the running of all of the workload for a given application, along with any Database In-Memory workload to support that application, on just a single node in a RAC environment. This technique can also be used to support multiple "independent" IM column stores if, for example, you wanted to partition multiple applications across nodes in your RAC environment. However, the objects populated in the IM column stores in this example have to be distinct, you cannot intermix objects between multiple IM column stores. In other words, you cannot cheat and try to set up duplication of objects. That will only work on engineered systems. So how do we get started? The following will take you through setting up three "independent" IM column stores on a three node RAC, and then we will populate objects into two separate IM column stores and see how it all works. We're going to assume though, that we already have the three node RAC running and that we've allocated an IM column store on each instance by setting the INMEMORY_SIZE parameter to a non-zero value. Network and Service Connections First we will add services and appropriate TNS entries so that we can connect to each of our RAC instances independently. This will be required to insure that we only populate and access the IM column store that we intend to use for each or our application(s). srvctl add service -db dgrac -service IM1 -preferred racsrvctl add service -db dgrac -service IM2 -preferred rac2srvctl add service -db dgrac -service IM3 -preferred rac3srvctl start service -db dgrac -service "IM1,IM2,IM3" If we query the services they should look something like the following. Note that the RAC instance names in this example are rac, rac2 and rac3. The database unique name is dgrac: $ srvctl status service -db dgracService IM1 is running on instance(s) racService IM2 is running on instance(s) rac2Service IM3 is running on instance(s) rac3$ Add a TNS entry for each service: IM1 = (DESCRIPTION= (ADDRESS=(PROTOCOL=TCP)(HOST=rac1)(PORT=1521)) (ADDRESS=(PROTOCOL=TCP)(HOST=rac2)(PORT=1521)) (ADDRESS=(PROTOCOL=TCP)(HOST=rac3)(PORT=1521)) (CONNECT_DATA= (SERVICE_NAME=IM1) ) )IM2 = (DESCRIPTION= (ADDRESS=(PROTOCOL=TCP)(HOST=rac1)(PORT=1521)) (ADDRESS=(PROTOCOL=TCP)(HOST=rac2)(PORT=1521)) (ADDRESS=(PROTOCOL=TCP)(HOST=rac3)(PORT=1521)) (CONNECT_DATA= (SERVICE_NAME=IM2) ) )IM3 = (DESCRIPTION= (ADDRESS=(PROTOCOL=TCP)(HOST=rac1)(PORT=1521)) (ADDRESS=(PROTOCOL=TCP)(HOST=rac2)(PORT=1521)) (ADDRESS=(PROTOCOL=TCP)(HOST=rac3)(PORT=1521)) (CONNECT_DATA= (SERVICE_NAME=IM3) ) ) Verify service name connection: SQL> connect ssb/ssb@im1Connected.SQL> select instance_name from v$instance;INSTANCE_NAME----------------racSQL> connect ssb/ssb@im2Connected.SQL> select instance_name from v$instance;INSTANCE_NAME----------------rac2SQL> connect ssb/ssb@im3Connected.SQL> select instance_name from v$instance;INSTANCE_NAME----------------rac3SQL> Parallelism Next we need to set the initialization parameter PARALLEL_FORCE_LOCAL to TRUE. This has implications for all parallel processing since we will effectively be preventing inter-node parallelism, whether it's for Database In-Memory parallel queries or any other parallel queries. In other words, parallel server processes will be restricted so that they can only run on the node on which the SQL statement was executed. We have one more step in this setup and that is to set the PARALLEL_INSTANCE_GROUP parameter on each of our RAC instances. The PARALLEL_INSTANCE_GROUP parameter restricts parallel query operations to just the instances supported by the service name or specified by the instance_groups parameter on the instance where it is set. In our case we will set the PARALLEL_INSTANCE_GROUP parameter to the service_name we defined for each of our three instances. In other words: On node1 we define PARALLEL_INSTANCE_GROUP = IM1, on node2 PARALLEL_INSTANCE_GROUP = IM2 and on node3 PARALLEL_INSTANCE_GROUP = IM3. This will only allow population of the IM column store to occur on the node where we first access the object(s) that we want populated. $ sqlplus system@im1SQL*Plus: Release 12.1.0.2.0 Production on Wed Jul 29 16:05:55 2015Copyright (c) 1982, 2014, Oracle. All rights reserved.Enter password: Last Successful login time: Wed Jul 29 2015 16:05:38 -07:00Connected to:Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit ProductionWith the Partitioning, Real Application Clusters, OLAP, Advanced Analyticsand Real Application Testing optionsSQL> alter system set parallel_instance_group=IM1 scope=both sid='rac';System altered.SQL> alter system set parallel_instance_group=IM2 scope=both sid='rac2';System altered.SQL> alter system set parallel_instance_group=IM3 scope=both sid='rac3';System altered.SQL> Priority To insure that specific objects are populated into specific IM column stores we will have to set the population priority to NONE for the objects that we want to access in the IM column store, and we will have to access the object(s) (e.g. run a select) to start it's population connected to the instance that we want the object(s) populated on. Not to confuse the issue, but what happens if you set the priority to a level other than NONE? The answer is that you will not be able to control which node the population occurs on. At instance or database startup the population will be eligible to begin and it won't be possible to specify which objects should be populated into which IM column stores. This defeats the purpose of populating based on specific node/service combinations to achieve application affinity. Note that this also prevents any kind of distribution since we've limited parallelism to just the node that we execute from. Population Now that we have the connectivity set up let's see how we'll populate two of the IM column stores with tables from our SSB schema. First we will connect to the IM1 service (e.g. the rac instance) and we will alter the CUSTOMER table to be in-memory with a priority of none. We will then select from it to initiate the population. Now we'll populate the SUPPLIER table to the IM column store for the IM2 service (e.g. the rac2 instance). Note that we now have the CUSTOMER table entirely populated within the inst_id 1 IM column store and the SUPPLIER table entirely populated within the inst_id 2 IM column store. Queries So we now have our two tables populated into separate IM column stores in a single RAC environment. Please remember that this is just a simple example. We would really be populating the tables for an entire application into a single IM column store, not just one table as in this example. So what happens if we connect to our IM2 service (e.g. the rac2 instance) and query each of the tables? Let's give it a try. In this first query we have performed a simple count for the SUPPLIER table. From the execution plan and timing it looks like we've accessed the table through the IM column store and if we take a look at the statistics we can verify this: We see that we have IM statistics and that we accessed all 4000 rows from the IM column store (e.g. that is IM scan rows projected is 4000). Now let's do the same for the CUSTOMER table, but remember that we're connected to the IM2 service and the CUSTOMER table is populated in the IM column store for the IM1 service which is on instance 1 (e.g. the rac instance). Now that's strange. The explain plan indicates that we accessed the CUSTOMER table in-memory as well. Let's see what the statistics tell us. Whoops, no IM related stats other than IM scan segments disk and if we look further down the list we see that we did not get the 60,000 rows in the table from the IM column store (e.g. table scan disk non-IMC rows gotten): So this is really just meant to show that once you decide to use separate IM column stores for application affinity, if you need to access in-memory objects from a different node then you will have to go to the buffer cache/disk. Of course the whole point of setting up separate IM column stores was so that you could isolate an application to just one node, and why we had to set parallel_local_force = true and the parallel_instance_group parameter to the service name for each of the instances.

Setting Up Independent In-Memory Column Stores In previous posts we've talked about how to use RAC services to enable the IM column store to be run on a subset of nodes in a RAC environment. We also...

Oracle Database In-Memory

What to do with optimizer statistics when upgrading to 12c to take advantage of Database In-Memory

Before most customers can take advantage of Database In-Memory they will need to navigate the tricky terrain of a database upgrade. One of the most challenging aspects of an upgrade is figuring out how to minimize performance regressions due to execution plan changes. And if that wasn’t enough to handle, the introduction of Database In-Memory into a 12c environment has the potential to change even more of the execution plans. So, what should you do? First and foremost, you need to focus on getting to Oracle Database 12c with the same or better performing execution plans before introducing Database In-Memory into the mix. In order to achieve this goal you will need to arm yourself with two things; a copy of your existing optimizer statistics and a set of your existing (hopefully performant) execution plans. Capturing existing optimizer statistics You want to change as little as possible during the upgrade in order to make it easier to diagnose any changes that may occur. Since statistics have the biggest impact on the optimizer it is advisable for them to stay constant during the upgrade. You should continue use to use your 11g statistics until your 12c system performance is stable. The best way to ensure you have your 11g statistics is to keep a backup. So before the upgrade export a complete set of Optimizer statistics into a statistics or stats table using DBMS_STATS package. Step 1 Create the stats table BEGIN dbms_stats.Create_stat_table('SYS', 'MY_STATS_TAB'); END; / Step 2 Export the statistics for your critical schemas and select a stats_id to make it easy to identify your 11g statistics BEGIN dbms_stats.Export_schema_stats(‘sh’, 'MY_STATS_TAB', ’11g_stats’); END; / Note: If you’re not planning on doing an in-place upgrade, you will have to move the statistics to the Oracle Database 12c system by exporting the stats table and importing it into the 12c database. Then use the DBMS_STATS.EXPORT_SCHEMA_STATS procedure to populate your 11g statistics into the data dictionary. Capturing existing execution plans As we discussed in our previous post, SQL Plan Management (SPM) is your best safety net to ensure execution plans do not change and it should definitely be employed during an upgrade to help maintain stable performance. There a number of different ways you can capture your existing 11g plans in SPM, all of which are described in detail in capture section of the SPM whitepaper. Note: If you are not planning on doing an in-place upgrade you will have to move the SQL plan baselines to the Oracle Database 12c system by exporting them via a staging table using the DBMS_SPM package. Again you will find details on how to do this in the SPM whitepaper. After the upgrade Immediate after the upgrade you should be using your 11g plans and statistics to achieve the same executions plans you had before. Any new plans found by the 12c optimizer will be automatically captured in your existing SQL plan baselines but won’t be used until they have been verified to perform better than the 11g plan. But what should we do about gathering statistics and introducing Database In-Memory? You should start with statistics, since the optimizer needs a good, representative set of statistics in order to determine the correct execution, including in-memory plans. Since we have SPM in place, gathering a new set of statistics poses no threat to our system performance. Any new plans found after the statistics gather, won’t be used until they are verified to perform better than the 11g plan. We strongly recommend you follow Oracle’s best practices for gathering optimizer statistics. We also recommend that you run an SPM evolve task at this point, to ensure you are using the best execution plan for every statement. Once you are comfortable you have stable and acceptable performance in your 12c environment, you can introduce Database In-Memory by following the guidelines outlined in our previous post. Hopefully this approach will make you feel more comfortable about upgrading to Oracle Database 12c to take advantage of  Database In-Memory. +Maria Colgan

Before most customers can take advantage of Database In-Memory they will need to navigate the tricky terrain of a database upgrade. One of the most challenging aspects of an upgrade is figuring out...

Oracle Database In-Memory

Oracle Database In-Memory Bundle Patch 10 Released

The latest Bundle Patch for Database In-Memory has been released. The Bundle Patch is 21188742 or 12.1.0.2 Bundle Patch 10 for Engineered Systems and DB In-Memory (July2015)). This Bundle Patch improves the performance of mixed workload environments (OLTP & DW workloads), as well as enhancing the performance of analytic queries with aggregation. More information on the latest Bundle Patch can be found in the MOS note 21188742.8 or in the Mos note: 12.1.0.2 Bundle Patches for Engineered Systems and DB In-Memory (Doc ID 1937782.1). Remember that Bundle Patches are cumulative, which means that Bundle Patch 10 includes all fixes from Bundle Patches 1 through 9 in addition to new content. It also includes the most recently released Patch Set Update (PSU) (JULY 2015 GI PSU and DB PSU), which also includes the most recently released Critical Patch Update (CPU) as of the time that the Bundle Patch is created. This has caused quite a bit of concern so I wanted to point out two MOS notes that provide more details: Exadata Patching Overview and Patch Testing Guidelines (Doc ID 1261380.1) Exadata Database Machine and Exadata Storage Server Supported Versions (Doc ID 888828.1). Don't let the "Exadata" in the name of the Bundle Patch scare you off. Bundle Patches are supported for Exadata and non-Exadata systems and on RAC and non-RAC databases as well. When it comes to patching your system, it's important to remember that an installation can only use one of the CPU, PSU or Bundle Patch patching methods. This is documented in MOS note: Oracle Database - Overview of Database Patch Delivery Methods (Doc ID 1962125.1). For In-Memory customers we recommend that you always run with the latest Bundle Patch!  Good luck and happy patching!

The latest Bundle Patch for Database In-Memory has been released. The Bundle Patch is 21188742 or 12.1.0.2 Bundle Patch 10 for Engineered Systems and DB In-Memory (July2015)). This Bundle Patch...

Star Schema Challenge - Part 3

In my previous posts on the Star Schema Challenge, I establishedbaseline results for query performance with row store tables (that is, withoutusing Database In-Memory).  A 9dimensional, 500 million row star schema supported 25 users with a median querytime of 16.9 seconds. (With a median of16.9 seconds, there was not much point in increasing the number of users withthe row store tables). The same data ina 500 million row de-normalized table supported 25 users with a median querytime of 33.5 seconds per query. Remember all of these tests are being conducted ona 60 core Intel server with commodity disk and 1 TB of DRAM. Median queryperformance (in seconds) of the query workload with 25 users on row storetables. The overall goal for this series is to support 100 users with second level response times and an average think timeof 5 seconds. So, the star needs support 4x thenumber of users with about a 10x improvement and thede-normalized table needs to support 4x thenumber of user with a 20x improvement on query performance. Time to get to work with Database In-Memory! In this phase of the challenge I needed to enable thedatabase and star schema for Database In-Memory and get a new star schemabaseline for both 25 and 100 users. In this instance I allocated 300G to In-Memory Area via the INMEMORY_SIZE parameter. This comes out of the 800G allocated toSGA via the SGA_MAX_SIZE parameter. (This turned out to bemuch more than I needed for both the star and de-normalized tables). I altered each table in the star schema to mark the table INMEMORY with the default compression level of MEMCOMPRESS FOR QUERY LOW. This compression level would be expected toprovide the best query performance. Forexample: ALTER TABLE units_fact_500m_10 INMEMORY; I added the VECTOR_TRANSFORM hint to each query to force thevector transformation plan (also known as In-Memory Aggregation). For example: SELECT /*+ vector_transform */d1.calendar_year_name, d1.calendar_quarter_name, d2.all_products_name, d3.all_customers_name, d4.all_channels_name, SUM(f.sales), SUM(f.units), SUM(f.measure_3), SUM(f.measure_4) FROM time_dim d1, product_dim d2, customer_dim_500m_10 d3, channel_dim d4, units_fact_500m_10 f WHERE d1.day_id = f.day_id AND d2.item_id = f.item_idAND d3.customer_id = f.customer_id AND d4.channel_id = f.channel_id GROUP BY d1.calendar_year_name, d1.calendar_quarter_name, d2.all_products_name, d3.all_customers_name, d4.all_channels_name ORDER BY d1.calendar_year_name, d1.calendar_quarter_name, d2.all_products_name, d3.all_customers_name, d4.all_channels_name; I forced the vector transform plan because I know fromprevious tests that vector transform is almost always more efficient thanalternative plans. (See Efficient Query Processing with Database In-Memory). (There is also a hidden database parameter toforce the vector transform plan, but I prefer the hint because it allows morefine-grained control.) And finally, I made all indexes on tables in the star schemaINVISIBLE to ensure the optimizer did not accidentally choose an INDEX ACCESS with undesirable results. (With some tables and workloads INDEX ACCESSmight be the best choice. With these tablesand this query workload, which typically access large numbers of rows, it isbetter to always access the table in the in-memory column store.) All in all, I invested about 10 minutes of work to enableDatabase In-Memory, ALTER a few tables and indexes, and add a hint to myqueries. So what did I get for thisminor investment in time and system resource? The star schema used only a small fraction of the 300G I allocatedto In-Memory Area and was about 1/3 the size of the row store table.  I’m happy with that given that: The fact table in the star schema wasefficiently designed (for example, all INTEGER keys) so there isn’t a lot offluff such as long text keys to compress out, and; I’m using a MEMCOMPRESS FOR QUERY LOW (ratherthan a higher compression level). Star schema size ondisk. The median query performance for 25 concurrent users (with a 5second time between queries) was 3.85 seconds, a 77% reduction. Keeping in mind that 25 users on a 60 coremachine leaves little room for parallelism, that’s pretty good. Median queryperformance for 25 users, row store vs. Database In-Memory. As with most queries using a vector transformation plan, themost time was spent in KEY VECTOR USE (a join operation) and VECTOR GROUP BY(aggregation). TABLE ACCESS (IN MEMORYFULL) typically accounted for less then 10% of the time.  As with previous tests, the machine ran very efficientlywith no usual wait events, IO wait, etc. The one notable statistic is a relatively low HOST CPU at only 46%. This is not surprising given theconservative setting of PARALLEL_DEGREE_LIMIT of 5 for the 25 user run and 5second wait time between queries. Givena more aggressive setting for PARALLEL_DEGREE_LIMIT I probably could get alower median query time. Key AWR statistics for25 users, star tables and the first Database In-Memory tests. Rather than trying a more aggressive parallelism, I did run a 25user test with no wait time between queries. In this test median query only increased to 4.69 seconds per query andHOST CPU time increased to 73%. For 25 users, where I have baselines for each configuration,here’s where we stand at time point. Median query time for 25 users and row store star, row store de-normalized and in-memory star. With 100 concurrent users and 5 seconds wait time the medianquery was 21.5 seconds per query. (Remember, I didn’t run 100 users with the row store table so there isno comparison here.) For the 100 user run, I set PARALLEL_DEGREE_LIMIT = 2 sothere was very little opportunity for parallel query. The AWR report starts to get a bit more interesting with the100 user run. The machine stays busywith host CPU at 89.4%. Queries are runon the server using a PL/SQL procedure in a job managed by the Oracle JobScheduler. The scheduler accounted for13.2% of DB Time, reducing the CPU available for DB CPU. It’s probably fair to think that query time could have been15-20% better if user sessions were not run through the job scheduler, but thisis just the baseline for Database In-Memory so I’m not going to worry aboutthat.  Key AWR statistics for100 users, star tables and the first Database In-Memory tests. For my next post, I’ll test 100 users with the de-normalizedfact table in the In-Memory columnar store.

In my previous posts on the Star Schema Challenge, I established baseline results for query performance with row store tables (that is, without using Database In-Memory).  A 9dimensional, 500 million...

Star Schema Challenge - Part 2.1

In Star Schema Challenge – Part 2 I revealed baselineresults for my query workload running on a 500 million row star schema with alltables marked NO INMEMORY and In-Memory Aggregation prevented using theNO_VECTOR_TRANSFORM hint.  With a medianof 16.9 seconds per query (allowing for an average think time between queries of 5seconds), I decided that 25 users were enough for a baseline. After my first post in this series someone suggested thatthe best implementation might be a de-normalized table. I decided to satisfy this user’s curiosity and test ade-normalized table also, starting with NO INMEMORY test. (Vector Transform is not applicable to thede-normalized table because it does not join to another table.) Like the fact table in the star schema the de-normalizedtable used COMPRESS BASIC on the row store table and attribute clustering asdefined below:  CLUSTERING  BY INTERLEAVEDORDER ( "UNITS_FACT_500M_10_DNORM"."CALENDAR_YEAR_NAME", "UNITS_FACT_500M_10_DNORM"."DEPARTMENT_NAME", "UNITS_FACT_500M_10_DNORM"."REGION_NAME", "UNITS_FACT_500M_10_DNORM"."CLASS_NAME") YES ON LOAD  YES ON DATA MOVEMENT WITHOUTMATERIALIZED ZONEMAP As with the star schema, I set PARALLEL_DEGREE_POLICY toLIMITED and controlled the number of parallel processes by settingPARALLEL_DEGREE_LIMIT = (PARALLEL_MAX_SERVERS / 2 ) / Number of Users toprevent statement queuing and distribute resources evenly across uses. Remember, this is a non-RAC environment so I don't have to worry about data affinity. For the most part, plans where TABLE_ACCESS (INMEMORY_FULL)and HASH (GROUP BY). Typical SQL executionplan with de-normalized fact table. Without Database In-Memory the de-normalized table supported25 users (with an average think time of 5 seconds between queries) with amedian query time of 33.5 seconds per query. Median queryperformance (in seconds) of the query workload with 25 users. Perhaps this is not too surprising given that greaterexpense of scanning the full de-normalized table. The de-normalized table has 44 columns and an average row length of 483bytes, while the star fact table has 22 columns and an average row length of 96 bytes. In addition to slower query, the de-normalized table is almost 2X larger on disk. (Again, both tablesused COMPRESS BASIC.) Size on disk of rowstore tables. An AWR report showed that the server ran efficiently andused more of the machine that the star baseline (host CPU was 85.3% as comparedto the star baseline at 70.9). Also,even though the de-normalized table was about twice as large as the star table,there was no IO wait. AWR report for a 25user query workload against the de-normalized row store table. So, at least on this commodity server and conventional rowstore tables the star schema is the clear winner. My next post will include the first of the Database In-Memory results.

In Star Schema Challenge – Part 2 I revealed baseline results for my query workload running on a 500 million row star schema with all tables marked NO INMEMORY and In-Memory Aggregation prevented...

Oracle Database In-Memory

Oracle Database In-Memory Bundle Patch 9 Released

The latest Bundle Patch for Database In-Memory has been released. The Bundle Patch is 21053000 or 12.1.0.2 Bundle Patch 9 for Engineered Systems and DB In-Memory (June2015)). More information on the latest Bundle Patch can be found in the MOS note: 12.1.0.2 Bundle Patches for Engineered Systems and DB In-Memory (Doc ID 1937782.1), and for specific details on Bundle Patch 9 see MOS note 21053000.8. Remember that Bundle Patches are cumulative, which means that Bundle Patch 9 includes all fixes from Bundle Patches 1 through 8 in addition to new content. It also includes the most recently released Patch Set Update (PSU) which also includes the most recently released Critical Patch Update (CPU) as of the time that the Bundle Patch is created. This has caused quite a bit of concern so I wanted to point out two MOS notes that provide more details: Exadata Patching Overview and Patch Testing Guidelines (Doc ID 1261380.1) Exadata Database Machine and Exadata Storage Server Supported Versions (Doc ID 888828.1). Don't let the Exadata part scare you off. Bundle patches are also supported for non-Exadata databases as well. Note the text right in the Bundle Patch 9 note (21053000.8): ****** This patch includes fixes for both Engineered Systems and for DB In-Memory.*** It can be used on both Exadata and non-Exadata systems, and can be used*** for both RAC and non-RAC configurations. For RAC configurations the bundle*** includes GI PSU fixes to be applied to the Grid home.****** This patch includes all fixes from April 2015 GI PSU and DB PSU.*** Note that it also states which PSU is included. It is important to remember that an installation can only use one of the CPU, PSU or Bundle Patch patching methods. This is documented in MOS note: Oracle Database - Overview of Database Patch Delivery Methods (Doc ID 1962125.1). We recommend that you always run with the latest Bundle Patch and hopefully this information will allay any concern that bundle patches are only for Exadata platforms. Good luck and happy patching.

The latest Bundle Patch for Database In-Memory has been released. The Bundle Patch is 21053000 or 12.1.0.2 Bundle Patch 9 for Engineered Systems and DB In-Memory (June2015)). More information on the...

Oracle Database In-Memory

Do I really have to drop all of my reporting indexes?

I'm back on the road this month, meeting with customers to discuss their initial impressions and experiences with Oracle Database In-Memory. During one such discussion, I got asked a very peculiar question. The question was, "Do I really have to drop all of my reporting indexes if I use Database In-Memory?" I have to admit I was a little taken aback by this question. After all, I thought most folks would be delighted to have an opportunity to give up the majority of their indexes, not just because of the space savings and DML performance benefits but also the maintenance nightmare that indexes can sometimes become. Assuming this was a trick question, I deployed the standard stalling technique of answering a question with a question, “Can you tell me a little more about your situation?” To which the system architect explained that they were in production with Oracle Database In-Memory on a 2 node RAC cluster running on commodity servers and a crap IO subsystem (his words, not mine). They had a snowflake schema, and had enough memory to accommodate all of their dimension tables but only the last 3 months of data in their two fact tables. Following my guidelines, they had kept their primary key indexes but dropped the rest of their indexes. He assured me that the performance of most of their queries had improved 100X and their ETL jobs were finishing 2X faster without the indexes but there were some queries that accessed more than just the last 3 months worth of data in the fact table and their performance had gotten worse, a lot worse. It was in that moment that I realized that our guidance on dropping all reporting indexes with Database In-Memory had been based on an assumption that was not always true. The assumption I had been working under was; all of your performance critical data resides in memory or you have a good IO sub-system (engineered system etc.) Luckily for me, there was a solution I could offer to the system architect, to restore the performance of his queries that access more than 3 months worth of data from his fact table. The solution was to take advantage of the table expansion optimizer transformation in conjunction with partially usable local indexes. If you aren’t familiar with the table expansion transformation, it was introduced in Oracle Database 11g Release 2 to allow the optimizer to generate a plan that uses indexes on only some of the partitions in a table, but not all of them. Hence, DBAs can configure a table so that an index is only created on the partitions that are predominately read-only, and will not suffer the overhead of index maintenance on the active portions of the data. This same technique can be use with Database In-Memory. In this case the index is created only on the partitions in the fact table that are not in the In-Memory column store (IM column store). For the most current partitions that are in-memory, the index is marked unusable. Therefore it doesn’t have to be maintained during DML. Let’s look at an example of table expansion in action with Oracle Database In-Memory. Consider the following query: SELECT Count(*) FROM partition_sales s WHERE s.lo_orderdate BETWEEN To_date('2011-09-01','YYYY-MM-DD') AND To_date('2015-05-31','YYYY-MM-DD') AND s.lo_orderkey = 5819909; The PARTITION_SALES table is range partitioned by month on LO_ORDERDATE. In order to ensure that the query performs efficiently in Oracle Database 11g a local index has been created on the LO_ORDERKEY column. Below is the execution plan for the statement in 11g: After upgrading to Oracle Database 12c, I only have enough memory to accommodate the latest 3 months of data in the IM column store. The following query shows the latest 3 months populated into the IM column store: Rather than dropping the index on the LO_ORDERKEY column completely, I used the following command to mark the index unusable on the 3 latest partitions: ALTER INDEX idx_lo_ordkey MODIFY PARTITION p45 unusable; Marking the indexes unusable on these 3 partition means that the indexes won’t be maintained during any DML operations on these partitions. Since data only changes in these most recent partitions, we still get all of the benefits we would have gotten if the index had been dropped altogether. If the date range in our query is changed so that it only accesses older data, then the Optimizer picks our original index driven plans: SELECT Count(*) FROM partition_sales s WHERE s.lo_orderdate BETWEEN To_date('2011-09-01','YYYY-MM-DD') AND To_date('2015-02-28','YYYY-MM-DD') AND s.lo_orderkey = 5819909; If however the date range in our query is only for 1 month and that partition is in- memory, then the plan becomes a full table scan via the IM column store: SELECT Count(*) FROM partition_sales s WHERE s.lo_orderdate BETWEEN To_date('2015-04-01','YYYY-MM-DD') AND To_date('2015-04-30','YYYY-MM-DD') AND s.lo_orderkey = 5819909; And when the date range spans both the data in the IM column store and on disk (as in the original query above), the Optimizer takes advantage of the table expansion transformation to automatically rewrite the query into a UNION ALL query. The transformed or rewritten UNION ALL query is made up of two branches, one branch accessing the older data on disk and the second branch accessing the 3 latest partitions in-memory. In this example the transformed query would be as follows: SELECT Count(*) FROM partition_sales s WHERE s.lo_orderdate BETWEEN To_date('2011-09-01','YYYY-MM-DD') AND To_date('2015-02-28','YYYY-MM-DD') AND s.lo_orderkey = 5819909;UNION ALL SELECT count(*) FROM partition_sales s WHERE s.lo_orderdate BETWEENTo_date('2015-03-01','YYYY-MM-DD') AND To_date('2015-05-31','YYYY-MM-DD') AND s.lo_orderkey = 5819909; Transforming or rewriting the query to a UNION ALL enables the optimizer to get the best execution plan for all of the partitions in the table, with the first branch using the index access and the second branch using the In-Memory column store. So, as you can see there maybe situations where it is not advantageous to drop all of your reporting indexes completely but in those situations remember a combination of the table expansion transformation and partially useable indexes will give you a very similar benefit. +Maria Colgan

I'm back on the road this month, meeting with customers to discuss their initial impressions and experiences with Oracle Database In-Memory. During one such discussion, I got asked a very peculiar...

Star Schema Performance Challenge - Part 2

In my previous post, Star Schema Performance Challenge – Part 1, I outlined a challenge to support 100 concurrent users querying a 9 dimensional, 500 million row star schema using a Sun X4-4 (with 60 cores, 1 TB DRAM and commodity storage).  The users, of course, expect great query performance.  The challenging part of this is 100 active users on a 60 core machine.  With far fewer or less active users this might not be so challenging, but 100 users on this size machine is a different story. In this post I’ll talk more about the workload and share some baseline results. The query workload will be familiar to anyone that supports a business intelligence platform such as Oracle Business Intelligence (or Business Objects, MicroStrategy, etc.).  Users are presented with a variety of reports in a business intelligence dashboard and from there are allowed to explore by drilling down, adding filters, choosing different measures, etc. My challenge uses generated queries that start at high level aggregates (the first query in a series) and drills down 5 to 8 times depending on the dimensions used in the query.  This process repeats until 528 queries are run for each user (a total of 52,800 queries for 100 users).  The dimensions and measures vary for each series, so there are very few repeated queries (on average, 511 unique queries per user). For example, the first series for a user might start with a query such as: SELECT d1.calendar_year_name,   d1.calendar_quarter_name,   d2.all_products_name,   d3.all_customers_name,   d4.all_channels_name,   SUM(f.sales),   SUM(f.units),   SUM(f.measure_3),   SUM(f.measure_4) FROM time_dim d1,   product_dim d2,   customer_dim d3,   channel_dim d4,   units_fact f WHERE d1.day_id          = f.day_id AND d2.item_id           = f.item_id AND d3.customer_id       = f.customer_id AND d4.channel_id        = f.channel_id GROUP BY d1.calendar_year_name,   d1.calendar_quarter_name,   d2.all_products_name,   d3.all_customers_name,   d4.all_channels_name ORDER BY d1.calendar_year_name,   d1.calendar_quarter_name,   d2.all_products_name,   d3.all_customers_name,   d4.all_channels_name; The first drill down in this series might be on Calendar Year 2014. SELECT d1.calendar_year_name,   d1.calendar_quarter_name,   d1.month_name,   d2.all_products_name,   d3.all_customers_name,   d4.all_channels_name,   SUM(f.sales),   SUM(f.units),   SUM(f.measure_3),   SUM(f.measure_4) FROM time_dim d1,   product_dim d2,   customer_dim d3,   channel_dim d4,   units_fact f WHERE d1.day_id             = f.day_id AND d2.item_id              = f.item_id AND d3.customer_id          = f.customer_id AND d4.channel_id           = f.channel_id AND d1.calendar_year_name   ='CY2014' GROUP BY d1.calendar_year_name,   d1.calendar_quarter_name,   d1.month_name,   d2.all_products_name,   d3.all_customers_name,   d4.all_channels_name ORDER BY d1.calendar_year_name,   d1.calendar_quarter_name,   d1.month_name,   d2.all_products_name,   d3.all_customers_name,   d4.all_channels_name; This process repeats until the last query in the series.  For example: SELECT d1.calendar_year_name,   d1.calendar_quarter_name,   d1.month_name,   d1.day_name,   d2.all_products_name,   d2.department_name,   d2.category_name,   d3.all_customers_name,   d3.region_name,   d3.country_name,   d4.all_channels_name,   d4.class_name,   d4.channel_name,   SUM(f.sales),   SUM(f.units),   SUM(f.measure_3),   SUM(f.measure_4) FROM time_dim d1,   product_dim d2,   customer_dim d3,   channel_dim d4,   units_fact f WHERE d1.day_id             = f.day_id AND d2.item_id              = f.item_id AND d3.customer_id          = f.customer_id AND d4.channel_id           = f.channel_id AND d1.calendar_year_name   ='CY2014' AND d1.calendar_quarter_name='Q3-CY2014' AND d1.month_name           ='Jul-14' AND d2.all_products_name    ='All Products' AND d2.department_name      ='Computers' AND d3.all_customers_name   ='All Customers' AND d3.region_name          ='Asia' AND d4.all_channels_name    ='All Channels' AND d4.class_name           ='Direct' GROUP BY d1.calendar_year_name,   d1.calendar_quarter_name,   d1.month_name,   d1.day_name,   d2.all_products_name,   d2.department_name,   d2.category_name,   d3.all_customers_name,   d3.region_name,   d3.country_name,   d4.all_channels_name,   d4.class_name,   d4.channel_name ORDER BY d1.calendar_year_name,   d1.calendar_quarter_name,   d1.month_name,   d1.day_name,   d2.all_products_name,   d2.department_name,   d2.category_name,   d3.all_customers_name,   d3.region_name,   d3.country_name,   d4.all_channels_name,   d4.class_name,   d4.channel_name; The user then moves on the next series with different dimensions and different measures.  For example:  SELECT d1.calendar_year_name, d2.all_products_name, d3.all_customers_name, d4.all_channels_name,  d5.all_ages_name,  d6.all_sizes_name,  d7.all_years_name,  d8.all_incomes_name,  d8.income_name,  d9.all_status_name,  SUM(f.measure_6),  SUM(f.measure_4),  SUM(f.measure_3),  SUM(f.measure_9)FROM time_dim d1,  product_dim d2,  customer_dim d3,  channel_dim d4,  age_dim d5,  household_size_dim d6,  years_customer_dim d7,  income_dim d8,  marital_status_dim d9,  units_fact fWHERE d1.day_id          = f.day_idAND d2.item_id           = f.item_idAND d3.customer_id       = f.customer_idAND d4.channel_id        = f.channel_idAND d5.age_id            = f.age_idAND d6.household_size_id = f.household_size_idAND d7.years_customer_id = f.years_customer_idAND d8.income_id         = f.income_idAND d9.marital_status_id = f.marital_status_idAND d8.all_incomes_name  ='All Incomes'GROUP BY d1.calendar_year_name,  d2.all_products_name,  d3.all_customers_name,  d4.all_channels_name,  d5.all_ages_name,  d6.all_sizes_name,  d7.all_years_name,  d8.all_incomes_name,  d8.income_name,  d9.all_status_nameORDER BY d1.calendar_year_name,  d2.all_products_name,  d3.all_customers_name,  d4.all_channels_name,  d5.all_ages_name,  d6.all_sizes_name,  d7.all_years_name,  d8.all_incomes_name,  d8.income_name,  d9.all_status_name; Due to the changes in each series, random drill patterns and random filters the workload represents a very diverse set of reports and ad-hoc queries. The star schema includes the following dimension tables: Time, with 1,827 days (5 years) plus months, quarters and years. Product, with 2,713 items plus subtypes, types, categories and departments. Customer, with 1.1 million customers plus cities, states, countries and regions. Channel, with 11 distribution channels plus distribution classes. And 5 demographic dimensions (Age, Income, Marital Status, Years Customer, Household Size), each with fewer than 10 members. A 500 million row fact table with keys for all dimensions and 9 measures. The time dimension and fact table are each partitioned by time (interval, by month).  Keys are INTEGER data type and measures are NUMBER data type.  Constraints include primary and foreign keys (RELY DISABLE).  Bitmap indexes were created for each of the key in the dimension and fact tables.  This is a typical (but well crafted) star schema.  I also used the new attribute clustering feature, which should help when querying the row store table (but I didn’t test with and without this feature). CLUSTERING   BY INTERLEAVED ORDER (UNITS_FACT.CUSTOMER_ID,   UNITS_FACT.ITEM_ID,   UNITS_FACT.CHANNEL_ID)    YES ON LOAD    YES ON DATA MOVEMENT   WITHOUT MATERIALIZED ZONEMAP  For the baseline timings I started with all tables NO INMEMORY and prevented In-Memory Aggregation with the NO_VECTOR_TRANSFORM hint.  After some experimentation I set PARALLEL_DEGREE_POLICY to LIMITED.  I controlled the number of parallel processes using PARALLEL_DEGREE_LIMIT, using the standard formula of (PARALLEL_MAX_SERVERS / 2) / Number of Users.  These settings prevented statement queuing and distributed resources evenly across users. With users running a query once every 5 seconds (ranging from 0 to 10 seconds) I didn’t expect fit 100 users on this 60 core machine and get great query performance, so I started with 25 users.  That proved to be enough to get a NO IMEMORY baseline.  The median query time was 16.9 seconds (.26 seconds in parse and 15.9 in fetch). SQL execution plans for less selective queries tended to include HASH JOIN and HASH GROUP BY.  Plans for more selective queries (more filters) often used Bloom filters.  Since tables where not in-memory, TABLE ACCESS (FULL) was usually used with the dimension tables and always used with the fact table.  TABLE ACCESS (FULL) typically accounted for 25-50% of the CPU time of the query. An AWR report showed that the server stayed reasonably busy at 71% host CPU and ran efficiently. Some key statistics from an AWR report follow.  Note that even through the tables are NO INMEMORY there is no IO wait.  This is because the tables fit within the buffer cache.  This allows us to compare the efficiency of In-Memory vs without In-Memory, which is why the challenge focuses on a large number of concurrent users rather than fact table size. So, without Database In-Memory my baseline is a median of16.9 seconds per query for 25 concurrent users.  My goal is to support 100 concurrent users. So, all I need to do is make the queries run 10to 15 times faster with 4 times the number of concurrent users. No worries, right? Stay tuned for my next post to see what approach I try next to accomplish this goal!

In my previous post, Star Schema Performance Challenge – Part 1, I outlined a challenge to support 100 concurrent users querying a 9 dimensional, 500 million row star schema using a Sun X4-4(with 60...

Oracle Database In-Memory

SQL Plan Management

Have you ever upgraded something to a new version, like your phone or your laptop, and found that while most of it worked great there were just a few things that were worse or even broken? You might have decided to restore back to the previous version because the problem(s) were just too annoying or catastrophic to ignore. Well this was one of the motivations behind the release of SQL Plan Management back in Oracle Database 11g. SQL Plan Management (SPM) provides a framework for completely transparent, controlled execution plan evolution. The main idea being that your application should see no performance regressions due to execution plan changes when you upgrade or change something in your database environment. So what does this have to with Oracle Database In-Memory? Just like a database upgrade, the addition of Oracle Database In-Memory is a big change and can introduce SQL plan changes. We've talked before on this blog about how the Optimizer in Oracle Database 12.1.0.2 has been enhanced to make it aware of Oracle Database In-Memory. But let's face it, there have been a lot of changes and no software is perfect. To minimize the risk of database upgrades we routinely recommend using SPM to prevent any regressions in SQL performance and an upgrade to Oracle Database In-Memory is no different. SPM provides the ability to help insure that all of your SQL will run at least as fast as it did prior to the upgrade. For Oracle Database In-Memory implementations we recommend that SQL plan baselines be captured prior to implementing Oracle Database In-Memory. This should be considered the application "baseline" and the performance of any SQL executions should not regress from this baseline. Ideally we would expect it to improve. The following process can be used to capture the initial baseline: Once these SQL plan baselines have been captured, Oracle Database In-Memory can be implemented. The same workload used to create the baseline should be executed again with all objects populated into the In-Memory column store. If the optimizer finds new execution plans that take advantage of the In-Memory column store, they will be automatically added to the existing SQL plan baselines but they will not be used. The original plans captured during the baseline run will still be used. A set of evolve tasks can be run and only the new execution plans that perform better then the original baselines will be accepted. The following shows an example of running a set of evolve tasks: The workload can then be run a final time and should now use the best possible execution plans with no regressions. You can find additional information about SQL Plan Management in this white paper or in the Oracle Database 12c documentation.

Have you ever upgraded something to a new version, like your phone or your laptop, and found that while most of it worked great there were just a few things that were worse or even broken? You might...

Oracle Database In-Memory

Memory Usage with Oracle Database In-Memory

We often get asked the same two questions at conferences and customer presentations. That is  can't I just cache my tables in the buffer cache or put them in the keep pool, and how much memory do I need to allocate to the IM column store? While not directly related, I'm going to try and answer both questions in this post. First let's tackle the issue of "in-memory". If all of my data is "in-memory" in the buffer cache then why do I need Database In-Memory? The problem with this type of reasoning is that it ignores the real key to Database In-Memory and that is the columnar format, along with other optimizations that are part of the columnar format of Database In-Memory like compression, storage indexes and SIMD vector processing. The fact that the column store is in-memory is essentially just an enabler for the columnar format.Let's take a look at an example to see what I mean. The following query makes use of our SSB schema, and specifically the LINEORDER table. The LINEORDER table has 23,996,604 rows and has been fully populated into the IM column store and fully populated into the buffer cache as well. First let's see the query run with a NO_INMEMORY hint against the buffer cache: We can see that we performed a full table scan against the LINEORDER table and the query took 2.45 seconds on the database run in VirtualBox on my laptop. In the statistics section note that we did no physical I/O, and also note the amount of CPU used by the query.Next let's look at the same query run against the IM column store. The only change that I've made to the query is to remove the NO_INMEMORY hint. Now we see that the execution plan tells us that we did a full in-memory scan of the LINEORDER table and that it took no time (of course this isn't really true, but it took so little time that SQL*Plus didn't record any time for the execution). We can also see that we only had to look at one IMCU (IM scan CUs memcompress for query low - IM scan CUs pruned) and that we only had to return one row (IM scan rows projected). Now obviously this is a best-case scenario, but it highlights the difference that the columnar format makes with Database In-Memory. One other thing to note is to look at the CPU usage. Look at how much less CPU was consumed for the same query. What if we actually had to scan all of the values for the columns in the query? In other words, what if we had a query that couldn't take advantage of storage indexes and had to scan all of data for the columns in the query in all of the IMCUs for the LINEORDER table in the IM column store? Will that still be faster than scanning the data in the buffer cache? Recognizing that this is a contrived example, I think it is still worth exploring in light of our initial question. I've taken the first query and removed the where clause predicates and surrounded each column returned with a COUNT function. This will generate a full table scan of the LINEORDER table for the three column values and return the total count for each column. Here's an example for the buffer cache: Now here's an example of the query against the IM column store: In this example we had to look at all of the data in the IMCUs for the columns in the query. Note that now the IM scan rows projected statistic shows all of the "rows" being returned. The in-memory query runs in 1.15 seconds versus 3.42 seconds for the "row" version. We still did no physical I/O so both formats were truly "in-memory". However we still see that the IM column store is significantly faster and more efficient than accessing the data in row format from the buffer cache. Hopefully this helps answer the question of why just placing your row based data "in-memory" is not the same thing as using Database In-Memory. And what about our second question. How much memory do I need to allocate to the IM column store? The simple answer is that you need enough memory allocated to hold the objects that you want to be able to run analytic queries on. Normally this generates two more questions, the first is how do I know how much memory each object will consume in the IM column store and the second is how do I tell which objects should be placed into the IM column store? The answer to the first question is the Compression Advisor. The Compression Advisor can be run to determine how much space an object will consume in the IM column store based on the compression level chosen. But be warned only the 12c Compression Advisor is aware of the new In-Memory compression techniques. The answer to the second question is to start with the In-Memory Advisor. Between the output of the In-Memory Advisor and the knowledge of your application you should be able to determine which objects are the best candidates for the IM column store.Let's dig a little deeper though. Now that you know how much memory you will need for the IM column store, you need to figure out where you're going to get it. Unfortunately, the tendency that we have seen is to take the memory from existing usage. In general this is a bad approach. Very few existing systems have unallocated memory on their database servers. If you are in this category then by all means use that memory, but more often than not the tendency is to "steal" the memory from something else. For instance the buffer cache, the SGA in general, or the memory allocated for the PGA. The problem with these approaches is that it is very likely to impact the performance of your existing workload. We recommend that the memory required for the IM column store should be in addition to your existing database memory allocations. Here is a formula that reflects this goal: SGA_TARGET = SGA_TARGET(original) + INMEMORY_SIZE In addition, sufficient PGA must be allocated to handle the memory requirements of parallel processing, sorting and aggregation that can occur due to the potentially large amounts of data being queried by in-memory analytical queries without spilling to disk. This is in addition to the PGA requirements of your existing workload. A good way to view your current PGA usage is to use your AWR reports from your running systems. In the Advisory Statistics section there is a PGA Memory Advisory section. The data from this section can provide good information about the current PGA usage. Alternatively you can provision enough memory for the maximum number of parallel server processes allowed on the system to allocate their full amount of memory using the following formula. PGA_TARGET = MAX_PARALLEL_SERVERS * 2GB Hopefully this will help in understanding and planning for memory usage when implementing Database In-Memory.

We often get asked the same two questions at conferences and customer presentations. That is  can't I just cache my tables in the buffer cache or put them in the keep pool, and how much memory do I...

Oracle Database In-Memory

Star Schema Performance Challenge - Part 1

Over the last 20 years or so I’ve had the opportunity to help build a variety business intelligence solutions for large organizations. It will be no surprise that one of the core requirements is fast query response times. Thisrequirement for fast response times seems pretty obvious – faster response times lead to greater end user satisfaction. Theperformance of a database can also determine what types of applications are practical. Breakthroughs in performanceoften lead to innovating new applications that can change how people work. A more efficient application is usually lessexpense to operate since it might not require as large of a server. Oracle Database In-Memory and the availability of more powerful serverswith many cores and plenty of memory really is a game changer, when it comes to improving query response times. With Database In-Memory far fewer CPU cycles are used to scan, filter, join and aggregate data. With modern servers, there are plenty of CPU cycles available for processing and  DRAM is relatively inexpensive. So, Database In-Memory can do more with less and, since Database In-Memory is a feature of the Oracle Database that works with all of the others performance enhancing features of the Oracle Database, it has become yet another tool in my big tool box of performance improving techniques. Recently, I was presented with a challenge. Could I provide extremely fast query response times for a large number of concurrent users running ad-hoc business intelligence workload on a database running on a single server? More specifically, at least 100 concurrent users querying a 500 million row table with an average response time of just a few seconds. This is the first post in a series where I will walk you through my solution to this performance challenge put before me.  I’m actively working on this challenge as Iwrite this first post, so I’m not exactly sure howthe numbers will work out, but I’m pretty confident I’ll be successful. And you can be sure that Database In-Memorywill at the core of the solution! I havea plan in mind, but perhaps some of you will have some suggestions. Feel free to let me know via the comment section below if you do. Here’s what I know today: I have a star schema with a 500 million row fact  table and 9 dimension tables. There is 5 years of data in the Sales fact table(by day), about 3,000 items in a product dimension table, about 100,000 customers, a handful of distribution channels and some demographic attributes such as income and age. There are at least 100 users actively querying this data set at any given time. They are an active group of users, issuing a query on average once every 5 seconds. The workload is diverse. There are business intelligence dashboard type queries (often high level aggregations), slice and dice ad-hoc queries and anything in-between. Queries can use any subset or all of the dimension tables. In total, over 50,000 queries will be run. I can do anything I want to optimize the application. I can specify how data is organized and physically stored, how the queries are formed and what features of the Oracle Database I use. I must, however, use features that are available to and easily implemented by anyone. I will be running a single database instance on a Sun X4-4 with 60 cores, 1 TB of DRAM and spinning disks. I’m going to step through various designs and optimizations,documenting the improvements along the way. I’m going to start with a baseline using only features that areavailable in Oracle 11.2. That, ofcourse, means the baseline will not use Database In-Memory. Since I can organize the data and write the queries any wayI want, I’m going to use a prescriptive approach to this challenge. My implementation will be designed to be CPUefficient (that is use a few CPU cycles as possible) and avoid IO and otherwait events. I will decide what SQLexecution plan I want first and design the implementation around that. This is usually a much better approach that tryingto compensate for in efficient schema or queries that do not result in the bestplan. Let me know if you have thoughts. In my next post, I’ll share my starting pointand baseline results.

Over the last 20 years or so I’ve had the opportunity to help build a variety business intelligence solutions for large organizations. It will be no surprise that one of the core requirements is fast...

Oracle Database In-Memory

Getting started with Oracle Database In-Memory Part V - Controlling Access

I’m finally going to make good on a promise I made way back in part 3 of our getting started with In-Memory series, to explain how you could control which queries use the In-Memory column store (IM column store) and which don't. As with all new query performances enhancing features in the Oracle Database, a number of initialization parameters and hints have been introduce that enable you to control when and how the IM column store will be used. This post provides information on the initialization parameter, while the details on the Optimizer hint that control the use of the IM column store can be found on the Optimizer blog. When it comes to controlling the use of the IM column store the most powerful parameter at your disposable is INMEMORY_QUERY (default value is ENABLE). Setting INMEMORY_QUERY to DISABLE either at the session or system level disables the use of the IM column store completely. It will blind the Optimizer to what is in the IM column store and it will prevent the execution layer from scanning and filtering data in the IM column store. Let’s take a look at an example using the simplest of queries that asks the question, “What is the most expensive order we have received to date?” SELECT Max(lo_ordtotalprice) most_expensive_order FROM lineorder; The LINEORDERS table has 23,996,604 rows and it's been fully populated into the IM column store. If we run the query with all default parameter settings, it completes 0.04 seconds. If we check the execution plan, you can see that the query did a full table scan via the IM column store. Now let’s set INMEMORY_QUERY to DISABLE at the session level and rerun our simple query. The elapsed time is considerably longer this time, over 36 seconds. If we examine the execution plan you’ll see that we still have the same SQL_ID (53xrbdufu4h4q) but a new child cursor (child number 1) has been created because although we still do a full table scan it’s not an in-memory scan. What’s even more interesting is the plan hash value (2267213921) is the same, even though the plan text is different. The plan hash value doesn’t change because the keyword INMEMORY is not actually considered when the plan hash value is determined. The reason the keyword is ignored actually goes back a lot further than Database In-Memory. When Oracle Exadata was first introduced in 2008, a new keyword STORAGE was added to TABLE ACCESS FULL operation to indicate that the scan was being offloaded to the Exadata storage and would potentially benefit from smart scan. In order to ensure the new keyword wouldn’t trigger automatically testing tools to consider the smart scan as a plan change, it was decided that all keywords such as STORAGE and INMEMORY would be ignore when determining the plan hash value. But I digress; let’s get back to our discussion on controlling the use of the IM column store. If you want to prevent the Optimizer from considering the information it has about the objects populated into the IM column store (in-memory statistics), or in other words, revert the cost model back to what it was before In-Memory came along, you have two options; Set OPTIMIZER_FEATURES_ENABLE to 12.1.0.1 or lower. This would force the Optimizer to use the cost model from that previous release, effectively removing all knowledge of the IM column store. However, you will also undo all of the other changes made to the Optimizer in the subsequent releases, which could result in some undesirable side effects. Set the new OPTIMIZER_INMEMORY_AWARE parameter to FALSE.This is definitely the less dramatic approach, as it will disable only the optimizer cost model enhancements for in-memory. Setting the parameter to FALSE causes the Optimizer to ignore the in-memory statistics of tables during the optimization of SQL statements. I should point out that even with the Optimizer in-memory enhancements disabled, you might still get an In-Memory plan. Let’s go back to our simple query, so you can see what I mean. In a new session I’ve set the OPTIMIZER_INMEMORY_AWARE parameter to FALSE and executed our query.As you can see, we still get a sub-second response time and the execution plan below shows we get a TABLE ACCESS INMEMORY FULL, so does that mean the parameter change didn’t take effect? No, the parameter definitely did take effect. The change in the Optimizer environment forced a new child cursor to be created (child number 2). You can confirm this by querying v$SQL_OPTIMIZER_ENVIRONMENT. SELECT c1.name, c1.value, c2.value,       c3.valueFROM v$sql_optimizer_env c1, v$sql_optimizer_env c2, v$sql_optimizer_env c3 WHERE c1.sql_id = '53xrbdufu4h4q' AND c2.sql_id = '53xrbdufu4h4q'       AND c3.sql_id = '53xrbdufu4h4q' AND c1.child_number = 0 AND c2.child_number = 1 AND c3.child_number = 2 AND c1.name = c2.name AND c1.name = c3.name AND c1.value != c3.value; We can also see the effect of the parameter on the cost of the plan. If you look back at the original two plans, you'll see that the default In-Memory plan had a cost of just 2,201, while the non In-Memory plan had a cost of 48,782. The new plan also has a cost of 48,782. So although the Optimizer didn’t consider the in-memory statistics for the table, it still came up with a full table scan plan based on the disk statistics hence the high cost. But that doesn't explain why the keyword INMEMORY still shows up in the plan? The keyword remains in the plan because we actually executed the scan via the IM column store. If you would rather influence the use of the IM column store at an individual statement level then you can do so using the INMEMORY and the NO_INMEMORY hints. There’s been a lot of confuse about what these hints actually control, so check out the full story on the Optimizer blog. Hopefully this post has made it clear how you can use the new initialization parameters control the use of the In-Memory column store and the In-Memory statistics.

I’m finally going to make good on a promise I made way back in part 3 of our getting started with In-Memory series, to explain how you could control which queries use the In-Memory column store (IM...

Oracle

Integrated Cloud Applications & Platform Services