Subscribe

Share

Database, SQL and PL/SQL

It's All In

Leverage the in-memory column store in Oracle Database In-Memory to supercharge database performance—without changing application code.

By Arup Nanda Oracle ACE Director

September/October 2014

Maria, the lead DBA at Acme Inc., has some concerned visitors today. Heli—the primary user of a major application—is upset with that application’s performance when running large analytic queries. John—the DBA of that application’s database—wants the developers to alter the application code to make it more efficient, but Sophie—the lead developer—says it can’t be done because it’s a packaged application and they can’t change it. John has also tried increasing the size of the buffer cache, but that has delivered little or no benefit. Heli has offered to create additional indexes to improve query performance, but Sophie notes that the application includes online transaction processing (OLTP) functionality: it issues DML statements, and those processes will be slowed by a proliferation of indexes.

Stalemate. Maria’s visitors look to her for a solution to their problems.

There is a solution, Maria assures them, one that requires neither application code changes nor creation of any new indexes. The solution is the in-memory column store, part of the Oracle Database In-Memory option, available for use with Oracle Database 12c Release 2 (12.1.0.2).


Column Store

Before Maria can present the solution, John wants to know why a larger buffer cache did not improve performance as expected.

It’s true, Maria explains, that a bigger buffer cache does enable more data to exist in memory for longer periods, and therefore the bigger cache reduces the need to fetch from disk and consequently reduces I/O. However, she clarifies, the buffer cache does much more than act as an interim placeholder between the disk and the users. The buffer cache is also the primary staging area for read consistency, multiversioning, and other attributes and operations in the database, and these require space. Expanding the buffer cache, Maria continues, also increases the overhead of managing the cache. In some cases, the added overhead negates any benefits of a larger cache or the overhead may erode the gains of the larger buffer cache so much as to make any performance gains insignificant. Expanding the buffer cache is not the real solution to the current problem.

In addition, Maria continues, the data access pattern plays a significant role in performance. Traditionally, Oracle Database has stored data in multicolumn records—also known as rows—one column after another. When all the columns of a row are populated, the next row starts in that database block—again, with the same sequence of columns, Maria explains, pointing to Figure 1. If a query selects all the columns of a table every time, this arrangement of data inside the database block will perform well. However, the analytic queries issued by Heli typically select only a handful of columns from all the rows. The database fetches the entire database block into the buffer cache, locates the first row, identifies the start position of the column, extracts the value, locates the starting point of the next row, locates the relative position of the same columns in that row, extracts the value, and so on. The process repeats until the values of these selected columns have been extracted from all the rows.
o54dba-f1

Figure 1: Traditional data block storage

Heli sees the problem immediately. If the database were able to read the column values from all the rows immediately instead of performing the additional tasks of locating the starting points of the rows and then the relative positions of the columns inside them, the performance would be fantastic. Precisely, Maria confirms, and that calls for a new design called a column store database, in which the columns—not the rows—are stored together, as shown in Figure 2. Columns from the rows are stored together in a new structure called a column unit (CU). The CU is not the same as a data block, she clarifies. The CU can be much bigger than the typical 8 K data block. This special arrangement of columns enables the query to grab the values of the selected column from all the rows immediately—without going through the time-consuming task of identifying rows and the columns inside.
o54dba-f2

Figure 2: Organization of data in column storage

John is skeptical. This column-store-database concept is not new. Database vendors have been providing this technology for years. But we can’t use it, John says, for other types of operations, particularly OLTP. The traditional Oracle Database storage format, the row store, processes data change operations most efficiently. So, John continues, queries may be faster with a column store database, but a row store database is the right choice for OLTP.

Why not have it both ways? asks Maria. The good news is that these two choices need not be mutually exclusive. In Oracle Database 12c Release 2 (12.1.0.2), we have the option of using both types of stores in the same database. And, she continues, the column store exists entirely in memory. The combination of the column store structure and memory residency makes the performance of analytical queries soar.


Configuration

Maria gathers her visitors around her desk to demonstrate how Oracle Database In-Memory works.

The in-memory column store is a new pool in the system global area (SGA), she explains. The DBA decides how much memory to allocate to this new pool and issues this statement to create the in-memory column store:

alter system set inmemory_size=48g scope=spfile;

This is the only configuration step required to enable the column store. This is not a dynamic operation; the database instance needs to be recycled to enable the new column store.

Because the size of this sample column store is only 48 GB, it may be smaller than the overall database size. With Oracle Database In-Memory, however, the DBA can choose specific data to populate the in-memory column store. Even better, the DBA can choose individual partitions of tables rather than entire tables. Maria explains the basic syntax for putting tables in the in-memory column store:

alter table TableName inmemory;

Maria pauses and directs her audience’s attention to a very important characteristic of this process of populating tables from disk to the in-memory column store. The version of the table in the in-memory column store is in the new column store format for better performance. On disk, however, the table continues to be stored in the traditional row store format. When users modify the data in that table, the version of the table in the buffer cache is updated and written back to the disk. When the data is queried, however, the in-memory column store version of the table is used.

When the INMEMORY attribute is set for a segment—a table or a partition—that segment is automatically loaded into the in-memory column store. A new background process, Maria explains—IMCO—coordinates the populating of the data. The actual data populating is done by two new background processes—SMCO and Wnnn.

John, seeing a potential issue, points out that when the table data gets updated, the buffer cache is where the update occurs, so the in-memory column store version is no longer the most current version of the table. Doesn’t that make the data access of the updated table from that column store incorrect? John asks. Not at all, assures Maria. Oracle Database automatically keeps the in-memory column store transactionally consistent with the row store format in the buffer cache, similar to how an index is kept transactionally consistent with a table.


In-Memory Options

Heli raises a good question: what if the total size of all the segments placed in the in-memory column store is more than 48 GB? “No worries,” Maria responds. Oracle Database will determine which segments are loaded in what sequence and how long they stay there, based on a priority that can be defined in the ALTER statement with the PRIORITY subclause of the INMEMORY attribute. The PRIORITY subclause takes four options—CRITICAL, HIGH, MEDIUM, and NONE. The last option, Maria explains, does not populate the segment into the column store automatically. That populating occurs only when someone selects from the table.

In addition, Maria continues, to save space, a DBA can compress the data in the in-memory column store, using the MEMCOMPRESS subclause of the INMEMORY attribute. She reminds everyone that this compression is inside the in-memory column store only; the segment on the disk is not affected. There are five options for the MEMCOMPRESS subclause—FOR DML, FOR QUERY LOW (the default), FOR QUERY HIGH, FOR CAPACITY LOW, and FOR CAPACITY HIGH—from lowest to highest compression. The higher the compression, the less space occupied in the in-memory column store but the more CPU consumed by data accesses. If compression is not needed, the DBA can use the NO MEMCOMPRESS option.

John asks how the in-memory column store works with the multiple nodes in an Oracle Real Application Clusters (Oracle RAC) database. By default, Maria responds, the column store is distributed among the caches of all nodes. It also can be duplicated on all Oracle RAC nodes by use of another clause: DUPLICATE.

Heli is impressed, but she has a concern. Her queries do not select many of the columns of a table, so having those columns in the in-memory column store does not help performance but rather takes up valuable space. Good point, concedes Maria, and continues that the DBA can define a table to be in the in-memory column store but can also exclude some columns. Different compression levels can also be defined for different columns in the column store. Maria demonstrates with a table named T1. It is in the in-memory column store, but its columns c4 and c5 should not be in the in-memory column store. In addition, columns c3 and c2 should be compressed with the CAPACITY LOW option whereas the other columns should be compressed with QUERY LOW (the default). Maria issues the following SQL statement to accomplish all that:

alter table t1 inmemory
memcompress for
capacity low (c2,c3)
no inmemory (c4,c5);


Case Study

Every one of Maria’s guests wants to see how Oracle Database In-Memory works in an actual use case. Maria demonstrates with the SH schema that comes with Oracle Database. The COSTS table is partitioned by range—one partition per quarter or, in the case of very old data, one partition per half year or per year. Sophie knows that Acme uses something similar to this table in its major application and that that table is accessed frequently and thus should be a good candidate for the in-memory column store, with some compression to save space. Sophie also explains that more-recent partitions are queried much more frequently than older ones. Because the level of compression affects CPU consumption, Maria makes a rule: apply lesser levels of compression or no compression at all for segments that are accessed more frequently—that is, more-recent partitions. Older partitions, which are accessed less frequently, can be more aggressively compressed to save space. She also makes a rule that higher priority (the PRIORITY clause) for populating the in-memory column store should be assigned to more-recent partitions, because she wants them to be loaded sooner and stay in memory longer. With this set of requirements in place, Maria alters the COSTS table by using the SQL shown in Listing 1.

Code Listing 1: Altering the COSTS table to put partitions in memory

alter table costs modify partition COSTS_Q4_2003 inmemory no memcompress 
priority high;
… script truncated for brevity …
alter table costs modify partition COSTS_1996 inmemory memcompress for
capacity high priority none;
alter table costs modify partition COSTS_1995 inmemory memcompress for
capacity high priority none;

The USER_TABLES view—or the USER_TAB_PARTITIONS view for partitioned tables—provides in-memory column store information on the selected table. Three new columns in these views— INMEMORY_PRIORITY, INMEMORY_DISTRIBUTE, INMEMORY_COMPRESSION—display the priority, the distribution in an Oracle RAC database, and the compression type, respectively. Maria uses the SQL shown in Listing 2 to get these details for each partition.

Code Listing 2: Displaying in-memory attributes of the COSTS table

select partition_name, inmemory_priority, 
inmemory_distribute, inmemory_compression
from user_tab_partitions
where table_name = 'COSTS'
order by partition_position;
PARTITION_NAME INMEMORY INMEMORY_DISTRI INMEMORY_COMPRESS
—————————————— ———————— ——————————————— —————————————————
COSTS_1995 NONE AUTO DISTRIBUTE FOR CAPACITY HIGH
COSTS_1996 NONE AUTO DISTRIBUTE FOR CAPACITY HIGH
COSTS_H1_1997 NONE AUTO DISTRIBUTE FOR CAPACITY HIGH
COSTS_H2_1997
COSTS_Q1_1998 NONE AUTO DISTRIBUTE FOR CAPACITY HIGH
COSTS_Q2_1998 NONE AUTO DISTRIBUTE FOR CAPACITY HIGH
COSTS_Q3_1998 NONE AUTO DISTRIBUTE FOR CAPACITY HIGH
COSTS_Q4_1998 NONE AUTO DISTRIBUTE FOR CAPACITY HIGH
COSTS_Q1_1999 NONE AUTO DISTRIBUTE FOR CAPACITY HIGH
COSTS_Q2_1999 NONE AUTO DISTRIBUTE FOR CAPACITY HIGH
COSTS_Q3_1999 NONE AUTO DISTRIBUTE FOR CAPACITY HIGH
COSTS_Q4_1999 NONE AUTO DISTRIBUTE FOR CAPACITY HIGH
COSTS_Q1_2000 MEDIUM AUTO DISTRIBUTE FOR CAPACITY HIGH
COSTS_Q2_2000 MEDIUM AUTO DISTRIBUTE FOR CAPACITY HIGH
COSTS_Q3_2000 MEDIUM AUTO DISTRIBUTE FOR CAPACITY HIGH
COSTS_Q4_2000 MEDIUM AUTO DISTRIBUTE FOR CAPACITY HIGH
COSTS_Q1_2001 HIGH AUTO DISTRIBUTE FOR CAPACITY HIGH
COSTS_Q2_2001 HIGH AUTO DISTRIBUTE FOR CAPACITY HIGH
COSTS_Q3_2001 HIGH AUTO DISTRIBUTE FOR CAPACITY HIGH
COSTS_Q4_2001 HIGH AUTO DISTRIBUTE FOR CAPACITY HIGH
COSTS_Q1_2002 HIGH AUTO DISTRIBUTE FOR CAPACITY LOW
COSTS_Q2_2002 HIGH AUTO DISTRIBUTE FOR CAPACITY LOW
COSTS_Q3_2002 HIGH AUTO DISTRIBUTE FOR CAPACITY LOW
COSTS_Q4_2002 HIGH AUTO DISTRIBUTE FOR CAPACITY LOW
COSTS_Q1_2003 CRITICAL AUTO DISTRIBUTE FOR QUERY
COSTS_Q2_2003 CRITICAL AUTO DISTRIBUTE BASIC
COSTS_Q3_2003 CRITICAL AUTO DISTRIBUTE BASIC
COSTS_Q4_2003 CRITICAL AUTO DISTRIBUTE BASIC

To find out what segments are in the in-memory column store, Maria queries a new dynamic performance view—V$IM_SEGMENTS. She explains some important columns of the view:

  • OWNER, SEGMENT_NAME, PARTITION_NAME: The column names describe the information included there.
  • BYTES: This column shows the size of the segment on disk.
  • POPULATE_STATUS: This column shows when the cache is populated from this segment on disk. After populating is complete, this column shows COMPLETE.
  • BYTES_NOT_POPULATED: This column shows the segment amount remaining while the cache is being populated.
  • INMEMORY_SIZE: This column shows the size of that segment in the in-memory column store.
  • INMEMORY_PRIORITY, INMEMORY_DISTRIBUTE, INMEMORY_COMPRESSION: These columns are the priority, Oracle RAC distribution, and compression attributes of that segment, respectively, and they provide the same information as the identically named columns in the USER_TABLES and USER_TAB_PARTITIONS views.

Maria executes the SQL shown in Listing 3 to check for the segments in the in-memory column store and how much they have been compressed. Heli notes a very important fact: not all the partitions are visible in the in-memory column store. The reason, Maria explains, is simple: the segments where she has defined the priority to be NONE are not populated automatically in the column store. For example, the priority of the COSTS_1995 partition is set to NONE, as shown in Listing 2, so that partition is not yet in the column store. When someone selects from that partition, the in-memory column store will be populated.

Code Listing 3: Checking for segments in the in-memory column store

select partition_name, populate_status, bytes_not_populated “Not_Pop”,
bytes, inmemory_size, bytes/inmemory_size comp_ratio
FROM v$im_segments;
PARTITION_NAME POPULATE_S Not_Pop BYTES INMEMORY_SIZE COMP_RATIO
—————————————— —————————— ——————— ——————— ————————————— ——————————
COSTS_Q3_2000 COMPLETED 0 8388608 1179648 7.11111111
COSTS_Q4_2001 COMPLETED 0 8388608 1179648 7.11111111
COSTS_Q1_2001 COMPLETED 0 8388608 1179648 7.11111111
COSTS_Q2_2000 COMPLETED 0 8388608 1179648 7.11111111
COSTS_Q3_2001 COMPLETED 0 8388608 1179648 7.11111111
COSTS_Q4_2000 COMPLETED 0 8388608 1179648 7.11111111
COSTS_Q1_2000 COMPLETED 0 8388608 1179648 7.11111111
COSTS_Q2_2001 COMPLETED 0 8388608 1179648 7.11111111

Code Listing 4: Query for evaluation

select /*+ gather_plan_statistics */ PROD_NAME, TIME_ID, 
avg(UNIT_PRICE) avg_cost
from costs c, products p
where p.prod_id = c.prod_id
group by PROD_NAME, TIME_ID
having avg(unit_price) > 1000
order by PROD_NAME, TIME_ID;
select * from
table(dbms_xplan.display_cursor(format=>'TYPICAL'));

Code Listing 5: Execution plan with the in-memory column store

——————————————————————————————————————————————————————————————————————————————
|Id| Operation |Name |Rows |Bytes |Cost(%CPU)|Time |
——————————————————————————————————————————————————————————————————————————————
| 0|SELECT STATEMENT | | | | 7 (100)| |
|*1| FILTER | | | | | |
| 2| SORT GROUP BY | | 1799| 84553| 7 (58)|00:00:01|
|*3| HASH JOIN | |82112| 3768K| 4 (25)|00:00:01|
| 4| TABLE ACCESS FULL |PRODUCTS| 72| 2160 | 3 (0)|00:00:01|
| 5| PARTITION RANGE ALL | |82112| 1363K| 1 (100)|00:00:01|
| 6| TABLE ACCESS INMEMORY FULL|COSTS |82112| 1363K| 1 (100)|00:00:01|
——————————————————————————————————————————————————————————————————————————————

Next, Maria takes a typical query from Acme’s major application, joining the COSTS table with the PRODUCTS table, which is not defined to be in memory. Listing 4 shows the query as well as a query on execution statistics to demonstrate how well the query performed. (She uses the SET TIMING ON command before running the query to get the response time.) The query in Listing 4 took 3.64 seconds to execute. Listing 5 shows the execution plan. Maria then disables the use of the in-memory column store by executing the following SQL:

alter session 
set inmemory_query = disable;

Then she executes the Listing 4 query again. This time the query does not use the in-memory column store and the elapsed time jumps to 6.99 seconds—a 92 percent increase! Everyone in Maria’s office wants to know what happened. To answer that question, Maria draws their attention to the execution plan of the second query execution, shown in Listing 6. The two plans are almost identical, except for step 6, “TABLE ACCESS INMEMORY FULL.” The CPU cost is 1 for the first execution (with the in-memory column store), Maria observes, but 134 for the second (without the in-memory column store). The CPU had to do extra work to get the data from the buffer cache when the in-memory column store was not used, hence the 92 percent performance penalty. Maria reminds everyone that the performance boost demonstrated in Listing 4 came purely from the in-memory column store without code changes or additional indexes.

Code Listing 6: Execution plan without the in-memory column store

————————————————————————————————————————————————————————————————————————————
|Id|Operation |Name |Rows |Bytes |Cost (%CPU)|Time |
————————————————————————————————————————————————————————————————————————————
| 0|SELECT STATEMENT | | | | 140 (100)| |
|*1| FILTER | | | | | |
| 2| SORT GROUP BY | | 3665| 168K| 140 (3)|00:00:01|
|*3| HASH JOIN | |82112| 3768K| 137 (1)|00:00:01|
| 4| TABLE ACCESS FULL |PRODUCTS| 72| 2160 | 3 (0)|00:00:01|
| 5| PARTITION RANGE ALL | |82112| 1363K| 134 (1)|00:00:01|
| 6| TABLE ACCESS FULL |COSTS |82112| 1363K| 134 (1)|00:00:01|
————————————————————————————————————————————————————————————————————————————

Her visitors wonder how to decide which segments are good candidates for the in-memory column store. Maria suggests simple rules: Tables whose entire set of rows, even for a few columns, are selected by queries are great candidates for the new in-memory column store. Tables or partitions that participate in OLTP transactions more than in queries will not benefit as much, so she advises that they not be put into the in-memory column store. When in doubt, tables can be tested in the in-memory column store and then kept there or moved out after the overall performance has been checked. For guidance on compression, Maria suggests using the DBMS_COMPRESSION package to estimate the space savings a segment could have in the in-memory column store.


Conclusion

Maria’s visitors want a quick solution to a common performance problem but without changing code or adding indexes. In this article, Maria solves the problem by marking relevant tables to be populated into the in-memory column store in Oracle Database In-Memory. Oracle Database transparently gets the data from disk to this store, and Maria does not need to choose between the two different store technologies—column and row. She can choose both in the same Oracle database and put segments in either store as appropriate. Her visitors, quite impressed, leave the meeting happy and with a clear action plan.

Next Steps

LEARN more about Oracle Database In-Memory
 bit.ly/inmemfeature
bit.ly/inmemdoc

 DOWNLOAD Oracle Database 12c



Photography by Scott Webb, Unsplash