By Arup Nanda
Organizations are storing increasingly large amounts of data in their data warehouses to support their businesses, and many government regulations and industry mandates require this data to be archived rather than deleted. Unchecked, this exponential explosion in active and archived data will lead to skyrocketing storage costs and slowing query performance as a result of increased I/O.
Fortunately, Hybrid Columnar Compression—a key feature in Oracle Exadata Storage Server Software in Oracle Exadata V2—addresses both the data explosion and potential I/O bottlenecks. It minimizes storage requirements and significantly improves query performance, by changing how compressed data is organized. Instead of storing rows of the table together, Hybrid Columnar Compression stores the columns —where the data has similar characteristics —together. The new organization and the similar characteristics of the column data allow a much better compression ratio. This article explains how to enable and manage Hybrid Columnar Compression and examines the storage and performance impact.Rows Versus Columns
Traditionally, database table rows have been stored in blocks, as shown in Figure 1. Typically, a row is fully contained in a block, with the columns of the row stored next to each other. However, when the row becomes too large to fit into a block, the row overflows into the next block—a phenomenon known as row chaining —but the organization of the columns being stored next to each other still remains the same.
|Figure 1: Row-centric data block structure|
Oracle Database 11g Release 1 introduced online transaction processing (OLTP) compression, and that compression mechanism replaces a value in a row with a much smaller symbol, reducing the length of the row.
In real-world situations, however, data is more often repeated in columns, not rows. For example, here is the data for a simple (abbreviated) table:
FIRST_NAME LAST_NAME ---------- --------- Albert Smith Bernie Smith Charles Smith David Smith John Smith ... and so on ...
The Smith value repeats many times, so a great deal of compression can be achieved by replacing the Smith value with a much smaller symbol. And because the same symbol can represent all of the repeated Smith values, fewer unique symbols will need to be stored, reducing the size of the compressed data significantly.
With Hybrid Columnar Compression, Oracle Exadata Storage Server in Oracle Exadata V2 creates a column vector for each column, compresses the column vectors, and stores the column vectors in data blocks. The collection of blocks is called a compression unit . The blocks in a compression unit contain all the columns for a set of rows, as shown in Figure 2. (In Hybrid Columnar Compression, a row typically spans several data blocks.)
|Figure 2: A compression unit in Hybrid Columnar Compression|
Types of Hybrid Columnar Compression
Hybrid Columnar Compression comes in two basic flavors: warehouse compression and archive compression.
Warehouse compression. For warehouse compression, the compression algorithm has been optimized for query performance, specifically for scan-oriented queries used heavily in data warehouses. This approach is ideal for tables that will be queried frequently.
Here is how you create a table with warehouse compression:
CREATE TABLE XXX COMPRESS FOR QUERY AS SELECT * FROM YYY;
Archive compression. With archive compression, the compression algorithm has been optimized for maximum storage savings. This approach is ideal for tables that are infrequently accessed. (Note that for compressing or decompressing data, archive compression may consume a significant amount of CPU compared to warehouse compression.)
Here is how you create a table with archive compression:
CREATE TABLE XXX COMPRESS FOR ARCHIVE AS SELECT * FROM YYY;
This is all that’s necessary for enabling Hybrid Columnar Compression—no application changes are necessary. Note that within each of the two Hybrid Columnar Compression flavors, you can also set a HIGH or LOW modifier to control the amount of compression.
Comparing compression performance. When a table compressed with Hybrid Columnar Compression is read, the CPU consumption may be higher than for an uncompressed table. However, because the number of blocks returned by a query against a compressed table is significantly lower, the logical reads and consistent gets are lower as well, often resulting in a reduction in both CPU consumption and I/O. So, the overall CPU consumption may actually be lower for queries against tables compressed with Hybrid Columnar Compression.
Now let’s look at the effects of different compression types and settings on performance with an example table. The following syntax creates our compressed tables:
create table loc_nocomp nologging as select * from locations / create table loc_compqrylow nologging compress for query low as select * from loc_nocomp / create table loc_compqryhigh nologging compress for query high as select * from loc_nocomp / create table loc_comparclow nologging compress for archive low as select * from loc_nocomp / create table loc_comparchigh nologging compress for archive high as select * from loc_nocomp /
We use data from a table called LOCATIONS, available from a sample Oracle E-Business Suite database, and then we create one uncompressed table and four compressed tables from this source table, with different compression clauses. The LOC_COMPQRYLOW, LOC_COMPQRYHIGH, LOC_COMPARCLOW, and LOC_COMPARCHIGH tables are created with the query low, query high, archive low, and archive high compression settings, respectively.
After the tables have been created, we can check the space consumption of the tables by using this query:
SELECT SEGMENT_NAME, BYTES/1024/1024 MB FROM USER_SEGMENTS WHERE SEGMENT_NAME LIKE 'LOC%'; SEGMENT_NAME MB --------------- ------ LOC_NOCOMP 33020 LOC_COMPQRYLOW 5455 LOC_COMPQRYHIGH 3352 LOC_COMPARCLOW 2727 LOC_COMPARCHIGH 2093
From the output, we can calculate the compression ratio (the size of the compressed table as a percentage of the uncompressed one) for each type of compression.
Finally, let’s run a small test to identify the resource consumption of a query against the tables—the uncompressed as well as the four compressed tables. The code is shown in Listing 1. (Please note that the exact output may be different in your case.) Table 1 shows the summary of the performance metrics as well as the compression ratios in each case.
Code Listing 1: Performance test for accessing the compressed tables
set serveroutput on size unlimited alter system flush buffer_cache / col value noprint new_value start_cpu select value from v$sesstat s, v$statname n where sid = (select sid from v$mystat where rownum < 2) and s.statistic# = n.statistic# and n.name in ('CPU used by this session') / col value noprint new_value start_reads select value from v$sesstat s, v$statname n where sid = (select sid from v$mystat where rownum < 2) and s.statistic# = n.statistic# and n.name in ('session logical reads') / set autot on explain stat set timing on select city, APPLICATION_ID,avg((sysdate-CREATION_DATE)) from LOC_COMQRYLOW group by city, APPLICATION_ID order by city, APPLICATION_ID / -- -- Repeat the test by replacing the tablename above with -- the other compressed table names -- set autot off select value - &start_cpu cpu_consumed from v$sesstat s, v$statname n where sid = (select sid from v$mystat where rownum < 2) and s.statistic# = n.statistic# and n.name in ('CPU used by this session') / select value - &start_reads logical_reads from v$sesstat s, v$statname n where sid = (select sid from v$mystat where rownum < 2) and s.statistic# = n.statistic# and n.name in ('session logical reads') /
|Compression Setting||Table Size,
|Consistent Gets||Physical Reads||CPU Utilization||Logical Reads|
As you can see, with compression set at query low, the table consumes only 17 percent of the storage required by the original table, whereas the table consumes only 6 percent of the original storage when using archive high. As a result of compression, the query against the compressed tables required significantly fewer logical reads, with only modest increases in CPU consumption, and in the case of warehouse compression, this significantly reduces the elapsed time of execution. With the query high compression setting, for example, elapsed query time was 12 percent of the time required to run the same query against the uncompressed table. Because the number of blocks queried was reduced, the logical I/O was reduced significantly—ultimately reducing the elapsed time.
Checking for Compression Candidates
The more aggressive the compression setting, the greater the space savings, but how can you determine the space savings that would result from using a particular compression setting?
You can use the DBMS_COMPRESSION package to estimate the space savings. Listing 2 shows how to use this package to estimate the compression ratio for the LOCATIONS table compressed with query low. From the output, we can see that the compression ratio would be 7.71 times. You can re-execute the package for each of the other compression types by changing the comptype parameter. In this example, the value is DBMS_COMPRESSION.comp_for_query_low, which estimates the compression factor for the “for query low” clause during compression. By changing it to comp_for_query_high, comp_for_archive_low, or comp_for_archive_high, you can estimate the compression factor for the clauses “for query high,” “for archive low,” and “for archive high,” respectively. Listing 3 shows the output for all the Hybrid Columnar Compression options.
Code Listing 2: Estimating the compressed table size reduction
DECLARE l_blkcnt_cmp BINARY_INTEGER; l_blkcnt_uncmp BINARY_INTEGER; l_row_cmp BINARY_INTEGER; l_row_uncmp BINARY_INTEGER; l_cmp_ratio NUMBER; l_comptype_str VARCHAR2 (200); BEGIN DBMS_COMPRESSION.get_compression_ratio ( scratchtbsname => 'USERS', ownname => 'ARUP', tabname => 'LOCATIONS', partname => NULL, comptype => DBMS_COMPRESSION.comp_for_query_low, blkcnt_cmp => l_blkcnt_cmp, blkcnt_uncmp => l_blkcnt_uncmp, row_cmp => l_row_cmp, row_uncmp => l_row_uncmp, cmp_ratio => l_cmp_ratio, comptype_str => l_comptype_str ); DBMS_OUTPUT.put_line ('l_blkcnt_cmp=' || l_blkcnt_cmp); DBMS_OUTPUT.put_line ('l_blkcnt_uncmp=' || l_blkcnt_uncmp); DBMS_OUTPUT.put_line ('l_row_cmp=' || l_row_cmp); DBMS_OUTPUT.put_line ('l_row_uncmp=' || l_row_uncmp); DBMS_OUTPUT.put_line ('l_cmp_ratio=' || l_cmp_ratio); DBMS_OUTPUT.put_line ('l_comptype_str=' || l_comptype_str); END; l_blkcnt_cmp=184 l_blkcnt_uncmp=1419 l_row_cmp=216 l_row_uncmp=27 l_cmp_ratio=7.71 l_comptype_str="Compress For Query Low"
Code Listing 3: Output of DBMS_COMPRESSION for all compression types
l_BLKCNT_CMP=178 l_BLKCNT_UNCMP=2147 l_ROW_CMP=2501 l_ROW_UNCMP=207 l_CMP_RATIO=12.06 l_COMPTYPE_STR="Compress For Query Low" l_BLKCNT_CMP=103 l_BLKCNT_UNCMP=2006 l_ROW_CMP=4041 l_ROW_UNCMP=207 l_CMP_RATIO=19.47 l_COMPTYPE_STR="Compress For Query High" l_BLKCNT_CMP=99 l_BLKCNT_UNCMP=1992 l_ROW_CMP=4169 l_ROW_UNCMP=207 l_CMP_RATIO=20.12 l_COMPTYPE_STR="Compress For Archive Low" l_BLKCNT_CMP=66 l_BLKCNT_UNCMP=1607 l_ROW_CMP=5049 l_ROW_UNCMP=207 l_CMP_RATIO=24.34 l_COMPTYPE_STR="Compress For Archive High"
Tables under Hybrid Columnar Compression can have data loaded or modified with any type of SQL operation. Data that is direct-path-loaded will be compressed with the Hybrid Columnar Compression format, whereas data loaded or modified with conventional data manipulation language (DML) will be compressed with OLTP table compression. Because there are no DML restrictions and because Oracle Database automatically manages all compression operations, the compression is completely application transparent.
You can also alter an existing, noncompressed table to use Hybrid Columnar Compression. However, when you alter an existing noncompressed table to use Hybrid Columnar Compression, the existing table data will not be immediately compressed. Newly loaded rows, but not the existing data, will be compressed. To compact the existing data, use ALTER TABLE MOVE. (This same method can be used to migrate OLTP-compressed rows to Hybrid Columnar Compression.) Listing 4 shows how to use ALTER TABLE MOVE to compact a table called LOCATIONS.
Code Listing 4: Compressing existing tables
SQL> create table loc_normal nologging 2 as select * from locations where rownum < 1000001; -- Let's check for space consumption SQL> select bytes, extents from dba_segments where segment_name = 'LOC_NORMAL'; BYTES EXTENTS -------------- -------- 1,140,850,688 200 -- Now alter the table to add compression SQL> alter table loc_normal compress for query low; -- checking for space again SQL> select bytes, extents from dba_segments where segment_name = 'LOC_NORMAL'; BYTES EXTENTS -------------- -------- 1,140,850,688 200 -- The space consumption didn't change. Now, move the table to compress the data SQL> alter table loc_normal move; SQL> select bytes, extents from dba_segments where segment_name = 'LOC_NORMAL'; BYTES EXTENTS -------------- -------- 167,772,160 91 -- The table is now compressed
If a table to be compressed is partitioned, you may want to apply different degrees of compression to each partition. For instance, you may want to keep the most recent partition uncompressed, the partition for last month compressed for query, and older partitions compressed for archive. You can easily accomplish that, as shown in Listing 5.
Code Listing 5: Separate compression mechanisms for different partitions
CREATE TABLE trans ( acc_no number not null, txn_id number not null, txn_dt date not null, txn_amt number(15,2) not null ) PARTITION BY RANGE (txn_dt) ( partition y09m06 VALUES LESS THAN(TO_DATE('2009-07-01', 'yyyy-mm-dd')) COMPRESS FOR archive high, partition y09m07 VALUES LESS THAN(TO_DATE('2009-08-01', 'yyyy-mm-dd')) COMPRESS FOR archive low, partition y09m08 VALUES LESS THAN(TO_DATE('2009-09-01', 'yyyy-mm-dd')) COMPRESS FOR query high, partition y09m09 VALUES LESS THAN(TO_DATE('2009-10-01', 'yyyy-mm-dd')) COMPRESS FOR query low, partition y09m10 VALUES LESS THAN(TO_DATE('2009-11-01', 'yyyy-mm-dd')) nocompress )
You can check the compression settings in place for tables by using the DBA_TABLES view. Compression setting information for individual partitions and subpartitions is available from the DBA_TAB_PARTITIONS and DBA_TAB_SUBPARTITIONS views, respectively. The noteworthy columns are
You can also add and drop columns from hybrid columnar compressed tables. When adding a column, you have to define the column as NOT NULL if you are specifying a default value. When dropping a column, the database marks the column as unused immediately and drops it later. This avoids a potentially long decompression and recompression operation on the table. (Note that unlike OLTP table compression, which allows only tables with 255 or fewer columns, Hybrid Columnar Compression allows any number of columns.)
You can also check for the compression characteristics of a specific row in the table, by executing the GET_COMPRESSION_TYPE function in the DBMS_COMPRESSION package, shown in Listing 6. The output “8” means that the row was compressed with the clause “for query low.” Here are the possible outputs and their meanings:
1 Uncompressed 2 Compressed for OLTP 4 For query high 8 For query low 16 For archive high 32 For archive lowCode Listing 6: Checking for compression for a specific row
SQL> variable ret number SQL> begin 2 :ret := dbms_compression.get_compression_type ( 3 'ARUP', 'LOC_COMPQRYLOW', 'AAASOgAAEAAAAaLAAA'); 4 end; 5 / PL/SQL procedure successfully completed. SQL> print ret RET -------------- 8
When a session locks a row in an uncompressed table, it puts an entry in a construct known as the Interested Transaction List, inside the block header. When the table is under Hybrid Columnar Compression, there is no specific block for a given row, so where does Oracle Database put the locking information?
In a hybrid columnar compressed table, when a lock is placed on a row, Oracle Database locks all the rows in that compression unit, not just that row. Be aware of this behavior change and the impact it may have on your applications.Conclusion
OLTP table compression, introduced in Oracle Database 11g Release 1, allows DML against compressed tables. To accommodate the demands of exponential data growth, Oracle Exadata V2 has introduced a new technology called Hybrid Columnar Compression that changes the fundamental way the data is structured inside a block. Instead of grouping rows, it groups like values in columns to make the compression ratio orders of magnitude higher. This enables databases to store more data for data warehousing and archival purposes without a corresponding size increase.
LEARN more about
Photography by Scott Webb, Unsplash