By Philippe Deverchère on Sep 03, 2013
In a previous issue of this blog, we have analyzed why the Oracle Sun ZFS Storage Appliance is an excellent storage system to host an enterprise database. Because of its unique controller architecture based on the Hybrid Storage Pool (HSP), it guarantees best in class performance and ease of integration. In this second part of the blog, we are going to address another critical requirement in today’s Enterprise database implementations: compression.
Best in class compression capabilities is a must if we consider the current growth of structured data in the IT centers. While traditional compression algorithms typically found in disk subsystem controllers can do a good job, a more specialized compression mechanism aligned with the database structure can go much further in terms of data reduction.
At the top of that, compression is not just a matter of saving space in a database environment:
- It also has a great side effect when properly implemented: it reduces IO traffic between storage and the database node. The point is that disk drive capacity and IO capability are quite unbalanced. We can store massive amounts of data in today’s HDD, but the IO capabilities of one given disk have remained basically unchanged in the recent past. At the same time, the more data we store, the more IOs we need from a given disk. Therefore, it is often more important to pay attention at decreasing the IOs from a storage system rather than increasing its capacity.
- It increases efficiency not only in terms of data reduction on disk, but also for the backup tasks, Disaster Recovery replication, memory usage (SGA) and ARC/L2ARC caches when using the Sun ZFS Storage Appliance. On that aspect, it goes far beyond what a traditional compression or deduplication approach at disk controller level can achieve.
Compression has always been an important aspect of an Oracle database environment. The table below summarizes the different types of compression which are available in an Oracle database and provides for each of them:
- the database level needed to use the compression capability;
- what is required in terms of storage or database option to use it;
- what type of compression it operates;
- what type of operation the compression applies to.
|Compression||DB Level||Requirements||Type of Compression||DB Operations|
|Row compression on indexes through deduplication of the index leading columns||Compression on indexes when created or rebuild an index to compress it.|
|Basic Table Compression||9i||
||Row compression on tables through deduplication of repeated values||Only applies to bulk operations, i.e. compression is not maintained during DML operations such as INSERT, UPDATE and DELETE, so basic compression should be reserved for data which is mostly read-only.|
|Advanced Row Compression**||11gR1||
||Row compression at block-level through deduplication of repeated values||Compression is maintained during DML operations (INSERT and UPDATE) with some restrictions (compression is triggered when new records reach a threshold).
|RMAN Backup Compression||10g||
||Compression levels are BASIC (BZIP2), LOW (LZO), MEDIUM (ZLIB) and HIGH (BZIP2)||Backup data is compressed before being written to disk or tape and read back directly in compressed format (only for backup sets).|
|SecureFile Compression and Deduplication||11gR1||
||Compression is based on industry standard compression algorithms||Deduplication is performed first on SecureFile data, then compression and finally encryption if required.|
|Data Pump Export Compression||11gR1||
||Data Pump compression provides comparable reduction to standard gzip algorithms||Inline compression during export and import of dump files.|
|Data Guard Redo Transport Compression||11gR1||
||Data Guard Redo Transport Compression uses the zlib compression engine at level 1 (equivalent to gzip -1 level)||Data Guard Redo data is transmitted in a compressed format to reduce network bandwidth and improve performance.|
|Hybrid Columnar Compression||11gR2||
||Hybrid row/column compression that comes in 4 flavors: Query Low (LZO) and Query High (ZLIB) for warehouse compression, Archive Low (ZLIB) and Archive High (BZIP2) for archive compression||Hybrid Columnar Compression is used on bulk operations. When DML is performed on HCC compressed data, new or modified rows are stored uncompressed or possibly compressed by OLTP Compression if the option is available. HCC is intended for dormant or possibly near-dormant data.|
* Advanced Compression Option (ACO) in Oracle Database Enterprise Edition. We do not mention in the table the following ACO options: Flashback Data Archive (11g), Heat Map (12c), Automatic Data Optimization (12c), Advanced Network Compression (12c), Storage Snapshot Optimization (12c) and Online Move Partition (12c).
** Starting with Oracle Database 12c, OLTP Table Compression is called Advanced Row Compression.
The Advanced Compression Option in the recently announced Oracle 12c brings several new compression features including Heat Map and Automatic Data Optimization (ADO). Heat Map is a new Oracle Database feature that collects usage information at the block and segment levels, enabling further automated compression and movement of data which becomes less accessed and used over time. Automatic Data Optimization is the policy engine that performs compression and storage tiering of table partitions according to the data maintained in the Heat Map. Compression can be any type of compression supported by the Oracle Database including Advanced Row Compression (formerly called OLTP Table Compression) and all levels of Hybrid Columnar Compression (HCC).
The Advanced Row Compression introduced in Oracle DB 9iR2 was a great improvement compared to the previous Basic Table Compression as it supports conventional DML (Data Manipulation Language) operations such as INSERT and UPDATE. Furthermore, the database does not require data to be uncompressed and keeps data compressed in memory. This means that performance can be improved on reads since fewer IOs are performed while memory can be more efficiently used. It must be noted that Advanced Compression Options (ACO) regroups several different types of compression: Advanced Row Compression, RMAN Backup Compression, SecureFile Compression, Data Pump Export Compression and Data Guard Redo Transport Compression (as well as several new options introduced in Oracle Database 12c).
Hybrid Columnar Compression (HCC) goes to the next level in terms of compression ratios compared to the other compression techniques. Typically, you will get a compression ratio between 1:5 and 1:20 as opposed to 1:2 and 1:4 for Basic Table Compression and Advanced Row Compression. HCC can be used with two compression modes: Query and Archive. The Query compression mode will provide faster decompression for data which is relatively frequently accessed but a lower compression ratio. On the other hand, the Archive compression mode will provide slower decompression for data which is not often accessed but a very significantly higher compression ratio.
Let’s now have a deeper look at how Hybrid Columnar Compression works and why it can bring so much value for compressing warehouse and archive data on Oracle’s storage.
|Imagine that we have a 7-row and 6-column table as shown on the left and that we want to compress it with HCC. The first thing to understand is how such a table would be stored in an Oracle database without any compression applied. The Oracle database splits table data into “blocks”, each block containing data in the row format, i.e. the first row is sequentially written into the block, then the second one is appended and so on. If all the rows do not fit into one block, no problem, a second block is simply chained to the first one. So our table is going to be stored in the Oracle database as shown below (we ignore here the details about the header blocks and other subtle implementation tricks):|
Such an arrangement of data in the database blocks is in most cases very effective when one wants to read for instance a complete row since the column values can be sequentially read for that particular row. However, it does not land itself well to traditional industry standard compression algorithms such as LZO and ZLIB since data with similar types and values are most of the time sorted in columns and are now dispersed within the blocks as show below:
This is where HCC is coming into play… In order to perform an effective compression without losing the benefit of having the rows “locally stored” within the blocks, we are going to store the data in a columnar format, but only for a subset of rows and not the whole table. So let’s say that, for the sake of our example, we define a set of 3 rows, that we call a Compression Unit (as shown on the right). For that Compression Unit, we reorganize the data in the “columnar” way, i.e. we group first the data by columns then we move along the rows. Here is what our compression unit looks like after the data reorganization process:
In reality, a Compression Unit would include thousands or tens of thousands of lines, but for the sake of our example, we are going to limit ourselves to three… Now, before storing the Compression Unit data into the database blocks, we are going to compress it. Guess what, we are now in a much better position to achieve a good compression ratio since similar data from the same columns now sit next to each other, at least for the rows that constitute a Compression Unit. After compression using a traditional “zip-like” algorithm, here is what we are going to get:
Our Compression Unit has dramatically shrunk and is now ready to be stored into the database blocks as shown below. If the compression unit does not fit in one block, it is simply split on a second block and the blocks are chained together.
In order to understand how DELETE, UPDATE and INSERT commands are managed with HCC compressed format, let’s first see in more details how a compression structure is stored within blocks:
The Header stores the offsets and lengths of all the column entries within the Compression Unit while the Bitmap is in charge of identifying the table rows which are stored by the Compression Unit (one can see the Bitmap as a “map” that gives the status of each of the row and column of the table zone which is covered by the Compression Unit). So here is how the data is actually stored into the blocks (we also represent here the Block Headers but not the Row Directories nor the Row Headers):
Let’s say now that a table row is actually deleted through a DELETE DML statement. Since the data corresponding to that row is deeply mingled with the other rows into the Compression Unit, there is no way we can free that space into the blocks. So all what we do is to mark the row data as deleted into the Compression Unit Bitmap as shown below with the small red squares into the Bitmap:
When compressed with HCC, a deleted entry is actually never removed from the table stored on the disk. It is simply marked as invalid. The only way to get rid of the data which is not more valid is to copy the table through a database command.
If a row is updated or inserted respectively through an UPDATE or INSERT statement, then the new row is simply added as a row (and not as an HCC hybrid structure) into the free space of the last block that constitutes the Compression Unit. No compression is applied to this modified or new row, except if the Advanced Row Compression option is set for the particular table or partition the row belongs to (but this is not an HCC compression which is used here). The figure below depicts the case of a row update where the old data is marked as invalid into the Compression Unit bitmap and where the new row is added in an uncompressed format into the free space of the second block (if there is not enough free space into that block, another block is simply chained to it):
Now that we understand how DML operations are performed in an HCC compressed table or partition, it becomes clear that we should not perform too many changes on that type of table and that HCC must only be used for dormant or near-dormant data such as Data Warehouse and Database Archive environments. Despite this limitation, in a vast majority of cases, the compression ratios achieved by HCC will be much better than what can be achieved through Advanced Row Compression.
Let’s now have a look at the different HCC options that can be used:
|HCC Option||Target Environment||Typical Compression Ratio||Objective|
|QUERY LOW||Warehouse Compression||Between 1:5 and 1:10||Optimized for speed where load time is important|
|QUERY HIGH||Warehouse Compression||Between 1:10 and 1:15||Optimized for speed on data access|
|ARCHIVE LOW||Online Archive Compression||Between 1:10 and 1:20||Optimized for highly compressed historical and archive data|
|ARCHIVE HIGH||Online Archive Compression||Between 1:15 and 1:50||Optimized for maximum storage reduction for data which is lightly accessed|
At the top of storage reduction, HCC often brings an additional value in terms of performance for database loads which can be limited by an IO bottleneck between the database server and the storage device. In the case of database tables stored in a ZFS Storage Appliance and HCC compressed, all the queries to the disk subsystem are returned in the HCC compressed format, resulting in a much faster transmission of the data and greatly improved query performances. This is especially true when the ZFS Storage Appliance unit is connected to the database node through a low latency Infiniband network, which represents quite a unique capability of the ZFS Storage Appliance on the database storage market.
The figure below summarizes the benefits of handling HCC compressed tables:
- Data remains compressed during transfers over the network;
- Data remains compressed in the database SGA Buffer Cache;
- Data remains compressed in the Smart Flash Cache of the database server;
- Data stays compressed when handled by RMAN for backup to disk or tape;
- Data remains compressed when shipping logs to the Data Guard node over the network.
It is very clear that using a traditional disk array based compression does not bring any of the advantages mentioned above, and this makes HCC an extremely powerful feature for storing efficiently Warehouse and Archive tables or partitions. When combined with the unique capabilities of the Oracle Sun ZFS Storage Appliance (IO bandwidth, throughput, snapshot and cloning, connectivity and observability), Hybrid Columnar Compression represents an unmatched tool to reduce infrastructure costs and improve performance.