Compression can be done at a variety of levels and locations. Disk vendors for years have touted compression in place on storage to consume less space. Compression has been used in a variety of industries. Audio compression, for example, takes recorded audio and under samples the changes in volume and pitch and only records only 8,000 samples per second since the ear can not really hear changes faster than that. These changes are then compressed and stored in an mp3 or avi format. Programs know how to take the mp3 format and rebuild the 8k sample and drive a speaker to estimate the sound that was originally created. Some people can hear the differences and still want to listen to music recorded on reel to reel tape or vinyl because the fidelity is better than CD-ROM or DVD. Videos do the same thing by compressing a large number of bits on a screen and break it into squares on the screen. Only the squares that are changing are transmitted rather than sending all of the data across the whole screen and the blocks that did not change are redisplayed rather than being retransmitted thirty times a second. This allows for video distribution of movies and video recordings across the internet and storage on a DVD rather than recording all of the data all of the time.
Generically compressing data for a database can be complex and if done properly works well. It can also be done very poorly and cause performance problems and issues when reading back the data. Let's take the census data that we talked about earlier. If we store the data as bytes it will consume 198K of space on the disk. If we use the compression ratio that we talked about we will consume roughly 20K of data. This gives us a 10x compression ratio and saves us a significant amount of space on the disk. If the disk sub-system does this compression for us we write 198K of data to the disk, it consumes 20K of storage on the spindles, but when we read it back it has to be rehydrated and we transfer 198K back to the processor and consume 198K of memory to hold the rehydrated data. If the database knew what the compression algorithm and compressed the data initially in memory it would only transmit 20K to the disk, store 20K on the spindles, read 20K back from the disk, and consume 20K of memory to hold the data. This might not seem significant but if we are reading the data across a 2.5 G/second SCSI connection it takes 80ms to read the data rather than 8ms. This 72ms difference can be significant if we have to repeat this a few thousand times. It can also be significant if we have a 1 GigE network connection rather than a direct attached disk. The transfer time jumps to 200ms by moving the data from an attached disk to an nfs or smb mounted disk. We see performance problems like this with database backups to third party storage solutions like Data Domain. If you take a database backup and copy it to a Data Domain solution you get the 10x compression and the backup takes roughly an hour. You have to estimate that it will take seven to eight times the time to rehydrate the data so a restore will take 7-8 hours to recover your database.
The recommended solution is to use compression inside the database rather than third party compression solutions that are designed to compress backups, home directories, and email attachments. Oracle offers Advanced Compressions options for information stored in the database. If you look at the 12c Advanced Compression Data Sheet you will notice that there are a variety options available for compression. You can compress
Heat Map Compression
At the segment level, Heat Map tracks the timestamps of the most recent modification and query of each table and partition in the database. At the block level, Heat Map tracks the most recent modification timestamp. These timestamps are used by Automatic Data Optimization to define compression and storage policies which will be automatically maintained throughout the lifecycle of the data. Heat Map skips internal operations done for system tasks -- automatically excluding Stats Gathering, DDLs, Table Redefinitions and similar operations. In addition, Heat Map can be disabled at the session level, allowing DBA’s to exclude manual maintenance, avoiding
pollution of Heat Map data.
With the data collected by Heat Map, Oracle Database can automatically compress each partition of a table independently based on Heat Map data, implementing compression tiering. This compression tiering can use all forms of Oracle table compression, including: Advanced Row Compression and all levels of Hybrid Columnar Compression (HCC) if the underlying storage supports HCC. Oracle Database can also compress individual database blocks with Advanced Row Compression based on Heat Map data.
a segment-level ADO policy is created to automatically compress the entire table after there have been no
modifications for at least 30 days, using Advanced Row Compression:
ALTER TABLE employee ILM ADD POLICY ROW STORE COMPRESS ADVANCED SEGMENT AFTER 30 DAYS OF NO MODIFICATION;
ALTER TABLE employee ILM ADD POLICY ROW STORE COMPRESS ADVANCED ROW AFTER 3 DAYS OF NO MODIFICATION;
In this example, a tablespace-level ADO policy automatically moves the table to a different tablespace when the tablespace currently containing the object meets a pre-defined tablespace fullness threshold:
ALTER TABLE employee ILM ADD POLICY tier to ilmtbs;
Advanced Row Compression uses a unique compression algorithm specifically designed to work with OLTP applications. The algorithm works by eliminating duplicate values within a database block, even across multiple columns. Compressed blocks contain a structure called a symbol table that maintains compression metadata. When a block is compressed, duplicate values are eliminated by first adding a single copy of the duplicate value to the symbol table. Each duplicate value is then replaced by a short reference to the appropriate entry in the symbol table.
Consider an email application where 10 users receive an email with the same 1MB attachment. Without
Advanced LOB Deduplication, the system would store one copy of the file for each of the 10 users –
requiring 10MB of storage. If the email application in our example uses Advanced LOB Deduplication, it will
store the 1MB attachment just once. That’s a 90% savings in storage requirements.
In addition to the storage savings, Advanced LOB Deduplication also increases application performance.
Specifically, write and copy operations are much more efficient since only references to the SecureFiles
data are written. Further, read operations may improve if duplicate SecureFiles data already exists in the
Backup data compression
RMAN makes a block-by-block backup of the database data, also known as a “physical” backup, which can be used to perform database, tablespace or block level recovery. Data Pump is used to perform a “logical” backup by offloading data from one or more tables into a flat file.
Due to RMAN’s tight integration with Oracle Database, backup data is compressed before it is written to
disk or tape and doesn’t need to be uncompressed before recovery – providing an enormous reduction in
storage costs and a potentially large reduction in backup and restore times. There are three levels of RMAN Compression: LOW, MEDIUM, and HIGH. The amount of storage savings increases from LOW to HIGH, while potentially consuming more CPU resources.
Data Pump compression is an inline operation, so the reduced dump file size means a significant savings
in disk space. Unlike operating system or file system compression utilities, Data Pump compression is fully
inline on the import side as well, so there is no need to decompress a dump file before importing it. The
compressed dump file sets are automatically decompressed during import without any additional steps by
the Database Administrator.
Data Guard redo log compression
Data Guard Redo Transport Services are used to transfer this redo data to the standby site(s). With Advanced Compression, redo data may be transmitted in a compressed format to reduce network bandwidth consumption and in some cases reduce transmission time of redo data. Redo data can be transmitted in a compressed format when the Oracle Data Guard configuration uses either synchronous redo transport (SYNC) or asynchronous redo transport (ASYNC).
Advanced Index compression is a new form of index block compression. Creating an index using Advanced Index Compression reduces the size of all supported unique and non-unique indexes -- while still providing efficient access to the indexes. Advanced Index Compression works well on all supported indexes, including those indexes that are not good candidates (indexes with no duplicate values, or few duplicate values, for given number of leading columns of the index) with the existing index Prefix Compression feature.
Advanced Network Compression, also referred to as SQL Network Data Compression, can be used to compress the network data to be transmitted at the sending side and then uncompress it at the receiving side to reduce the network traffic. Advanced Network Compression reduces the size of the session data unit (SDU) transmitted over a data connection. Reducing the size of data reduces the time required to transmit the SDU.
Advanced Network Compression not only makes SQL query responses faster but also saves bandwidth. On narrow bandwidth connections, with faster CPU, it could significantly improve performance. The compression is transparent to client applications.
We won't cover the last two options since they don't apply to database services in the cloud unless you purchase the Exadata as a Service option. There is a Compression Estimation Tool to help you estimate the benefits of compression. A sample of this looking at 100 TB of database data shows a significant cost savings in the millions of dollars.
There is also a Compression Advisor that can be downloaded and installed in your database to look at your tables and estimate how much storage you can save based on your data and your usage patterns. You can watch a Four minute marketing video on the tool and how to use it. I recommend Tyler Mouth's blog entry on customizing the output of the compression advisor to be a little more user friendly. I would also look at Mike Haas's Blog on compression and the DBAORA blog that provides a good overview of 11g compressions. Mike Messin's blog is a good blog on installing and executing the compression advisor.
In summary, compression can be used with a variety of mechanisms based on your usage patterns and objectives. This option is not one size fits all and requires a DBA with knowledge of the usage patterns and familiarity of the data and applications. Letting a non-DBA decide on the compression mechanism can lead to poor performance, missing recovery objective times, increased network throughput, and higher processor utilization than necessary. The Database 12c Compression Documentation details how to create tables that are compressed, how to look and see if tables are compressed, and how to update tables for compression. Compression is a mechanism that can directly reduce your storage costs by consuming significantly less amounts of storage to store the same data. In the cloud this correlates directly to storage cost savings. You get compression as an option for High Performance Edition and Extreme Performance Edition but not the Standard Edition or Enterprise Edition versions of the database.