Database Storage Optimization best practices, tips and tricks and guidance from Database Compression Product Management

  • November 27, 2018

Can Data Compression Ratios Be Improved By Sorting Data Before Loading?

Gregg Christman
Product Manager

This is a very common question, but before we discuss this further we should review how Advanced Row Compression works.

Advanced Row Compression uses a compression algorithm specifically designed to work with OLTP/DW applications. The algorithm works by eliminating duplicate values within a database block, even across multiple columns. Compressed blocks contain a structure called a symbol table that maintains compression metadata. When a block is compressed, duplicate values are eliminated by first adding a single copy of the duplicate value to the symbol table. Each duplicate value is replaced by a short reference to the appropriate entry in the symbol table.

The compression ratio achieved in a given environment depends on the data being compressed, specifically the cardinality of the data. In general, organizations can expect to reduce their storage space consumption by a factor of 2x to 4x by using Advanced Row Compression. That is, the amount of space consumed by uncompressed data will be two to four times larger than that of the compressed data.

So given how Advanced Row Compression works, it may be possible to improve a table’s compression ratio by presorting the data when it is loaded. You will have to decide which column(s) to sort on based on the cardinality of the data in each column: if you can sort on a column that has a small number of distinct values, that could produce better compression ratios. However, presorting will require additional preparation of the data before loading - you will need to weigh that additional time versus any compression ratio gain.

A very small gain in compression ratio, going from a 2.3x compression ratio to a 2.5x compression ratio for example, may not be worth the time and resources required to manage the data sort prior to the data load. On the other hand, a significant improvement in the compression ratio, going from a 2.3x compression ratio to a 4.1x compression ratio for example, may be well worth the time and resources needed to sort the data prior to loading.

Improving compression ratios, by sorting data before loading, can be especially useful for cold/historic data where the data is read-mostly/only and few (if any) modifications are expected. If sorting the data can improve the compression ratio for data that is maintained for reporting, historic or regulatory reasons, then even smaller storage savings may be well worth the time and resources required for the sorting before loading.

As always, it is recommended that you test with your own data to see if sorting data, before loading the data, will benefit the tables/partitions compression ratio and that the time and resources needed to perform the sort does not outweigh the benefits.

Note that this discussion applies to Advanced Row Compression, OLTP Table Compression and Basic Table Compression.

For more information on how to determine the compression ratio of your tables/partitions, please see this Oracle White Paper that discusses how to use the free Oracle Compression Advisor (DBMS_COMPRESSION).

In my next blog, we will continue discussing common compression questions.

The database storage optimization adventure continues!

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.