X

Database Storage Optimization best practices, tips and tricks and guidance from Database Compression Product Management

Compressing your Indexes: Index Key Compression (PART 1)

Guest Author

Folks have often commented to me that Table Compression (and
Advanced Row Compression) are  great at helping
to reduce the storage footprint of an Oracle database by a factor of 2x-5x, but
Oracle shouldn’t forget about the indexes.

In my experience, indexes often take up to 50% of the total
database space and it is not uncommon to have 10-20 indexes on a single table
(in extreme cases, I have seen upward of 100 indexes per table).

The good news is indexes are the first class citizens in
Oracle and we have NOT forgotten about them.

Index
Key Compression
is perhaps one of the oldest compression features within
the Oracle database, released with Oracle 8.1.3 (long before Basic Table
Compression in 9.2). If used correctly, Index Key Compression has the potential
to substantially reduce the overall size of indexes. It helps both multi-column
unique indexes and non-unique indexes alike and is also one of the most
critical index optimization options available.

Index Key Compression allows us to compress portions of the key values in an index segment (or Index Organized Table), by reducing the storage inefficiencies of storing repeating values. It compresses the data by splitting the index key into two parts; the leading group of columns, called the prefix entry (which are potentially shared across multiple key values), and the suffix columns (which is unique to every index key). As the prefixes are potentially shared across multiple keys in a block, these can be stored more optimally (that is, only once) and shared across multiple suffix entries, resulting in the index data being compressed.


Index Key compression is done in the leaf blocks of a B-Tree
index. The keys are compressed locally within an index leaf block, that is,
both the prefix and suffix entries are stored within same block. Suffix entries
form the compressed representation of the index key. Each one of these
compressed rows refers to the corresponding prefix, which is stored in the same
block. By storing the prefixes and suffixes locally in the same block, each
index block is self-contained and in order to construct the complete key there
is no additional block IO involved. It is a very cheap memory only operation.

So how does one enable compression on indexes?

This can be achieved easily by specifying the COMPRESS
option for the index. New indexes can be automatically created as compressed,
or the existing indexes can be rebuilt compressed.

CREATE INDEX idxname ON tabname(col1, col2, col3) COMPRESS;

By default the prefix consists of all indexed columns for
non-unique indexes, and all indexed columns excluding the last one for unique
indexes.

In the above example, the default prefix would be (col1,
col2). In this case, if the index leaf contains the keys, say (A, B, X), (A, B,
X), (A, B, Y), (A, C, X), (A, C, Z), (A, D, X) etc., the prefix column values (A,
B) and (A, C) occur repeatedly in the keys and can be compressed.

Alternatively you can also specify the prefix length, which
is the number of columns in the prefix.

CREATE INDEX idxname ON tabname(col1, col2, col3) COMPRESS [<prefix_col_length>]

The <prefix_col_length> after the COMPRESS keyword
denotes how many columns to compress. The default (and the maximum) prefix
length for a non-unique index is the number of key columns, and the maximum
prefix length for a unique index is the number of key columns minus one.

If you specify a prefix length of 1 in the above example,
then the prefix is only (col1) and the suffix is (col2, col3). For the same
list of values, the prefix value (A) occurs repeatedly in the keys and can be
compressed.

Prefix entries are written to the index block only if the
index block does not already contain that prefix. They are available for
sharing across multiple suffix entries immediately after being written and
remain available until the last referencing suffix entry is deleted and cleaned
out from the block.

Keep in mind that although key compression reduces the
storage requirements of an index by sharing parts of keys across multiple
entries, there is a small CPU overhead to reconstruct the key column values
during index lookup or scans (which is minimized by keeping the prefixes
locally in the block).

A more optimal representation of an index, especially if it
stays permanently compressed without any subsequent overhead on the maintenance
operations, is a good thing. Not only will it have a positive impact on the
storage and space savings, but has secondary benefits such as better cache
efficiency, fewer leaf blocks and less deep tree resulting in potentially fewer
logical IOs and hence a cheaper execution plans.

Index Key compression can be useful in many different
scenarios, such as:

  • In case of a
    non-unique index where ROWID is appended to make the key unique. If such
    an index is compressed using key compression, the duplicate key is stored
    as a prefix entry in the index block without the ROWID. The remaining rows
    become suffix entries consisting of only the ROWID.
  • In case of
    unique multicolumn index (key compression is not possible for unique
    single column index because there is a unique piece but there are no
    prefix grouping pieces to share).
  • In case of Index
    Organized Tables, the same considerations as unique multicolumn indexes
    apply.

This is all great. But going back to my comment on "If
used correctly...", so where is the problem?

Compression can be very beneficial when the prefix columns
of an index are repeated many times within a leaf block. However, if the
leading columns are very selective or if there are not many repeated values for
the prefix columns, then we have a problem.

In these scenarios, Oracle still creates prefix entries
storing all unique combinations of compressed column values within a leaf
block. The index rows will refer to the prefix entry, which is hardly shared
(if at all) by other index rows. So, it’s possible that compression in these
cases is not beneficial, and could end up bloating the index size due to the
overhead of storing all of the prefix entries.

For index compression to be beneficial, you should ensure
low cardinality columns are the leading columns in a concatenated index.
Otherwise you run the risk of getting negative compression such that leaf blocks
can no longer store as many keys as their non-compressed counterparts.

There is also no point in compressing a single column unique
index or compressing every column in a concatenated, multi-column unique index.
In these cases compression will result in an index structure that increases in
size rather than decreasing (negative compression) due to all the overheads
associated with having prefix entries for each and every index row.

The key to getting good index compression is identifying
which indexes will benefit from it and how to specify <prefix_col_length>
correctly.

There is no easy way to figure this out without knowing your
data well (which often requires you to analyze all your data) and knowing how
the data is going to evolve over time.

On the positive side, Oracle does try and protect you under
certain obvious cases. For example, you are not able to create a compressed
index on a single column unique index. Nor are you allowed to specify a prefix
length equal to or greater than the number of columns for unique index, (remember
the default prefix length for a unique concatenated index is the number of
indexed columns minus one) etc.

You may also want to check out Richard
Foote’s
blog series on index compression. Jonathan
Lewis
also gives a great example on how index compression and compression
in general can help with overall database storage and performance.

We will talk more on how to overcome
the issues with Index Key Compression and always guarantee a positive
compression with Advanced Index
Compression,
a feature of Advanced Compression, in Part 2 of this
blog.

Join the discussion

Comments ( 2 )
  • guest Wednesday, March 16, 2016

    "There is no easy way to figure this out"

    As long as you have some realistic data, an "analyze" can help

    SQL> desc index_stats

    Name Null? Type

    ----------------------------- -------- ----------------

    HEIGHT NUMBER

    BLOCKS NUMBER

    NAME VARCHAR2(128)

    ...

    ...

    OPT_CMPR_COUNT NUMBER <====

    OPT_CMPR_PCTSAVE NUMBER <====


  • Vineet Marwah-Oracle Friday, March 18, 2016

    I agree with you, and as I mentioned in the blog post:

    "There is no easy way to figure this out without knowing your data well (which often requires you to analyze all your data) and knowing how the data is going to evolve over time."

    ANALYZE can help you to obtain the optimal prefix column count, but it produces the optimal count for the index as a whole and may not yield the best compression ratio. Additionally, running ANALYZE INDEX takes an exclusive lock on the table, effectively making the table “offline” for this period.

    More on this in Part 2 of the blog.


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.