Compressing Individual Partitions in the warehouse

Partitioning, now that is a topic that we talk a lot about here at Oracle. If you look in the data warehouse guide and related publications you will see lots on partition pruning, partition wise joins and partitioned indexing. For more on this see also this excellent best practices white paper.

If I use my new "volume test" (the old one being "how thick is the piece of paper that describes the topic") and measure the number of centimeters of screen that is needed for all that and compare it with the small piece of my screen dedicated to the compression part, it warrants a little bit of a write up.

What am I talking about? Well, lets say you are running a data warehouse and want to achieve two things:

- Use less of your high end storage for older data
- Try to improve large table scan performance without buying Oracle Exadata Storage or a Database Machine

Let's say that the use of expensive storage for data that is read once in a while is not something you can avoid. So you want to use as little of that space as possible. Simply said, compress it and you have achieved that. But you want to leave your new data (which will also gets updates etc.) uncompressed.

Obviously, partitioning will allow you to scan less data due to pruning, but that is not what I'm after here. If you compress data on disk, you reduce the total volume (and footprint if you may say that) on the actual disk. That will give you less space for more data. E.g. in a simplistic manner, in an I/O constraint system, you have just potentially improved the performance of large table scans. Oracle Exadata is obviously the other really cool solution... for this and for much more performance improvements.

So what do we need to do to get some of these improvements?

First lets look at the SH schema that comes with 11g database (and older versions, but I'm using 11g). To see if a partition is compressed and for which operations run the following:

select compression, compress_for
from user_tab_partitions
where partition_name = 'SALES_1995';

You get something like this:

COMPRESS COMPRESS_FOR
----------- --------------------
ENABLED DIRECT LOAD ONLY

Now we use the example SQL from the VLDB and Partitioning book (Oracle's documentation) with a little tweak - I run this on a laptop, so I took PARALLEL off to change the COMPRESS_FOR.

ALTER TABLE sales
MOVE PARTITION sales_1995
COMPRESS FOR ALL OPERATIONS
NOLOGGING;

And ask again, what we compressed for you get the expected result:

COMPRESS COMPRESS_FOR
-------------- ------------------------
ENABLED FOR ALL OPERATIONS

What this means is that we now have compression on, and if we insert a row, or update a row, we still keep the compression going. The earlier status is really the one to use for older data that you do not see changing at all. COMPRESS FOR ALL OPERATIONS is actually the 11g Advanced Compression database priced option.

BTW, note that partition maintenance invalidates indexes... so make sure to rebuild the indexes at some point of time! I'm not doing that here, since it is not really relevant to what I'm trying to show.

Continuing with our example, if we use the SH.SALES table we need to un-compress the table or partition first (if you check you will see the whole lot is compressed already):

ALTER TABLE sales
MOVE PARTITION SALES_Q1_1998
NOCOMPRESS
NOLOGGING
/

This gives us a size as follows:

SEGMENT_NAME: SALES
SEGMENT_TYPE: TABLE PARTITION
BYTES: 2097152

Using the earlier syntax to compress this again we get the following:

SEGMENT_NAME: SALES
SEGMENT_TYPE: TABLE PARTITION
BYTES: 655360

That gives us a compression ratio of 3.2 which is quite nice if our goal is to reduce costly space on disk.

One last step we can take it to consolidate the partitions into a larger partition. This won't save space, but might make management of the partitions simpler. For example rather than moving a set of 12 monthly partitions to different storage, you deal with a single yearly partition.

That statement (very simple indeed) is:

alter table sales
merge partitions SALES_H1_1997, SALES_H2_1997
into partition sales_1997;

Comments:

JP, People do read your posts . Thanks

Posted by Rich on December 07, 2009 at 04:21 AM PST #

Post a Comment:
Comments are closed for this entry.
About

The data warehouse insider is written by the Oracle product management team and sheds lights on all thing data warehousing and big data.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
2
4
5
6
7
8
9
10
11
12
13
14
16
18
19
20
21
23
24
25
26
27
28
29
30
   
       
Today