Information, tips, tricks and sample code for Big Data Warehousing in an autonomous, cloud-driven world

  • January 5, 2010

New whitepaper on Exadata Hybrid Columnar Compression

Keith Laker
Senior Principal Product Manager

The Exadata team has just released a new technical white paper on Exadata Hybrid Columnar Compression (EHCC). It is available on OTN and you can download it by clicking here.

To follow on from Jean-Pierre's recent posting on "Increase Performance While Reducing Cost",
the new Exadata compression is probably one of the most important and exciting features of this release of Exadata for data warehouse customers, because it increases performance while significantly reducing the overall cost of storage.

In a data warehouse it would be very useful to apply different rates of compression based on data usage/query patterns. Data that is being actively updated should really be compressed in a different way from historical or archive data. Ideally, as much historical data as possible should be kept online to support adhoc analysis and the development of data mining models. Many database vendors have tried to resolve this growing need - to keep more data online and maintain query performance while reducing overall storage costs. The result to date has been either 1) high rates of compression with poor query performance that impacts on general BI and adhoc queries as well as the development of data mining models or 2) low rate of compression which does not allow large amounts of historical data to be kept online.

EHCC has two compression methods and this makes it very useful for data warehousing. The first compression mode, which is optimized for query performance, provides up to 10x compression ratio, with corresponding improvements in query performance. The second mode provides up to 50x compression ratios with only limited impact on performance. This makes EHCC ideal for managing data that is considered suitable for archiving but also needs to be kept online to support various business queries. This means it is possible to keep more data online for much longer. Having access to a lot more data helps all sorts of data warehouse operations, especially data mining where access to large datasets helps develop more robust models.

Combining EHCC with partitioning provides the ideal solution. Oracle partitioning provides the ability to divide a single table into smaller partitions. These partitions are typically based on date ranges (although other options are available). Using the combination of partitioning and EHCC it is possible to develop something like this:

  • Active Partitions (0-6 months) - up to 10x compression using Exadata Hybrid Columnar Compression - Query Mode.
  • Historical Partitions (7- 24 months) - up to 10x compression using Exadata Hybrid Columnar Compression - Query mode
  • Archive Partitions (25 - 60 months) - up to 50x compressing using Exadata Hybrid Columnar Compression - Archive mode

Defining a compression strategy similar to this provides a lot of benefits. In almost all cases query performance will improve due to improved disk scan rates and a reduction in the number of I/Os. Oracle Database does not require data to be decompressed - it can keep data compressed in memory, thereby further reducing I/Os and providing fast data reads.

So now you can store more data and access that data much faster. For more information visit the Exadata home page on OTN: http://www.oracle.com/technology/products/bi/db/exadata/index.html

Join the discussion

Comments ( 1 )
  • Mitchel Thursday, January 14, 2010
    That is a huge leap in compression and performance. Nice to see Oracle leading the pack on this technology. It seems like a lot of their competitors are not making much progress at all.
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.