11gR2 - What is in it for me?

By now you probably heard the news of the Oracle Database 11g Release 2. It is available and comes with a number of very interesting features for data warehousing.

Obviously now you want to try this, so here is the download site for the software (OTN License required of course). The data warehouse collateral can be found on OTN as well, so here is the link for that.

I would recommend everyone to first and foremost read the update on what the Oracle Database offers for data warehousing.

As a guide to the paper just mentioned you may want to skim over some of the sections and look for In-Memory Parallel Execution, look for either Archive Compressoin or Warehouse Compression (look for both using Exadata Hybrid Columnar Compression) and take a quick peak at the Warehouse Builder section. Those sections will give you a quick glance of some of the new features. Now these are some of the features that made it into the highest-level overview paper.

In this blog post and in the next set of postings (yes, this is a series of posts...) we will discuss a myriad of new features (including the above mentioned ones) in much more detail.

One of the features that has already hit the blogosphere is the Exadata Hybrid Columnar Compression. Kevin commented on a post (here).

As there seems to be some confusion as to what we really did here, we did introduce a columnar compression mechanism. It consists out of 2 compression grades - if I may use that word here - one called data warehouse compression that allows great compression (10x) with excellent query ratings and an archiving compression that really focuses on the compression rate at the cost of performance. You can mix and match these compression forms by applying different schemes to different partitions.

Technically, what we did here is something along these lines...

Data is still organized in tables and rows, however the database now organizes this data in so-called compression units. The compression units hold a set of rows (let's say a couple of thousand to make it a bit more concrete) and the data within the compression units is organized in columns and then compressed. This column orientation within a compression unit allows for larger rates of compression as we have more of the same values close together.

The hybrid columnar compression is something else (just to be clear) then Advanced Compression (often referred to as OLTP compression) and you do not need one to get the other work.

OLTP compression is a method to compress data that sees very frequent inserts and updates and it will keep the data compressed at a high ration (3-4x) while you write, update and read.

Hybrid Columnar Compression for Data Warehousing is focused at data that gets delivered in bulk loads (direct writes) and gets queried a lot rather than update a lot. This hybrid columnar for data warehousing will deliver significantly higher compression rates (10x and we have seen more) than OLTP compression. But it is aimed at that write-ones, read-many workload.

As Kevin said, Hybrid is the key word in the compression feature new in 11gR2 (ehm, sorry, new in Exadata...). By using a hybrid solution we get great compression rates, excellent table scans and good random row access. The latter is the one that is always a problem in pure columnar storage.

One of the fun little features - to switch gears here - that is added to SQL is in analytics. We now have something called LISTAGG. LISTAGG allows you to flatten a result set into a single record. This is often used in 1:many constructions such as hierarchical data and bill of materials.

If we use good old EMP and DEPT, you get something like this:

SELECT deptno,
LISTAGG(ename, '; ') WITHIN GROUP (order by ename)
FROM emp GROUP BY deptno;

DEPTNO LISTAGG(ENAME,';')WITHINGROUP(ORDERBYENAME)
-------------- -----------------------------------------------------------------------------------
10 CLARK; KING; MILLER
20 ADAMS; FORD; JONES; SCOTT; SMITH
30 ALLEN; BLAKE; JAMES; MARTIN; TURNER; WARD

Next up, we'll look at the in-memory parallel execution. If you want to see how fast it is, just take a look at the TPC benchmark sites here... Find the 1000GB benchmark or click here.

Comments:

This is an excellent article, I will definitely be sure to add your blog to my bookmarks!

Posted by Susan Costello on December 09, 2009 at 05:20 AM PST #

Post a Comment:
Comments are closed for this entry.
About

The data warehouse insider is written by the Oracle product management team and sheds lights on all thing data warehousing and big data.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
2
4
5
6
7
8
9
10
11
12
13
14
16
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today