500GB/sec and Database Machine Generation 2

Last Tuesday we announced the second generation of Database Machine. This second generation of Oracle Exadata is now running on Sun hardware. The premise for Database Machine is still the same: deliver extreme performance systems on commodity hardware with ease of deployment.

The database machine is a prime example (as was the first generation) of software-enabled hardware. The software offers the real value, the hardware is of the shelve stuff allowing a great price point and an easy way to quickly release a next generation system and get the benefits of faster chips and other components. The software allows the easy migration and the extreme benefits.

The Sun Oracle Database Machine comes with some new and very cool Exadata software features, it once again has InfiniBand - generation 2 delivers even higher throughput numbers - and it is now available in smaller configurations.

So what is new here?

For one, the addition of flash into the system is something very compelling and a leap forward in terms of performance and throughput. And yes, that is where the 500GB/sec comes from...

cache_hierarchy.JPG

Effectively what we did in generation 2 is adding a very fast cache into the storage tier of the system, and by doing this created a hierarchy as shown above. The fastest tier is the actual memory in the database nodes, which we increased on the machine. The bottom part of hierarchy is the disk, here we increased the throughput for a whole rack to 21GB/sec. By adding flash cards (not flash drives!) to the storage tier we can leverage this as cache and get the benefits from a scale out strategy. As we scale out the storage, we scale out the flash and the throughput.

The Exadata cache is a smart cache that we carefully manage. If you deem it necessary you can pin objects into the cache as well. Since the Exadata Storage Server actually understands the structure of the data stored, the cache does so too. It is after all managed by the Exadata software. This means that we do not use a regular LRU (Least Recently Used) algorithm, but determine which data is hot and cache these sets when we deem it better to do so.

One distinct difference with the flash you see in traditional storage arrays is that we are not using flash disks in Exadata. We are using PCIe cards. This means we are not constraint by slow disk controllers and can get these massive throughput numbers of 50GB/sec for a full rack database machine.

On top of this, we are introducing Hybrid Columnar Compression with Exadata generation 2. We talked about this already in a previous post around the 11g Release 2 database new features.

In the data warehousing workload (assuming bulk loads for example and lots of querying) we can achieve a 10x compression of the data with almost no impact on query performance. That compression rate allows us to achieve up to 500GB/sec of scan rates from the flash cards.

To put that into perspective, in generation 1 of the Database Machine we achieved up to 14GB/sec of throughput from the disks (in a full rack). In generation 2 we are up to 21 GB/sec, both numbers are uncompressed. Flash gets us to around 50GB/sec. The truly staggering numbers come with that 10x Hybrid Columnar Compression rate... For anyone who has ever run queries on a system, 500 GB/sec is really, really fast!

Storage Indexes

That is not all though. Generation 2 of Exadata also introduces Storage Indexes. A storage index is something more akin to a range partition, but we evaluate this at the storage layer. Sometime this is referred to as a negative index.

What happens is that for each column commonly queried we transparently store the min and max values of that column. We do this for a certain data size e.g. as soon as we finish writing the data and filling up that predefined size we calculate the min and max for the relevant columns. The result is something like this:

storage_index_schematic.JPG

If the user now issues a query asking something like SELECT * FROM TABLE WHERE B<2 the scans will only look for the first set of rows in above picture. Since the minimum value in the second block is 3, no rows matching the query will be in that set of rows. This allows a Storage Index to gives us transparent data elimination without overhead, making the scans more distinctive and therefore faster.

So as you can see, the whole system is faster on all accounts than the already fast generation 1 system. It is also much faster than anything else out there in the market.

Seeing that there is much more news, like the actual family details (half racks, quarter racks and smaller) the offloading of data mining scoring and all the 11g Release 2 details we haven't yet cover, expect quite a few follow-up posts on both 11gR2 and Generation 2 Exadata.

Next is as promised earlier, the 11gR2 in-memory parallel execution

Comments:

Are you dumping HP Oracle Database Machine now that you have a faster version in Sun?

Posted by anton on September 22, 2009 at 02:31 PM PDT #

Anton, Remember when Apple introduced the Intel based Macs and stopped using the earlier generation chips (what was it again Power?)? Did they stop supporting their older Macs? Nope, they simply saw a better value proposition out there and adopted that for their solution. And the solution remained the same high-end value proposition. Bottom line, a Mac is not just the chip it runs on (then it would just be a PC), it is the whole package, OS X, the applications, the styling, even the box the thing comes in... A Database machine is something very similar. It is not the hardware that gives it value (that just keeps it a low price point), it is the entire package of software, ease of deployment and the overall performance it delivers. Oracle supports the current HP production customers - remember you call Oracle for support - as we support the new Sun customers. We just moved to a newer generation of chips and boxes, and added a lot of software enhancements and great features to create a faster and more versatile solution. JP

Posted by Jean-Pierre on September 23, 2009 at 01:25 AM PDT #

"Oracle supports the current HP production customers" HP Oracle Database Machine customers can upgrade to V2 Exadata Storage Server Software and get the benefits of Hybrid Columnar Compression and Storage Indexes.

Posted by Kevin Closson on September 23, 2009 at 11:36 PM PDT #

Would there be any solaris version or just OEL? 11gR2 for solaris is not yet downloadable what is expected data. Since all features being used in exadata v2 are on r2.

Posted by Azhar on September 27, 2009 at 08:02 PM PDT #

As generation 1, the new machine is also only OEL. 11gR2 is released on the linux platforms first, other platforms with the large Unix versions first will be coming available shortly.

Posted by jean-pierre.dijcks on September 28, 2009 at 12:34 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

The data warehouse insider is written by the Oracle product management team and sheds lights on all thing data warehousing and big data.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
2
4
5
6
7
8
9
10
11
12
13
14
16
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today