Friday Oct 14, 2011

A Closer Look at Oracle Big Data Appliance

Oracle Openworld just flew by… a lot of things happened in the big data space of course and you can read a lot of articles, blogs and other interesting materials all over.

What I thought I’d do here is to go through the big data appliance in a little more detail so everyone understands what the make-up of the machine is, what software we are putting on the machine and how it integrates with the Exadata machines.

Now, if you are bored reading, you can actually see and hear Todd and me discuss all this stuff using this link. This should be fun if you have never been to Openworld, as the interview is recorded at the OTN Lounge in the Howard street tent.

Oracle Big Data Appliance

The machine details are as follows:

  • 18 Nodes – Sun Servers
  • 2 CPUs per node, each with 6 cores (216 cores total)
  • 12 Disks per node (432 TB raw disk total)
  • Redundant InfiniBand Switches with 10GigE connectivity

To scale the machines, simply add a rack to the original full rack via InfiniBand. By leveraging InfiniBand we generally remove the network bottlenecks in the machine and between the machines. We chose InfiniBand over the 10GigE connectivity because we do believe network capacity of 40Gb/sec is a valuable asset in a Hadoop cluster. We also think that using InfiniBand to connect the big data appliance to an Exadata machine will have a positive influence of the batch loads done into an Oracle system.


The software we are going to pre-install on the machine is:

  • Oracle Linux and Oracle Hotspot
  • Open-source distribution of Apache Hadoop
  • Oracle NoSQL Database Enterprise Edition (also available stand-alone)
  • Oracle Loader for Hadoop (also available stand-alone)
  • Open-source distribution of R (statistical package)
  • Oracle Data Integrator Application Adapter for Hadoop (also available stand-alone with ODI)

The goal of this software stack combined with the Sun hardware as an appliance is to create an enterprise class solution for Big Data that is:

  • Optimized and Complete - Everything you need to store and integrate your lower information density data
  • Integrated with Oracle Exadata - Analyze all your data
  • Easy to Deploy - Risk Free, Quick Installation and Setup
  • Single Vendor Support - Full Oracle support for the entire system and software set

As we get closer to the delivery date, you will see more detailed descriptions of the appliance, so stay tuned.

Thursday Sep 29, 2011

Added Session: Big Data Appliance

We added a new session to discuss Oracle Big Data Appliance. Here are the session details:

Oracle Big Data Appliance: Big Data for the Enterprise
Wednesday 10:15 AM
Marriott Marquis - Golden Gate C3 

Should be a fun session... see you all there!!

Monday Sep 19, 2011

Focus On: Big Data at Openworld

With Oracle Openworld rapidly approaching and many new things coming / being announced around big data at Openworld, I figured it is good to share some of the sessions that are interesting in the big data context.

All big data related session can be found here: Focus on Big Data.

A couple of important highlights to set the scene:

  • Sunday 5.30pm: Openworld Welcome Keynote featuring some of the announcements
  • Monday 2.00pm: Extreme Data Management - Are you Ready? by Andy Mendelsohn - this one promises to be a very fun session mixing fun and technical content

With those session behind your belt, you are ready to dive into the details as listed in the Focus On document. And if you just want to look at the new machines, come visit us at the Engineered Systems Showcase in front of the key note hall or come look around the Big Data area in the database demogrounds section!

See you all in San Francisco!

Friday Aug 05, 2011

Big Data: In-Memory MapReduce

Achieving the impossible often comes at a cost. In the today’s big data world that cost is often latency. Bringing real-time, deep analytics to big data requires improvements to the core of the big data platform. In this post we will discuss parallel analytics and how we can bring in-memory – or real-time – to very large data volumes.

The Parallel Platform that run in Memory

One of the often overlooked things we did in Oracle 11.2 is that we changed the behavior of data that resides in memory across a RAC cluster. Instead of restricting the data size to the size of a the memory in a single node in the cluster, we now allow the database to see the memory as a large pool. We no longer do cache fusion to replicate all pieces of data in an object to all nodes.


That cache fusion process is show above. In state A, node 1 has acquired pieces of data into the buffer cache, as has node 2. These are different pieces of data. Cache fusion now kicks in and ensures that all data for that object is in the cache of each node.

In 11.2 we have changed this by allowing parallel execution to leverage the buffer cache as a grid. To do that we no longer do cache fusion, but instead pin (affinitize is the not so nice English word we use) data onto the memory of a node based on some internal algorithm. Parallel execution keeps track of where data lives and shuffles the query to that node (rather than using cache fusion to move data around).


The above shows how this works. P1 and P2 are chunks of data living in the buffer cache and are affinitized with a certain node. The parallel server processes on that node will execute the query and send the results back to the node that originated the query. With this in-memory grid we can process much larger data volumes.

In-Database MapReduce becomes In-Memory MapReduce

We talked about in-database MapReduce quite a bit on this blog, so I won’t repeat myself. If you are new to this topic, have a look at this post.

Because of how the database is architected, any code running within it leveraging parallelism can now use the data hosted in memory of the machine. Whether this is across the grid or not, doesn’t matter. So rather than having to figure out how to create a system that allows for MapReduce code to run in memory, you need to just figure out how to write the code, Oracle will ensure that if the data fits, it will leverage memory instead of disk. That is shown in the following picture.


Now, is this big data? Well, if I look at an Exadata X2-8  machine today, we will have 2TB of memory to work with and with Exadata Hybrid Columnar Compression (yes the data resides in memory on compressed state) this means I should easily be able to run upwards of 10TB in memory.  As memory footprints grow, more data will fit within the memory and we can do more and more interesting analytics on that data, bringing at least realtime to some fairly complex analytics.

More at Oracle Openworld

For those of you who are going to San Francisco (with or without flowers in your hat), expect to see and hear a lot more on big data and MapReduce! See you there!

Monday Jun 27, 2011

Big Data Accelerator

For everyone who does not regularly listen to earnings calls, Oracle's Q4 call was interesting (as it mostly is). One of the announcements in the call was the Big Data Accelerator from Oracle (Seeking Alpha link here - slightly tweaked for correctness shown below):

 "The big data accelerator includes some of the standard open source software, HDFS, the file system and a number of other pieces, but also some Oracle components that we think can dramatically speed up the entire map-reduce process. And will be particularly attractive to Java programmers [...]. There are some interesting applications they do, ETL is one. Log processing is another. We're going to have a lot of those features, functions and pre-built applications in our big data accelerator."

 Not much else we can say right now, more on this (and Big Data in general) at Openworld!

Tuesday Jun 07, 2011

Big Data: Achieve the Impossible in Real-Time

Sure, we all want to make the impossible possible… in any scenario, in any business. Here we are talking about driving performance to levels previously considered impossible and doing so by using just data and advanced analytics.
An amazing example of this is the BMW Oracle Americas cup boat and its usage of sensor data and deep analytics (story here).

Consider these two quotes from the article:

"They were measuring an incredible number of parameters across the trimaran, collected 10 times per second, so there were vast amounts of [sensor] data available for analysis. An hour of sailing generates 90 million data points."

"[…] we could compare our performance from the first day of sailing to the very last day of sailing, with incremental improvements the whole way through. With data mining we could check data against the things we saw, and we could find things that weren't otherwise easily observable and findable."

Winning the Cup

BMW Oracle Racing © Photo Gilles Martin-Raget

The end result of all of this (and do read the entire article, it is truly amazing with things like data projected in sunglasses!) that the guys on the boat can make a sailboat to go THREE times as fast as the wind that propels the boat.

To make this magic happen, a couple of things had to be done:

  1. Put the sensors in place and capture all the data
  2. Organize the data and analyze all of it in real-time
  3. Provide the decisions to the people who need it, exactly when they need it (like in the helmsman’s sunglasses!)
  4. Convince the best sailors in the world to trust and use the analysis to drive the boat

Since this blog is not about sailing but about data warehousing, big data and other (only slightly) less cool things, the intent is to explain how you can deliver magic like this in your company?

Move your company onto the next value curve

The above example gives you an actual environment where the combination of high volume, high velocity sensor data, deep analytics and real-time decisions are used to drive performance. This example is a real big data story.

Sure, a multi-billion dollar business will collect often more data, but the point of the above story is analyzing a previously unseen, massive influx of data – the team estimated 40x more data than in conventional environments. However, the extra interesting aspect is that decisions are automated. Rather than flooding the sunglasses with data, only relevant decisions and data are projected. No need for the helmsman to interpret the data, he needed to simply act on the decision points.

To project the idea of acting on decision points into an organization, your IT will have to start changing, as will your end users. To do so, you need to jump onto the bandwagon called big data. The following describes how to get on that bandwagon.

Today, your organization is doing the best it can by leveraging its current IT and DW platforms. That means – for most organizations – that you have squeezed all the relevant information out of the historical data assets you analyze. You are the dot on the lower value curve and you are on the plateau. Any extra dollar invested in the plateau is just about keeping the lights on, not about generating competitive advantage or business value. To jump to the next curve, you need to find some way to harness the challenges imposed by big data.

Value Curves Today

From an infrastructure perspective, you must design a big data platform. That big data platform is a fundamental part of your IT infrastructure if your company wants to compete over the next few years.

Value Curves Tomorrow

The main components in the big data platform provide:

  • Deep Analytics – a fully parallel, extensive and extensible toolbox full of advanced and novel statistical and data mining capabilities
  • High Agility – the ability to create temporary analytics environments in an end-user driven, yet secure and scalable environment to deliver new and novel insights to the operational business
  • Massive Scalability – the ability to scale analytics and sandboxes to previously unknown scales while leveraging previously untapped data potential
  • Low Latency – the ability to instantly act based on these advanced analytics in your operational, production environments

Read between the lines and you see that the big data platform is based on the three hottest topics in the industry: security, cloud computing and big data, all working in conjunction to deliver the next generation big data computing platform.

IT Drives Business Value

Over the next couple of years, companies which drive efficiency, agility and IT as a service via the cloud, which drive new initiatives and top line growth leveraging big data and analytics, keep all their data safe and secure, will be the leaders in their industry.

Oracle is building the next generation big data platforms on these three pillars: cloud, security and big data. Over the next couple of months – leading up to Oracle OpenWorld – we will cover details about Oracle’s analytical platform and in-memory computing for real-time big data (and general purpose speed!) on this blog.

A little bit of homework to prepare you for those topics is required. If you have not yet read the following, do give them a go, they are a good read:

These - older - blog posts will get you an understanding of in-database mapreduce techniques, how to integrate with Hadoop and a peak at some futuristic applications that I think would be generally cool and surely be coming down the pipeline in some form or fashion.

Sunday Sep 27, 2009

In-Memory Parallel Execution in Oracle Database 11gR2

As promised, the next entry in our 11gR2 explorations is In-Memory Parallel Execution. If you are going to Oracle OpenWorld next month make sure you check out the following session:

Tuesday, October 13 2009 5:30PM, Moscone South Room 308
Session S311420
Extreme Performance with Oracle Database 11g and In-Memory Parallel Execution.

In this session you will get more details and insight from the folks who actually built this functionality! A must see if this is of any interest, so book that ticket now and register!

Down to business, what is "In-Memory Parallel Execution"?

Let's begin by having a quick trip down memory-lane back to Oracle Database 7 when Parallel Execution (PX) was first introduced. The goal of PX then and now is to reduce the time it takes to complete a complex SQL statement by using multiple processes to go after the necessary data instead of just one process. Up until now these parallel server processes, typically by-passed the buffer cache and read the necessary data directly from disk. The main reasoning for this was that the objects accessed by PX were large and would not fit into the buffer cache. Any attempt made to read these large objects into the cache would have resulted in trashing the cache content.

However, as hardware systems have evolved; the memory capacity on a typical database server have become extremely large. Take for example the 2 CPU socket Sun server being used in new the Sun Oracle Database Machine. It has an impressive 72GB of memory, giving a full Database Machine (8 database nodes) over ½ a TB of memory. Suddenly using the buffer cache to hold large object doesn't seem so impossible any more.

In-Memory Parallel Execution (In-Memory PX) takes advantage of these larger buffer caches but it also ensures we don't trash the cache.

In-Memory PX begins by determining if the working set (group of database blocks) necessary for a query fits into the aggregated buffer cache of the system. If the working set does not fit then the objects will be accessed via direct path IO just as they were before. If the working set fits into the aggregated buffer cache then the blocks will be distributed among the nodes and the blocks will be affinitzed or associated with that node.

In previous releases, if the Parallel Execution of one statement read part of an object into the buffer cache, then subsequent SQL statement on other nodes in the cluster would access that data via Cache Fusion. This behavior could eventually result in a full copy of that table in every buffer cache in the cluster. In-Memory PX is notably different because Cache Fusion will not be used to copy the data from its original node to another node, even if a parallel SQL statement that requires this data is issued from another node. Instead Oracle uses the parallel server process on the same node (that the data resides on) to access the data and will return only the result to the node where the statement was issued.

The decision to use the aggregated buffer cache is based on an advanced set of heuristics that include; the size of the object, the frequency at which the object changes and is accessed, and the size of the aggregated buffer cache. If the object meets these criteria it will be fragmented or broken up into pieces and each fragment will be mapped to a specific node. If the object is hash partitioned then each partition becomes a fragment, otherwise the mapping is based on the FileNumber and ExtentNumber.




To leverage In-Memory PX you must set the initialization parameter PARALLEL_DEGREE_POLICY to AUTO (default MANUAL). Once this is set, the database controls which objects are eligible to be read into the buffer cache and which object will reside there at any point in time. It is not possible to manual control the behavior for specific statements.

Obviously this post is more of a teaser, for in-depth discussions on this, go to Openworld and/or keep an eye out for a new white paper called Parallel Execution Fundemental in Oracle Database 11gR2 that will be coming soon to This paper not only covers In-Memory PX but Auto-DOP and parallel statement queuing.

Stay tuned for more on 11gR2 coming soon...

Wednesday Mar 04, 2009

Managing Optimizer statistics in an Oracle Database 11g

Knowing when and how to gather optimizer statistics has become somewhat of dark art especially in a data warehouse environment where statistics maintenance can be hindered by the fact that as the data set increases the time it takes to gather statistics will also increase. By default the DBMS_STATS packages will gather global (table level), partition level, and sub-partition statistics for each of the tables in the database. The only exception to this is if you have hash sub-partitions. Hash sub-partitions do not need statistics, as the optimizer can accurately derive any necessary statistics from the partition level statistic because the hash partitions are all approximately the same size due to linear hashing algorithm.
As mentioned above the length of time it takes to gather statistics will grow proportionally with your data set, so you may now be wondering if the optimizer truly need statistics at every level for a partitioned table or if time could be saved by skipping one or more levels? The short answer is "no" as the optimizer will use statistics from one or more of the levels in different situations.

The optimizer will use global or table level statistics if one or more of your queries touches two or more partitions.

The optimizer will use partition level statistics if your queries do partition elimination, such that only one partition is necessary to answer each query. If your queries touch two or more partitions the optimizer will use a combination of global and partition level statistics.

The optimizer will user sub-partition level statistics if your queries do partition elimination, such that only one sub-partition is necessary. If your queries touch two more sub-partitions the optimizer will use a combination of sub-partition and partition level statistics.

How to gather statistics?
Global statistics are by far the most important statistics but they also take the longest time to collect because a full table scan is required. However, in Oracle Database 11g this issue has been addressed with the introduction of Incremental Global statistics. Typically with partitioned tables, new partitions are added and data is loaded into these new partitions. After the partition is fully loaded, partition level statistics need to be gathered and the global statistics need to be updated to reflect the new data. If the INCREMENTAL value for the partition table is set to TRUE, and the DBMS_STATS GRANULARITY parameter is set to AUTO, Oracle will gather statistics on the new partition and update the global table statistics by scanning only those partitions that have been modified and not the entire table. Below are the steps necessary to do use incremental global statistics

SQL> exec dbms_stats.set_table_prefs('SH', 'SALES', 'INCREMENTAL', 'TRUE');

SQL> exec dbms_stats.gather_table_stats( Owname=>'SH', Tabname=>'SALES', Partname=>'23_MAY_2008', Granularity=>'AUTO');

Incremental Global Stats works by storing a synopsis for each partition in the table. A synopsis is statistical metadata for that partition and the columns in the partition. Each synopsis is stored in the SYSAUX tablespace and takes approximately 10KB. Global statistics are generated by aggregating the synopses from each partition, thus eliminating the need for the full table scan (see Figure below). When a new partition is added to the table you only need to gather statistics for the new partition. The global statistics will be automatically updated by aggregating the new partition synopsis with the existing partitions synopsis.


But what if you are not using Oracle Database 11g and you can't afford to gather partition level statistic (not to mention global statistics) after data is loaded? In Oracle Database 10g ( you can use the DBMS_STATS.COPY_TABLE_STATS procedure. This procedure enables you to copy statistics from an existing [sub] partition to the new [sub] partition and will adjust statistics to account for the additional partition of data (for example the number of blks, number of rows). It sets the new partition's high bound partitioning value as the maximum value of the first partitioning column and high bound partitioning value of the previous partition as the minimum value of the first partitioning column for a range partitioned table. For a list-partitioned table it will find the maximum and minimum from the list of values.

SQL>exec dbms_stats.copy_table_stats('sh','sales','sales_q3_2000','sales_q44_2000', force=>TRUE);

When should you gather Statistics?
If you use the automatic stats job or dbms_stats.gather_schema_stats with the option "GATHER AUTO", Oracle only collect statistics at the global level if the table has changed more than 10% or if the global statistics have not yet been collected. Partition level statistics will always be gathered if they are missing. For most tables this frequency is fine.
However, in a data warehouse environment there is one scenario where this is not the case. If a partition table is constantly having new partitions added and then data is loaded into the new partition and users instantly begin querying the new data, then it is possible to get a situation where an end-users query will supply a value in one of the where clause predicate that is outside the [min,max] range for the column according to the optimizer statistics. For predicate values outside the statistics [min,max] range the optimizer will prorates the selectivity for that predicate based on the distance between the value the max (assuming the value is higher than the max). This means, the farther the value is from the maximum value the lower is the selectivity will be, which may result in sub-optimal execution plans.
You can avoid this "Out of Range" situation by using the new incremental Global Statistics or the copy table statistics procedure.

More information on Incremental Global Statistics or the copy table statistics procedure can be found on the Optimizer developers blog.


The data warehouse insider is written by the Oracle product management team and sheds lights on all thing data warehousing and big data.


« April 2014