Friday Aug 05, 2011

Big Data: In-Memory MapReduce

Achieving the impossible often comes at a cost. In the today’s big data world that cost is often latency. Bringing real-time, deep analytics to big data requires improvements to the core of the big data platform. In this post we will discuss parallel analytics and how we can bring in-memory – or real-time – to very large data volumes.

The Parallel Platform that run in Memory

One of the often overlooked things we did in Oracle 11.2 is that we changed the behavior of data that resides in memory across a RAC cluster. Instead of restricting the data size to the size of a the memory in a single node in the cluster, we now allow the database to see the memory as a large pool. We no longer do cache fusion to replicate all pieces of data in an object to all nodes.


That cache fusion process is show above. In state A, node 1 has acquired pieces of data into the buffer cache, as has node 2. These are different pieces of data. Cache fusion now kicks in and ensures that all data for that object is in the cache of each node.

In 11.2 we have changed this by allowing parallel execution to leverage the buffer cache as a grid. To do that we no longer do cache fusion, but instead pin (affinitize is the not so nice English word we use) data onto the memory of a node based on some internal algorithm. Parallel execution keeps track of where data lives and shuffles the query to that node (rather than using cache fusion to move data around).


The above shows how this works. P1 and P2 are chunks of data living in the buffer cache and are affinitized with a certain node. The parallel server processes on that node will execute the query and send the results back to the node that originated the query. With this in-memory grid we can process much larger data volumes.

In-Database MapReduce becomes In-Memory MapReduce

We talked about in-database MapReduce quite a bit on this blog, so I won’t repeat myself. If you are new to this topic, have a look at this post.

Because of how the database is architected, any code running within it leveraging parallelism can now use the data hosted in memory of the machine. Whether this is across the grid or not, doesn’t matter. So rather than having to figure out how to create a system that allows for MapReduce code to run in memory, you need to just figure out how to write the code, Oracle will ensure that if the data fits, it will leverage memory instead of disk. That is shown in the following picture.


Now, is this big data? Well, if I look at an Exadata X2-8  machine today, we will have 2TB of memory to work with and with Exadata Hybrid Columnar Compression (yes the data resides in memory on compressed state) this means I should easily be able to run upwards of 10TB in memory.  As memory footprints grow, more data will fit within the memory and we can do more and more interesting analytics on that data, bringing at least realtime to some fairly complex analytics.

More at Oracle Openworld

For those of you who are going to San Francisco (with or without flowers in your hat), expect to see and hear a lot more on big data and MapReduce! See you there!

Monday Jun 13, 2011

Parallel Execution – Precedence of Hints and other Factors

The following table is a reflection of the precedence of hints, things like alter session enable parallel DML when using Auto DOP. It is also important to understand how the DML and query parts work together and how they influence each other.

All of the below is based on a simple statement:

insert into t3 as select * from t1, t2 where t1.c1 = t2.c1;


Some explanatory words based on the lines in the picture above:

Line 1 => The cleanest way to run PX statements, where Auto DOP gets to do its work and we will use the computed DOP of the statement

Line 4 => Because a FORCE parallel is used, we must ensure that the DOP > 1

Line 9 => The statement level hint over rides all other means and we run the statement with the DOP in the hint

A word on internal degree limit. This is NOT (repeat NOT) a parameter you can find, or set or find an underscore for. It is the built in limit to DOPs (set to default DOP or parallel_degree_limit = CPU). It is an internal boundary to ensure we do not blast through the upper bound of CPU power. Also note, for any non-compute degree, those boundaries and limits do not apply. That in itself is a reason to go look at and understand Auto DOP.

Tuesday Jun 07, 2011

Take a Class on Parallel Execution

Thanks to our curriculum development team, we now have a comprehensive, 1 day class on all things parallel. The class has just been released to the public, so there are no scheduled dates yet. If you are interested in taking the class, which can be instructor-led (in-person), a live virtual class or self paced, please register your interest at the course site.


Friday May 20, 2011

Screencasts on Parallel Execution are now Live

New screencasts discussing automatic parallelism in Oracle 11g Release 2[Read More]

Sunday Sep 27, 2009

In-Memory Parallel Execution in Oracle Database 11gR2

As promised, the next entry in our 11gR2 explorations is In-Memory Parallel Execution. If you are going to Oracle OpenWorld next month make sure you check out the following session:

Tuesday, October 13 2009 5:30PM, Moscone South Room 308
Session S311420
Extreme Performance with Oracle Database 11g and In-Memory Parallel Execution.

In this session you will get more details and insight from the folks who actually built this functionality! A must see if this is of any interest, so book that ticket now and register!

Down to business, what is "In-Memory Parallel Execution"?

Let's begin by having a quick trip down memory-lane back to Oracle Database 7 when Parallel Execution (PX) was first introduced. The goal of PX then and now is to reduce the time it takes to complete a complex SQL statement by using multiple processes to go after the necessary data instead of just one process. Up until now these parallel server processes, typically by-passed the buffer cache and read the necessary data directly from disk. The main reasoning for this was that the objects accessed by PX were large and would not fit into the buffer cache. Any attempt made to read these large objects into the cache would have resulted in trashing the cache content.

However, as hardware systems have evolved; the memory capacity on a typical database server have become extremely large. Take for example the 2 CPU socket Sun server being used in new the Sun Oracle Database Machine. It has an impressive 72GB of memory, giving a full Database Machine (8 database nodes) over ½ a TB of memory. Suddenly using the buffer cache to hold large object doesn't seem so impossible any more.

In-Memory Parallel Execution (In-Memory PX) takes advantage of these larger buffer caches but it also ensures we don't trash the cache.

In-Memory PX begins by determining if the working set (group of database blocks) necessary for a query fits into the aggregated buffer cache of the system. If the working set does not fit then the objects will be accessed via direct path IO just as they were before. If the working set fits into the aggregated buffer cache then the blocks will be distributed among the nodes and the blocks will be affinitzed or associated with that node.

In previous releases, if the Parallel Execution of one statement read part of an object into the buffer cache, then subsequent SQL statement on other nodes in the cluster would access that data via Cache Fusion. This behavior could eventually result in a full copy of that table in every buffer cache in the cluster. In-Memory PX is notably different because Cache Fusion will not be used to copy the data from its original node to another node, even if a parallel SQL statement that requires this data is issued from another node. Instead Oracle uses the parallel server process on the same node (that the data resides on) to access the data and will return only the result to the node where the statement was issued.

The decision to use the aggregated buffer cache is based on an advanced set of heuristics that include; the size of the object, the frequency at which the object changes and is accessed, and the size of the aggregated buffer cache. If the object meets these criteria it will be fragmented or broken up into pieces and each fragment will be mapped to a specific node. If the object is hash partitioned then each partition becomes a fragment, otherwise the mapping is based on the FileNumber and ExtentNumber.




To leverage In-Memory PX you must set the initialization parameter PARALLEL_DEGREE_POLICY to AUTO (default MANUAL). Once this is set, the database controls which objects are eligible to be read into the buffer cache and which object will reside there at any point in time. It is not possible to manual control the behavior for specific statements.

Obviously this post is more of a teaser, for in-depth discussions on this, go to Openworld and/or keep an eye out for a new white paper called Parallel Execution Fundemental in Oracle Database 11gR2 that will be coming soon to This paper not only covers In-Memory PX but Auto-DOP and parallel statement queuing.

Stay tuned for more on 11gR2 coming soon...


The data warehouse insider is written by the Oracle product management team and sheds lights on all thing data warehousing and big data.


« July 2016