Thursday Mar 31, 2016

In-Memory Parallel Query

In an earlier post we looked at how you can see if a query uses Oracle Database In-Memory (DBIM) or In-Memory Parallel Query (IMPQ). In this post let's look at what IMPQ is and how it works in 12c.

What is IMPQ?

Before 11gR2 Parallel Execution (PX) in the Oracle Database used direct reads to scan large tables and bypassed the buffer cache. As memory sizes got bigger and SGA sizes got bigger as a result, IMPQ was introduced in 11gR2 to make use of the SGA for data warehouse workloads. IMPQ is a performance enhancement feature that caches the data in the SGA so that further scans against the same data can avoid IO and read the data from the buffer cache much more faster. IMPQ makes use of the aggregated buffer cache across all nodes in a RAC cluster so that more data can be cached than can be done in a single server's memory. It is not a RAC only feature, it kicks in for single instance databases too.

How does IMPQ work?

As you may already know PX uses granules to scan objects. Each object in a statement is divided into granules and granules are assigned to PX servers so that each process can read a part of the object. A granule can be a partition or a range of blocks depending on the physical definition of the object, the degree of parallelism (DOP), and the execution plan.

With IMPQ, for the first statement that accesses an object each PX server reads its granules and caches them in the buffer cache of the node it is on. In the following example a table is divided into four granules, PX server P1 on node 1 reads two of them and stores them in the buffer cache. PX server P2 on node 2 does the same for the other two granules. Note that the number of granules changes based on the object size and the DOP, I show four granules here for demonstration purposes.

Now, half of the table is cached in node 1 and the other half is cached on node 2. When another parallel statement comes and accesses the same table, as you can see in the below picture the PX servers assigned to this statement (P3 and P4) will read the data from their node's buffer cache instead of going to storage and doing IO. Note that cache fusion is not used to read data from other node's memory, all buffer cache access is local.

In the following example, table T is 700MB and I have a 2-node RAC database, each node with a 500MB buffer cache. We see that the first execution did close to 90K physical reads and finished in around 4 seconds. The second execution shows no physical reads and finished in less than a second. This means the table was cached during the first execution and read from the cache during the second execution. Note that the first execution of the statement can take longer than without IMPQ because of the overhead of populating the buffer cache.

SQL> set autot on stat
SQL> select /*+ parallel(2) */ count(*) from t;


Elapsed: 00:00:03.58

          7  recursive calls
          4  db block gets
      90486  consistent gets
      89996  physical reads
        248  redo size
        544  bytes sent via SQL*Net to client
        551  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> select /*+ parallel(2) */ count(*) from t;


Elapsed: 00:00:00.68

          6  recursive calls
          0  db block gets
      90485  consistent gets
          0  physical reads
        248  redo size
        544  bytes sent via SQL*Net to client
        551  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

When does IMPQ kick in?

IMPQ is tied to the parameter parallel_degree_policy. You need to set it to AUTO to enable IMPQ. Be aware that this setting also enables Auto DOP and Parallel Statement Queuing.

If IMPQ is enabled there are two decisions to be made for a statement to use IMPQ.

1. First the query coordinator (QC) decides to use IMPQ for an object or not. This decision is based on the object size to be scanned. By default if the object size is between 2% and 80% of the total buffer cache size in the cluster the object is a candidate for IMPQ. If the object is larger IMPQ will not be used and that object will be accessed using direct reads bypassing the buffer cache. For smaller objects IMPQ will not be used but the object can be cached like an object accessed by a serial query, Cache Fusion will be used to transfer data between nodes when necessary. For object size Oracle looks at the optimizer stats if they have been gathered, if not it looks at the actual object size.

If the query is using partition granules the object size is the total size of the partitions after static partition pruning. For dynamic partition pruning where the exact partitions to be scanned are determined at runtime the total table size is used in the IMPQ decision.

The total buffer cache size is calculated as the size of the buffer cache of the node where the QC resides multiplied by the number of nodes.

2. After the decision to use IMPQ is made the QC decides which granules will be cached on which node. It affinitizes each granule to a node so that subsequent accesses to the same granules are done on the same nodes. For partition granules this affinity is done based on the partition number, for block range granules it is done based on the data block address of the first block in the granule.

At this point the granules are assigned to PX servers on each node. Here each PX server makes the decision to use direct reads or buffered reads based on a set of heuristics and tries to cache as much data as possible. Depending on this decision, it is still possible to see direct reads even if the table is a candidate for IMPQ.

IMPQ adaptive offloading on Exadata

The smart scan capability of Exadata kicks in if an object is scanned using direct reads. If IMPQ tries to cache an object the object will be scanned using buffered reads and smart scan will not kick in. In this case you will see the wait event cell multiblock physical read instead of cell smart table scan. Not using smart scans means you will be using the database nodes to do filtering and projection. This can cause a query to run slower if that query benefits a lot from smart scan.

To prevent this PX servers use adaptive offloading on Exadata. The first scan of an object bypasses offloading and populates the buffer cache. In subsequent scans of the same object each PX server calculates the buffer cache scan rate and smart scan rate. Depending on the ratio of these rates PX servers will favor direct reads or buffer cache reads. So, if smart scan is very fast for your query you will see that PX servers will use more direct reads than buffer cache reads. For example, if the ratio of the smart scan rate to buffer cache read rate is 3:1 PX servers will scan 3x number of granules using direct reads compared to the number of granules scanned from the buffer cache. This rate comparison is done for each query execution by PX servers.

Things to be aware of

IMPQ on heterogeneous clusters

As of IMPQ assumes the RAC cluster is homogenous which means every RAC instance has the same buffer cache size. If you have instances with different buffer cache sizes you can see IMPQ being used or not depending on which instance your session is on.

For example, if you have a 2-node RAC with node 1 having a 300MB buffer cache and node 2 having a 700MB buffer cache, IMPQ will use 600MB as the total buffer cache size if you are on node 1, it will use 1400MB if you are on node 2. So, if you query a table of 700MB, IMPQ may or may not kick in, if you are on node 2 it will try to cache the table, if you are on node 1 it will not.

To make use of IMPQ efficiently we recommend sizing the buffer cache equally on all instances.

Effect of DOP change on IMPQ

As of granule-node affinity in IMPQ depends on the statement DOP. If you query a table with DOP=4 and then query the same table with DOP=8 the node affinity may not be the same. This means the second query may do IO to read the table even if the table is totally cached. This is because the node affinity depends on the starting data block address of a granule, DOP change means the number and size of granules can change which means the starting data block address can change.

DOP and the number of instances

Auto DOP automatically rounds up the DOP to a multiple of the number of instances so that every instance has equal number of PX servers. This means the table will be cached on all instances uniformly. If you are using hints to specify the DOP or if you are using Database Resource Manager(DBRM) to limit the DOP, make sure to set the DOP to a multiple of the number of instances to get the same behavior, this is because hints and DBRM override Auto DOP.

Tuesday Mar 22, 2016

Data loading into HDFS - Part2. Data movement from the Oracle Database to the HDFS

[Read More]

Monday Mar 21, 2016

SQL Pattern Matching deep dive - Part 1

There has been quite a lot of interest in the new 12c MATCH_RECOGNIZE feature for SQL pattern matching. Therefore, I thought now would be the perfect time to start a series of quick deep dive posts that explain how SQL pattern matching works. Over the coming weeks I will cover the following topics in a series of posts.

This is the start of a series of posts based on a presentation that I put together for the recent annual BIWA conference at Oracle HQ. The Oracle BI, DW and Analytics user community always puts on a great conference and this year was the best yet. You can download any or all of the presentations from this year’s conference by following this link. My pattern matching deep dive presentation started life about a year ago as a post covering some of the new keywords in the explain plan that are linked to pattern matching, see here. It has now expanded to cover a much wider range of topics.
The aim of this group of posts is to help you understand the underlying mechanics of the MATCH_RECOGNIZE clause. During these posts we will explore key concepts such as: how to get consistent results, using built-in debugging functions, deterministic vs. non-deterministic state machines, back-tracking (what is it and how to identify when it is occurring), and finally greedy vs. reluctant quantifiers. If you need a quick refresher on how MATCH_RECOGNIZE works then I would recommend that you take a look at the following links[Read More]

Thursday Mar 17, 2016

Big Data SQL 3.0 is now available!

Oracle Big Data SQL 3.0 is now available!  This is an exciting milestone for Oracle.  With support for Cloudera CDH (both on Big Data Appliance and non-Big Data Appliance), Hortonworks HDP and Oracle Database 12c (both Exadata and non-Exadata) - the benefits derived from unified queries across relational, Hadoop and NoSQL stores can now be achieved across a wide breadth of big data deployments.

Hadoop and NoSQL are rapidly becoming key components of today's data management platform, and many Oracle Database customers use Hadoop or NoSQL in their organization. Using multiple data management solutions typically lead to data silos, where different people and applications can only access a subset of the data needed. Big Data SQL offers an industry leading solution to deliver one fast, secure SQL query on all data: in Hadoop, Oracle Database, and NoSQL. Big Data SQL leverages both innovations from Oracle Exadata and specific Hadoop features to push processing down to Hadoop, resulting in minimized data movement and extreme performance for SQL on Hadoop. 

In summary, Big Data SQL 3.0: 

  • Expands support for Hadoop platforms - covering Hortonworks HDP, Cloudera CDH on commodity hardware as well as on Oracle Big Data Appliance
  • Expands support for database platforms - covering Oracle Database 12c on commodity hardware as well as on Oracle Exadata
  • Improves performance through new features like Predicate Push-Down on top of Smart Scan and Storage Indexes 


To learn more:


Wednesday Mar 16, 2016

Maximum Availability Architecture for Big Data Appliance

Oracle Maximum Availability Architecture (MAA) is Oracle's best practices blueprint based on proven Oracle high availability technologies, along with expert recommendations and customer experiences. MAA best practices have been highly integrated into the design and operational capability of Oracle Big Data Appliance, and together they provide the most comprehensive highly available solution for Big Data.

Oracle MAA papers are published at the MAA home page of the Oracle Technology Network (OTN) website. Oracle Big Data Appliance (BDA) Maximum Availability Architecture is a best-practices blueprint for achieving an optimal high-availability deployment using Oracle high-availability technologies and recommendations.

The Oracle BDA MAA exercise for this paper was executed on Oracle Big Data Appliance and Oracle Exadata Database Machine to validate high availability and to measure downtime in various outage scenarios. The current release of this technical paper covers the first phase of the overall Oracle BDA MAA project. The project comprises the following two phases:

Phase 1: High Availability and Outage scenarios at a single site

Phase 2: Disaster Recovery Scenarios across multiple sites

The white paper covering Phase 1 is now published here

Tuesday Mar 15, 2016

Big Data SQL Quick Start. Parallel Query - Part3.

[Read More]

Sunday Mar 06, 2016

Hadoop Compression. Choosing compression codec. Part2.

Many customers are keep asking me about "default" (single) compression codec for Hadoop. Actually answer on this question is not so easy and let me explain why.

Bzip2 or not Bzip2?

In my previous blogpost I published results of the compression rate for some particular compression codecs into Hadoop. Based on those results you may think that it’s a good idea to compress everything with bzip2. But be careful with this. Within the same research, I noted that bzip2 actually has on average 3 times worse performance than Gzip for querying (decompress) and archive (compress) data (it’s not surprising based on the complexity of algorithm).  Are you ready to sacrifice performance? I think it will depend on the compression benefits derived from bzip2 and the frequency of querying this data (compression speed is not so import after data is stored in Hadoop systems since you usually compress data once and read it many times).  On average, bzip2 is 1.6 times better than gzip.  But, again my research showed that sometimes you can achieve 2.3 times better compression, while other times you may gain only 9% of the disk space usage (and performance is still much worse compared to gzip and other codecs). Second factor to keep in mind is the frequency of data querying and your performance SLAs. If you don’t care about query performance (don’t have any SLAs) and you select this data very rarely – bzip2 could be good a candidate.  Otherwise consider other options. I encourage you to benchmark your own data and decide for yourself “Bzip2 or not Bzip2”.

[Read More]

Friday Feb 19, 2016

How Can Graph Databases and Analytics Help Your Business?

Several videos describing the value of graph analyses are now available from our Oracle Big Data Spatial and Graph + Oracle Labs teams.


Check out this blog post for details :). 

Big Data Lite 4.4.0 is now available on OTN

big data lite

It's now available for download on OTN.  Check out this VM to help you learn about Oracle's big data platform.[Read More]

Thursday Feb 04, 2016

Hadoop Compression. Compression rate. – Part1.

Compression codecs.

Text files (csv with “,” delimiter):

Codec Type  Average rate  Minimum rate  Maximum rate
bzip2 17.36 3.88 61.81
gzip 9.73 2.9 26.55
lz4 4.75 1.66 8.71
snappy 4.19 1.61 7.86
lzo 3.39 2 5.39

RC File: 

Codec Type Average rate Minimum rate Maximum rate
 bzip2 17.51 4.31 54.66
 gzip 13.59 3.71 44.07
 lz4 7.12 2 21.23
 snappy 6.02 2.04  15.38
 lzo 4.37 2.33 7.02

Parquet file:

Codec Type Average rate Minimum rate Maximum rate
 gzip 17.8 3.9 60.35
 snappy 12.92 2.63 45.99

[Read More]

Using Spark(Scala) and Oracle Big Data Lite VM for Barcode & QR Detection

Big Data and Scalable Image Processing and Analytics

Guest post by Dave Bayard - Oracle's Big Data Pursuit Team 

One of the promises of Big Data is its flexibility to work with large volumes of unstructured types of data such as images and photos. In todayís world, there are many sources of images including social media photos, security cameras, satellite images, and more. There are many kinds of image processing and analytics that are possible from optical character recognition (OCR), license plate detection, bar code detection, face recognition, geological analysis and more. And there are many open source libraries such as OpenCV, Tesseract, ZXing, and others that are available to leverage.

[Read More]

Tuesday Jan 26, 2016

Big Data SQL Quick Start. Offloading - Part2.

After reading these articles: Big Data SQL Quick Start. Introduction, One fast Query All your Data, you know what Big Data SQL is, and you understandthat it allows you query data from Hadoop through the Oracle Database.  But you also should to know that it’s not just reading data. Big Data SQL allows you to process data stored in HDFS locally and return back to the database only data relevant to the query. Let’s imagine a simple diagram of a data management system that includes Oracle Database and Hadoop:

[Read More]

Monday Jan 25, 2016

Parallel Correlated Filters and Expressions in 12c

In the last post we looked at multiple parallelizers and how they could cause a SQL statement use more PX servers than expected. You will notice that some types of statements behave differently in 12c compared to 11g in this regard. For these statements 12c does not use multiple parallelizers, thereby avoiding the negative consequences of multiple parallelizers.

Correlated filters

Here is a simple query that checks if the sales table has any rows without a corresponding channel_id in the channels table.

[Read More]

Tuesday Jan 19, 2016

Big Data SQL Quick Start. Introduction - Part1.

Today I am going to explain steps that required to start working with Big Data SQL. It’s really easy!  I hope after this article you all will agree with me. First, if you want to get caught up on what Big Data SQL is, I recommend that you read these blogs: Oracle Big Data SQL: One Fast Query, Big Data SQL 2.0 - Now Available.

The above blogs cover design goals of Big Data SQL. One of the goals of Big Data SQL is transparency. You just define table that links to some directory in HDFS or some table in HCatalog and continue working with it like with general Oracle Database table.It’s also useful to read the product documentation.

Your first query with Big Data SQL

Let’s start with simplest one example and query data that is actually stored in HDFS via Oracle Database using Big Data SQL. I’m going to begin this example by checking of the data that actually lies into HDFS. To accomplish this, I run the hive console and check hive table DDL:

[Read More]

Wednesday Jan 13, 2016

BIWA 2016 - here's my list of must-attend sessions and labs

It’s almost here - the 2016 BIWA conference at the Oracle Conference Center. The conference starts on January 26 with a welcome by the conference leaders at 8:30am. The BIWA summit is dedicated to providing all the very latest information and best practices for data warehousing, big data, spatial analytics and BI. This year the conference has expanded to include the most important query language on the planet: SQL. There will be a whole track dedicated to YesSQL! The full agenda is available here

Unfortunately I won’t be able to attend this year’s conference but if I was going to be there, then this would be my list of must-attend sessions and hands-on labs.

[Read More]

The data warehouse insider is written by the Oracle product management team and sheds lights on all thing data warehousing and big data.


« June 2016