Since 3.0 version Big Data SQL has awesome feature - Predicate Push Down (PPD). It works over ORC files, Parquet Files and few NoSQL Databases (like HBase). I plan to devote separate topic for NoSQL databases and Big Data SQL, today i'm going to explain ORC and Parquet file case. For both files there is using the same approach and I'll explain ORC case (Parquet is the same). If you are not familiar with those filetypes I advice you to read this documentation about Parquet File and this about ORC. So, this files has some meta information inside. ORC files has minimum and maximum value for each 10 000 rows (it's default value and defined with orc.row.index.stride parameter).
It means that with fast lookup we could understand eather scan this stripe or no. if we scan pretty unique column (like primary key), we could skip a lot of IO operations. Let's have an example. I have table that contain two stripes and want to scan it with using B predicate.
if i will use "B=6" predicate in "WHERE" clause, i will skip scan of first block. If i will use "B=15" i will skip both stripes.
Big Data SQL workflow.
Now it's a good time to explain how Big Data SQL workload looks like. Let's imagine that we have table stock_prices in HDFS which is ORC (or Parquet). Over this query we run follow query:
Here is workflow:
1) Reading Data from disks to the External Table Services Level
On the first step we read data from disk and we could do few very significant optimizations here:
- Column pruning. If table stock_prices has 100 columns, but in query we list only 4 (ticker, price, mnth, stx_xchange), other 96 will not be readed from disk.
- Pradicate Push Down In given example we do have to columns in where clause and two statements to be pushed:
using ORC index we could eliminate part of the reads. We will not read the data, which definitely not match by our where predicate and only read the data that may have rows that match our conditions.
I borrowed one picture from this presentation about parquet file, which illustrates perfectly what we could achieve with Column Pruning and Predicate Push Down
2) Transform data to Oracle RDBMS format and transfer it toward Smart Scan
On the second step we transform the data from HDFS format (Binary, Avro, ORC, Parquet) to Oracle RDBMS format. This is very CPU intensive step, here we apply SerDe and convert data to the Oracle RDBMS types. If we prune columns and rows (thanks for Predicate Push Down feature ) on the previous step, we could prevent a lot of CPU burning.
3) Smart Scan
on the next step will be applied Smart Scan over "WHERE" predicates.
If Predicate Push Down eliminate part of the data, Smart Scan does exact filtration. Also it could do some more operations rather than just predicates. All list of those functions you cuold find here:
4) Oracle RDBMS part.
This step is out of my picture, but it is. Not all functions could be applied on the Smart Scan level (like pl/sql, window functions) and this part of the query passes towards the datatabase. In our example AVG(price) OVER (PARTITION BY ticker ORDER BY mnth) as avg_price and ORDER BY will we done on the database level.
all picture looks like this:
Predicate Push Down good/bad use case.
For testing good and bad use cases for Predicate Push Down feature I got table from Intel Big Bench - store_sales. It's relatively big table.
Data is compressed with ZLIB codec (default one for ORC):
From statistics tables I found most selective and unselective columns:
now I will create two external tables. One (STORE_SALES_ORC) enable predicate pushdcontain, second (STORE_SALES_ORC_NOPPD) will disable is. Predicate Push Down enabled by default. If you want to disable it, you need to add com.oracle.bigdata.ppd=FALSE parameter in "create table" statement, like this:
Now let's have a tests.
1) Filtering using unselective predicate:
done in 89 seconds
done in 84 seconds
2) Filtering using selective predicate:
done in 9 seconds
done in 87 seconds
Predicate Push Down Feature even could accelerate Join oprration. Not all joins, but joins where exist predicate operator and even implicit request, like in nested loop.
Let's have join our big fact tables (store_sales_orc and store_sales_orc_noppd each one is 6 385 178 703 rows)with small table date_dim_orc (which is 109573 rows only). I enforce nested loop with correspond hint, but I did this for demo purposes, don't do this on your production system, let optimizer decide which type of join use.
both joins over table with enabled PPD feature:
and over the table where PPD feature is disable:
has the same query plan:
Which has predicates to store_sales table:
but Join over table with enabled PPD feature was done in 11 seconds. Join over the table with disabled PPD feature in 128 seconds. We got this effect only because we require only small dataset from store_sales table (from fact table). This feature will not be so efficient if we will need all data set or significant part of it.
Note: this happens only if join type is Nested Loop.
4) Cumulative predicate
Predicates are cumulitive, which means that if you will run query over few unselective (which is bad for BDS) predicates cululitively they will give you selective dataset and performance will be great.I took 5 most unselective predicates from previous example and run the query:
How to monitor predicate pushdown.
To determ whether it will work or not you have to check query plan. As usual there are tow most common ways either Enterprise Manager:
or explain plan statement:
In both cases you have to check that predicate exist over the big table that you are store into HDFS.
1) Predicate Push Down works perfectly for selective queries (which return only few rows).
2) Pradicate Push Down work a bit slower when it works over unselective columns (when we need to scan each stripe + extra overhead for index lookup). But difference is so small, that we could just neglecte it.
3) Column projection always applies. And that's great because it save so many CPU preventing data type conversion.
4) Joins where we need small piece of the data from big fact table also could be accelerated (if nested loop works).