Tuesday Mar 15, 2016
Sunday Mar 06, 2016
By Alexey Filanovskiy-Oracle on Mar 06, 2016
Many customers are keep asking me about "default" (single) compression codec for Hadoop. Actually answer on this question is not so easy and let me explain why.
Bzip2 or not Bzip2?
In my previous blogpost I published results of the compression rate for some particular compression codecs into Hadoop. Based on those results you may think that it’s a good idea to compress everything with bzip2. But be careful with this. Within the same research, I noted that bzip2 actually has on average 3 times worse performance than Gzip for querying (decompress) and archive (compress) data (it’s not surprising based on the complexity of algorithm). Are you ready to sacrifice performance? I think it will depend on the compression benefits derived from bzip2 and the frequency of querying this data (compression speed is not so import after data is stored in Hadoop systems since you usually compress data once and read it many times). On average, bzip2 is 1.6 times better than gzip. But, again my research showed that sometimes you can achieve 2.3 times better compression, while other times you may gain only 9% of the disk space usage (and performance is still much worse compared to gzip and other codecs). Second factor to keep in mind is the frequency of data querying and your performance SLAs. If you don’t care about query performance (don’t have any SLAs) and you select this data very rarely – bzip2 could be good a candidate. Otherwise consider other options. I encourage you to benchmark your own data and decide for yourself “Bzip2 or not Bzip2”.[Read More]
Thursday Feb 04, 2016
By Alexey Filanovskiy-Oracle on Feb 04, 2016
Text files (csv with “,” delimiter):
|Codec Type||Average rate||Minimum rate||Maximum rate|
|Codec Type||Average rate||Minimum rate||Maximum rate|
|Codec Type||Average rate||Minimum rate||Maximum rate|
Thursday Jan 07, 2016
By Alexey Filanovskiy-Oracle on Jan 07, 2016
Today I’m going to start first article that will be devoted by very important topic in Hadoop world – data loading into HDFS. Before all, let me explain different approaches of loading and processing data in different IT systems.
Schema on Read vs Schema on Write
So, when we talking about data loading, usually we do this into system that could belong on one of two types. One of this is schema on write. With this approach we have to define columns, data formats and so on. During the reading every user will observe the same data set. As soon as we performed ETL (transform data in format that mostly convenient to some particular system), reading will be pretty fast and overall system performance will be pretty good. But you should keep in mind, that we already paid penalty for this when were loading data. Like example of schema on write system you could consider Relational data base, for example, like Oracle or MySQL.
Schema on Write
Another approach is schema on read. In this case we load data as-is without any changing and transformations. With this approach we skip ETL (don’t transform data) step and we don’t have any headaches with data format and data structure. Just load file on file system, like coping photos from FlashCard or external storage to your laptop’s disk. How to interpret data you will decide during the data reading. Interesting stuff that the same data (same files) could be read in different manner. For instance, if you have some binary data and you have to define Serialization/Deserialization framework and using it within your select, you will have some structure data, otherwise you will get set of the bytes. Another example, even if you have simplest CSV files you could read the same column like a Numeric or like a String. It will affect on different results for sorting or comparison operations.
Schema on Read
Hadoop Distributed File System is classical example of schema on read system.More details about Schema on Read and Schema on Write approach you could findhere. Now we are going to talk about data loading data into HDFS. I hope after explanation above, you understand that data loading into Hadoop is not equal of ETL (data doesn’t transform).
Tuesday Nov 11, 2014
By nbayliss-Oracle on Nov 11, 2014
Most of you will be familiar with partition pruning, where the Oracle Database will avoid the need to scan table and index partitions based on query predicates. This optimization is transparent to your application, but for it to work, the database has to find a way of mapping a query filter predicate to the partitioning key column (or columns). Partition pruning can only occur if the query has predicates that match the predetermined shape of a partitioned object. For example, a query on a SALES table partitioned by ORDER_DATE will need to include ORDER_DATE in a join or WHERE clause for it to be optimized by partition pruning.
What if you could do better than this? What if you could prune partitions using a variety of column predicates and dimension hierarchies, irrespective of their appearance in the partitioning key? How about pruning at a much finer level of granularity than a partition? Perhaps we want to optimize queries that filter SALES by SHIP_DATE, STATE and COUNTY, as well as ORDER_DATE. The new Oracle 22.214.171.124 zone map feature is designed to achieve this, and just like partitioning, zone maps are transparent to your queries; you don’t have to change your applications to make use of them.
Zone maps are available in Oracle Database 12c for Oracle Engineered Systems. Conceptually, they divide a table up into contiguous regions of blocks called zones (the default zone size being 1024 blocks). For each zone, the Oracle database records the minimum and maximum values for specified columns using a new database object called a zone map. Queries that filter on zone map columns have the potential to be optimized; it’s possible to prune zones that contain ranges of column values outside the match specified in the query predicate.
Consider a query that filters a sales table by (North American) state; in this case “CA”. A zone map on the STATE column will record the minimum and maximum values for this column for each zone in the table. This makes it possible to skip the zones that we can be certain won’t contain rows for “CA”.
You are probably aware that Exadata storage cells and the Oracle database In-Memory Column Store uses similar storage index techniques, so what benefits do zone maps add? Besides the fact that you can control zone maps explicitly, the most significant difference between zone maps and storage indexes is that zone maps can be used to prune zones using column predicates from multiple (joined) tables. Consider a more realistic scenario, in which the SALES table doesn’t have a STATE column, but instead has a LOCATION_ID referencing a dimension table called LOCATIONS. This is our query for summing the sales figures in California:
SELECT SUM(amount) FROM sales s, locations l WHERE s.location_id = l.location_id AND l.state = 'CA';
It would be great if we could avoid scanning zones in SALES that don’t contain rows associated with “CA”. Before we look at how we can do this, we’ll make the scenario even more realistic by assuming that LOCATIONS is a dimensional hierarchy of State and County, like this:
Each State is made up of multiple Counties, so “CA” will be associated with multiple LOCATION_ID values. If we want the “CA” rows in SALES, we’ll need to match the ones marked below in bold/red:
If we want to optimize a scan for “CA” rows, we will have to address a few issues:
- The SALES table does not have a STATE column, so no storage index structure on SALESs data will allow us to directly prune disk regions based on “CA”.
- Table rows associated with “CA” are likely to be physically scattered throughout the SALES table, so it’s unlikely that these rows will be confined to a relatively small number of zones or disk regions. We might not be able to make efficient use of an Exadata storage index on SALES.LOCATION_ID, if any (note that I am consciously ignoring the push down of BLOOM FILTERs to Exadata here, which still suffers from the physical scattering).
- A SALES storage index based on min/max Location ID is likely to be less efficient than using zones based on min/max State values, simply because each State is made up of multiple Location IDs. This inefficiency is more pronounced if Location IDs for “CA” are not numerically close to one another - it will reduce the chances that the Location IDs we’re searching for will be found within the same min/max Location ID regions.
Of course, zone maps are designed to address these issues - with a little bit of help from another Oracle Database 12c feature called attribute clustering. I introduced attribute clustering in an earlier post, but don’t worry if you haven’t read that yet; I’ll cover the basics here anyway. You’ve probably deduced that we can reduce the number of zones that contain “CA” rows if we cluster or sort the rows in SALES, keeping these rows close to one another, like this:
Attribute clustering is the feature that’s used to cluster the rows together. Zone maps are used to record the min/max values for specified columns for each zone (and this can include column values derived from joins; LOCATIONS.STATE and LOCATIONS.COUNTY in our case).
The following DDL will create a zone map on our SALES fact table using the dimension table columns LOCATIONS.STATE and LOCATIONS.COUNTY. It will also enable attribute clustering, using the same columns to cluster the table’s rows:
ALTER TABLE sales ADD CLUSTERING sales JOIN locations ON (sales_ac.location_id = locations.location_id) BY LINEAR ORDER (locations.state, locations.county) WITH MATERIALIZED ZONEMAP;
The LINEAR ORDER clause specifies a linear clustering algorithm, which is ideal for this example. Another algorithm is available; it is specified with "INTERLEAVED" and is optimized for more complex combinations of query predicates and dimension tables. Note that the definition of attribute clustering by itself does not change any data stored on disk; instead, it provides a directive for direct path operations; INSERT APPEND and MOVE that will physically perform the clustering operation for us. If there are pre-existing rows in SALES, we can MOVE the table (or its partitions) to re-order them.
Joins between SALES and the dimension table are now candidates for optimization when the query includes predicates on the dimension hierarchy “state” and “state, county”. For example:
SELECT SUM(amount) FROM sales JOIN locations ON (sales.location_id = locations.location_id) WHERE locations.state = 'NM';SELECT SUM(amount) FROM sales JOIN locations ON (sales.location_id = locations.location_id) WHERE locations.state = 'CA' AND locations.county = 'Kern';
By clustering the rows and recording appropriate min/max column values for our zones, we have addressed all of the issues I identified above. What’s more, we can still get benefit from Exadata storage indexes because zone maps and storage indexes complement one another, and they work together transparently.
Zone maps are explicitly created and controlled by the database administrator on a table-by-table basis. They are an inherent part of the physical database design and can be thought of as a coarse anti-index structure (unlike an index, a zone map tells you what zones not to access). Zone maps are very compact, and in some cases it is possible to use them where you would otherwise use an index. This is most relevant in data warehousing environments where scanning is often more appropriate than indexed row retrieval, and where indexes can use a considerable amount of storage space. Zone maps must be refreshed to be synchronized with the underlying table data, so you will need to give some consideration to how you want them to be kept up-to-date if you decide to use them as an alternative to indexes.
In summary, take a look at zone maps if you want to:
- Optimize scanning queries, particular when joining with one or more tables.
- Reduce your dependency on indexes, particularly in data warehousing environments.
- Improve performance in your data warehouse; particularly for star or snowflake schemas.
Here’s an example of using zone maps to optimize a table scan. To compare before and after, start by creating a table that has no zone map or attribute clustering:
CREATE TABLE sales_zm (sale_id NUMBER(10), customer_id NUMBER(10));
Insert 8 million rows with the following PL/SQL code. Why that many? With our example, we'll read one or two zones rather than the entire table, so I'm aiming to make the difference pretty obvious when you look at the block read statistics:
DECLARE i NUMBER(10); BEGIN FOR i IN 1..80 LOOP INSERT INTO sales_zm SELECT ROWNUM, MOD(ROWNUM,1000) FROM dual CONNECT BY LEVEL <= 100000; COMMIT; END LOOP; END; /
EXEC dbms_stats.gather_table_stats(ownname=>NULL, tabname=>'SALES_ZM');
SET AUTOTRACE ON STATISTIC
Run the following query a few times to see what value “consistent gets” settles at:
SELECT COUNT(DISTINCT sale_id) FROM sales_zm WHERE customer_id = 50;
On my machine, I read 7,545 blocks from the buffer cache, but since the value depends on some storage defaults don’t be surprised if your value is different:
The following DDL will create a zone map, but since attribute clustering is a property of the table (like compression), any existing rows will not be re-ordered:
ALTER TABLE sales_zm ADD CLUSTERING BY LINEAR ORDER (customer_id) WITH MATERIALIZED ZONEMAP;
The zone map will not be efficient until we cluster the rows together, so we’ll MOVE the table to achieve this. This will refresh the zone map too:
ALTER TABLE sales_zm MOVE;
Run the same query a few times to see what value “consistent gets” settles at:
SELECT COUNT(DISTINCT sale_id) FROM sales_zm WHERE customer_id = 50;
On my database, I read around 1,051 database blocks instead of 7,545: a considerable improvement:
You'll find more examples covering zone maps and attribute clustering in the Oracle Learning Library and inside the Oracle Github repository. Full details on zone maps and attribute clustering can be found in the Oracle documentation library; particularly the Oracle 12c Database Data Warehousing Guide.
There's an earlier post on attribute clustering if you haven't read it already.
If there's anything to need to ask, or if you can't find what you need regarding zone maps or attribute clustering, please let me know by leaving a comment below. Thanks!
Wednesday May 25, 2011
By Rekha Balwada on May 25, 2011
Over the last couple of years there has been a lot of debate about space management in Data Warehousing environments and the benefits of having fewer larger extents. Many believe the easiest way to achieve this is to use uniform extents but the obvious benefits can often be out weighed by some not so obvious drawbacks.
For performance reasons most loads leverage direct path operations where the load process directly formats and writes Oracle blocks to disk instead of going through the buffer cache. This means that the loading process allocates extents and fills them with data during the load. During parallel loads, each loader process will allocate it own extent and no two processes work on the same extent. When loading data into a table with UNIFORM extents each loader process will allocate its own Uniform extent and begin loading the data, if the extents are not fully populated the table is left with a lot of partially filled extents, effectively creating ‘HOLES’ in the table. Not only is this space wastage but it also impacts query performance as subsequent queries that scan the table have to scan all of the extents even if they are not fully filled.
What is different with AUTOALLOCATE? AUTOALLOCATE will dynamically adjust the size of an extent and trim the extent after the load in case it is not fully loaded (Extent Trimming)
Tom Kyte covers this problem in great details in his post Loading and Extents but below is a simple example just to illustrate what a huge difference there can be in space management when you load into a table with uniform extents versus a table with autoallocated extents.
1) Create two tablespaces: Test_Uniform (using uniform extent management), Test_Allocate (using auto allocate)
create tablespace test_uniform datafile '+DATA/uniform.dbf' SIZE 1048640K
AUTOEXTEND ON NEXT 100M
EXTENT management local uniform size 100m;
create tablespace test_allocate datafile '+DATA/allocate2.dbf' SIZE 1048640K
AUTOEXTEND ON NEXT 100M
EXTENT management local autoallocate segment space management auto;
2)Create a flat file with a 10,000,000 records.
-rw-r--r-- 1 oracle dba 1077320689 May 17 16:59 TEST.dat
3)Do a parallel direct path load of this file into each tablespace:
create table UNIFORM_TEST parallel
tablespace Test__uniform as select * from big_table_ext;
create table AUTOALLOCATE_TEST
as select * from big_table_ext;
Let's view the space usage using a PL/SQL package called show_space .
As you can see from the results there are no unformatted blocks in the autoallocate table as extent trimming has taken place after the load was complete. The same can not be said for the uniform_test table. It is quite evident from the numbers that there is substantial space wastage in the uniform_test table. Although the count of full blocks are the same in both cases, the Total Mbytes used is 10x greater in the Uniform table.
ConclusionSpace utilization is much better with autoallocate becuase of extent trimming. As I said before more information on this topic can be found on Tom's Kyte post.
Sunday Sep 27, 2009
By Maria Colgan-Oracle on Sep 27, 2009
As promised, the next entry in our 11gR2 explorations is In-Memory Parallel Execution. If you are going to Oracle OpenWorld next month make sure you check out the following session:
Tuesday, October 13 2009 5:30PM, Moscone South Room 308
Extreme Performance with Oracle Database 11g and In-Memory Parallel Execution.
In this session you will get more details and insight from the folks who actually built this functionality! A must see if this is of any interest, so book that ticket now and register!
Down to business, what is "In-Memory Parallel Execution"?
Let's begin by having a quick trip down memory-lane back to Oracle Database 7 when Parallel Execution (PX) was first introduced. The goal of PX then and now is to reduce the time it takes to complete a complex SQL statement by using multiple processes to go after the necessary data instead of just one process. Up until now these parallel server processes, typically by-passed the buffer cache and read the necessary data directly from disk. The main reasoning for this was that the objects accessed by PX were large and would not fit into the buffer cache. Any attempt made to read these large objects into the cache would have resulted in trashing the cache content.
However, as hardware systems have evolved; the memory capacity on a typical database server have become extremely large. Take for example the 2 CPU socket Sun server being used in new the Sun Oracle Database Machine. It has an impressive 72GB of memory, giving a full Database Machine (8 database nodes) over ½ a TB of memory. Suddenly using the buffer cache to hold large object doesn't seem so impossible any more.
In-Memory Parallel Execution (In-Memory PX) takes advantage of these larger buffer caches but it also ensures we don't trash the cache.
In-Memory PX begins by determining if the working set (group of database blocks) necessary for a query fits into the aggregated buffer cache of the system. If the working set does not fit then the objects will be accessed via direct path IO just as they were before. If the working set fits into the aggregated buffer cache then the blocks will be distributed among the nodes and the blocks will be affinitzed or associated with that node.
In previous releases, if the Parallel Execution of one statement read part of an object into the buffer cache, then subsequent SQL statement on other nodes in the cluster would access that data via Cache Fusion. This behavior could eventually result in a full copy of that table in every buffer cache in the cluster. In-Memory PX is notably different because Cache Fusion will not be used to copy the data from its original node to another node, even if a parallel SQL statement that requires this data is issued from another node. Instead Oracle uses the parallel server process on the same node (that the data resides on) to access the data and will return only the result to the node where the statement was issued.
The decision to use the aggregated buffer cache is based on an advanced set of heuristics that include; the size of the object, the frequency at which the object changes and is accessed, and the size of the aggregated buffer cache. If the object meets these criteria it will be fragmented or broken up into pieces and each fragment will be mapped to a specific node. If the object is hash partitioned then each partition becomes a fragment, otherwise the mapping is based on the FileNumber and ExtentNumber.
To leverage In-Memory PX you must set the initialization parameter PARALLEL_DEGREE_POLICY to AUTO (default MANUAL). Once this is set, the database controls which objects are eligible to be read into the buffer cache and which object will reside there at any point in time. It is not possible to manual control the behavior for specific statements.
Obviously this post is more of a teaser, for in-depth discussions on this, go to Openworld and/or keep an eye out for a new white paper called Parallel Execution Fundemental in Oracle Database 11gR2 that will be coming soon to oracle.com. This paper not only covers In-Memory PX but Auto-DOP and parallel statement queuing.
Stay tuned for more on 11gR2 coming soon...
Wednesday Mar 04, 2009
By Maria Colgan-Oracle on Mar 04, 2009
Knowing when and how to gather optimizer statistics has become somewhat of dark art especially in a data warehouse environment where statistics maintenance can be hindered by the fact that as the data set increases the time it takes to gather statistics will also increase. By default the DBMS_STATS packages will gather global (table level), partition level, and sub-partition statistics for each of the tables in the database. The only exception to this is if you have hash sub-partitions. Hash sub-partitions do not need statistics, as the optimizer can accurately derive any necessary statistics from the partition level statistic because the hash partitions are all approximately the same size due to linear hashing algorithm.
As mentioned above the length of time it takes to gather statistics will grow proportionally with your data set, so you may now be wondering if the optimizer truly need statistics at every level for a partitioned table or if time could be saved by skipping one or more levels? The short answer is "no" as the optimizer will use statistics from one or more of the levels in different situations.
The optimizer will use global or table level statistics if one or more of your queries touches two or more partitions.
The optimizer will use partition level statistics if your queries do partition elimination, such that only one partition is necessary to answer each query. If your queries touch two or more partitions the optimizer will use a combination of global and partition level statistics.
The optimizer will user sub-partition level statistics if your queries do partition elimination, such that only one sub-partition is necessary. If your queries touch two more sub-partitions the optimizer will use a combination of sub-partition and partition level statistics.
How to gather statistics?
Global statistics are by far the most important statistics but they also take the longest time to collect because a full table scan is required. However, in Oracle Database 11g this issue has been addressed with the introduction of Incremental Global statistics. Typically with partitioned tables, new partitions are added and data is loaded into these new partitions. After the partition is fully loaded, partition level statistics need to be gathered and the global statistics need to be updated to reflect the new data. If the INCREMENTAL value for the partition table is set to TRUE, and the DBMS_STATS GRANULARITY parameter is set to AUTO, Oracle will gather statistics on the new partition and update the global table statistics by scanning only those partitions that have been modified and not the entire table. Below are the steps necessary to do use incremental global statistics
SQL> exec dbms_stats.set_table_prefs('SH', 'SALES', 'INCREMENTAL', 'TRUE');
SQL> exec dbms_stats.gather_table_stats( Owname=>'SH', Tabname=>'SALES', Partname=>'23_MAY_2008', Granularity=>'AUTO');
Incremental Global Stats works by storing a synopsis for each partition in the table. A synopsis is statistical metadata for that partition and the columns in the partition. Each synopsis is stored in the SYSAUX tablespace and takes approximately 10KB. Global statistics are generated by aggregating the synopses from each partition, thus eliminating the need for the full table scan (see Figure below). When a new partition is added to the table you only need to gather statistics for the new partition. The global statistics will be automatically updated by aggregating the new partition synopsis with the existing partitions synopsis.
But what if you are not using Oracle Database 11g and you can't afford to gather partition level statistic (not to mention global statistics) after data is loaded? In Oracle Database 10g (10.2.0.4) you can use the DBMS_STATS.COPY_TABLE_STATS procedure. This procedure enables you to copy statistics from an existing [sub] partition to the new [sub] partition and will adjust statistics to account for the additional partition of data (for example the number of blks, number of rows). It sets the new partition's high bound partitioning value as the maximum value of the first partitioning column and high bound partitioning value of the previous partition as the minimum value of the first partitioning column for a range partitioned table. For a list-partitioned table it will find the maximum and minimum from the list of values.
SQL>exec dbms_stats.copy_table_stats('sh','sales','sales_q3_2000','sales_q44_2000', force=>TRUE);
When should you gather Statistics?
If you use the automatic stats job or dbms_stats.gather_schema_stats with the option "GATHER AUTO", Oracle only collect statistics at the global level if the table has changed more than 10% or if the global statistics have not yet been collected. Partition level statistics will always be gathered if they are missing. For most tables this frequency is fine.
However, in a data warehouse environment there is one scenario where this is not the case. If a partition table is constantly having new partitions added and then data is loaded into the new partition and users instantly begin querying the new data, then it is possible to get a situation where an end-users query will supply a value in one of the where clause predicate that is outside the [min,max] range for the column according to the optimizer statistics. For predicate values outside the statistics [min,max] range the optimizer will prorates the selectivity for that predicate based on the distance between the value the max (assuming the value is higher than the max). This means, the farther the value is from the maximum value the lower is the selectivity will be, which may result in sub-optimal execution plans.
You can avoid this "Out of Range" situation by using the new incremental Global Statistics or the copy table statistics procedure.
The data warehouse insider is written by the Oracle product management team and sheds lights on all thing data warehousing and big data.
- Big Data SQL Quick Start. Data types - Part8.
- Single PX Server Execution
- New pattern matching tutorial on LiveSQL
- Big Data SQL Quick Start. Partition Pruning - Part7.
- Big Data SQL Quick Start. Predicate Push Down - Part6.
- SQL Pattern Matching Deep Dive - Part 3, greedy vs. reluctant quantifiers
- Common Distribution Methods in Parallel Execution
- Big Data SQL Quick Start. Joins. Bloom Filter and other features - Part5.
- Is an approximate answer just plain wrong?
- Big Data SQL Quick Start. Security - Part4.