Thursday Jul 09, 2015

Parallel_Degree_Limit, Parallel_Max_Degree, Maximum DOP? Confused?

Here are two questions for you. What is the maximum DOP you can get with Auto DOP in a 12c database? Where can you see what the number is without generating execution plans? The answers to these question may get tricky if you start to play with init.ora parameters. 

What is the maximum DOP you can get with Auto DOP in a 12c database? 

The answer to this question may seem obvious if you look at the documentation, the init.ora parameter PARALLEL_DEGREE_LIMIT determines the maximum DOP. So, you can set it to any integer value you want and that will be your maximum, that is it, or is it? What happens if you set it to a really high value? What is missing from the documentation is that the maximum DOP you can get will always be the default DOP, which is CPU_COUNT * PARALLEL_THREADS_PER_CPU. No matter what number you set for PARALLEL_DEGREE_LIMIT, the internal enforced limit will stay the same, any value higher than this will be ignored. This also applies to the Database Resource Manager (DBRM). The parameter and resource manager directives will be obeyed as long as they are lower than the internal limit.

Here is how the maximum DOP is derived for a session. 

Max DOP = MIN (CPU_COUNT * PARALLEL_THREADS_PER_CPU, PARALLEL_DEGREE_LIMIT, DBRM DIRECTIVE PARALLEL_DEGREE_LIMIT_P1

Here is a sample test case to show the behavior in 12.1.0.2. 

SQL> create table t as select * from all_objects;

Table created.

SQL> exec dbms_stats.set_table_stats(user,'T',numrows=>1000000000000,numblks=>1000000000);

PL/SQL procedure successfully completed.

I created an empty table and set the statistics so that the optimizer thinks it is a very large table, this is to make sure that Auto DOP will calculate a super high DOP. Here are the related parameters:

SQL> select name, value from v$parameter
  2  where name in ('parallel_degree_limit','cpu_count','parallel_threads_per_cpu');

NAME                           VALUE
------------------------------ --------------------------------------------------------------------------------
cpu_count                      2
parallel_threads_per_cpu       2
parallel_degree_limit          128 

Let's look at a statement accessing the large table.

The DOP is capped at 4 even though PARALLEL_DEGREE_LIMIT is 128. This is because CPU_COUNT * PARALLEL_THREADS_PER_CPU is 4

Where can you see the maximum DOP allowed without generating execution plans?

As you can see above an execution plan shows the computed DOP in the notes section under the plan. But, that is the computed DOP and if it does not hit any limits you will not be able to find out what the maximum DOP can be. Where can we see what our database allows as maximum DOP?

The view V$SES_OPTIMIZER_ENV can give a clue about this. This view shows the optimizer settings for all sessions in an instance. It also includes settings related to parallel execution. Of particular interest to this discussion is two of them, PARALLEL_DEGREE_LIMIT and PARALLEL_MAX_DEGREE. Let's see what they show in my test environment. Here are the init.ora parameters.

SQL> select name, value from v$parameter
  2  where name in ('parallel_degree_limit','cpu_count','parallel_threads_per_cpu');

NAME                           VALUE
------------------------------ --------------------------------------------------------------------------------
cpu_count                      2
parallel_threads_per_cpu       2
parallel_degree_limit          CPU 

Here is what the view shows with these parameters.

SQL> select name, value from v$ses_optimizer_env
  2  where name in ('parallel_degree_limit','parallel_max_degree')
  3  and sid=(select sid from v$mystat where rownum=1);

NAME                           VALUE
------------------------------ --------------------------------------------------------------------------------
parallel_degree_limit          65535
parallel_max_degree            4 

The first thing to notice is that parallel_degree_limit in this view is not the same thing as the init.ora parameter PARALLEL_DEGREE_LIMIT. Since I did not explicitly set the init.ora parameter to an integer value the view shows a large number (64K-1), the actual limit is shown as parallel_max_degree. This shows me that in this system I cannot get a DOP higher than 4 with Auto DOP.

If you set the init.ora parameter to an integer value this is what the view shows. 

SQL> alter system set parallel_degree_limit=128;

System altered. 

SQL> select name, value from v$ses_optimizer_env
  2  where name in ('parallel_degree_limit','parallel_max_degree')
  3  and sid=(select sid from v$mystat where rownum=1);

NAME                           VALUE
------------------------------ --------------------------------------------------------------------------------
parallel_degree_limit          128
parallel_max_degree            4 

Now parallel_degree_limit shows the actual init.ora parameter. The rule still applies, the maximum DOP I can get is parallel_max_degree which is 4 as shown in the execution plan above. If you set the init.ora parameter to a lower value than CPU_COUNT * PARALLEL_THREADS_PER_CPU you will see that parallel_max_degree reflects that. This also true for DBRM.

So, parallel_max_degree is calculated as;  MIN (CPU_COUNT * PARALLEL_THREADS_PER_CPU, PARALLEL_DEGREE_LIMIT, DBRM DIRECTIVE PARALLEL_DEGREE_LIMIT_P1) .

The only caveat with parallel_max_degree is it can only show values up to 4095 (4K-1). Above this number is starts over from 0. If you bump up one of the variable in the formula, like CPU_COUNT, you will see cases like below.

SQL> select name, value from v$parameter
  2  where name in ('parallel_degree_limit','cpu_count','parallel_threads_per_cpu');

NAME                           VALUE
------------------------------ ----------
cpu_count                      2048
parallel_threads_per_cpu       2
parallel_degree_limit          CPU 

SQL> select name, value from v$ses_optimizer_env
  2  where name in ('parallel_degree_limit','parallel_max_degree')
  3  and sid=(select sid from v$mystat where rownum=1);

NAME                           VALUE
------------------------------ --------------------------------------------------------------------------------
parallel_degree_limit          65535
parallel_max_degree            0 

In these cases the maximum DOP you can get will be different than parallel_max_degree.

So, you can rely on this value if your CPU_COUNT is not in the two thousands range! If you have such a huge system use the formula given above to calculate max DOP allowed. And please let me know how you are using parallel execution in such an environment :)

Remember that these are only maximum values that can be calculated by the optimizer, at runtime you are still bounded by PARALLEL_MAX_SERVERS. Obviously your statement will get downgraded if PARALLEL_MAX_SERVERS is reached.

Tuesday Jul 07, 2015

Update to BDA X5-2 provides more flexibility and capacity with no price changes

As more people pick up big data technologies, we see the workloads run on these big data system evolve and diversify. The initial workloads were all Map Reduce and fit a specific, (micro) batch workload pattern. Over the past couple of years that has changed and that change is reflected in the Hadoop tools - specifically with YARN. While there is still quite a bit of batch work being done, typically using Map Reduce (think Hive, Pig etc) we are seeing our customers move to more mixed workloads where the batch work is augmented with both more online SQL as well as more streaming workloads.

More Horsepower - More Capacity 

The change towards a more mixed workload leads us to change the shape of the underlying hardware. Systems now shift away from the once sacred "1-core to 1-disk ratio" and also from the small memory footprints for the worker nodes.

With the BDA X5-2 update in December 2014, BDA doubled the base memory configuration and added 2.25x more CPU resources in every node as well as upgrading to Intel's fastest Xeon E5 CPU. BDA X5-2 now has 2 * 18 Xeon cores to enable CPU intense workloads like analytics SQL queries using Oracle Big Data SQL, machine learning, graph applications etc.

With the processing covered for these more mixed workloads, we looked at other emerging trends or workloads and their impact on the BDA X5-2 hardware. The most prominent trend we see in big data are the large data volumes we expect to see from the Internet of Things (IoT) explosion and the potential cost associated with storing that data.

To address this issue (and storage cost in general) we are now adding 2x more disk space onto each and every BDA disk doubling the total available space on the system while keeping the list price constant. That is correct, 2x capacity but no price change! 

More Flexibility

And if that isn't enough, we are also changing the way our customers can grow their systems by introducing BDA Elastic Configurations.

As we see customers build out production in large increments, we also see a need to be more flexible in expanding the non-production environments (test, qa and performance environments). BDA X5-2 Elastic Configurations enables expansion of a system in 1-node increments by adding a BDA X5-2 High Capacity (HC) plus InfiniBand Infrastructure into a 6-node Starter Rack.

The increased flexibility enables our customers to start with a production scale cluster of 6 nodes (X5-2 or older) and then increment within the base rack up to 18 nodes, then expand across racks without any additional switching (no top of rack required, all on the same InfiniBand network) to build large(r) clusters. The expansion is of course all supported from the Oracle Mammoth configuration utility and its CLI, greatly simplifying expansion of clusters.

Major Improvement, No Additional Cost

Over the past generations BDA has been quickly adopted to changing usage and workload patterns enabling the adoption of Hadoop into the data ecosystem with minimal infrastructure disruption but with maximum business benefits. The latest update to BDA X5-2 enables flexibility, delivers more storage capacity and runs more workloads then ever before. 

For more information see the BDA X5-2 Data Sheet on OTN

Thursday Jul 02, 2015

Using Oracle Big Data Spatial and Graph

Wondering how to get started with graph analyses?  The latest Oracle Big Data Lite VM includes Oracle's new spatial and graph toolkit for big data.  Check out these two blog posts that describe how to find interesting relationships in data:

 Pretty cool :)

 

Saturday Jun 20, 2015

Oracle Big Data Spatial and Graph - Installing the Image Processing Framework

Oracle Big Data Lite 4.2 was just released - and one of the cool new features is Oracle Spatial and Graph.  In order to use this new feature, there is one more configuration step required.  Normally, we include everything you need in the VM - but this is a component that we couldn't distribute.

For the Big Data Spatial Image Processing Framework, you will need to install and configure Proj.4 - Cartographic Projections Library.  Simply follow these steps: 

  • Start the Big Data Lite VM and log in as user "oracle"
  • Launch firefox and download this tarball (​http://download.osgeo.org/proj/proj-4.9.1.tar.gz) to ~/Downloads
  • Run the following commands at the linux prompt:
    • cd ~/Downloads
    • tar -xvf proj-4.9.1.tar.gz
    • cd proj-4.9.1
    • ./configure
    • make
    • sudo make install

This will create the libproj.so file in directory /usr/local/lib/.  Now that the file has been created, create links to it in the appropriate directories.  At the linux prompt:

  • sudo ln -s /usr/local/lib/libproj.so /u02/oracle-spatial-graph/shareddir/spatial/demo/imageserver/native/libproj.so
  • sudo ln -s /usr/local/lib/libproj.so /usr/lib/hadoop/lib/native/libproj.so

That's all there is to it.  Big Data Lite is now ready for Orace Big Data Spatial and Graph!

Oracle Big Data Lite 4.2 Now Available!

Oracle Big Data Lite Virtual Machine 4.2 is now available on OTN.  For those of you that are new to the VM - it is a great way to get started with Oracle's big data platform.  It has a ton of products installed and configured - including: 

  • Oracle Enterprise Linux 6.6
  • Oracle Database 12c Release 1 Enterprise Edition (12.1.0.2) - including Oracle Big Data SQL-enabled external tables, Oracle Multitenant, Oracle Advanced Analytics, Oracle OLAP, Oracle Partitioning, Oracle Spatial and Graph, and more.
  • Cloudera Distribution including Apache Hadoop (CDH5.4.0)
  • Cloudera Manager (5.4.0)
  • Oracle Big Data Connectors 4.2
    • Oracle SQL Connector for HDFS 3.3.0
    • Oracle Loader for Hadoop 3.4.0
    • Oracle Data Integrator 12c
    • Oracle R Advanced Analytics for Hadoop 2.5.0
    • Oracle XQuery for Hadoop 4.2.0
  • Oracle NoSQL Database Enterprise Edition 12cR1 (3.3.4)
  • Oracle Big Data Spatial and Graph 1.0
  • Oracle JDeveloper 12c (12.1.3)
  • Oracle SQL Developer and Data Modeler 4.1
  • Oracle Data Integrator 12cR1 (12.1.3)
  • Oracle GoldenGate 12c
  • Oracle R Distribution 3.1.1
  • Oracle Perfect Balance 2.4.0
  • Oracle CopyToBDA 2.0

Check out our new product - Oracle Big Data Spatial and Graph (and don't forget to read the blog post on a small config update you'll need to make to use it).  It's a great way to find relationships in data and query and visualize geographic data.  Speaking of analysis... Oracle R Advanced Analytics for Hadoop now leverages Spark for many of its algorithms for (way) faster processing.

 But, that's just a couple of features... download the VM and check it out for yourself :). 

Tuesday May 26, 2015

Space Management and Oracle Direct Path Load

If you're loading you data warehouse using Oracle direct path load, you might want to check out a post on the Optimizer blog. It covers how the Oracle database manages space during a direct path load and what we've changed in Oracle Database 12c.

Wednesday May 20, 2015

PX In Memory, PX In Memory IMC?

In my previous post I talked about how to use SQL Monitor to monitor parallel statements. The part of that post which talks about In-Memory Parallel Execution triggered some questions and I want to provide more information on that in a separate post here.

In-Memory Database? In-Memory Parallel Execution? In-Memory Everywhere

Back in the 11.2 days when there was no Oracle Database In-Memory Option (DBIM) there was no confusion about in-memory because there was only one way to use the database memory for parallel execution and that was In-Memory Parallel Execution (IMPX, also known as IMPQ). Starting with 12.1.0.2 DBIM boosted the performance and also the confusion about the term in-memory parallel execution.

I will not go into the details of how these two features work in this post but let us look at the plan note from the previous post and try to understand what it shows about these.

As we see there are two notes about in-memory, Px In Memory Imc and Px In Memory. Let us look at what these two mean.

Px In Memory

This plan note is only related to IMPX and not related to DBIM. It shows whether the statement uses IMPX or not. If you see yes here it means some or all of the tables in the query are accessed through the buffer cache rather than doing direct reads from disk. IMPX looks at the table sizes and the buffer cache size and decides whether to read the blocks into the buffer cache or not. Once the blocks are in the buffer cache subsequent parallel statements can read the data from there rather than going to the disk. In the case of RAC, IMPX makes sure that PX servers are allocated on the nodes where table blocks reside, as a result of this cache fusion is not used to transfer blocks between nodes which in turn makes performance better.

This plan note in SQL Monitor maps to the note parallel scans affinitized for buffer cache in the DBMS_XPLAN output. Here is an example showing IMPX will be used.

The first access to the CUSTOMERS table will read the data from the disk to the buffer cache, subsequent accesses to this table will read data from the buffer cache. The term affinitized here means that specific blocks are assigned to specific nodes and PX servers on those nodes will read the data through the buffer cache.

You will see this note if IMPX is used independent of whether you are on a single-instance database or a RAC database.

So, if you see the note parallel scans affinitized for buffer cache in the DBMS_XPLAN output you will see Px In Memory=yes in SQL Monitor.

Px In Memory Imc

This plan note is only related to DBIM and not related to IMPX. IMC here means In-Memory Columnar which indicates DBIM. The IMC store provides metadata that shows which part of an in-memory table is in which node's memory in a RAC database. Parallel Execution uses this information to allocate PX servers on those nodes so that they can read the data from the IMC store without going to disk. The plan note Px In Memory Imc shows whether the statement accesses an in-memory table and whether PX servers are affinitized to RAC nodes. This does not indicate if DBIM is used or not, that is indicated by the operations in the plan like TABLE ACCESS INMEMORY FULL.

This note maps to the note parallel scans affinitized for inmemory in the DBMS_XPLAN output. Here is an example showing in-memory affinity is used.

In a single-instance database you will not see this note because any PX server can read the data from the single node's IMC store and there is no concept of affinity of PX servers to nodes.

So, if you see the note parallel scans affinitized for inmemory in the DBMS_XPLAN output you will see Px In Memory Imc=yes in SQL Monitor.

One or The Other?

It is possible to have parallel statements using both IMPX and DBIM. If your statement accesses both in-memory tables and tables determined to be read via the buffer cache you can see both notes in the plan output.

In this example, CUSTOMERS table is an in-memory table and since this is a RAC database PX servers are affinitized to nodes. CUSTOMERSPART table is not an in-memory table but Parallel Execution decided to read it through the buffer cache.

There is a lot of information about how DBIM works in the In-Memory blog, I will talk about how IMPX works in a future post. In the meantime please comment here if you have any questions related to it.

Friday May 15, 2015

Big Data Spatial and Graph is now released!

Cross-posting this from the announcement of the new spatial and graph capabilities. You can get more detail on OTN.

The product objective is to provide spatial and graph capabilities that are best suited to the use cases, data sets, and workloads found in big data environments.  Oracle Big Data Spatial and Graph can be deployed on Oracle Big Data Appliance, as well as other supported Hadoop and NoSQL systems on commodity hardware. 

Here are some feature highlights.   

Oracle Big Data Spatial and Graph includes two main components:

  • A distributed property graph database with 35 built-in graph analytics to
    • discover graph patterns in big data, such as communities and influencers within a social graph
    • generate recommendations based on interests, profiles, and past behaviors
  • A wide range of spatial analysis functions and services to
    • evaluate data based on how near or far something is to one another, or whether something falls within a boundary or region
    • process and visualize geospatial map data and imagery

Property Graph Data Management and Analysis

The property graph feature of Oracle Big Data Spatial and Graph facilitates big data discovery and dynamic schema evolution with real-world modeling and proven in-memory parallel analytics. Property graphs are commonly used to model and analyze relationships, such as communities, influencers and recommendations, and other patterns found in social networks, cyber security, utilities and telecommunications, life sciences and clinical data, and knowledge networks.  


Property graphs model the real-world as networks of linked data comprising vertices (entities), edges (relationships), and properties (attributes) for both. Property graphs are flexible and easy to evolve; metadata is stored as part of the graph and new relationships are added by simply adding a edge. Graphs support sparse data; properties can be added to a vertex or edge but need not be applied to all similar vertices and edges.  Standard property graph analysis enables discovery with analytics that include ranking, centrality, recommender, community detection, and path finding.

Oracle Big Data Spatial and Graph provides an industry leading property graph capability on Apache HBase and Oracle NoSQL Database with a Groovy-based console; parallel bulk load from common graph file formats; text indexing and search; querying graphs in database and in memory; ease of development with open source Java APIs and popular scripting languages; and an in-memory, parallel, multi-user, graph analytics engine with 35 standard graph analytics.

Spatial Analysis and Services Enrich and Categorize Your Big Data with Location

With the spatial capabilities, users can take data with any location information, enrich it, and use it to harmonize their data.  For example, Big Data Spatial and Graph can look at datasets like Twitter feeds that include a zip code or street address, and add or update city, state, and country information.  It can also filter or group results based on spatial relationships:  for example, filtering customer data from logfiles based on how near one customer is to another, or finding how many customers are in each sales territory.  These results can be visualized on a map with the included HTML5-based web mapping tool.  Location can be used as a universal key across disparate data commonly found in Hadoop-based analytic solutions. 

Also, users can perform large-scale operations for data cleansing, preparation, and processing of imagery, sensor data, and raw input data with the raster services.  Users can load raster data on HDFS using dozens of supported file formats, perform analysis such as mosaic and subset, write and carry out other analysis operations, visualize data, and manage workflows.  Hadoop environments are ideally suited to storing and processing these high data volumes quickly, in parallel across MapReduce nodes.  

Learn more about Oracle Big Data Spatial and Graph at the OTN product website:

Read the Data Sheet

Read the Spatial Feature Overview

Thursday May 07, 2015

Monitoring Parallel Execution using Real-Time SQL Monitoring in Oracle Database 12c

The Real-Time SQL Monitoring feature (SQL Monitor) was introduced in 11g to make it easier to monitor the execution of long-running SQL statements. As of 12c this tool is part of both Oracle Enterprise Manager Cloud Control and Oracle Enterprise Manager Database Express.

SQL Monitor is especially helpful in monitoring parallel statements as these statements access, join, aggregate and analyze large amounts of data. In this post we talk about how to use SQL Monitor to monitor parallel statements. It requires a basic understanding of how parallel execution (PX) works, for an introduction to Oracle Parallel Execution please see the white paper Oracle Parallel Execution Fundamentals. The examples used here were taken from Oracle Enterprise Manager Database Express on a 12.1.0.2 database. You can click on the images to see the full size versions of the screenshots.

General View of Monitored Statements

The main page of SQL Monitor shows all monitored statements, their statuses, durations, degree of parallelism (DOP), and the database time they consumed.

For Real Application Clusters (RAC) databases this page also shows the number of instances used for each statement.

The Status column indicates whether a statement is running, finished, or queued. The clock icon () means the statement is queued, the throbber icon () means the statement is running, and the checkmark icon () means the statement is completed. The cross icon () means the statement was cancelled, killed, or got an error. You can see the actual error message when you hover over that icon.

The Parallel column shows the actual degree of parallelism (DOP) for the statement. A down arrow icon () indicates that the DOP was downgraded. When you hover over it you can see the downgrade percentage, the number of parallel execution (PX) servers requested, and the number of PX servers allocated.

SQL Execution Details

To look at the details of a statement, click on the SQL_ID (found in the ID column) of that statement. This takes you to the SQL Monitor report for that specific statement. The SQL Monitor report shows general information like SQL text, elapsed time, DOP, time and wait statistics, as well as detailed execution plan statistics. This helps us to find out where the execution time was spent and focus on those areas for performance tuning.

For statements that are still running a green arrow icon () indicates the steps that are currently being executed. The page refreshes every 15 seconds so that you can monitor the progress of those statements. You can change the refresh interval at the top right of the screen or you can use the refresh button ( ) at the top right to manually refresh the page.

For completed statements the report shows the final statistics for that execution of the statement.

Let's walk through a sample SQL Monitor report to find out what information it provides about parallel execution. The report used in this example can be found here.

Degree of Parallelism (DOP)

There are two places in the SQL Monitor report that show information about the DOP. First, you can see the runtime DOP in the General section.

For RAC databases the same section also shows the number of instances used for parallel execution.

Second, you can click the Plan Note button in the Details section to find out more about the DOP.

The Plan Note shows the requested DOP and the reason for picking that DOP. In this example we see that the DOP was set to 4 because of the parallel degree limit (set by the parallel_degree_limit parameter or set by the resource manager parallel_degree_limit_p1 directive).

If the DOP is downgraded the General section shows it with a down arrow icon (), when you hover over it you can see the downgrade percentage, the number of PX servers requested, and the number of PX servers allocated. In the following example a downgrade percentage of 50% indicates a DOP of 4 was requested but the DOP was downgraded to 2. The number of PX servers allocated was 4 (2 * DOP) in this example, it is because this statement had 2 sets of PX servers as a result of the producer-consumer model used in parallel execution.

To find out why the DOP was downgraded you can look at the OTHER column for the PX COORDINATOR line in the Plan Statistics tab in the Details section.

This shows the runtime DOP and the reason for the downgrade.

Possible downgrade reasons are:

  • 350 = DOP downgrade due to adaptive DOP
  • 351 = DOP downgrade due to resource manager max DOP
  • 352 = DOP downgrade due to insufficient number of processes
  • 353 = DOP downgrade because slaves failed to join

In-Memory Parallel Execution

Plan Note also shows whether the statement uses in-memory affinity which makes sure the PX servers are allocated on the RAC nodes where in-memory tables reside, and whether In-Memory Parallel Execution is used. In this example, the statement did not use in-memory affinity (Px In Memory Imc is "no"), and it did not use In-Memory Parallel Execution (Px In Memory is "no").

Time & Wait Statistics

The Time&Wait Statistics section shows the elapsed time, the database time, and the CPU and wait activity percentages for the statement.

The Duration bar breaks down the elapsed time into queue time and running time. In this example, we can see that the elapsed time is 3.7 minutes, the statement waited in the parallel statement queue for 9.3ms and had a running time of 3.6 minutes after queuing.

The Database bar shows us the same information as it does for a serial statement, the difference is the database time will be much higher than the elapsed time as the database time for all PX servers are accumulated in this number. It shows the CPU time and wait times classified into wait event classes like user I/O, concurrency, etc... In this example, we see that we spent 2.3 minutes on CPU, 11.1 minutes on I/O, etc...

The Activity bar shows the percentage of CPU time and wait times so that we can focus on the most time consuming event for performance tuning. In this example, we see that we spent 18% of the time on CPU, 69% of the time in direct path temp writes, and about 8.8% of the time in direct path temp reads.

PX Server Sets

The Plan Statistics tab in the Details section shows the execution plan and runtime statistics for each plan step like the number of rows processed, wait events, number of IOs, etc... This section also shows the PX coordinator and the PX server sets indicating which operation was executed by which.

The PX coordinator is shown with a person icon (), each PX server set is shown as a people icon with a different color. In this example we see two sets, one red () and one blue(). The red set executed the operations with Line IDs 8-11 and 14-16 whereas the blue set executed the operations with Line IDs 3-7 and 12-13. Since the DOP is 4, each set has 4 PX servers. The operations executed by the PX coordinator were Line IDs 0-2.

Parallel Distribution Methods

Plan Statistics shows the parallel distribution methods in the plan, just like the output of the Explain Plan command.

In the case of adaptive parallel distribution where the distribution method is decided in runtime, SQL Monitor shows the runtime distribution method used in the OTHER column of the corresponding plan line. In this example, we see that an adaptive distribution method was used (HYBRID HASH).

When you click on the binocular icon () in the OTHER column of the related lines (Line IDs 8,14) the runtime distribution is shown.

Possible runtime distribution methods for adaptive distribution are:

  • 5 = Round-robin
  • 6 = Broadcast
  • 16 = Hash

CPU, IO, and Memory Usage

The Metrics tab shows you information about the CPU, IO, and memory usage of the statement. You can see the number of CPUs used, the IO and memory throughput, the number of reads and writes, amount of memory and temporary space used.

You can toggle between Storage and Buffer Cache tabs in the IO throughput section to look at the IO throughput and the memory (buffer cache) throughput.

You can toggle between PGA Usage and Temp Usage to look at the PGA memory usage and temporary space usage for the statement.

Work Distribution Across PX Servers

You can look at individual PX servers and their activity statistics in the Parallel tab.

This shows you the database time spent by each PX server set and their activity percentages compared to the total time. Here, we see that parallel set 1 consumed 11.8 minutes of database time and parallel set 2 consumed 2.1 minutes. 86% of the activity was done by parallel set 1 and 13% of the activity was done by parallel set 2. The variance between the database time consumed by different PX sets is not critical as it depends on what operations are executed by each set.

The important thing to be careful about is the distribution of work across PX servers in a single set. If you expand the parallel sets you can see the activity of individual PX servers.

The database time spent by each PX server gives a picture of how the work was distributed between PX servers. If the database time of each PX server is similar, it means every PX server performed equal amount of work. If the database time difference between PX servers is high that may indicate a problem with the data and work distribution between PX servers. This results in lower utilization of some PX servers which in turn results in bad performance. In this example, we see that each PX server in parallel set 1 performed about 21%-22% of the work, and each PX server on parallel set 2 performed about 2%-3% of the work, this indicates that the work was distributed equally across PX servers.

Here is a sample SQL Monitor report showing inequal work distribution across PX servers.

In this example, the process p000 consumed much more database time compared to other processes in the same set (p001, p002, p003). This indicates the performance of this query would be better if each PX server performed similar amount of work. To find out why this process did more work you can go to the Plan Statistics tab and look at the runtime statistics for that individual process. Near the Plan Note button there is a list of PX servers consuming significant amount of database time.

If we look at the statistics for the process p000 we see that it spent most of the time in the join operation, and it processed 51M rows of the SALES_SKEW table (Line ID 12).

When you hover over the Actual Rows column for this line you can see the number of rows processed by this process compared to the number of rows processed by the whole parallel set. Here we see that this PX server processed 51M rows of the total 55M rows. The skew ratio of 3.73 indicates this process did 3.73/4 of the job as the DOP is 4 in this case.

This is because the join key column (CUST_ID) is highly skewed in the SALES_SKEW table, most of the rows have the same CUST_ID value. Since the plan is using HASH distribution (Line ID 14) all those rows were sent to the same process leading to inequal work distribution across PX servers.

For RAC databases Parallel tab shows the distribution of work across instances too, so you can compare the work done in each instance.

I hope this helps as a starting point for looking at parallel statements in SQL Monitor. As always, any comments are welcome.

Friday Apr 24, 2015

Managing overflows in LISTAGG

This is an interesting problem that has come up a few times in discussions (and I think it has been mentioned on the SQL forums as well).  When using LISTAGG on very large data sets you can sometimes create a list that is too long and consequently get an ORA-01489: result of string concatenation is too long  error.

Obviously, it is possible to determine in advance if the error is going to occur using some bespoke PL/SQL code and then take appropriate action within your application to manage the rows that contain stings that exceed the VARCHAR2 limit.

Many customers have implemented workarounds to overcome the ORA-01489 error, however, this has mostly involved the use of complex code which has impacted performance. Wouldn’t it be great if there was a simple yet elegant way to resolve this issue? Actually there is and we can use a few of the most recent analytical SQL functions. If you are using Database 12c you can make use of the MATCH_RECOGNIZE function to effectively create chunks of strings that do not exceed the VARCHAR2 limit...

[Read More]

Tuesday Apr 14, 2015

Statement of Direction -- Big Data Management System

Click here to start reading the Full Statement of Direction. 

Introduction: Oracle Big Data Management System Today 

As today's enterprises embrace big data, their information architectures must evolve. Every enterprise has data warehouses today, but the best-practices information architecture embraces emerging technologies such as Hadoop and NoSQL. Today’s information architecture recognizes that data not only is stored in increasingly disparate data platforms, but also in increasingly disparate locations: on-premises and potentially multiple cloud platforms. The ideal of a single monolithic ‘enterprise data warehouse’ has faded as a new more flexible architecture has emerged. Oracle calls this new architecture the Oracle Big Data Management System, and today it consists of three key components

  • The data warehouse, running on Oracle Database and Oracle Exadata Database Machine, is the primary analytic database for storing much of a company’s core transactional data: financial records, customer data, point- of-sale data and so forth. Despite now being part of a broader architecture, the requirements on the RDBMS for performance, scalability, concurrency and workload management are in more demand than ever; Oracle Database 12c introduced Oracle Database In-Memory (with columnar tables, SIMD processing, and advanced compression schemes) as latest in a long succession of warehouse-focused innovations. The market-leading Oracle Database is the ideal starting point for customers to extend their architecture to the Big Data Management System.
  • The ‘data reservoir’, hosted on Oracle Big Data Appliance, will augment the data warehouse as a repository for the new sources of large volumes of data: machine-generated log files, social-media data, and videos and images -- as well as a repository for more granular transactional data or older transactional data which is not stored in the data warehouse. Oracle’s Big Data Management System embraces complementary technologies and platforms, including open-source technologies: Oracle Big Data Appliance includes Cloudera’s Distribution of Hadoop and Oracle NoSQL Database for data management.
  • A ‘franchised query engine,’ Oracle Big Data SQL, enables scalable, integrated access in situ to the entire Big Data Management System. SQL is the accepted language for day-to-day data access and analytic queries, and thus SQL is the primary language of the Big Data Management System.  Big Data SQL enables users to combine data from Oracle Database, Hadoop and NoSQL sources within a single SQL statement.  Leveraging the architecture of Exadata Storage Software and the SQL engine of the Oracle Database, Big Data SQL delivers high-performance access to all data in the Big Data Management System.

Using this architecture, the Oracle Big Data Management System combines the performance of Oracle’s market-leading relational database, the power of Oracle’s SQL engine, and the cost-effective, flexible storage of Hadoop and NoSQL. The result is an integrated architecture for managing Big Data, providing all of the benefits of Oracle Database, Exadata, and Hadoop, without the drawbacks of independently-accessed data repositories.  

Note that the scope of this statement of direction is the data platform for Big Data. An enterprise Big Data solution would also be comprised of big data tools and big data applications built upon this data platform. 

Read the full Statement of Direction -- Big Data Management System here.


Wednesday Apr 08, 2015

Noteworthy event for big data and data warehousing

I know there is a choice of conferences and events, but this one strikes me as very interesting. The thing that (amongst a whole bunch of other interesting sessions) strikes me as extremely interesting is the extra day with a master class on Information Architecture in the "big data world". Officially this is called Oracle Information Management and Big Data Reference architecture, and I think we - as a community - can all use some more foundation on information management. Or maybe, a nice refresher for those of us who were in the weeds of our daily jobs?

On top of the master class, there are a whole load of interesting sessions. Some are really on the consumer side, some are focused on the back-end, but I think both perspectives are very useful.

Some of the sessions that look very promising to me (in Brighton) are "A journey into big data and analytics - Liberty Global" and in either location anything on Big Data Discovery - really cool stuff, and something I think will change how we derive value from big data.

So, where do you go? Brighton and Atlanta of course...

Read more about the event here.

PS. I'm hearing our own Reiner Zimmermann is doing a keynote in Brighton! 

Tuesday Apr 07, 2015

Oracle Academy: Data Science Bootcamp for 2015

I'm pleased to announce that Oracle Academy has released our Data Science Bootcamp for 2015.  As I've spent a great deal of time over the past few months helping Oracle Academy develop the content, I wanted to briefly explain what the Bootcamp series is and why it's worth a look.

 What is This Thing?

The Data Science Bootcamp is an attempt at providing asynchronous training for data science fundamentals.  There are videos for each of the 16 lessons, example code, tight integration with our Big Data Lite VM, and even an online textbook.  Between these elements, we think students can learn in the way that best fits their schedule and level of interest.  Watch, try, or read about each problem in whatever helps you learn best.

[Read More]

Friday Mar 27, 2015

Open World 2015 call for papers - my simple guidelines

OOW Banner 2013

Most of you will already have received an email from the OpenWorld team announcing the call for papers for this year’s conference: https://www.oracle.com/openworld/call-for-proposals.html. Each year, a lot of people ask me how they can increase their chances of getting their paper accepted? Well, I am going to start by stating that product managers have absolutely no influence over which papers are accepted - even mentioning that a product manager will be co-presenting with you will not increase your chances! Yes, sad but true!

So how do you make sure that your presentation title and abstract catches the eye of the selection committee? Well, here is my list of top 10 guidelines for submitting a winning proposal...

[Read More]

Why SQL Part 4 - Intelligent and continuous evolution

In the third part of this series of posts on (Why SQL Part 3 - Simple optimization makes life simplerof this series of blog posts I explained that by letting the database perform the optimisations at source, where the data is located, it is possible to share tuning improvements across a wide spectrum of front-end systems (BI reports, dashboards, applications etc). The ability of the database to “own” the optimization and processing characteristics of how a SQL statement is executed is an important element in the success of SQL as a data analysis language. In the whitepaper that forms the basis for this series of blog posts this section is quite brief. This was done for reasons that will become obvious as you work through this blog post

Now that we have looked at the powerful framework that supports SQL and the simplified optimization layer we can move on to look at how SQL has evolved over time to support ever more sophisticated features and functions and compare this with some of the newer open source frameworks.

[Read More]
About

The data warehouse insider is written by the Oracle product management team and sheds lights on all thing data warehousing and big data.

Search

Archives
« August 2015
SunMonTueWedThuFriSat
      
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
     
Today