Monday Apr 15, 2013

How does the METHOD_OPT parameter work?

In last week’s post we addressed the most popular question surrounding statistics gathering, what sample size should be used? Once that decision has been made, folks typically turn their attention to histograms and the time honored question of, do I need histograms or not?

Regardless of which side of this religious debate you come down on, you are going to need to figure out what to set the METHOD_OPT parameter to in your statistics gathering command.

This post explains in detail what the METHOD_OPT parameter controls and how it can be used to influence which columns get statistics and what type of statistics they get.

[Read More]

Monday Apr 08, 2013

How does AUTO_SAMPLE_SIZE work in Oracle Database 11g?

When it comes to gathering statistics, one of the most critical decisions you have to make is, what sample size should be used? A 100% sample will ensure accurate statistics but could take a really long time. Where as a 1% sample will finish quickly but could result in poor statistics.

The ESTIMATE_PERCENT parameter in the DBMS_STATS.GATHER_*_STATS procedures controls the sample size used when gathering statistics and it default values is AUTO_SAMPLE_SIZE.

In an earlier blog post, we talked about the new implementation of AUTO_SAMPLE_SIZE in Oracle Database 11g in terms of its improvements in the speed and accuracy of statistics gathering compared to the old AUTO_SAMPLE_SIZE.

In this post, we will take a closer look at the how the new AUTO_SAMPLE_SIZE algorithm works and how it affects the accuracy of the statistics being gathered.

[Read More]

Wednesday Nov 21, 2012

How do I create statistics to make ‘small’ objects appear ‘large’ to the Optmizer?

I recently spoke with a customer who has a development environment that is a tiny fraction of the size of their production environment. His team has been tasked with identifying problem SQL statements in this development environment before new code is released into production.

The problem is the objects in the development environment are so small, the execution plans selected in the development environment rarely reflects what actually happens in production.

[Read More]

Wednesday Aug 29, 2012

How do I restrict concurrent statistics gathering to a small set of tables from a single schema?

I got an interesting question from one of my colleagues in the performance team last week about how to restrict a concurrent statistics gather to a small subset of tables from one schema, rather than the entire schema. I thought I would share the solution we came up with because it was rather elegant, and took advantage of concurrent statistics gathering, incremental statistics, and the not so well known “obj_filter_list” parameter in DBMS_STATS.GATHER_SCHEMA_STATS procedure.

[Read More]

Friday May 18, 2012

How do I compare statistics?

This question came up recently when I was helping a customer migrate a large data warehouse to Oracle Database 11g. Prior to the upgrade, they were using an ESTIMATE_PERCENT of 0.000001, the smallest possible sample size allowed, when they gathering statistics on their larger tables. When I asked why they picked such a tiny sample size they said it was because they needed statistics to be gathered extremely quickly after their daily load both at the partition and at the global level.

Since these large tables were partitioned, I thought they would be an excellent candidate for incremental statistics. However, in order to use incremental statistics gathering you have to let the ESTIMATE_PERCENT parameter default to AUTO_SAMPLE_SIZE. Although the customer saw the benefit of using incremental statistics they were not keen on changing the value of ESTIMATE_PERCENT. So I argued that with the new AUTO_SAMPLE_SIZE in Oracle Database 11g they would get statistics that were equivalent to a 100% sample but with the speed of a 10% sample.  And since we would be using incremental statistics we would only have to gather statistics on the freshly loaded partition and the global statistics (table level statistics) would be automatically aggregated correctly.

So with much skepticism, they tried incremental statistics in their test system and they were pleasantly surprised at the elapse time. However, they didn’t trust the statistics that were gathered and asked me, “how do I compare the statistics I got with AUTO_SAMPLE_SIZE  to the statistics I normally get with an ESTIMATE_PERCENT of 0.000001?”

The answer to that was easy, ‘use DBMS_STAT.DIFF_TABLE_STATS’.
[Read More]

Sunday Apr 08, 2012

Lies, damned lies, and statistics Part 2

There was huge interest in our OOW session on Managing Optimizer Statistics. It seems statistics and the maintenance of them continues to baffle people. In order to help dispel the mysteries surround statistics management we have created a two part white paper series on Optimizer statistics. 

Part one of this series was released in November last years and describes in detail, with worked examples, the different concepts of Optimizer statistics. Today we have published part two of the series, which focuses on the best practices for gathering statistics, and examines specific use cases including, the fears that surround histograms and statistics management of volatile tables like Global Temporary Tables.

[Read More]

Monday Apr 02, 2012

I thought the new AUTO_SAMPLE_SIZE in Oracle Database 11g looked at all the rows in a table so why do I see a very small sample size on some tables?

I recently got asked this question and thought it was worth a quick blog post to explain in a little more detail what is going on with the new AUTO_SAMPLE_SIZE on Oracle Database 11g and what you should expect to see in the dictionary views. Let’s take the SH.CUSTOMERS table as an example.  There are 55,500 rows in the SH.CUSTOMERS tables.[Read More]

Sunday Mar 18, 2012

Incremental Statistics Maintenance – what statistics will be gathered after DML occurs on the table?

Incremental statistics maintenance was introduced in Oracle Database 11g to improve the performance of gathering statistics on large partitioned table. When incremental statistics maintenance is enabled for a partitioned table, oracle accurately generated global level  statistics by aggregating partition level statistics. As more people begin to adopt this functionality we have gotten more questions around how they expected incremental statistics to behave in a given scenario. For example, last week we got a question around what partitions should have statistics gathered on them after DML has occurred on the table?[Read More]

Monday Jan 09, 2012

Fixed Objects Statistics and why they are important

Fixed objects are the x$ tables and their indexes. The v$performance views in Oracle are defined in top of X$ tables (for example V$SQL and V$SQL_PLAN). Since V$ views can appear in SQL statements like any other user table or view then it is important to gather optimizer statistics on these tables to help the optimizer generate good execution plans. However, unlike other database tables, dynamic sampling is not automatically use for SQL statement involving X$ tables when optimizer statistics are missing. The Optimizer uses predefined default values for the statistics if they are missing. These defaults may not be representative and could potentially lead to a suboptimal execution plan, which could cause severe performance problems in your system. It is for this reason that we strong recommend you gather fixed objects statistics.[Read More]

Tuesday Nov 08, 2011

Lies, damned lies, and statistics

There was huge interest in our OOW session on Managing Optimizer Statistics. It seems statistics and the maintenance of them continues to baffle people. However, understanding and managing Optimizer statistics is key to optimal SQL execution. Knowing when and how to gather statistics in a timely manner is critical to maintaining acceptable performance. In order to clarify all of the information surrounding statistics we have put together a two part whitepaper on Optimizer statistics and the management of them.[Read More]

Wednesday Aug 03, 2011

How do I know if the cardinality estimates in a plan are accurate?

It is often recommended when you are tuning a SQL statement to confirm that the Optimizer's cardinality estimates, in the execution plan, are accurate. But how do you do that?

It is probably easier to answer this question with an example. Let's take the following example of a simple two table join.

[Read More]

Wednesday Apr 20, 2011

How do I drop an existing histogram on a column and stop the Auto Stats gathering job from creating it in the future?

Before answering this question, I want to provide a bit more background to this scenario. The person who submitted this question has a table with a VARCHAR2 column called ID whose data distribution is skewed. When the Auto Stats job kicks in for this table a histogram is automatically created on this column as it is used in nearly every SQL statement and it has a data skew. However, the values in the ID column are extremely long and the first 32 characters in each ID are identical.[Read More]

Thursday Mar 24, 2011

How do I know what extended statistics are needed for a given workload?

In our previous post we introduced extended statistics, which help the Optimizer improve the accuracy of cardinality estimates for SQL statements that contain predicates involving a function wrapped column (e.g. UPPER(LastName)) or multiple columns from the same table used in filter predicates, join conditions, or group-by keys. So extended statistics are extremely useful but how do you know which extended statistics should be created?

In Oracle Database we introduced Auto Column Group Creation, which automatically determines which column groups are required for a table based on a given workload. Please note this functionality does not create extended statistics for function wrapped columns it is only for column groups. Auto Column Group Creation is a simple three step process:

[Read More]

Thursday Mar 17, 2011

Extended Statistics

In real-world data, there is often a relationship or correlation between the data stored in different columns of the same table. For example, in the customers table, the values in the cust_state_province column are influenced by the values in the country_id column, as the state of California is only going to be found in the United States. Until now, the Optimizer had no way of knowing about these real-world relationships and could potentially miscalculate the cardinality estimate if multiple columns from the same table are used in the where clause of a statement. With extended statistics you now have an opportunity to tell the Optimizer about these real-world relationships between the columns.[Read More]

Friday Jan 28, 2011

Concurrent Statistics Gathering

Gathering optimizer statistics is one of life's necessary evils even if it can take an extremely long time to complete. In this blog post, we discuss one remedy to improve the efficiency of statistics gathering.[Read More]

The Oracle Optimizer blog is written by members of the Optimizer development team. The goal of this blog is to provide an insight into the workings of the Optimizer and the statistics it relies on. The views expressed on this blog are our own and do not necessarily reflect the views of Oracle and its affiliates. The views and opinions expressed by visitors on this blog are theirs solely and may not reflect ours.


« June 2016