How do I create statistics to make ‘small’ objects appear ‘large’ to the Optmizer?

I recently spoke with a customer who has a development environment that is a tiny fraction of the size of their production environment. His team has been tasked with identifying problem SQL statements in this development environment before new code is released into production.

The problem is the objects in the development environment are so small, the execution plans selected in the development environment rarely reflects what actually happens in production.

To ensure the development environment accurately reflects production, in the eyes of the Optimizer, the statistics used in the development environment must be the same as the statistics used in production. This can be achieved by exporting the statistics from production and import them into the development environment. Even though the underlying objects are a fraction of the size of production, the Optimizer will see them as the same size and treat them the same way as it would in production.

Below are the necessary steps to achieve this in their environment. I am using the SH sample schema as the application schema who's statistics we want to move from production to development.

Step 1. Create a staging table, in the production environment, where the statistics can be stored

Step 2. Export the statistics for the application schema, from the data dictionary in production, into the staging table

Step 3. Create an Oracle directory on the production system where the export of the staging table will reside and grant the SH user the necessary privileges on it.


Step 4. Export the staging table from production using data pump export


Step 5. Copy the dump file containing the stating table from production to development

Step 6. Create an Oracle directory on the development system where the export of the staging table resides and grant the SH user the necessary privileges on it. 


Step 7. Import the staging table into the development environment using data pump import


Step 8. Import the statistics from the staging table into the dictionary in the development environment.

You can get a copy of the script I used to generate this post here.

+Maria Colgan

Comments:

Hi Maria,

Here's my issue with this approach :-)

Its making the presumption that merely by examination, you can assess the worthiness of an execution plan. That's just a reinforcement of the age-old problem:

"I saw TABLE ACCESS FULL, so it must be a bad plan"

Lets say I've copied the stats from my large environment back to my small environment, and the plan comes up as:

----------------------------------
| Id | Operation | Name |
----------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS FULL| EMP |
----------------------------------

Now what ? Is that the best plan ? Is it the worst one ? I can hunt around the row estimates etc for each line, but what if they are out (eg, the moment I have a couple of joins in there).

I can't use 'gather_plan_stats' because the actual-vs-estimate comparison wouldn't make sense because of the disparate stats.

I can't actually *run* it, because (if my dev environment is tiny), then the response might be super fast (when it might not be in production)...or alternatively, it might be super slow (when it might have been super fast in production). I can't actually make a decision on that plan based on its response time.

I'm only pontificating because (in my humble opinion) perhaps the largest threat to production performance issues is unrealistic or unrepresentative data sample sizes in non-production environments.

Copying stats around serves to perpetuate that problem rather than resolve it.

I would have pressed 'Post' 10 mins ago...but I'm still struggling on the proof-of-non-spammer maths question :-)

Cheers
Connor

Posted by Connor McDonald on November 21, 2012 at 09:10 PM PST #

Will those imported statistics be overwritten by the regular job that gathers statistics (usually during the night)? Or will running dbms_stats.import_schema_stats() somehow lock them?

Posted by guest on December 04, 2012 at 01:28 AM PST #

To Connor:
I see your point. Our production system is OLTP, where 95% of the queries return less than 50 rows in less than 250ms. The database has hundreds of tables with 50+ million rows each. A full table scan of any such big table is absolutely the wrong thing to do in our case. Although Maria's solution may not work in all environments, it will work in our environment.

My problem however is this: The development environment is ALWAYS ALWAYS ahead of production environment. i.e. new columns, new tables etc. How will this export/import stats approach work with this in mind?

Posted by guest on December 04, 2012 at 08:24 AM PST #

Unfortunately importing statistics will not automatically lock the statistics. So you will have either disable the automatic statistics gathering job or manually lock the statistics after the import.

Posted by Maria Colgan on December 04, 2012 at 09:22 AM PST #

@Connor,

I understand what you're saying, but I think that the purpose of transporting the stats is not to provide actual timings of different query execution path runtimes that can then be compared to production; but instead to provide comparisons of different query execution path runtimes within DEV only. E.g. does execution path A produce a faster or slower response compared to execution path B, when both A & B have access to the same stats.
You can then attempt to predict the production execution path and tune the query on the smaller subset of data.

I do have to admit that some of the 11g features, like SQL Plan Management, make this approach slightly redundant.

Posted by guest on January 02, 2013 at 03:53 AM PST #

@guest,

I think you need think about having a schema that is an exact copy of prod.

Posted by guest on January 23, 2013 at 02:29 PM PST #

By replacing the stats is just not a controlled and predictable approach. The subject just makes this technique sound more sophisticated, but all knowledge is good :)

Posted by @Roderick on January 29, 2013 at 06:24 AM PST #

Many shops simply can't afford to have large development environments, so replacing the stats seems to be about a close as some as they may get to replicating their production databases object statistics.

Posted by guest on February 04, 2013 at 01:41 PM PST #

@guest with the extra columns: Sounds like you have a "developmentesting" environment and have put your finger on one of the many problems with this all-too-common setup.

Posted by guest on March 11, 2013 at 09:12 AM PDT #

this only works well if dev/uat have exactly the same ddl structures in place i.e. prod could have additional partions etc. this will break the import of the stats?

Posted by guest on April 09, 2013 at 10:28 AM PDT #

You are correct. The assumption here is that the DDL for the objects in the test environment is identical to production.

Thanks,
Maria

Posted by guest on April 09, 2013 at 01:20 PM PDT #

One problem with this approach is that any CBO mechanisms that relate to actual data values, such as Adaptive Cursor Sharing or Bind Variable Peeking, may not produce the same effects as with the parent table, as there could be swathes of absent values in the smaller table.

Posted by Richard Armstrong-Finnerty on June 22, 2013 at 03:08 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

The Oracle Optimizer blog is written by members of the Optimizer development team. The goal of this blog is to provide an insight into the workings of the Optimizer and the statistics it relies on. The views expressed on this blog are our own and do not necessarily reflect the views of Oracle and its affiliates. The views and opinions expressed by visitors on this blog are theirs solely and may not reflect ours.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today