X

Insights into Statistics, Query Optimization and the Oracle Optimizer

Recent Posts

Optimizer

How does the Oracle Database choose what histograms to create?

Introduction I am often asked, “How does Oracle choose which columns need a histogram?” Also, “Why have I got a histogram on my primary key column?” Jonathan Lewis posted an explanation in response to the latter question some while ago, but I thought I’d cover a little bit more on this topic and focus on how Oracle chooses which columns should have histograms. In particular, I want to clarify how the database determines what histograms to create if you use the recommended (default) method for gathering statistics, which is where the DBMS_STATS.GATHER...STATS parameter METHOD_OPT is set to ’FOR ALL COLUMNS SIZE AUTO’. The Oracle Database can create a variety of different types of histogram (hybrid, top frequency, frequency) and I will mention them briefly below. However, for the purposes of this post it is not necessary to dwell on the differences between them. I want instead to concentrate on the circumstances that lead to histogram creation in general. This post is aimed at Oracle Database 12c Release 1 onwards. You will probably know already that histograms are useful for improving cardinality estimates, particularly if the data in a column is skewed. As you will see, this is not the full story, but I will start by covering what is meant by skew. What is Skew? There are two types of skew considered by the database. For most of us, there is one type that springs to mind. Here is a chart representing how many order entries a pet shop has for each type of fish it sells: The shop gets a lot of orders for goldfish; they are cheap and common. On the other hand, Koi carp are highly valued and expensive so it’s not surprising that fewer are sold.  Technically speaking, there is a non-uniformity in the repetitions of values or value skew. In other words, some fish names appear significantly more often than others. Let’s take a look at a couple of simple queries: select sum(order_amount) from fish_sales where fish = 'Gold';    [1200 rows match] select sum(order_amount) from fish_sales where fish = 'Koi';     [22 rows match] For the above queries, the number of rows matching the WHERE clause are very different so it would be useful to give the optimizer a way to figure this out. That is what a histogram is for of course. There is another type of skew, and I will illustrate it with a few examples.  Example 1 Imagine a small but fast-growing company that is hiring new people over time. Employee information is stored in a STAFF table, which includes a HIRE_DATE column. Back in the days when the company started up it was common for a number of weeks to pass between each new hire, but gradually the rate of arrival increased. For any given day we can plot the number of people hired like this: I generated the plot above using a query like this: select trunc(hire_date) "Date"       ,count(*)         "Number of Hires Today" from staff group by trunc(hire_date) order by trunc(hire_date); You will see that a large number of days passed in 2012 with no hires, but in 2015 there were multiple people joining almost every day. There is a wide variation in the number of rows the database will match in the STAFF table for a given range of hire dates. Technically speaking, there is a non-uniformity in range or range skew. To clarify what I mean, consider the following queries: select avg(salary) from staff where hire_date   between to_date('01-mar-2012') and to_date('30-apr-2012'); select avg(salary) from staff where hire_date   between to_date('01-mar-2014') and to_date('30-apr-2014'); The first query will match a small number of rows and the second will match a large number of rows even though the size of the range is the same in both cases (I was careful to avoid including February since 2012 is a leap year!). We need to help the optimizer figure out that if a query filters rows over a given range of HIRE_DATE values, then it can expect to see a wide variation in the number of rows it will match. We are going to need a histogram. There is value skew in this example too. I mention this fact because I want to point out that range and value skews are not mutually exclusive: column values can sometimes exhibit both types of skew to varying degrees. Example 2 Perhaps the simplest way to see range skew is to consider a column containing unique numeric values like this: 1, 100, 200, 300, 301, 302, 303, 304,…, 998, 999, 1000 Each value happens only once but the numbers of values falling into the range of (1,100) is very different from those falling into range of (300, 400) and (400, 500). This is similar to Jonathan’s example mentioned above. Example 3 When considering numeric or date columns, it is easy to visualize range skew in terms of missing values or days. For character-based fields it is not so intuitive. Personally, I like to think of range skew in terms of how many rows will match a range predicate if we take a fixed range and 'slide it up and down’ the dataset. Consider the following (real) example: a table containing a list of US surnames. Range skew is easily revealed like this: select count(*) from us_surnames where surname between 'A' and 'C'; [19281 rows] select count(*) from us_surnames where surname between 'H' and 'J'; [9635 rows] select count(*) from us_surnames where surname between 'X' and 'Z'; [1020 rows] Example 4 Finally, it is not always quite as obvious as example 3 implies. Consider a dataset like this: AA, AB, AC, AD,…,AZ, BA, BB, BC,…, ZX, ZY, ZZ It doesn’t look like there are any gaps at all, but remember that there are non-alphabet characters and we could use longer and shorter strings in the range predicates. Both of the following queries return a count of zero: select count(*) from   the_table where  column_value between 'B1' and 'B9'; select count(*)  from   the_table where  column_value between 'AZB' and 'B'; A histogram will yield better cardinality estimates because it will make the optimizer aware of these gaps. The database encodes column values and uses statistical techniques to characterize the degree of variation in cardinality estimates when a range-based query predicate is used. Once this analysis is complete, internal thresholds are used to decide whether or not a histogram will be useful.  The net result is that text-based columns will often acquire histograms if they are used in range-based query predicates. Automatic Histogram Creation Histograms are created automatically when statistics are gathered using the SKEWONLY and AUTO options in METHOD_OPT. For example: EXEC DBMS_STATS.GATHER_TABLE_STATS( … METHOD_OPT=>'FOR ALL COLUMNS SIZE SKEWONLY' …) If you choose to use FOR ALL COLUMNS SKEWONLY, then all columns will need to be tested to see if they require a histogram (excluding columns with datatypes such as LONG and CLOB). This is not the best choice for day-to-day statistics collection because there is a more efficient option (which also happens to be the default): EXEC DBMS_STATS.GATHER_TABLE_STATS( … METHOD_OPT=>'FOR ALL COLUMNS SIZE AUTO' …) AUTO uses column usage information to identify which columns are used in query predicates and joins, making them potential candidates for a histogram. When statistics are gathered, the candidate columns are tested further to identify skew and calculate the number of distinct column values. Restricting these tests to an initial list of candidates will, in many cases, make AUTO much more efficient than SKEWONLY. For the remainder of this post you can assume that 'FOR ALL COLUMNS SIZE AUTO' is used. In addition, remember that some column datatypes are not eligible for histograms (such as LONG and CLOB). Column Usage How does the Oracle Database know that a particular column is used in a query predicate or join? This information is gathered by the Oracle Optimizer at parse time and ultimately stored in the Oracle data dictionary in a table called SYS.COL_USAGE$.  We can get a user-friendly view of column usage data using the DBMS_STATS API. In the following example, the report tells us that the HIRE_DATE column in STAFF was used in a range predicate: set long 10000 select dbms_stats.report_col_usage(user, 'staff') from dual; DBMS_STATS.REPORT_COL_USAGE(USER,'STAFF') -------------------------------------------------------------------------------- LEGEND: ....... EQ : Used in single table EQuality predicate RANGE : Used in single table RANGE predicate LIKE : Used in single table LIKE predicate NULL : Used in single table is (not) NULL predicate EQ_JOIN : Used in EQuality JOIN predicate NONEQ_JOIN : Used in NON EQuality JOIN predicate FILTER : Used in single table FILTER predicate JOIN : Used in JOIN predicate GROUP_BY : Used in GROUP BY expression ............................................................................... ############################################################################### COLUMN USAGE REPORT FOR ADHOC.STAFF ................................... 1. HIRE_DATE : RANGE ############################################################################### If you want to test this for yourself, then note that column usage data is flushed to the data dictionary periodically, so if you are using a test script then you might not see the data immediately. If you are in a hurry, you can initiate a manual flush like this: execute dbms_stats.flush_database_monitoring_info() Identifying Candidate Columns We have established that the database has a way of determining how individual columns are used in queries. A column is considered a candidate for a histogram if it is used in joins or query predicates. For example, the following queries will make col1 and txtcol potential candidates for histograms: Query: Column usage: select sum(amount) from sales where col1 = 10; [EQ] select sum(amount) from sales where col1 != 10; [recorded as EQ] select sum(amount) from sales where col1 > 10; [RANGE] select sum(amount) from sales s, customers c where s.col1 = c.col1; [EQ_JOIN] select sum(amount) from sales s, customers c where s.col1 != c.col1; [EQ_JOIN NONEQ_JOIN] select sum(amount) from sales where txtcol like 'ALA%'; [LIKE] Once column usage information has been used to identify candidate columns, the database needs to examine the data in those columns to establish whether a histogram will be useful for improving cardinality estimates. This happens when statistics are gathered (e.g. when using DBMS_STATS.GATHER_TABLE_STATS), but what principles does the database use to finally decide? I will cover that next. Beneficial Histograms The database applies some principles to decide whether a histogram will be of benefit. A histogram will be deemed beneficial if: The column has value skew and column usage indicates RANGE, LIKE, EQ or EQ_JOIN. The column has range skew and column usage indicates LIKE or RANGE. The column has a low number of distinct values (with some repeated values) and column usage indicates RANGE, LIKE, EQ or EQ_JOIN. When incremental statistics are used, and even though a column might not have value/range skew in a partition, a histogram may be created. The database will use partition-level histograms to derive global histograms. Histograms created for non-skewed data are ignored by optimizer stats. You will see that there is overlap between principles 1 and 3, and there is no mention of skew in number 3. The reason is that Oracle considers histograms useful for low number of distinct value (NDV) columns even if there is no significant range or value skew (provided that there are some repeated values). A typical benefit of doing this is in the case where a data distribution is quite uniform in both value and range but there are nevertheless some “holes” in values. The cost of building histograms on low cardinality columns is small, so if a column is used in joins or predicates then there is little reason not to create one. In practice this means that a column with usage RANGE, LIKE, EQ or EQ_JOIN columns and an NDV of less than or equal to 254 will be given a frequency histogram (or even a top frequency histogram if the NDV is a little over 254). The number 254 is significant because it is the default histogram bucket count and is one of the factors taken into account when choosing the type of histogram used. There is more on this here. There is some additional complexity I have not covered. For example, histograms may be created from a partial or full sample of the data (depending on how and when statistics are gathered). Datatypes can be important. For example, if you use a LIKE predicate on a numeric column, then this type of usage will not be recorded in column usage. The boundary between “skewed” and “non-skewed” is a statistical construct (or, if you prefer, a “fuzzy” construct) and column values can have both range skew and value skew. Nevertheless, I hope that you now appreciate the general principles at work. Comments welcome.

Introduction I am often asked, “How does Oracle choose which columns need a histogram?” Also, “Why have I got a histogram on my primary key column?” Jonathan Lewis posted an explanation in response to...

19c

What is automatic SQL plan management and why should you care?

Oracle Database 19c adds a new feature called automatic SQL plan management. This post covers: What is automatic SQL plan management and how it works. Why it is important for you to be aware of it. How to configure. Summary. Where is it available? Recently we changed the availability of this feature so that the new auto mode is no longer available on all platforms in Oracle Database 19c. In addition, it is no longer the default setting.See Automatic SQL Plan Management in table 1-6 of the license guide. Nevertheless, SPM still enables you to 'repair' SQL performance regressions by locating better SQL execution plans. Take a look at this post. What is it? Automatic SQL plan management identifies SQL statements that consume significant system resources and, in addition, they have been observed by the database to be using multiple SQL execution plans, some apparently more optimal than others. Auto SPM identifies the best plans from the alternatives it discovers and prevents the sub-optimal plans from being chosen (or at least until SPM evolution establishes that a sub-optimal plan has become optimal). How is it enabled? The automatic mode is enabled by setting the DBMS_SPM parameter ALTERNATE_PLAN_BASELINE to AUTO. This setting not available on all platforms. The ALTERNATE_PLAN_SOURCE parameter can be set to AUTO on all platforms and it is currently equivalent to "AUTOMATIC_WORKLOAD_REPOSITORY+CURSOR_CACHE+SQL_TUNING_SET". How does it work? Here is a summary of the flow:   The Automatic Workload Repository (AWR) is inspected for SQL execution plans that consume significant system resources. In addition, the database inspects the automatic SQL tuning set (ASTS) if it is available (this is a tuning set maintained by the database primarily for automatic indexing).  The database looks for alternative SQL execution plans in various sources such as AWR, SQL tuning sets and the cursor cache. The plans identified are added to the SQL plan history. The SPM evolve advisor test executes the alternative plans and compares their performance. The evolve advisor decides which plans perform best and adds them to the SQL plan baseline. SQL plan baselines prevent 'regressed' execution plans from being used. How is it configured? I am aware that some DBAs use SQL plan management in a tactical fashion; controlling regression in a carefully chosen set of SQL statements. Some DBAs will capture SQL statements according to a predetermined schedule and perhaps run SPM evolution manually as-and-when required. Some DBAs disable SPM evolution altogether, choosing exactly when and how to evolve a particular SQL statement.  If you currently have a particular way of using SPM, then when you upgrade to Oracle Database 19c you will need to decide whether to choose: Option#1: Use the new 'auto regime' in Oracle Database 19c alone.  Option#2: Continue to use SPM in the way you have in the past, but in conjunction with automatic SPM. Option#3: Disable automatic SPM and continue to use SPM in the way you have used it in the past. Option #1 BEGIN     DBMS_SPM.SET_EVOLVE_TASK_PARAMETER(        task_name => 'SYS_AUTO_SPM_EVOLVE_TASK' ,       parameter => 'ALTERNATE_PLAN_BASELINE',        value     => 'AUTO'); END;  / BEGIN     DBMS_SPM.SET_EVOLVE_TASK_PARAMETER(        task_name => 'SYS_AUTO_SPM_EVOLVE_TASK',       parameter => 'ALTERNATE_PLAN_SOURCE',        value     => 'AUTO'); END;  / Option #2 Use AUTO and continue to capture and evolve as you see fit. The automatic approach can work along-side existing strategies. Option #3 The Oracle Database 19c defaults are now the same as Oracle Database 12c Release 2 and Oracle Database 18c: BEGIN DBMS_SPM.SET_EVOLVE_TASK_PARAMETER( task_name => 'SYS_AUTO_SPM_EVOLVE_TASK' , parameter => 'ALTERNATE_PLAN_BASELINE', value => 'EXISTING'); /* The Default */ END; / BEGIN DBMS_SPM.SET_EVOLVE_TASK_PARAMETER( task_name => 'SYS_AUTO_SPM_EVOLVE_TASK', parameter => 'ALTERNATE_PLAN_SOURCE', value => 'CURSOR_CACHE+AUTOMATIC_WORKLOAD_REPOSITORY'); /* The Default */ END; / If you want to use the same settings in Oracle Database 19c as the Oracle Database 12c Release 1 defaults: BEGIN DBMS_SPM.SET_EVOLVE_TASK_PARAMETER( task_name => 'SYS_AUTO_SPM_EVOLVE_TASK' , parameter => 'ALTERNATE_PLAN_BASELINE', value => ''); END; / BEGIN DBMS_SPM.SET_EVOLVE_TASK_PARAMETER( task_name => 'SYS_AUTO_SPM_EVOLVE_TASK', parameter => 'ALTERNATE_PLAN_SOURCE', value => ''); END; / Notes To view current parameter settings: SELECT PARAMETER_NAME, PARAMETER_VALUE FROM DBA_ADVISOR_PARAMETERS WHERE TASK_NAME = 'SYS_AUTO_SPM_EVOLVE_TASK'; If you want to accept evolved execution plans manually, I recommend that you continue to allow the automatic SPM evolve advisor task to execute (so that you can view SPM reports and findings). Then, to prevent alternative plans from being accepted automatically, use the following setting: BEGIN DBMS_SPM.SET_EVOLVE_TASK_PARAMETER( task_name => 'SYS_AUTO_SPM_EVOLVE_TASK' , parameter => 'ACCEPT_PLANS', value => FALSE); END; / Note that the evolve advisor task, SYS_AUTO_SPM_EVOLVE_TASK, was introduced in Oracle Database 12c. Summary Automatic SQL plan management is a great new way to prevent query performance regressions and capture SQL plan baselines transparently, without any management overhead. The parameters used above were first introduced in Oracle Database 12c Release 2. The internal implementation for identifying alternative plans continued to be enhanced in Oracle Database 18c and Oracle Database 19c.  See the SQL Tuning Guide for more details.  

Oracle Database 19c adds a new feature called automatic SQL plan management. This post covers: What is automatic SQL plan management and how it works. Why it is important for you to be aware of it. How...

19c

What's New in the Oracle Optimizer for Oracle Database 19c?

For those of you that want to keep up-to-speed with the latest changes, this post includes a quick summary of the headline new features in Oracle Database 19c with links to the documentation. In addition, most of the new optimizer white papers have been published. You will find links to them in the new Oracle Query Optimization page (and there will be further updates to this page soon). Automatic Indexing This is a complex and sophisticated piece of functionality but, ironically, it requires only a tiny API because it has been designed to be very easy to use and require virtually zero manual intervention. I'm sure that you can expect a lot to be written about this feature, but you can find material in the SQL Tuning Guide and the DBMS_AUTO_INDEX API. Real-Time Statistics Statistics can now be collected 'on-the-fly' during conventional DML operations. This feature requires no manual intervention, but you can find details in the SQL Tuning Guide's Optimizer Statistics Concepts chapter. High-Frequency Statistics Gathering Real-time statistics are a complement to conventional gathered statistics, so there is still a need for both. To help keep gathered statistics up-to-date, we added a new task to do it more frequently. It is again documented in the SQL Tuning Guide. Automatic SQL Plan Management There is a lot to say about this feature, so my intention is to post a blog in the next few of weeks. We have included more automation in the SQL evolve advisor task, so see the new features guide and the section on managing SQL plan baselines. Documentation for the SQL plan management (SPM) API can be found in the DBMS_SPM API section. Note that the following SQL plan management task parameters have new default values in Oracle Database 19c:  PARAMETER_NAME DEFAULT -------------------------------- ---------- ALTERNATE_PLAN_BASELINE AUTO ALTERNATE_PLAN_LIMIT UNLIMITED ALTERNATE_PLAN_SOURCE AUTO If you currently use SQL plan management, especially if you use it to micro-manage individual SQL execution plans, then you need to be aware of the new default behavior when upgrading to Oracle Database 19c. You do have a choice: you can use the new defaults immediately or, if you prefer, you can set these parameters back to the pre-19c defaults very easily. An additional piece of SQL plan management functionality was added late-in-the-day, so that will make its way through soon. I will post an update here when it's documented. Hint Usage Reporting Diagnosing SQL statement hint issues can be difficult. More information on how we have made is easier can be found in the documentation and in this blog post. Using this feature to diagnose issues with SQL plan baselines is presented in this blog post. SQL Plan Comparison We've made it much easier to compare a 'reference' SQL execution plan with multiple plans from a variety of different sources using a single API call. It's a great tool for diagnosis and you can find details in the section on comparing execution plans in the documentation. The API is DBMS_XPLAN.COMPARE_PLANS. SQL Quarantine This feature allows you to prevent runaway SQL statements from consuming all your system resource. It is covered in detail in the Database Administrator's Guide chapter on Diagnosing and Resolving Problems. Automatic Diagnosis and Repair The process of diagnosing and repairing problems has been enhanced to make it easier to treat issues with the DBMS_SQLDIAG API and the addition of the V$SQL_TESTCASES view. There's more in in the Database Administrator's Guide. Licensing Not all of these features will be available on all database platforms. Consult the Database Licensing Information User Manual for details  

For those of you that want to keep up-to-speed with the latest changes, this post includes a quick summary of the headline new features in Oracle Database 19c with links to the documentation....

19c

Oracle Database 19c and SQL Plan Management Diagnostics

A popular enhancement request I see is to provide an easier way to diagnose issues with SQL plan baselines; in particular the situation where a SQL plan baseline is ACCEPTED but is not being used for whatever reason. This is rare, but can happen if changes are made to the database such as dropping indexes or changing partitioning schemes. If a SQL plan baseline can't be used, you will see something like this in Oracle Database 19c: So why did it fail? In this example I captured a plan that uses an index and then I made the index invisible. There's no way the index plan can be used anymore. However, let's pretend that we don't know what happened. There is now a really nice way to help diagnose issues with SQL plan baselines. It relies on Oracle Database 19c (hint usage reporting) and a hidden parameter to force the SQL statement to use the outline in the SQL plan baseline even if the resulting plan doesn't match the SQL plan baseline (a pretend match). This is how you do it: alter session set "_sql_plan_management_control"=4; explain plan for select /* MYTESTSQL */ sum(num) from mytest1 where id = 10; select * from table(DBMS_XPLAN.DISPLAY(FORMAT=>'typical')); alter session set "_sql_plan_management_control"=0; And then you will see something like this: How beautiful is that? The hint report tells us that INDEX_RS_ASC is not used - a really strong clue. There's a worked example on GitHub if you want to try it yourself. There's also a spool file if you want to look at the example but don't have access to a version 19c database at the moment. I'm afraid that LiveSQL doesn't allow you to use SQL plan management yet. Hey, what? Forcing the plan baseline plan? Some of you might jump on the idea that a parameter can be used to "force a SQL plan baseline to be used" (another popular request). This is not the case! As I said, it forces the outline in the SQL plan baseline to be used even if the plan is not the one we want. The parameter is not a magic bullet that will somehow force the optimizer to use the plan in the SQL plan baseline. You can see from this example that it is just not possible: the index is not available for use. In other words - if the outline in the SQL plan baseline can be used successfully, then it will be used. Telling the optimizer to pretend-match the outline in the SQL plan baseline won't somehow fix the issue. This is demonstrated above - the outline is applied to the SQL statement but the hints it uses cannot be honored. The FULL plan is still used.   

A popular enhancement request I see is to provide an easier way to diagnose issues with SQL plan baselines; in particular the situation where a SQL plan baseline is ACCEPTED but is not being used for...

19c

LiveSQL Now Live on Oracle Database 19c

Since LiveSQL is running on Oracle Database 19c, now is the time to be one of the first to take a look at the new Hint Usage Reporting feature. Trying to figure out why a hint is not being used is not on my top-100-fun-things-to-do list. Luckily, there's now a way to do it much more easily than before. Here is a very simple example - but it's just the tip of the iceberg... create table mytab (a number(10) primary key); insert into mytab values (1); commit; exec dbms_stats.gather_table_stats(null,'mytab'); explain plan for select /*+ FULLL(t) */ * from mytab t; SELECT * FROM table(DBMS_XPLAN.DISPLAY(FORMAT=>'TYPICAL')); SELECT * FROM table(DBMS_XPLAN.DISPLAY(FORMAT=>'BASIC +HINT_REPORT')); Here's some example output. Hint - take a look at the bottom of the report. :-) -------------------------------------------------------- | Id | Operation | Name | -------------------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | INDEX STORAGE FAST FULL SCAN| SYS_C0010006659 | -------------------------------------------------------- Hint Report (identified by operation id / Query Block Name / Object Alias): Total hints for statement: 1 (E - Syntax error (1)) --------------------------------------------------------------------------- 1 - SEL$1 E - FULLL

Since LiveSQL is running on Oracle Database 19c, now is the time to be one of the first to take a look at the new Hint Usage Reporting feature. Trying to figure out why a hint is not being used is not...

18c

Plan Stability in Oracle Database 18c Standard Edition

Oracle Database 18c Standard Edition includes a new way to control SQL execution plans. Stored outlines were deprecated in Oracle Database 11g Release 1 and SQL plan management (SPM) was only available in Oracle Database Enterprise Edition. This meant that many DBAs continued to use stored outlines in Standard Edition and they continue to work to this day. At some point in the future we will begin to remove support for stored outlines so, to avoid leaving a hole in SE, we have made a subset of SQL plan management available in this release from Oracle Database 18c onwards. The license guide remains the single source of truth, so you should always refer to this for full details, but the idea is to give you functionality that's similar to stored outlines using SQL plan baselines. In SE, you can now store a single SQL plan baseline per SQL statement and you have the option to migrate stored outlines directly to SQL plan baselines. I've provided some example SPM scripts in Github for Enterprise Edition and Standard Edition. The SE approach is very similar to the EE approach (as outlined in this post). If you have an questions or feedback regarding these scripts, then let me know by leaving a comment below.    

Oracle Database 18c Standard Edition includes a new way to control SQL execution plans. Stored outlines were deprecated in Oracle Database 11g Release 1 and SQL plan management (SPM) was only...

12c

Fetch First Rows Just Got Faster

Many applications need to paginate rows fetched from the database, or at least retrieve the first N rows. In most cases the data needs to be returned in some kind of order too. If you are an old-school developer, then you are likely to use a variation on this theme: select * from ( select * from the_table order by object_id ) where rownum <= 10; It's not pretty, but it is effective. In fact, it is very effective if the column (or columns) in the ORDER BY have a b-tree index. Oracle reads the index entries in order so that it can avoid having to sort the entire result set. This can speed things up very considerably. If you are new-school, then this is what you probably use instead: select * from the_table order by object_id fetch first 10 rows only; This is much prettier, but I'm afraid it has not always been as effective. The snag has been that the optimizer did not always cost this type of query correctly. This meant that the index access 'trick' was sometimes missed, potentially resulting in a large sort. Now for the good news: a one-off patch (22174392) is available and it works with OFFSET too: This change can have a dramatic effect on the performance for these types of queries, so it's well worth a look if you have FETCH FIRST ROWS queries. I've uploaded a demo with before and after examples to GitHub.  Comments welcome!    

Many applications need to paginate rows fetched from the database, or at least retrieve the first N rows. In most cases the data needs to be returned in some kind of order too. If you are an...

12c

How to Gather Optimizer Statistics Fast!

There are a number of ways to speed up the process of gathering optimizer statistics, but I'm not sure that it's common knowledge just how much of an effect some simple changes can make. If you have been asking yourself, "why is stats gathering taking so long and what can I do about it?", then this post is for you. If you are already familiar with the different methods of gathering optimizer statistics, you might want to jump to the end of this post where I compare them and make some recommendations. Overview The perception that it's difficult to speed up statistics gathering has sometimes motivated DBAs to manipulate the number of rows sampled on a table-by-table basis using the ESTIMATE_PERCENT parameter (or DBMS_STATS preference). For example, large tables may have estimate percent set to 1% and small tables, 100%. Legacy scripts play their part too: some systems are still using procedures established before the performance enhancements available with auto sample size. One of the reasons we recommend ESTIMATE_PERCENT=>AUTO_SAMPLE_SIZE is that it includes number of distinct value (NDV) optimizations that yield high performance and accurate statistics. Gathering statistics using a 1% sample of rows might complete very quickly, but inaccurate statistics are the likely result, along with sub-optimal SQL execution plans. Instead of manipulating ESTIMATE_PERCENT, the time taken to gather statistics can be reduced by using more machine resources. This post compares some before-and-after scenarios to demonstrate how you can do this. Fully worked examples are available in GitHub. I will concentrate on using automatic optimizer statistics gathering, but the lessons are broadly applicable to manual statistics gathering too (there's an example at the end of the post). The examples are intended for use on Oracle Database 12c and Oracle Database 18c. The same techniques are applicable to Oracle Database 11g, but note that the resource consumer groups have different names in that release. Option #1 - Default Statistics Gathering Consider the following trace of CPU consumption over time:  It shows my CPU utilization while the automatic statistics gathering job is running and there's not much else happening on the system. Notice that about 75% of the CPU is not utilized. This fact is easy to understand once you know that the environment has a 4-core CPU with one thread per core. By default, statistics gathering uses a single process (with a single worker-thread) and this will utilize the processing power of a single CPU core. In my case, this equates to a utilization of 25% (one quarter of the 4-core chip). For systems with a higher core count, the single process will utilize an even smaller proportion of the available CPU. Gathering statistics like this is not necessarily a problem. If stats gathering runs to completion most nights and there's no urgent need to have fresh statistics by a certain time, then there's no need to do anything more. Always keep things as simple as possible and only make changes if you need to. If your environment is large and/or volatile, the auto statistics job might regularly fail to run to completion in the batch window. In other words, the window might close before all tables considered stale have fresh statistics. If this is the case, then some tables might remain stale for a long time. Fortunately, this situation is easy to see. If you view the statistics advisor report available in Oracle Database 12c Release 2, then it will tell you. The data dictionary stored this information too. In the example below, my batch window is 20 minutes long and the auto stats job has sometimes failed to complete (status STOPPED). The JOB_INFO column reveals the reason: auto statistics collection is occasionally taking longer than 20 minutes and terminates when the batch window closes. How can we fix this? We could (a) make the batch window longer and/or (b) speed up statistics gathering. I am going to consider option b (because option a is less interesting). How do you speed up statistics gathering? If you have resources on your database server, then you could dedicate more of it to gather statistics. You can reduce the elapsed time of gathering statistics at the cost of a more fully utilized database server. It is of course necessary to identify a window of time where there's spare system resource, so this solution requires that the system is not running at 100% all of the time. It is worth noting that other techniques are available to reduce the time required to maintain statistics (such as incremental statistics maintenance), but this is out of scope for the purposes of this blog post. Option #2 - Gathering Statistics in Parallel – AUTO_DEGREE Gathering statistics with auto sample size initiates full table scans to inspect table data. We can leverage parallel execution to make these scans complete in less time. To do this you can, for example, identify large tables and define a specific degree of parallelism (DOP): exec dbms_stats.set_table_prefs(user, 'BIG_TABLE', 'DEGREE', 16) There is an easier, set-and-forget approach where you can let Oracle to decide on the DOP for you: exec dbms_stats.set_table_prefs(user, 'BIG_TABLE', 'DEGREE', DBMS_STATS.AUTO_DEGREE) A clean and simple approach is to set the property at the global level: exec dbms_stats.set_global_prefs('DEGREE', DBMS_STATS.AUTO_DEGREE) With parallel execution in play, statistics gathering has the potential to consume lots of system resource, so you need to consider how to control this. When the auto stats gathering job executes it (by default) uses the resource management plan DEFAULT_MAINTENANCE_PLAN and a consumer group called ORA$AUTOTASK. This makes it very easy to make some adjustments and control just how much resource you want to dedicate to gathering statistics. You first need to decide what priority to attribute to auto stats gathering depending on what other processes are likely to be running at the same time. In the following example, the auto stats job has a minimum of 5% CPU if other tasks running in the database are competing for CPU. However, if the system is not busy, we will allow the job to consume up to 80% of the CPU (this will protect processes that must run outside the control of the database). The maximum degree of parallelism an individual session can use is four in this case. It is useful to control the maximum DOP because you will want to make sure that you do not reach the maximum number of parallel server processes allowed for the system (this will become more relevant later in this post). For completeness, the example above includes all plan directives for the DEFAULT_MAINTENANCE_PLAN, but it is only necessary to specify the plan directives you want to modify. By default, when the maintenance windows opens, it will activate the DEFAULT_MAINTENANCE_PLAN. If you prefer, you can create your own resource management plan and associate it with any maintenance windows of your choosing. If you also set the resource_management_plan initialization parameter, then you can use the same resource management plan when the batch windows are both open and closed. Here's an example: When AUTO_DEGREE is used the resource utilization can look very different . In this example, the tables are all identical so there's a very regular pattern: We are now using much more CPU, and consequently the job completes in only 12 minutes and 26 seconds (where, previously, it failed to complete within the 20-minute window): Remember that database resource management (DBRM) is in force during the batch window, so it is very easy to adjust CPU utilization even while the job is running. For example - consider what happens when I adjust the utilization limit down from 80% to 40% and then back again: Let's look at a more realistic AUTO_DEGREE scenario. In the following example we have a schema containing tables that have a wide variation in size. The CPU profile is now less consistent: The DOP is changing in response to the size of each individual table. The job runs serially at first (about 25% CPU), then DOP 2 for a while, then DOP 3 and then back to serial. We could micro-manage DOP on a table-by-table basis, but it is much better to avoid approaches like this because we should always aim to avoid too much manual intervention. The global AUTO_DEGREE solution will be good enough in many cases, so there will be no need for any further manual intervention.   Option #3 - Gathering Statistics Concurrently - CONCURRENT Parallel statistics gathering has enabled us to increase CPU utilization significantly, but what if we have spare machine resources and want to go even faster? In the previous example, the CPU could be more fully utilized. If you want to achieve that, then how do you go about it? Firstly, disable parallel execution (we will come back to that later): exec dbms_stats.set_global_prefs('DEGREE', 1) The CONCURRENT preference allows DBMS_SCHEDULER to initiate multiple statistics gathering jobs at once, so that the database will gather statistics on multiple tables and partitions concurrently. We can choose to enable this behavior for auto stats gathering only: exec dbms_stats.set_global_prefs('CONCURRENT','AUTOMATIC') The database will now gather statistics using multiple scheduler jobs. In my case, auto stats initiated 16 job processes and the CPU profile looked like this: I am using max_utilization_limit set to 80%, and the job completes in 11 minutes and 39 seconds: Concurrent statistics gathering works very well if tables are of a similar size, but without parallel execution, serial jobs running on very large tables can take a long time and the might not complete before the batch window closes. If this is a problem for you, you can use a combination of concurrent processing and parallel execution. Option #4 - Gathering Statistics Concurrently and in Parallel - CONCURRENT and AUTO_DEGREE Care is required when implementing concurrency and parallel execution because there's scope to execute a very large number of concurrent parallel execution servers and generate a very high system load. Multiple jobs will start and each has the potential to initiate a number of parallel query servers. As a very general rule of thumb, you want to have no more than about 2*CPUCoreCount to 4*CPUCoreCount parallel servers executing at any one time. You can mitigate the risk of initiating too many parallel execution servers as follows: There is currently no way to cap the number of job queue processes allocated to concurrent stats gathering, so 'turning down' the job_queue_processes setting is the only way to do this. I have created an enhancement request with respect to this limitation. Enable concurrent stats gathering for the automatic statistics gathering job: exec dbms_stats.set_global_prefs('CONCURRENT','AUTOMATIC') Set AUTO_DEGREE globally: exec dbms_stats.set_global_prefs('DEGREE', DBMS_STATS.AUTO_DEGREE) Or, for individual large tables: exec dbms_stats.set_table_prefs(user, 'BIG_TABLE', 'DEGREE', DBMS_STATS.AUTO_DEGREE) On my system, the auto stats initiated 16 job processes and the resource management plan I used limited DOP to four. This resulted in 64 parallel execution servers executing concurrently, so a DOP limited to two might have been a better choice in my case. Here is the new CPU profile: The job completed in 13 minutes 16 seconds: In this case, why was there no benefit in run time using CONCURRENT and AUTO_DEGREE? It is because CONCURRENT without AUTO_DEGREE consumed CPU up to the 80% limit imposed by DBRM. In the more general case where there is a good mix of table sizes, some very large tables and a faster IO subsystem, then AUTO_DEGREE used in combination with CONCURRENT has the potential to yield the shortest gather stats times. If you are in a position to be able to decrease job_queue_processes to limit the number of jobs that execute concurrently, then you will be able to increase the DOP limit to a higher values:  The same is true if there are only a small number of tables where parallelism is used. Oracle Multitenant At the time of writing there's a bug with parallel statistics gathering in a multitenant database when used with CONCURRENT (unpublished bug# 27249531). Parallel execution servers initiated by gather stats are not constrained by max_utilization_limit. This can result in high CPU consumption. Using DEGREE above 1 or AUTO_DEGREE is OK if CONCURRENT is not used. Until a fix is available for your platform and version, the best solution is to use DEGREE=>1 if you want to use CONCURRENT in multitenant environments. Manual Statistics Gathering If you want to initiate stats gathering manually, and still make full use of parallel and concurrent settings, then you can use the following approach: Performance Comparisons A small test system was used for the examples above, so it will be useful to see what an enterprise-class system looks like (let's say 72 cores with HT). The Oracle Real World Performance Group ran some tests to check out the different techniques. The relative performance of the stats gathering methods will be different on every system you try, so treat this as entertainment rather than science. For example, the test tables were all large and all the same size, so this will work in favor of AUTO_DEGREE (used without CONCURRENT) because a high degree of parallism was used for every table. A large number of CPU cores will make the default method look exceptionally underpowered. In this case the CPU is only 1.5% utilized; a single core in an enterprise-class system: Here are the results plotted: Summary Remember that you will need spare machine capacity to gain benefit from the techniques outlined in this blog post. Generally speaking, option #2 is most likely to give you a quick and easy win if there are a number of very large tables. Option #3 is great if you have plenty of spare machine resource and a large number of smaller tables. Option #4 requires more care to avoid initiating too many parallel execution servers. Options #3 and #4 are particularly useful if you need to get a one-off stats gathering task done very quickly: perhaps when you are commissioning a new deployment or gathering statistics after an upgrade. Here's a high-level summary: I've uploaded self-contained test scripts to GitHub. Comments welcome!

There are a number of ways to speed up the process of gathering optimizer statistics, but I'm not sure that it's common knowledge just how much of an effect some simple changes can make. If you have...

Fine-Grained Cursor Invalidation

This feature made its debut in Oracle Database 12c Release 2 and enhancements in Oracle Database 18c allow it to be used in a broader range of scenarios. SQLMaria mentioned it in a blog post so I realized that I am overdue in producing a blog post for this cool feature. What is Fine-Grained Cursor Invalidation? You might be familiar with the concept of rolling or deferred cursor invalidation because DBMS_STATS uses it if you gather statistics with NO_INVALIDATE=>AUTO_INVALIDATE (this is the default). Fine-grained cursor invalidation is similar except that it is relevant when DDL commands are used to change schema objects; adding indexes or changing partitions and so on. Consider a SALES table that has a number of SQL statements in the shared pool that access it. These statements are referred to as dependent cursors. If we make any changes to SALES (such as adding an index or refreshing statistics) then we should invalidate and re-parse dependent cursors to take these changes into account. If SALES is a 'hot' application table and it has a large number of dependent cursors then our DDL might cause a sudden surge in hard parses. An increase in hard parse rate on a busy system can have a detrimental effect on database performance, but if we spread re-parse activity over a period of time then performance is much more likely to remain steady. Prior to Oracle Database 12c Release 2, DDL invalidated dependent cursors immediately, but with fine-grained cursor invalidation the database can often defer invalidation and spread the cost of re-parsing dependent cursors. What is Rolling Invalidation? A cursor marked for rolling invalidation will become eligible for re-parse after a random amount of time (up to a maximum). A newly generated random time is assigned to each dependent cursor. Randomization therefore spreads the overhead of re-parsing dependent cursors over time. How does Fine-Grained Cursor Invalidation Work? During cursor compilation the database annotates cursors with fine-grained dependency information (from Oracle Database 12c Release 2 onwards). For example, "this cursor reads index SALES_IDX" or "this cursor does static partition pruning and only accesses partitions in the range [P1, P10]" or "this cursor does a partition related optimization that depends on the partition scheme of the tables involved, e.g. partition-wise join". Commonly-used DDLs define a set of rules that consult each cursor's fine-grained dependencies to decide how the cursor should be processed. There are four actions that we can take for a cursor: Cursor remains valid -- this is used when we know that the cursor is safe to execute and that the cursor's plan is still optimal. For example, modifying a table partition to read only. Cursor is marked for rolling invalidation -- this is used when we know that the cursor is safe to execute but its plan may be sub-optimal. For example, if we create a visible index, then the cursor is safe to execute but we want to recompile it sometime in the future so the new index can be considered for the plan. Cursor is marked for rolling invalidation with metadata refresh -- this is used when the cursor is safe to execute after refreshing some of its metadata at runtime. For example, if we rebuild an index that is used by the cursor, then the cursor can be executed if we refresh the metadata for the index so the cursor uses the new index segment. Cursor is invalidated -- this is used if the cursor is no longer safe to execute. For example, if we drop an index that is used by the cursor, we must invalidate the cursor so it is recompiled to get a new plan that does not use the index. Note that actions 1, 2, and 3 correspond to DDL_NO_INVALIDATE, IS_ROLLING_INVALID, and IS_ROLLING_REFRESH_INVALID columns in V$SQL. Action 4 is simply the existing invalidation action that was used prior to the fine-grained invalidation feature. Action 1 is enabled by default; i.e. we will use action 1 when applicable without any syntax or parameter required. Actions 2 and 3 are only used if DEFERRED INVALIDATION syntax is used, or if CURSOR_INVALIDATION parameter is set to DEFERRED (see below). Summary Fine-grained cursor invalidation reduces the risk of performance degradation associated with DDL operations on objects in you application schema. More As mentioned above, the feature is controlled using DDL statements that have an additional DEFERRED VALIDATION clause (for example, ALTER TABLE). There is also a CURSOR_INVALIDATION initialization parameter. More detail is available in the SQL Tuning Guide. Note that we have not documented the multitude of factors that affect where this feature will or will not kick in. This is where the differences between Oracle Database 12c and Oracle Database 18c are, so the changes between these two releases will not be obvious from the documentation. Nevertheless, if you want to experiment, I have linked to some scripts below that will give you a good place to start. You will see that DML cursors are treated differently to queries because queries are generally less sensitive to change than DML. A SQL statement's eligibility for deferred invalidation is also dependent on access method (e.g. whether its SQL execution plan uses a modified index) and also whether partitions are statically pruned or accessed via KEY. Big thanks to George Eadon for putting together the 'how does it work?' section. If you want to try this feature out, I've uploaded examples to GitHub. The tests are in a very raw state, but I explain that in the README file.

This feature made its debut in Oracle Database 12c Release 2 and enhancements in Oracle Database 18c allow it to be used in a broader range of scenarios. SQLMaria mentioned it in a blog post so I...

ADWC

The Oracle Optimizer and ADWC - Hints

This is Part 3 of a series on the Oracle Optimizer in the Oracle Autonomous Data Warehouse Cloud. You can find part 1 here and part 2 here. It's time to take a look at optimizer hints. Here's our test query: select sum(t1.num), sum(t2.num) from table1 t1 join table2 t2 on (t1.id = t2.id); Executing on an ADW database (using the LOW consumer group) yields this plan: ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 4 (100)| | 1 | RESULT CACHE | 86m6ud7jmfq443pumuj63z1bmd | | | | | 2 | SORT AGGREGATE | | 1 | 52 | | |* 3 | HASH JOIN | | 1 | 52 | 4 (0)| | 4 | TABLE ACCESS FULL| TABLE2 | 1 | 26 | 2 (0)| | 5 | TABLE ACCESS FULL| TABLE1 | 1000 | 26000 | 2 (0)| ---------------------------------------------------------------------------------------- There are of course no indexes on the table so this is the best plan (we get a single row from TABLE2 so it leads the HASH join). I will now try to make the plan worse using a hint:   :-) select /*+ LEADING(t1 t2) USE_NL(t2) */ sum(t1.num), sum(t2.num) from table1 t1 join table2 t2 on (t1.id = t2.id); This doesn't work - the plan does not change. Take my word for it for now; there is a link to test scripts at the bottom of this post. Autonomous Data Warehouse Cloud ignores optimizer hints and PARALLEL hints in SQL statements by default. If your application relies on them you can set OPTIMIZER_IGNORE_HINTS to FALSE at the session or system level using ALTER SESSION or ALTER SYSTEM. You can also enable PARALLEL hints in your SQL statements by setting OPTIMIZER_IGNORE_PARALLEL_HINTS to FALSE at the session or system level. For this example, I used ALTER SESSION to give me the sub-optimal plan I wanted (TABLE1 is now the leading table and it's a NESTED LOOPS join): alter session set optimizer_ignore_hints = false; select /*+ LEADING(t1 t2) USE_NL(t2) */ sum(t1.num), sum(t2.num) from table1 t1 join table2 t2 on (t1.id = t2.id); ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 73 (100)| | 1 | RESULT CACHE | db11srrdf8ar4d06x4b1j674pp | | | | | 2 | SORT AGGREGATE | | 1 | 52 | | | 3 | NESTED LOOPS | | 1 | 52 | 73 (3)| | 4 | TABLE ACCESS FULL| TABLE1 | 1000 | 26000 | 2 (0)| |* 5 | TABLE ACCESS FULL| TABLE2 | 1 | 26 | 0 (0)| ---------------------------------------------------------------------------------------- Why is ADWC set up like this? It's pretty simple: the Oracle Optimizer's job is to find good SQL execution plans without manual intervention. It is not the application developer's or DBA's job, so hints should be avoided as much as possible. Over time, they can prevent applications from taking advantage of new optimization techniques, so try and leave the heavy-lifting to the database. Think autonomous. If you looked at  part 1 of this series, then you will know that we are careful with this restriction and allow INSERT /*+ APPEND */ by default. To try this example for yourself, it's uploaded to GitHub. Comments and suggestions welcome!

This is Part 3 of a series on the Oracle Optimizer in the Oracle Autonomous Data Warehouse Cloud. You can find part 1 here and part 2 here. It's time to take a look at optimizer hints. Here's our...

ADWC

The Oracle Optimizer and ADWC - Statistics-Based Query Transformation

This is Part 2 of a series on the Oracle Optimizer in the Oracle Autonomous Data Warehouse Cloud. You can find Part 1 here. In this post, I'm going to cover a clever optimization included in ADWC: Statistics-Based Query Transformation. I'm sure that you can guess what it is already, but let's take it step-by-step. Consider the following query: SELECT max(sale_value) FROM    huge_fact_table; To find MAX, the Oracle Database would normally need to scan all of the rows in the huge fact table to find the maximum value (I'm assuming that there's no index on SALE_VALUE). This scan can be very time-consuming, but there is a potential short-cut. When statistics are gathered on a table the database retains information on how may rows there are as well as details of column data such minimum and maximum values and the number of distinct values (NDV). If the database is sure that statistics are up-to-date, then it is clearly in a position to answer some queries using the information it has already gathered. Here is an example I executed on ADWC using the HIGH consumer group: select max(num0),min(num1) from fact1; --------------------------------------------------------------- | Id | Operation | Name | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | RESULT CACHE | cpuu29wy174jfbu3a7zcxzkw12 | | 2 | VIEW | VW_SQT_65BBF4BE | | 3 | SORT AGGREGATE | | | 4 | PX COORDINATOR | | | 5 | PX SEND QC (RANDOM) | :TQ10000 | | 6 | SORT AGGREGATE | | | 7 | PX BLOCK ITERATOR | | |* 8 | TABLE ACCESS FULL| FACT1 | --------------------------------------------------------------- The first thing of note is that we use the server result cache, but I'm going to stick to the topic at hand an concentrate on operation ID 2: VW_SQT_65BBF4BE. This tells us that the statistics-based query transformation has kicked in and, potentially, the values of max(num0) and max(num1) can be retrieved from the statistics stored in the data dictionary. We are in a position to avoid the need to scan FACT1. The word 'potentially' might sound rather vague, but bear in mind that the query transformation is established when the query is parsed, so when it is executed it is possible that the data in FACT1 has been updated in some way. This will prevent us from being able to use the dictionary statistics because the data might no longer match the statistics. In ADWC, the easiest way to see that statistics have actually been used is to observe the much-reduced query response time. As you might expect, there are restrictions regarding when this transformation can and cannot be used. I've included some queries in the GitHub repository that use the transformation and some that don't. Hopefully, this will be enough to get you started if you want to explore the boundaries. There's also a spool file if you can't run the example yourself. Part 3 of this series is here. As ever, comments welcome!      

This is Part 2 of a series on the Oracle Optimizer in the Oracle Autonomous Data Warehouse Cloud. You can find Part 1 here. In this post, I'm going to cover a clever optimization included in ADWC: Sta...

ADWC

The Oracle Optimizer and ADWC - Statistics and Bulk Load

It's time for some posts on the Oracle Autonomous Data Warehouse Cloud and the enhancements we've made to Statistics Management and the Oracle Optimizer. This is Part 1, and it covers statistics maintenance during bulk load. I'll add links to the other posts when they become available.  My scripts on the topic of autonomous are stored in GitHub here. The scripts for this post are here. Statistics and Bulk Loading Consider an empty fact table called FACT1. Let's populate it with 10,000 rows using a bulk transform and load operation from FACT1_SOURCE: SQL> insert /*+ APPEND */ into fact1 select num0,1,txt1 from fact1_source; 10,000 rows inserted. Elapsed: 00:00:00.519 -- Take a look at stats... select table_name,num_rows,sample_size,stale_stats from user_tab_statistics where table_name = 'FACT1'; TABLE_NAME NUM_ROWS SAMPLE_SIZE STALE ------------------------------ ---------- ----------- ----- FACT1 10000 10000 NO We can see that there are valid statistics on FACT1, and this will come as no surprise if you are familiar with online statistics gathering.  Now let's bulk load another 10,000 rows... SQL> insert /*+ APPEND */ into fact1 select num0,2,txt1 from fact1_source; 10,000 rows inserted. Elapsed: 00:00:00.414 ... TABLE_NAME NUM_ROWS SAMPLE_SIZE STALE ------------------------------ ---------- ----------- ----- FACT1 20000 20000 NO This is new behavior: ADWC maintains statistics on each subsequent direct path load. The demo script continues to insert more rows (from FACT1 so that we rapidly increase the row count) until finally... SQL> insert /*+ APPEND */ into fact1 select num0,16,txt1 from fact1; 81,920,000 rows inserted. Elapsed: 00:00:49.970 ... TABLE_NAME NUM_ROWS SAMPLE_SIZE STALE ------------------------------ ---------- ----------- ----- FACT1 163840000 163840000 NO What about histograms? SQL> select table_name,column_name,low_value,high_value,sample_size,histogram   from user_tab_col_statistics   where table_name = 'FACT1'; TABLE_NAME COLUMN_NAME LOW_VALUE HIGH_VALUE SAMPLE_SIZE HISTOGRAM ----------- ------------ ---------- --------------- ----------- --------------- FACT1 NUM0 C102 C302 163840000 HYBRID FACT1 NUM1 C102 C111 163840000 FREQUENCY FACT1 TXT1 58585831 58585839393939 163840000 HYBRID We have histograms too! Summary I connected to ADWC using the HIGH consumer group and within two or three minutes I populated FACT1 with over 160 million rows using a series of bulk load operations. Statistics (including histograms) were maintained automatically. Not bad! Part 2 of this series is here.          

It's time for some posts on the Oracle Autonomous Data Warehouse Cloud and the enhancements we've made to Statistics Management and the Oracle Optimizer. This is Part 1, and it covers statistics...

How do I

How to Generate a Useful SQL Execution Plan

Introduction There is an old joke the unreconstructed comic Dave Allen used to tell, where a traveler asks a passer-by for directions to a particular town and the passer-by simply says, “Well I wouldn’t start from here if I were you.” When it comes to SQL execution plans, if you start from the wrong place, then you probably won't make it to your destination. The purpose of this blog post is to take stock for a moment and present what I consider to be the best 'default' methods for collecting SQL execution plans. This post is intended for those of you that don't have an established method already and want to make sure that you capture something that is actually useful. To clarify what I mean by 'useful': I mean a plan that will help you to learn how SQL execution plans work (if you don't know already) and one that is suitable for figuring out if there is a problem that makes the SQL statement take longer to execute than it should. A SQL execution plan reveals a great deal about how the Oracle Database plans to execute (or has executed) a SQL statement. Do you need to understand SQL execution plans to be an effective Oracle Database expert? No - but most of us like to learn new things, and it's fun to take a look inside the machine sometimes. There's a lot of opinion in the post, so remember that comments are very welcome. Yet Another Article on SQL Execution Plans? I know that there are a LOT of articles and chapters in books about generating SQL execution plans. There is no single 'right way', but I want to distill things down to a few cases that will be good-to-go in most scenarios. Why? Well, I get sent quite a large number of SQL execution plans, and I often find myself wishing for that vital piece of information that's missing from the plan I've been sent. In addition, there seems to be some blind spots – useful methods that are often mentioned but often missed. Finally, when I wanted to learn to read plans myself, I found it confusing and frustrating until I realized that there's a lot of incredibly helpful information provided by the Oracle Database, but you won't see it if you don't ask for it! It is perhaps easy to believe that you are the only one to think that SQL execution plans are difficult to understand. Often they are difficult to understand – their sheer size can be daunting. Some are almost impossible to evaluate if certain details are missing. They can be confusing because some query transformations and operations will result in reported numbers (such as Rows) being at odds with what you might expect. This won't prevent you from understanding how queries are executed, but when you start out, it can give you some tough hurdles to leap. The examples below generate lot of information that is useful but potentially overwhelming (and probably unnecessary at first). Nevertheless, the information is broken down into sections (or available through an Enterprise Manager UI) so it is easy to digest piecemeal or simply ignored until you want to consider it. I have not listed the output of all the examples below because it would take up too much space, so I uploaded some self-contained scripts to GitHub. Examples Here are my suggestions … Example A If you can run the query stand-alone using (for example) SQL Plus or SQLcl: select e.ename,r.rname from   employees  e join   roles       r on (r.id = e.role_id) join   departments d on (d.id = e.dept_id) where  e.staffno <= 10 and    d.dname in ('Department Name 1','Department Name 2'); SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALL +OUTLINE')); Or, if you don’t want to execute the query: explain plan for select e.ename,r.rname from   employees  e join   roles       r on (r.id = e.role_id) join   departments d on (d.id = e.dept_id) where  e.staffno <= 10 and    d.dname in ('Department Name 1','Department Name 2'); SELECT * FROM table(DBMS_XPLAN.DISPLAY (FORMAT=>'ALL +OUTLINE')); The important feature of this example is that I am using FORMAT=>'ALL +OUTLINE'. Some of you might have come across the undocumented option, FORMAT=>'ADVANCED'. I am not using it here because the content of its output has the potential to be different between releases, but there's no fundamental reason why you can't use it. The 'ALL' format is documented and 'OUTLINE' is mentioned briefly; its basic content is unlikely to change between releases. Example B If you cannot run a query stand-alone, you can still get plan information from the cursor cache using a query like this: SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR(   SQL_ID=>'the_SQL_ID',   CHILD_NUMBER=>the_child_number,   FORMAT=>'ALL +OUTLINE')); You will need the SQL_ID and CHILD_NUMBER of the query you want. There are many ways of doing this, but if you have DBA privilege then you can search for the statement in V$SQL: select /* MY_TEST_QUERY */ e.ename,r.rname from employees e join roles r on (r.id = e.role_id) join departments d on (d.id = e.dept_id) where e.staffno <= 10 and d.dname in ('Department Name 1','Department Name 2'); select sql_id, child_number, sql_text from v$sql where sql_text like '%MY_TEST_QUERY%' and sql_text not like '%v$sql%'; The plans above do not include any runtime information, so you will not see how long each part of the plan took to execute or how many rows were actually processed. For example, 'Rows' is an estimate; it does not tell you how many rows were actually processed. If you gather and examine runtime information, it is likely that your level of understanding will be enhanced significantly. How do you go about getting it? Example C You can use a hint to gather runtime information: select /*+ gather_plan_statistics */        e.ename,r.rname from   employees  e join   roles       r on (r.id = e.role_id) join   departments d on (d.id = e.dept_id) where  e.staffno <= 10 and    d.dname in ('Department Name 1','Department Name 2'); SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST ALL +OUTLINE')); This will show you statistics such as the actual number of rows processed (A-Rows), rather than just the estimates (E-Rows). It also includes a column called Starts, which tells you how many times each step was executed. A-Rows, E-Rows and Starts are all incredibly useful if you want to understand a plan. Example D If you don’t want to change the query text to add the hint, there is a parameter you can set instead: alter session set statistics_level='ALL'; select e.ename,r.rname from employees e join roles r on (r.id = e.role_id) join departments d on (d.id = e.dept_id) where e.staffno <= 10 and d.dname in ('Department Name 1','Department Name 2'); SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST ALL +OUTLINE')); Example E DBMS_XPLAN 'ALLSTATS LAST' does not give you a continuous view of runtime statistics while a query is executing, but SQL Monitor solves this problem. It requires the Oracle Tuning Pack, so always check the licence user guide for your database version. This tool is fantastic for generating plans and monitoring SQL, and it is available via Enterprise Manager in the Performance Hub. Before I cover that, you can use it on the command line too (a fact that is often missed or forgotten for some reason): select /*+ MONITOR */ e.ename,r.rname from employees e join roles r on (r.id = e.role_id) join departments d on (d.id = e.dept_id) where e.staffno <= 10 and d.dname in ('Department Name 1','Department Name 2'); -- Get the SQL ID of the query we just executed select prev_sql_id from v$session where sid=userenv('sid') and username is not null and prev_hash_value <> 0; PREV_SQL_ID ------------- an05rsj1up1k5 set linesize 250 pagesize 0 trims on tab off long 1000000 column report format a220 select DBMS_SQL_MONITOR.REPORT_SQL_MONITOR (sql_id=>'an05rsj1up1k5',report_level=>'ALL') report from dual; The SQL_ID parameter is optional, but I usually set it explicitly because there might be multiple long-running queries in the system, so the default report will sometimes pick up a different SQL statement to the one I am experimenting with. The database automatically makes long-running queries available to SQL Monitor, but I used a MONITOR hint in this case because the query is very fast and wouldn't normally show up. It can be useful to monitor a query while it is executing because you can watch its progress and learn from that. This is where SQL Monitor is really useful because you can watch a query in another session and see its statistics updating continuously. You don’t necessarily have to wait for it to complete to figure out what part of the query is taking a long time, so you can sometimes avoid having to wait for completion. Note that you can get 'ALL +OUTLINE' plan details while a query is executing - just use Example B, above. You can even generate an active HTML report using the command line! This is a great way to capture a SQL execution plan and explore it interactively later on. Just run the report like this: -- spool output to a file, then… select DBMS_SQL_MONITOR.REPORT_SQL_MONITOR (sql_id =>'an05rsj1up1k5', report_level =>'all', type =>'ACTIVE') report from dual; If you spool the output and open it in a browser, you get an interactive HTML page like this: Bear in mind that the browser requires Internet access because the HTML report downloads some external assets. Example F I know that many of you love the command line (and I am the same) but you should check out using SQL Monitor in the Oracle Enterprise Manager Performance Hub. It’s much easier to access interactive SQL Monitor reports and they will refresh continuously as query execution progresses. In addition, it is easy to save these reports and send them to others. Just use the Save button (circled in red, below). If you hit the 'Plan' tab, it can be enlightening to look at a graphical view if the plan is not too large. I like to select 'Rotate' to give me a tree that is oriented vertically. Aha - now I can see what the left side and right side of a join actually means! Very broadly speaking, you read trees from the bottom left up. I might blog about this later. In the following example, and in common with the examples above, the database reads DEPARTMENTS first, then joins the rows with EMPLOYEES and then joins these rows with ROLES. Example G Finally, there is SQL Developer too! With DBMS_XPLAN: Summary If you want to save and share a plan, then... More Check out the self-contained test scripts for this post. If you want more detail and more options for looking at plans, then check out Maria’s blog posts on DBMS_XPLAN and SQL Monitor. If you want to generate plans and send a runnable test case to someone else, then check out Test Case Builder and the Oracle Support tool SQLT. Comments and @vldbb welcome!

Introduction There is an old joke the unreconstructed comic Dave Allen used to tell, where a traveler asks a passer-by for directions to a particular town and the passer-by simply says, “Well I...

Should You Gather System Statistics?

Introduction Should you gather system statistics? If you deploy Oracle Databases then you will have an opinion on the answer, but what does Oracle recommend? Before I get to that, I'll present a brief survey of existing Oracle collateral and then cover how systems statistics affect SQL execution plans. If you want to skip directly to the point, then check out the recommendation section, below. A Brief Survey of Existing Recommendations Oracle white papers and documentation include some recommendations, but there are some differences and there has been some variation over time. Oracle introduced DBMS_STATS.GATHER_SYSTEM_STATS back in 2001 with Oracle Database 9i and in the Database Performance Guide and Reference it clearly states that gathering system statistics is highly recommended. Later versions of the documentation have been, until recently, a little more equivocal. Oracle added an EXADATA option to take into account the characteristics of Oracle Exadata Database Machines. Oracle Optimizer white papers, on the other hand, recommend using system statistics defaults and not gathering them manually. Recent updates to the documentation have aligned with this message. If systems statistics are not gathered, the optimizer measures CPU speed when the database starts and IO costing is calculated using default metrics.  How System Statistics Affect SQL Execution Plans The concept behind system statistics is to measure the performance of system CPU and the storage subsystem (such as NAS, SAN, JBOD or flash) and use this information when costing alternative SQL execution plans. A query consumes CPU and (in many cases) storage resources when it executes. A typical query will have many potential SQL execution plans and each plan may consume a different proportion of CPU versus storage IO. The Oracle Optimizer’s job is (broadly speaking) to find the plan that is estimated to take the least amount of time to execute. In reality, it compares alternative plans using an internal metric called cost. The optimizer chooses the plan with the lowest estimated cost (out of all the plans it considers). If the database knows how fast storage and CPU actually is, then it can make finer judgements about the cost of each alternative plan. Consider a query that has three possible plans. Each plan uses a different amount of CPU and IO. The diagram below illustrates that Plan 1 is expected to use a large amount of CPU and very little IO. For the sake of this example, we will assume that Plan 1 is deemed to have the lowest estimated cost and has duly been chosen by the optimizer. Consider now a system that has a particularly high performance storage infrastructure. It has the potential to shift the relative performance balance between the CPU and IO. Gathering system statistics will allow the optimizer to take this into account. In our example, the presence of high performance storage lowers the relative cost of Plan 2 and Plan 3 significantly, and Plan 1 by a small amount (because it uses less IO). This shift is enough to make Plan 3 the best choice; it now has the lowest estimated cost. On systems with fast IO infrastructure, we can expect the use of system statistics to tip the balance a little towards using table scans in preference to indexes.   Recommendation If gathering system statistics benefits your workload and you are happy to manage them, then you have no reason to change. Alternatively, if you are at a decision point and you need to choose whether to gather them or not, then in most cases you should use the defaults and not gather system statistics. There is an exceptional case to consider. Databases supporting a pure data warehouse workload on an Oracle Exadata Database Machine can benefit from system statistics gathered using the EXADATA option. It will make table scans more likely and, of course, this plays directly into one of the architecture’s major strengths. Nevertheless, if the workload is mixed or you are not in a position to test the effect of using EXADATA system statistics, then stick to the defaults even on this platform. It is important to remember that if you change the way you use system statistics, then this can affect SQL execution plans. For this reason, you should only make a change if you are at a decision point and you are able to test the effect of the change (or at least in a position to mitigate the risk of undesirable changes in workload performance). Justifying the Recommendation The recommendation is always going to be controversial. I am well aware that some DBAs and architects are very happy using system statistics and consider them highly beneficial. There is nothing wrong with this position and there are undoubtedly cases where system statistics have tipped the balance and improved the performance of some queries. However, this tipping-point is not in the same place for all queries, so it is quite possible that it will not work out well for some. It is also possible that a particular set of system statistics will not benefit all workloads handled by the database (such as ETL/ELT and OLTP). The net result is that gathering system statistics is unlikely to be the silver bullet that delivers ideal performance across the board in a consistent manner: you might still have to tune some queries. Of course, you probably never expected system statistics to be a silver bullet, but there is at least some management or procedural overhead required to maintain them and, depending on how you use them, they can introduce some variability in SQL execution plans (between different systems or over time on a single system). The recommendation allows you to avoid potential plan variability and management overhead. If you are testing a workload where gathering system statistics has improved performance, it is worth spending some time to find out why. It is unlikely that there has been an across-the-board improvement. It is more likely that the performance of a small number of significant queries has improved or there are perhaps there are a group of queries with a common pattern (maybe accessing a particular set of tables or using a particular combination of predicates). You might find that it is better to free yourself from managing system statistics and, instead, use the tools that Oracle provides you with to tune the queries that are not performing as well as you want. To the best of my knowledge, relevant MOS notes and the docs have been aligned with this message. If you spot anything contradicting this, then please post a comment. All comments welcome! 

Introduction Should you gather system statistics? If you deploy Oracle Databases then you will have an opinion on the answer, but what does Oracle recommend? Before I get to that, I'll present a...

Gathering Statistics Only When Stale

Oracle recommends that you use the Automatic Statistics Gathering Job to maintain optimizer statistics, but sometimes you might want to gather statistics on a select number of tables rather than (for example) all tables in a particular schema. Perhaps you want to make sure that stats are fresh on a group of tables when a batch run has completed, or maybe at key points during the run. In situations like this, DBA’s don't always have the luxury of knowing exactly which tables will have stale statistics and which ones won't, so it can be very useful to have a method for gathering statistics on some tables, skipping the ones that don't have stale stats. Is there an easy way to do that? Of course, 'easy' is a subjective term, but I think you will like the technique I've outlined below. If you are using Oracle Database 12c or later, you might assume that something like this will work: exec dbms_stats.gather_table_stats(user, 't1',options=>'gather auto') exec dbms_stats.gather_table_stats(user, 't2',options=>'gather auto') exec dbms_stats.gather_table_stats(user, 't3',options=>'gather auto') It doesn't work because 'GATHER  AUTO' in the context of GATHER_TABLE_STATS is for creating additional statistics (such as histograms) after a bulk load operation has taken advantage of online statistics gathering. When I was talking to the folks in the Real World Performance Group a neat solution was presented. They told me that they use GATHER_SCHEMA_STATS with 'GATHER AUTO' to implement something like this: DECLARE    filter_lst  dbms_stats.objecttab := dbms_stats.objecttab(); BEGIN    filter_lst.extend(2);    filter_lst(1).ownname := 'hr';    filter_lst(1).objname := 'employees';    filter_lst(2).ownname := 'hr';    filter_lst(2).objname := 'departments';    dbms_stats.gather_schema_stats(            ownname        => 'hr',            obj_filter_list=> filter_lst,            options        => 'gather auto'); END; / The 'GATHER AUTO' option in the context of GATHER_SCHEMA_STATS tells the database to skip tables and partitions that do not have stale statistics. If you want to try it out for yourself, I've uploaded a couple of examples to GitHub. There is a test case for both partitioned and non-partitioned tables. In 2012, Maria Colgan wrote about using obj_filter_list in the context of concurrent stats gathering. You can check out that blog post here. It includes some additional information on the specifics of obj_filter_list too. As always, if you have any corrections or suggestions for improvements, leave a comment below.

Oracle recommends that you use the Automatic Statistics Gathering Jobto maintain optimizer statistics, but sometimes you might want to gather statistics on a select number of tables rather than...

The Oracle 12.1.0.2 October 2017 BP and the Adaptive Optimizer

A comment on my previous post indicated that it might help if I clarify the behavior of the adaptive optimizer settings in the Oracle 12.1.0.2 bundle patches (from October 2017 onwards). MOS Note 2312911.1 and Mike Dietrich's blog tell you what you need to know to get the changes up and running. Nevertheless, from the questions I receive, I know that some of you like to dig a bit deeper and take a closer look at what's going on under the hood. If that's you, then I'm going to go over what you can expect to see. If you just want to know what optimizer adaptive parameters to set (with or without the new optimizer behavior enabled) then you can jump to the bullet list at the end of this post.  The Oracle 12.1.0.2 October 2017 and January 2017 BPs include some changes to the adaptive optimizer. It is worth knowing that Oracle has a strict policy precluding inclusion of most optimizer fixes from BPs. The policy allows you to apply BPs without changing SQL execution plans, and it is why the adaptive changes are not active by default (although there are exceptions covered shortly). Check out MOS Note: 1962125.1 – Overview of Database Patch Delivery Methods if you want to know more about patching policies. Note that the policy for patch inclusion and activation for Windows is different, so I'm covering activation for non-Windows systems in this post. The difference is covered in the MOS note and another blog from Mike. Here I'm going to concentrate on patch for bug 22652097. It is included in the BP and allows you to control adaptive features in the same way as Oracle Database 12c Release 2 (by adding the database parameters optimizer_adaptive_plans and optimizer_adaptive_statistics). The adaptive optimizer changes included in in the BP are not active by default unless you've previously applied a matching patch. For example, if you applied 22652097 before the October BP, the BP will detect this and keep the changes active. In this way, you can continue to use the new parameters post-BP without interruption. From this point on, I will assume that patch 22652097 was not applied before the BP. This means that the adaptive changes will not be active by default. Note also that I'm going to ignore the optimizer_adaptive_reporting_only parameter here because it is not relevant to this discussion. Once the BP is applied, the new adaptive optimizer behavior can be enabled using ‘fix control’ 26664361 (Mike Dietrich’s blog post shows you how). The first thing I want to clarify is that you will see two new database parameters even if you have not enabled the adaptive feature changes with fix control 26664361: SQL> show parameter optimizer_adaptive NAME                                 TYPE        VALUE ------------------------------------ ----------- ------- ... optimizer_adaptive_plans             boolean     TRUE optimizer_adaptive_statistics        boolean     TRUE ... These parameters are intended to replace optimizer_adaptive_features but only if you choose to use them. They will be TRUE or FALSE depending on whether optimizer_adaptive_features is TRUE or FALSE, but they are benign if you don’t enable them and don't set them explicitly. In other words, it is safe to apply the BP, leave the fix control unset and then continue to use optimizer_adaptive_features (OAF) in the same was as before. The behavior remains the same and we obey the BP policy. If you set fix control 26664361 to activate patch 22652097, the database will acquire the Oracle Database 12c Release 2 defaults. This is what you will see if you have not set any optimizer_adaptive_* parameters: SQL> show parameter optimizer_adaptive NAME                                 TYPE        VALUE ------------------------------------ ----------- ------- ... optimizer_adaptive_plans             boolean     TRUE optimizer_adaptive_statistics        boolean     FALSE ... You should be aware of a potentially confusing issue here. The parameter values will be reported incorrectly if you start a database instance and use the same session to view them. Here is an example of what you will see if you have not set any optimizer_adaptive_* parameters: SQL> startup ...don't disconnect here... SQL> show parameter optimizer_adaptive NAME                              TYPE        VALUE --------------------------------- ----------- ------- ... optimizer_adaptive_plans          boolean     TRUE optimizer_adaptive_statistics     boolean     TRUE [instead of FALSE] ... The correct values are shown if you exit the startup session, reconnect, and then use show parameter. What happens if you have not activated the adaptive changes with _fix_control but go ahead and set the new parameters anyway? The database will not stop you from doing this but it is not recommended. Setting the new parameters explicitly will override both optimizer_adaptive_features and the fix control settings. For example, if optimizer_adaptive_feature is FALSE and optimizer_adaptive_statistics is set to TRUE, the database will assume that you want to use adaptive statistics even if you have not set the fix control. Combining the old and the new approach is confusing and it is even possible to configure illogical combinations of functionality. So, assuming that you did not install 22652097 before the BP, this is the advice: If you have not set fix control 26664361 to enable patch for bug 22652097 then you should continue to control adaptive features using optimizer_adaptive_features. If you want to use the adaptive feature settings available in Oracle Database 12c Release 2, set _fix_control as per MOS Note 2312911.1 or Mike’s blog. Then, if you want to use the recommended defaults, you should set optimizer_adaptive_statistics=FALSE [**see below] and leave optimizer_adaptive_plans unset (or set to TRUE). If you do not want to use the recommended defaults, change the adaptive feature settings using optimizer_adaptive_plans and/or optimizer_adaptive_statistics. If you did install 22652097 before the BP, then I’m assuming here that you will be using the new parameters already. **Note - For both the October and January BP the intention was to allow you to leave optimizer_adaptive_statistics unset and it would take the default value FALSE. However, leaving this parameter unset enables adaptive statistics in some deployments even though the parameter will show the default value of FALSE. This issue only occurs if the _fix_control is set to enable the new 12.2 parameter settings. This is unpublished bug 27626925 and our intention is to include it in BPs ASAP. You can avoid the issue by explicitly setting optimizer_adaptive_statistics in the pfile/spfile rather than leaving it to default.  

A comment on my previous post indicated that it might help if I clarify the behavior of the adaptive optimizer settings in the Oracle 12.1.0.2 bundle patches (from October 2017 onwards). MOS Note...

What Adaptive Feature Settings Should You Use In Oracle Database 12c?

Happy new year everyone! Let's start the year with a simple question: what adaptive feature settings does Oracle recommend for the optimizer in Oracle Database 12c? I've covered this a couple times in the past (links below) but I've tended to dive into the details and unintentionally buried the short answer. Here's the Short Answer Oracle Database 12c Release 2 For the majority of systems: use the defaults. There is no need to specifically set optimizer adaptive feature parameters. By default, adaptive statistics are disabled and adaptive plans are enabled. If you are in a position to test the benefits of adaptive statistics on your system, then it's something to consider using of course. Oracle Database 12c Release 1 If you have a production environment that's not encountering performance issues related to the adaptive optimizer then it's more than reasonable to keep things as they are. In general, if you want to change adaptive feature settings in a production environment (or enable adaptive feature patches included in a bundle patch), then you should test before implementation. If you are currently testing (or about to test) an upgrade to Oracle Database 12c Release 1, then it's a good idea to apply the latest bundle patch and specifically enable the adaptive features patches. See Mike Dietrich's blog post on this topic. Here's the Long Answer Not everyone likes a short answer; the real-world can be very complicated. To that end, I have blogged in more detail about upgrades here and there are some specifics on the adaptive feature changes in Oracle Database 12c Release 2 here. There's also Mike's blog post as I mentioned earlier. It's ironic that in this post, the short answer looks longer than the long answer. Oh well.

Happy new year everyone! Let's start the year with a simple question: what adaptive feature settings does Oracle recommend for the optimizer in Oracle Database 12c? I've covered this a couple times in...

How are dynamic sampling results managed for SQL plan directives?

At DOAG this year, the question of how Oracle Database 12c Release 2 manages dynamic statistics for SQL plan directives came up a number of times. SQL plan directives tell the Oracle Optimizer when to use dynamic sampling (DS) to improve cardinality estimates. When queries are parsed and relevant SQL plan directives exist, DS queries sample data to obtain better cardinality estimates (assuming that optimizer_adaptive_statsitics is TRUE). In Oracle Database 12c Release 1, DS queries included a RESULT_CACHE hint to store the results in the server result cache to reduce the overhead of continuously executing them against application data. This didn’t always work well for systems with very high hard-parse rates and significant data-churn, so in Oracle Database 12c Release 2 the result cache is no longer used for this purpose. Instead, DS query results are stored in the data dictionary. They can be seen as follows: SELECT * FROM DBA_SQL_PLAN_DIRECTIVES WHERE TYPE = 'DYNAMIC_SAMPLING_RESULT'; In Oracle Database 12c Release 2, the optimizer checks to see whether the result of a particular DS query is available in the data dictionary before it attempts to execute it and fetch the results. Executing DS queries consumes system resources, even though the database uses sampling to reduce the overhead. 'Caching' reduces the overhead further. The optimizer needs up-to-date DS query results to derive good cardinality estimates, but since the DS queries sample application data that's subject to change, the obvious question is: “how do we make sure that the results held in the data dictionary are up-to-date?” Prior to Oracle Database 12c Release 2, the result cache made things easy for the optimizer because staleness is managed by the database. The optimizer itself didn’t need to do anything special to keep the DS results up-to-date. Now that the results are cached in the data dictionary, the optimizer has to take care of things itself. How does it do it? It’s pretty intuitive. The optimizer updates (at query parse time) the dynamic sampling result that's stored in the data dictionary if: Optimizer statistics for any of the base tables in the dynamic sampling query have been re-gathered since the dynamic sampling result was created or modified. The number of rows of the tables in the directive is changed above a threshold since the directive was created or modified. The threshold we use is the same we use in DBMS_STATS, which is based on the STALE_PERCENT preference. When a dynamic sampling result is stored in DBA_SQL_PLAN_DIRECTIVES, the CREATED column will record the creation time and, initially, LAST_MODIFIED will be NULL. When a new dynamic sampling result is obtained by a DS query, the LAST_MODIFIED column is updated accordingly.

At DOAG this year, the question of how Oracle Database 12c Release 2 manages dynamic statistics for SQL plan directives came up a number of times. SQL plan directives tell the Oracle Optimizer when...

Adaptive Cursor Sharing (ACS) and Bind Sensitivity

There’s a common misconception that queries with equality predicates will only become bind sensitive if there’s a histogram on relevant table columns. For example, it’s often thought that the following query will not be marked bind sensitive if SK_ID lacks a histogram: SELECT sum(amount) FROM   sales WHERE  sk_id = :b1; This looks like a very reasonable assumption. After all, if there’s no skew in SK_ID values then where’s the benefit of knowing the value of :b1? Wouldn’t we expect the cardinality estimate to be the same no matter what value :b1 takes? I’ll come to that shortly. Let’s first examine where the misconception came from. I think it probably originates from the Oracle Database 11g documentation, so it's unfortunate that the misconception is probably held by the people that read the documentation most carefully. The 11g Performance Tuning Guide mentions the following prerequisite for marking a query bind sensitive: “A histogram exists on the column containing the bind value”. This statement was carried forward into the Oracle Database 12c documentation but was subsequently corrected. The reason it was there in the first place is interesting, and it required some detective work to figure out because it all happened a century ago in I.T. years (10 human-years). I’m pretty sure that the documentation represents the behavior that was originally specified, but refinements to adaptive cursor sharing during development changed things. For example, Oracle Database 11.1.0.7 will mark the example query (above) bind sensitive, even if there’s no histogram. Those of you with version museums can check out whether or not it works like this in the base release (so go ahead and leave a comment if you’ve tried it out). Why does it work like this? In the following example (where we use range predicates) it’s pretty easy to see why it’s useful to peek at the bind variable values even if there’s no skew or histogram: SELECT SUM(amount) FROM   sales WHERE  sk_id > :b1 AND    sk_id < :b2; The number of rows matched is highly sensitive to the values of :b1 and :b2 and, in an extreme case, if :b1 is greater than :b2, we’ll match zero rows. Back to the original query: SELECT sum(amount) FROM   sales WHERE  sk_id = :b1; Assuming that there’s no histogram on SK_ID and no skew, why would bind peeking be useful? We might expect the cardinality estimate to be about the same no matter what value :b1 takes. Not quite. Consider the case where the minimum value of SK_ID is 1 and the maximum value is 10,000. If :b1 is 20,000, then the predicate will be out of range and we will match zero rows. It’s worth accounting for this possibility. In summary: Queries with bind variables in predicates will be marked bind sensitive. ACS might mark a query bind aware if bind variable values significantly affect the number of rows processed. This may result in multiple SQL execution plans for a particular SQL statement. Histograms are not a prerequisite for ACS. Always bear in mind that if there is data skew, ACS needs to be able to discern that different bind variable values have different selectivity. For this reason, the use of histograms is highly recommended. Fortunately, they are managed automatically by the database if statistics are gathered using default settings.

There’s a common misconception that queries with equality predicates will only become bind sensitive if there’s a histogram on relevant table columns. For example, it’s often thought that the...

12c

How does AUTO_SAMPLE_SIZE work in Oracle Database 12c?

The ESTIMATE_PERCENT parameter in DBMS_STATS.GATHER_*_STATS procedures controls the percentage of rows to sample when gathering optimizer statistics. What percentage of rows should you sample to achieve accurate statistics? 100% will ensure that statistics are accurate, but it could take a long time. A 1% sample will finish much more quickly but it could result in poor statistics. It’s not an easy question to answer, which is why it is best practice to use the default: AUTO_SAMPLE_SIZE. In this post, I’ll cover how the AUTO_SAMPLE_SIZE algorithm works in Oracle Database 12c and how it affects the accuracy of the statistics being gathered. If you want to learn more of the history prior to Oracle Database 12c, then this post on Oracle Database 11g is a good place to look. I will indicate below where there are differences between Oracle Database 11g and Oracle Database 12c. It’s not always appreciated that (in general) a large proportion of the time and resource cost required to gather statistics is associated with evaluating the number of distinct values (NDVs) for each column. Calculating NDV using an exact algorithm can be expensive because the database needs to record and sort column values while statistics are being gathered. If the NDV is high, retaining and sorting column values can become resource-intensive, especially if the sort spills to TEMP. Auto sample size instead uses an approximate (but accurate) algorithm to calculate NDV that avoids the need to sort column data or spill to TEMP. In return for this saving, the database can afford to use a full table scan to ensure that the other basic column statistics are accurate. Similarly, it can be resource-intensive to generate histograms but the Oracle Database mitigates this cost as follows: Frequency and top frequency histograms are created as the database gathers basic column statistics (such as NDV, MIN, MAX) from the full table scan mentioned above. This is new to Oracle Database 12c. If a frequency or top frequency histogram is not feasible, then the database will collect hybrid histograms using a sample of the column data. Top frequency is only feasible when the top 254 values constitute more than 99% of the entire non null column values and frequency is only feasible if NDV is 254 or less. When the user has specified 'SIZE AUTO' in the METHOD_OPT clause for automatic histogram creation, the Oracle Database chooses which columns to consider for histogram creation based column usage data that’s gathered by the optimizer. Columns that are not used in WHERE-clause predicates or joins are not considered for histograms. Both Oracle Database 11g and Oracle Database 12c use the following query to gather basic column statistics (it is a simplified here for illustrative purposes). SELECT COUNT(c1), MIN(c1), MAX(c1) FROM  t; The query reads the table (T) and scans all rows (rather than using a sample). The database also needs to calculate the number of distinct values (NDV) for each column but the query does not use COUNT(DISTINCT c1) and so on, but instead, during execution,  a special statistics gathering row source is injected into the query. The statistics gathering row source uses a one-pass, hash-based distinct algorithm to gather NDV. The algorithm requires a full scan of the data, uses a bounded amount of memory and yields a highly accurate NDV that is nearly identical to a 100 percent sampling (a fact that can be proven mathematically). The statistics gathering row source also gathers the number of rows, number of nulls and average column length. Since a full scan is used, the number of rows, average column length, minimum and maximum values are 100% accurate. Effect of auto sample size on histogram gathering Hybrid histogram gathering is decoupled from basic column statistics gathering and uses a sample of column values. This technique was used in Oracle Database 11g to build height-balanced histograms. More information on this can be found in this blog post. Oracle Database 12c replaced height-balanced histograms with hybrid histograms. Effect of auto sample size on index stats gathering AUTO_SAMPLE_SIZE affects how index statistics are gathered. Index statistics gathering is sample-based and it can potentially go through several iterations if the sample contains too few blocks or the sample size was too small to properly gather number of distinct keys (NDKs). The algorithm has not changed since Oracle Database 11g, so I’ve left it to the previous blog to go more detail. There one other thing to note: At the time of writing, there are some cases where index sampling can lead to NDV mis-estimates for composite indexes. The best work-around is to create a column group on the relevant columns and use gather_table_stats. Alternatively, there is a one-off fix - 27268249. This patch changes the way NDV is calculated for indexes on large tables (and no column group is required). It is available for 12.2.0.1 at the moment, but note that it cannot be backported. As you might guess, it's significantly slower than index block sampling, but it's still very fast. At the time of writing, if you find a case where index NDV is causing an issue with a query plan, then the recommended approach is to add a column group rather than attempting to apply this patch. Summary: Note that top frequency and hybrid histograms are new to Oracle Database 12c. Oracle Database 11g had frequency and height-balanced histograms only. Hybrid histograms replaced height-balanced histograms. The auto sample size algorithm uses a full table scan (a 100% sample) to gather basic column statistics. The cost of a full table scan (verses row sampling) is mitigated by the approximate NDV algorithm, which eliminates the need to sort column data. The approximate NDV gathered by AUTO_SAMPLE_SIZE is close to the accuracy of a 100% sample. Other basic column statistics, such as the number of nulls, average column length, minimal and maximal values have an accuracy equivalent to 100% sampling. Frequency and top frequency histograms are created using a 100%* sample of column values and are created when basic column statistics are gathered. This is different to Oracle Database 11g, which decoupled frequency histogram creation from basic column statistics gathering (and used a sample of column values). Hybrid histograms are created using a sample of column values. Internally, this step is decoupled from basic column statistics gathering. Index statistics are gathered using a sample of column values. The sample size is determined automatically. *There is an exception to case 5, above. Frequency histograms are created using a sample if OPTIONS=>'GATHER AUTO' is used after a bulk load where statistics have been gathered using online statistics gathering.

The ESTIMATE_PERCENT parameter in DBMS_STATS.GATHER_*_STATS procedures controls the percentage of rows to sample when gathering optimizer statistics. What percentage of rows should you sample to...

12c

Using SQL Plan Management to Control SQL Execution Plans

SQL plan management (SPM) is designed to prevent performance regression for all SQL statements used by an application (assuming that the SQL statements in question are used more than once). SPM uses SQL plan baselines that are associated with individual SQL statements to control what execution plans they are permitted to use. It’s a simple but powerful idea that opens the door to the possibility of using SQL plan baselines in a more selective and reactive way: to influence the SQL execution plans of individual queries without having to modify application queries or change the application itself. The technique is covered in blogs and in the SPM white paper, but it deserves a post of its own along with a full example.  If you want to avoid reading all the background below and jump to a real example, I've added some new scripts to GitHub here. They are similar to SQL profile examples published by Oracle Support some years ago and they demonstrate how you can retrieve plans from SQL tuning sets and AWR. They will work in Oracle Database 12c Release 2 onwards (and even on Oracle Database 18c Standard Edition). The other code example below will work in Oracle Database 11g onwards. Consider the scenario where you have a SQL statement used by an application that’s got a sub-optimal plan and you need to do something about it. For the sake of argument, let’s assume that you know that there’s a hint you can use to achieve a better plan. I’m going to assume from now on that you want to apply a hint but the application code cannot be changed in any way. Take a look at the following SQL execution plan. It’s an application query that filters SALES rows using an index: SQL> SELECT *   2  FROM table(DBMS_XPLAN.DISPLAY_CURSOR(sql_id=>'f23qunrkxdgdt'));   PLAN_TABLE_OUTPUT ----------------- SQL_ID f23qunrkxdgdt, child number 2 ------------------------------------- select sum(num) from sales where id < :idv Plan hash value: 2327341677 ------------------------------------------------------- | Id | Operation | Name | ------------------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | SORT AGGREGATE | | | 2 | TABLE ACCESS BY INDEX ROWID BATCHED| SALES | |* 3 | INDEX RANGE SCAN | SALESI | ------------------------------------------------------- What if this plan isn’t optimal? It is optimal in this case, but for the sake of example I’m going to assume that I want the Oracle Optimizer to pick a full table scan instead.  All we need is the FULL hint: PLAN_TABLE_OUTPUT ----------------- SQL_ID  82x4tj3z2vg23, child number 0 ------------------------------------- select /*+ FULL(sales) */ sum(num) from sales where id < :idv   Plan hash value: 1047182207 ------------------------------------ | Id  | Operation          | Name  | ------------------------------------ |   0 | SELECT STATEMENT   |       | |   1 |  SORT AGGREGATE    |       | |*  2 |   TABLE ACCESS FULL| SALES | ------------------------------------ The hinted test query gives us an example of the TABLE ACCESS FULL plan we want to use. At this point we are in a position to use SPM to associate our preferred plan with the application query.  Here are the steps: Step zero acknowledges that we have a SQL statement with a plan we want to change. The remaining steps are: Create an initial disabled SQL plan baselines for the application query. I’m using the term “one or more” because a query might have more than one SQL execution plan.  We will create a SQL plan baseline for each plan used by the SQL statement, but only one is actually needed. Execute (or parse) a hinted test query to generate the preferred plan. Load the preferred plan into a SQL plan baseline created in step two (this time with enabled=’YES’). The hinted statement’s text is of course different to the application statement’s text, but that’s just fine: we’re simply using the plan and not the SQL text. Our application query will use the plan as long as it can reproduce it and it’s valid. What do I mean by that? Here’s an example: Imagine a CUSTOMERS query that happens to perform a FULL scan: select sum(num) from CUSTOMERS; If we use the plan for this query in an attempt to influence our SALES query, it’s not going to work.  We would be asking the SQL plan baseline to influence the SALES plan like this: select /*+ FULL(customers) */ sum(num) from SALES where id < :idv Under the covers, SQL plan baselines use a complete set of hints to control execution plans. So, for our SALES query, FULL(customers) is not a valid hint and is not going to yield the desired result! If you’ve got some time on your hands, you can try loading a plan for a CUSTOMERS query into a SQL plan baseline associated with a SALES query. There won’t be an error message, but you won’t be able to reproduce the plan you want either (unless it’s just by luck). Worked Example I’ve have uploaded an example procedure and a fully worked example to GitHub so you can see how the steps above can be implemented. Based on a comment below, I added this procedure too. It loads all existing plans in a disabled state and adds a new enabled SQL plan baseline (rather than replacing an existing one). You should adapt the procedures to meet your specific requirements. For example, you might not want to drop pre-existing SQL plan baselines. I’ll be using the SQL IDs and plan hash value that I highlighted in bold, above. Here’s how to use my example procedures set_my_plan and add_my_plan (see proc.sql and proc2.sql in GitHub): Executing the Procedures Note that “SPB” stands for SQL plan baseline: SQL> set serveroutput on SQL> set linesize 200 SQL> exec set_my_plan('f23qunrkxdgdt','82x4tj3z2vg23',1047182207) No existing SQL plan baselines to drop Created 1 disabled SPBs for SQLID f23qunrkxdgdt SPB Detail: SQL_PLAN_3yr9p97b3j5gbfaa7aab3 handle SQL_3f5d3549d63895eb Associating plan SQLID/PHV 82x4tj3z2vg23/1047182207 with SPB SQL Handle SQL_3f5d3549d63895eb Enabled SPB - Name: SQL_PLAN_3yr9p97b3j5gb35032dee SQL handle: SQL_3f5d3549d63895eb SQL> set serveroutput off Here’s the explain plan for the application query after the procedure was executed. The non-hinted SQL statement now uses the FULL scan and you can see from the Note section that the SQL plan baseline is being used. SQL> SELECT *   2  FROM table(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'LAST'));   PLAN_TABLE_OUTPUT ----------------- SQL_ID  f23qunrkxdgdt, child number 0 ------------------------------------- select sum(num) from sales where id < :idv Plan hash value: 1047182207 ------------------------------------ | Id  | Operation          | Name  | ------------------------------------ |   0 | SELECT STATEMENT   |       | |   1 |  SORT AGGREGATE    |       | |*  2 |   TABLE ACCESS FULL| SALES | ------------------------------------   Predicate Information (identified by operation id): ---------------------------------------------------    2 - filter("ID"<:IDV) Note -----    - SQL plan baseline SQL_PLAN_3yr9p97b3j5gb35032dee used for this statement Usage Notes SPM matches a SQL statement using a signature, not a SQL ID. The signature is generated from the normalized SQL text. For this reason, if there are multiple SQL IDs that have the same signature then they will all share the same SQL plan baseline. For example, the following queries have the same signature: select sum(num) from sales where id < :idv SELECT SUM(num) FROM sales WHERE id < :idv select      sum(num) from sales where id < :idv The example procedures (above) will drop any pre-existing SQL plan baselines for SQL statements that have the same signature as the application SQL statement. The newer scripts will generate an error if there are existing SQL plan baselines unless you use the FORCE parameter. As always, comments and corrections are welcome. Just post a comment at the bottom.

SQL plan management (SPM) is designed to prevent performance regression for all SQL statements used by an application (assuming that the SQL statements in question are used more than once). SPM uses...

Expression Tracking

A new feature called expression tracking was added in Oracle Database 12c Release 2. It allows the Oracle Optimizer keep track of what expressions are used in SQL statements and how often they are used. Why does the optimizer do that? I’ll cover that at the end of the post, but before I do, let’s take a look a quick look under the hood. Consider the following query: select value_of_sale - cost_of_sale from   sales where  transaction_id = 10; The query evaluates an expression: “value_of_sale - cost_of_sale” and it accesses a number of columns. We can take a look at what’s tracked using a data dictionary view: exec dbms_stats.flush_database_monitoring_info  -- do this or -- wait 15 minutes! select expression_text,        fixed_cost,        evaluation_count,        snapshot from   user_expression_statistics where  table_name = 'SALES' order by snapshot EXPRESSION_TEXT                          FIXED_COST EVALUATION_COUNT SNAPSHOT ---------------------------------------- ---------- ---------------- ---------- "VALUE_OF_SALE"                          1.3170E-08                1 LATEST "COST_OF_SALE"                           1.3170E-08                1 LATEST "TRANSACTION_ID"                         1.3170E-08                1 LATEST "VALUE_OF_SALE"-"COST_OF_SALE"           6.5849E-07                1 LATEST This expression data is persisted to the data dictionary every 15 minutes, so it will be available post shutdown/startup.  This is done without any need to configure or gather anything, but you can pre-empt persistence by executing flush_database_monitoring_info (if you’re running a test and impatient like me). Notice that there are evaluation_count and fixed_cost columns. The fixed cost is an estimate of the computational cost per evaluation. The evaluation count is self explanitory but it's also an estimate. The snapshot column tells you if the value you’re seeing is from the last 24 hours (LATEST), or if it's a long-term cumulative figure (CUMULATIVE). A few more things to bear in mind: If you take at the dictionary view definition, you will see that there is also a column called dynamic_cost. This is currently not implemented so expect to see NULL or zero in this column. The optimizer tracks expressions that include columns from a single table only. For example, “t1.col1 + t1.col2”, but not “t1.col1 + t2.col2”. Currently, there are some limitations on inline select statements such as: select a.sal, (select sal+9999 from empt b where b.empno = a.empno) from ... What is this feature used for? You can imagine that it is useful for optimization in general, but right now it has a single “client”: Oracle Database In-Memory, where it’s referred to as the Expression Statistics Store (ESS). It’s used to support In-Memory Expressions, so if you've not heard about that already then you should definitely go and check it out.

A new feature called expression tracking was added in Oracle Database 12c Release 2. It allows the Oracle Optimizer keep track of what expressions are used in SQL statements and how often they are...

Are my SQL plan baselines being re-enabled?

A number of times people have said to me something along the lines of, “Our SQL plan baselines keep getting re-enabled. Why is that?”. For some time I’ve been puzzled by this question because the database doesn't automatically re-enable SQL plan baselines once they've been disabled. Nevertheless, it gradually dawned on me that there is probably a misunderstanding behind it. That's what this post is about. Consider the following SQL execution plan; it has an associated SQL plan baseline: Execution Plan #1 select /* SPMTEST */ sum(n) from sales where region_id = 10 ------------------------------------ | Id | Operation | Name | ------------------------------------ | 0 | SELECT STATEMENT | | | 1 | SORT AGGREGATE | | | 2 | TABLE ACCESS FULL| SALES | ------------------------------------ Note ----- - SQL plan baseline SQL_PLAN_b8dhx84x2wh9n35032dee used for this statement The "Note" section above shows that we are using a SQL plan baselines. Let’s disable it: ret := dbms_spm.alter_sql_plan_baseline (plan_name=>'SQL_PLAN_b8dhx84x2wh9n35032dee', attribute_name=>'ENABLED', attribute_value=>'NO') ; You can see that our SQL plan baseline has ENABLED=NO: SQL> SELECT plan_name, sql_text, enabled, accepted 2 FROM dba_sql_plan_baselines; PLAN_NAME SQL_TEXT ENABLED ACCEPTED ------------------------------ --------------------------------- -------- -------- SQL_PLAN_b8dhx84x2wh9n35032dee select /* SPMTEST */ sum(n) NO YES   from sales where region_id = 10 The SQL execution plan for this statement is no longer constrained. The data in the table will change over time and at some point we might get a new plan. For the purposes of this example, imagine that this business sells to more and more regions and eventually the optimizer establishes that the index on region_id is a favorable access path: Execution Plan #2 select /* SPMTEST */ sum(n) from sales where region_id = 10 --------------------------------------------------------- | Id | Operation | Name | --------------------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | SORT AGGREGATE | | | 2 | TABLE ACCESS BY INDEX ROWID BATCHED| SALES | | 3 | INDEX RANGE SCAN | SALES_IX | --------------------------------------------------------- Our SQL plan baseline is disabled, so the optimizer is not constrained; it is free to use this new plan. There is no “Note” section either so we can be sure that we are not using a SQL plan baseline. At this point, everything is as you might expect: the SQL plan baseline is disabled so the execution plan is not constrained in any way. Nevertheless, a few days later the DBA notices this: Execution Plan #3 --------------------------------------------------------- | Id | Operation | Name | --------------------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | SORT AGGREGATE | | | 2 | TABLE ACCESS BY INDEX ROWID BATCHED| SALES | | 3 | INDEX RANGE SCAN | SALES_IX | --------------------------------------------------------- Note ----- - SQL plan baseline SQL_PLAN_b8dhx84x2wh9n4f85717c used for this statement At first glance you can be forgiven for thinking that the SQL plan baseline has been “switched back on”. Not quite – take a look at the SQL plan baseline name – it is different. So what happened? Firstly, bear in mind that when you disable a SQL plan baseline you disable an individual SQL plan baseline and not the ability of a particular SQL statement to acquire new SQL plan baselines. When the optimizer sees a new SQL execution plan for a SQL statement it will create a new SQL plan baseline if a SQL plan baseline for the statement already exists. It does this even if auto capture is disabled (optimizer_capture_sql_plan_baselines=FALSE) and even if existing SQL plan baselines associated with the statement are disabled. If we had taken a look at SQL plan baselines the moment after the new “index plan” was chosen by the optimizer (execution plan #2), this is what you would have seen: SQL> SELECT plan_name, sql_text, enabled, accepted 2 FROM dba_sql_plan_baselines; PLAN_NAME SQL_TEXT ENABLED ACCEPTED ------------------------------ --------------------------------- -------- -------- SQL_PLAN_b8dhx84x2wh9n35032dee select /* SPMTEST */ sum(n) NO YES   from sales where region_id = 10 SQL_PLAN_b8dhx84x2wh9n4f85717c select /* SPMTEST */ sum(n) YES NO   from sales where region_id = 10 When the optimizer generated the new SQL execution plan it created a new SQL plan baseline for it (SQL_PLAN_b8dhx84x2wh9n4f85717c). It does this even though our first SQL plan baseline (SQL_PLAN_b8dhx84x2wh9n35032dee) was disabled (ENABLED=NO). The new SQL plan baseline is enabled but it has not yet been verified and accepted by SQL plan management evolution, so we see that ACCEPTED=NO. At this point, our SQL statement will not use either of our SQL plan baselines and there is no “Note” section shown in the plan (that’s what we saw above with execution plan number 2). If the auto evolution task is enabled in your database (which is the default in Oracle Database 12c) or if you run evolution manually, then the new SQL plan baseline is subject to verification and evolution because it is not disabled. When a SQL plan baseline is verified, evolution might accept it. This is what you will see after evolution: PLAN_NAME SQL_TEXT ENABLED ACCEPTED ------------------------------ --------------------------------- -------- -------- SQL_PLAN_b8dhx84x2wh9n35032dee select /* SPMTEST */ sum(n) NO YES   from sales where region_id = 10 SQL_PLAN_b8dhx84x2wh9n4f85717c select /* SPMTEST */ sum(n) YES YES   from sales where region_id = 10 We now have a SQL plan baseline that is enabled and accepted. The first one remains disabled but the optimizer will be able to use the new SQL plan baseline. That’s what we see in execution plan number 3, above. Summary When you disable a SQL plan baseline you disable an individual SQL plan baseline but not the mechanics of plan capture and evolution. This behavior is benign and correct because, after all, the new plan was verified before it was accepted. However, I get the impression DBAs sometimes disable a SQL plan baseline with the expectation that they have somehow switched off SQL plan management for an individual SQL statement. As you’ve seen, that isn’t the case. If your intention is to do this, then I recommend that you drop the SQL plan baseline using DBMS_SPM.DROP_SQL_PLAN_BASELINE. If you want to keep a “backup copy”, then use DBMS_SPM.PACK_STGTAB_BASELINE to pack the relevant baseline into a staging table before you drop it. You can always put it back using DBMS_SPM.UNPACK_STGTAB_BASELINE. There's more on this too in My Oracle Support, Doc ID 1520337.1.

A number of times people have said to me something along the lines of, “Our SQL plan baselines keep getting re-enabled. Why is that?”. For some time I’ve been puzzled by this question because the...

Adding and Disabling Hints Using SQL Patch

If you’re a DBA, it’s likely that you’ve encountered systems where a lot of SQL statements have been hinted almost as a matter of policy. Perhaps you’d like to figure out if these hints are actually helping. You might like to demonstrate to a development team that they should probably dial down their enthusiasm for micro-managing the Oracle Optimizer. Sometimes, you might want to apply a hints on-the-fly. A while ago, Maria Colgan wrote a couple of posts (here and here) on SQL Patch and how you can add hints to SQL in a packaged application. In other words you can apply hints to SQL statements without having to change any application code. From Oracle Database 12c Release 2, the interface to SQL Patch is greatly improved and easier to use. In particular, it’s now part of the public API and the hint text is a CLOB (because VARCHAR2 can be too limiting if you want to specify a complete query outline). The API includes a new SQL_ID parameter too. Check out the documentation for the details, but here’s an example of the new look: patch_name := dbms_sqldiag.create_sql_patch( sql_id=>'amz7zfdk33czb', hint_text=>' FULL(@"SEL$1" "T1"@"SEL$1")'); For the rest of this post I’ll use the new DBMS_SQLDIAG API, but everything here will work in Oracle Database 11g and Oracle Database 12c Release 1 if you use the internal DBMS_SQLDIAG API documented in Maria’s post. Take a look at the following example. The query really should use indexes I1 and I2 on T1.V and T2.V, but I’ve hinted it to use a FULL scan or T2. select /* QUERY2 */ /*+ FULL(t2) */ sum(t1.id) from   t1,t2 where  t1.id = t2.id and    t1.v = 1000 and    t2.v = 1000; ---------------------------------------------- | Id  | Operation                     | Name | ---------------------------------------------- |   0 | SELECT STATEMENT              |      | |   1 |  SORT AGGREGATE               |      | |   2 |   NESTED LOOPS                |      | |   3 |    NESTED LOOPS               |      | |*  4 |     TABLE ACCESS FULL         | T2   | |*  5 |     INDEX RANGE SCAN          | I1   | |*  6 |    TABLE ACCESS BY INDEX ROWID| T1   | ---------------------------------------------- If you create a SQL patch with an additional hint for a FULL scan on T1 (using the create_sql_patch example above) the new plan will include the FULL hints for both T1 and T2. In other words, the new hint has been merged with the preexisting hint: SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'OUTLINE BASIC NOTE')); ------------------------------------ | Id  | Operation           | Name | ------------------------------------ |   0 | SELECT STATEMENT    |      | |   1 |  SORT AGGREGATE     |      | |*  2 |   HASH JOIN         |      | |*  3 |    TABLE ACCESS FULL| T1   | |*  4 |    TABLE ACCESS FULL| T2   | ------------------------------------ Outline Data -------------   /*+       BEGIN_OUTLINE_DATA       IGNORE_OPTIM_EMBEDDED_HINTS       OPTIMIZER_FEATURES_ENABLE('12.2.0.1')       DB_VERSION('12.2.0.1')       ALL_ROWS       OUTLINE_LEAF(@"SEL$1")       FULL(@"SEL$1" "T1"@"SEL$1")       FULL(@"SEL$1" "T2"@"SEL$1")       LEADING(@"SEL$1" "T1"@"SEL$1" "T2"@"SEL$1")       USE_HASH(@"SEL$1" "T2"@"SEL$1")       END_OUTLINE_DATA   * Predicate Information (identified by operation id): ---------------------------------------------------    2 - access("T1"."ID"="T2"."ID")    3 - filter("T1"."V"=1000)    4 - filter("T2"."V"=1000) Note -----    - SQL patch "PATCH1" used for this statement Hints specified using SQL patch must include query block names (such as @"SEL$1") . If you’ve not done this before it can appear quite difficult, but if you use example queries and display the SQL execution plans using the OUTLINE format (see above), it becomes pretty easy to figure out what you need to do. What if you want to disable hints for an individual SQL statement? Well, there’s a neat trick to do this. If you look closely, you’ll see it in the OUTLINE above. Let’s do it for our query: patch_name := dbms_sqldiag.create_sql_patch( sql_id=>'amz7zfdk33czb', hint_text=>' IGNORE_OPTIM_EMBEDDED_HINTS'); Here’s the new plan – it’s the one we should have in the first place: ------------------------------------------------------- | Id  | Operation                              | Name | ------------------------------------------------------- |   0 | SELECT STATEMENT                       |      | |   1 |  SORT AGGREGATE                        |      | |   2 |   NESTED LOOPS                         |      | |   3 |    NESTED LOOPS                        |      | |   4 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1   | |   5 |      INDEX RANGE SCAN                  | I1   | |   6 |     INDEX RANGE SCAN                   | I2   | |   7 |    TABLE ACCESS BY INDEX ROWID         | T2   | ------------------------------------------------------- You can even disable all existing hints and supply new ones. Here the FULL(t2) hint has been removed and I've added FULL(t1): ... hint_text=>'IGNORE_OPTIM_EMBEDDED_HINTS LEADING (t1 t2) FULL(@"SEL$1" "T1"@"SEL$1")'); ---------------------------------------------- | Id  | Operation                     | Name | ---------------------------------------------- |   0 | SELECT STATEMENT              |      | |   1 |  SORT AGGREGATE               |      | |   2 |   NESTED LOOPS                |      | |   3 |    NESTED LOOPS               |      | |   4 |     TABLE ACCESS FULL         | T1   | |   5 |     INDEX RANGE SCAN          | I2   | |   6 |    TABLE ACCESS BY INDEX ROWID| T2   | ---------------------------------------------- In common with using hints in general, if you supply invalid hints via SQL patch then the effects can be confusing if you fail to notice your error – so take care. Patch #17203284 is currently available for Oracle Database 12.1.0.2 if you want to make the new public interface available in this release.

If you’re a DBA, it’s likely that you’ve encountered systems where a lot of SQL statements have been hinted almost as a matter of policy. Perhaps you’d like to figure out if these hints are actually...

12c

Efficient Statistics Maintenance for Partitioned Tables Using Incremental Statistics – Part 3

Introduction This post covers how Oracle has improved incremental statistics for partitioned tables in Oracle Database 12c Release 2. If you’re not already familiar with synopses in the context of incremental statistics then take a look at Part 1 and Part 2 before you read on. Beginning with Oracle Database 12c Release 2 there's a new synopsis format that’s significantly more compact than the format used in earlier releases.  For brevity, I’ll refer to the Oracle Database 12c Release 2 format as new and the previous format as old. This enhancement is particularity relevant if you looked at incremental statistics maintenance in the past but decided not to use it because of the additional space usage in SYSAUX. Oracle Database 12c Release 2 resolves this issue and, in many cases, reduces the amount of system resource required to manage synopses. A New Type of Synopsis A synopsis is metadata stored in a couple of tables in the data dictionary (SYS.WRI$_OPTSTAT_SYNOPSIS$ and SYS.WRI$_OPTSTAT_SYNOPSIS_HEAD$). The metadata is managed automatically by the Oracle Database, so there’s generally no reason to be aware of the underlying implementation. Nevertheless, I’ll cover some of the details here because it will help you to see why the change is so significant. Synopses in Oracle Database 12c Release 2 are now created (by default) using an algorithm called HyperLogLog (HLL). This is a state-of-the-art algorithm that calculates the approximate number of distinct values for table column values. Even though it is an approximation, it is nevertheless very accurate with a typical error rate of less than 2%. Prior to Oracle Database 12c Release 2, synopses consisted of rows stored in the WRI$_OPTSTAT_SYNOPSIS$ table. The number of rows in this table can be very large if there are a large number of partitions and table columns, and if the columns contain a large number of distinct values. New-style synopses do not store rows in this table. Instead, some additional (and compact) hash data is stored in the WRI$_OPTSTAT_SYNOPSIS_HEAD$ table (in the SPARE2 column). How much smaller are the new synopses? Well, as you've probably guessed, it's the consultants' answer: "It Depends". As outlined above, the space used by synopses is a function of the number of partitions, columns and distinct values in columns. I can give you an example from one of our test systems containing an 8TB test table with 84-partitions. The total size of the old-style synopses was around 160MB and the new-style synopses totaled only 6MB. The example I created in GitHub (see below) was contrived to generate a particularly large amount of synopsis data. In the example I've given, new-style synopses take up virtually no space at all and the old-style synopses take up about 160MB (in SYSAUX). How to Use Them What do you need to do to use new-style synopses? Nothing! They are used by default if you choose to use incremental statistics and you don’t need to do anything different in Oracle Database 12c Release 2 compared to earlier releases. Controlling Synopses You can control the type of that will be created using a DBMS_STATS preference called APPROXIMATE_NDV_ALGORITHM. The default is REPEAT OR HYPERLOGLOG: if a table is using old-style synopses then it will continue to do so, and tables using new-style synopses will continue to use those! There’s no reason to use anything other than the default unless you are upgrading a database to Oracle Database 12c Release 2. If this is the case then you might want to consider the options. That's covered next. Upgrading If you are upgrading a database that’s using incremental statistics, then you will want to migrate to using the new-style synopses. How do you go about doing that? It’s worth noting from the outset that it’s possible to have partitions with old-style and new-style synopses in the same table. Also, the good news is that you can control when and how to transition from one type of synopses to the other. There is a DBMS_STATS preference called INCREMENTAL_STALENESS. It the controls whether or not you want to allow partitions within an individual table to have different types of synopses during the transition period from old-style to new-style. Let’s look at the different scenarios and how to proceed after you have upgraded to Oracle Database 12c Release 2. There is a spectrum of choice from “very conservative” (i.e., maintaining old behaviors) to “aggressive” (i.e., taking advantage of new features immediately). The chart below describes the different scenarios from the most conservative cases to most aggressive cases. Use-case Action Initially, you want to continue to use old-format synopses for all tables. We recommend that you use the new-style synopses, but can choose to use them later on if you prefer. The algorithm used prior to HLL is called adaptive sampling. EXEC DBMS_STATS.SET_TABLE_PREFS(              <table_owner>,             '<table-name>',             'APPROXIMATE_NDV_ALGORITHM',             'ADAPTIVE SAMPLING') You want tables using old-style synopses to continue to use them. Newly created incrementally-managed tables will use new-style synopses. Incrementally-managed tables without synopses will use new-style when statistics are gathered. Incrementally-managed tables with old-style synopses will continue to use them. No action. This is the default behavior. The APPROXIMATE_NDV_ALGORYTHM is, by default, REPEAT OR HYPERLOGLOG. You have some very large partitioned tables. They are using old-style synopses and you want to gradually replace the old with the new. Old-format synopses are not immediately replaced and new partitions will have synopses in the new format. Mixed formats will yield less accurate statistics but the advantage is that there is no need to re-gather all table statistics in the foreground. The statistics auto job will gradually re-gather statistics on partitions with old format synopses and generate new format synopses. Eventually, new format synopses will be used for all partitions and statistics will be accurate. EXEC DBMS_STATS.SET_TABLE_PREFS(                 <table_owner>,                 <table_name>,                 'APPROXIMATE_NDV_ALGORITHM',                 'HYPERLOGLOG')> Note that INCREMENTAL_STALENESS preference must have the value ALLOW_MIXED_FORMAT, but it does not need to be set explicity (unless you've changed it) because it is the default setting. You have time to re-gather all statistics. Incrementally managed tables are using old-style synopses and you want to replace the old-style with the new immediately. If you have a window of time to completely re-gather statistics for partitioned tables, then this is the recommended approach. EXEC DBMS_STATS.SET_TABLE_PREFS(                 <table_owner>,                 <table_name>,                 'APPROXIMATE_NDV_ALGORITHM',                 'HYPERLOGLOG') You also need to specify that you don’t want a mix of old synopses and new synopses in the same table: EXEC DBMS_STATS.SET_TABLE_PREFS (             <table_owner>,             <table_name>,            'INCREMENTAL_STALENESS',            'NULL') You need to take some care here. The preference value should be set to 'NULL' (in quotes) and not NULL (without quotes). NULL (without quotes) sets a preference to its default value, which in this case is ALLOW_MIXED_FORMAT. Once these preferences are set you will need to re-gather the table’s statistics.   Remember that you can also set DBMS_STATS preferences (such as APPROXIMATE_NDV_ALGORITHM) at the database, global and schema-level as well as at the table level (as per the examples above). Summary The synopsis format in Oracle Database 12c Release 2 is much more compact than the previous format. If your database is very large, expect to save a lot of space in SYSAUX while maintaining very good accuracy for your statistics. You can expect  the system overhead required to manage synopses to drop too (for example, when you exchange partitions). For more on this and some example scripts, take a look at GitHub. If you have comments on this post or the scripts in GitHub, please go ahead below.

Introduction This post covers how Oracle has improved incremental statistics for partitioned tables in Oracle Database 12c Release 2. If you’re not already familiar with synopses in the context of...

SQL Plan Management

Upgrade to Oracle Database 12c and Avoid Query Regression

Those of you that made it to the demo grounds at Oracle Open World this year (it’s still just about 2015) will have had the chance to chat with the Oracle developers and throw lots of difficult questions at them! For everyone in the Optimizer development team it was a great opportunity to get lots of interaction and feedback, so thanks to all of you that took the time to come along and talk to us. We’re not all lucky enough to get a trip out to San Francisco, so I’ve been collating the main topics that came up to steer the subject matter of the next few blog posts. In this way I hope that we can all benefit from these interactions. I can tell you right away that the number one demo ground question at OOW 2015 was …drum roll… “How can I reduce the risk of query regression when I upgrade to Oracle Database 12c?”. Well, maybe not those words exactly, but words to that effect. There is quite a lot of information out there on this topic, but people seem to struggle to find it… so we put our heads together and we realized that we should publish a 20,000ft view of this topic with pointers down into the detail. That’s the purpose of this post and, for the sake of brevity, I’m going to make the assumption that you are upgrading your Enterprise Edition database from Oracle Database 11g Release 2 to Oracle Database 12c. The bottom line is this: if you want to mitigate the risk of query regression when you upgrade to Oracle Database 12c, then use SQL Plan Management (SPM). This is the recommended approach for the vast majority of systems out there, where the most critical SQL statements are reusable or, in other words, they are executed more than once. Here are a couple of common scenarios: Scenario 1 You want to use all Oracle Database 12c Optimizer features right away, but you need to “repair” any regressed queries quickly and with minimum effort. Scenario 2 You’re upgrading and want to keep your “tried-and-tested”, Oracle Database 11g execution plans. Nevertheless, you do not want your application to be frozen in time: you want to evolve and use improved execution plans that are available in the new release, and you need to do this in a proven and controlled way.   Scenario 1 This is something you’ll want to think about before your production system goes live, particularly if you have not been able to test a realistic workload on all of your production data. It’s also very relevant if you are running a benchmark or proof of concept, where the time that’s available to resolve problem queries can be pretty limited (I’m using some understatement there!). Ideally you will have captured SQL plan baselines before you’ve upgraded, because then you’ll have a set of “good” execution plans at-the-ready. It isn’t absolutely necessity to do this, though. As long as you can reproduce or find an example of a good plan, then this can be used to create a SQL plan baseline on-demand. For example, you may find a better plan: By running the problem query in a pre-upgrade environment (remembering that you can export and import SQL plan baselines to copy them between databases). Inside an existing SQL Tuning Set (STS). By reproducing the good plan in the post-upgrade environment using (for example) “alter session set optimizer_features_enabled = 11…”, adjusting other Optimizer database parameters or by using hints. Yes, setting this parameter to an older version will give you the plan of the older version; that’s the whole purpose of it (and if it doesn’t work for you then it usually means that there’s a bug). The next step is the clever part, but I get the impression that a lot of Oracle experts don’t know that it’s even possible. When you’ve found a good plan and captured the details in a SQL plan baseline, you can use SPM to associate it with a regressed SQL statement without having to change the existing query or the existing application code. For details, take a look in the section, “Creating an accepted plan by modifying the SQL text” in an earlier Optimizer blog entry and also page 24 of SQL Plan Management with Oracle Database 12c. In both cases, an improved SQL execution plan is found using a hint. This plan is associated with a regressed SQL statement so that, in future, the better plan is used. Scenario 2 You should capture SQL Plan Baselines in your Oracle Database 11g environment and export them so that they can be imported into the upgraded database. If you are upgrading in-place, then existing SQL plan baselines will be available without the need to export and import them. If you neglected to capture baselines in the pre-upgrade environment, then you still have the option to capture 11.2 execution plans in an Oracle Database 12c environment by executing your queries in a session where the database parameter optimizer_features_enabled is set to “11.2.0.4” (or whatever version you like). Once SQL plan baselines are established in the upgraded database, you will enjoy plan stability while you get to know your system in the critical early days after the upgrade. Once you are happy with your shiny new database, you can evolve SQL plan baselines either automatically or at your own pace. Then you will gain advantage of all the new Optimizer features available in Oracle Database 12c. Licensing SPM is an Oracle Database Enterprise Edition (EE) feature. You don’t need to buy an additional license to use it with EE. Optionally, you might choose to use SPM with SQL Tuning Sets (STS). If you do, then you will need to have purchased the Oracle Tuning Pack for Oracle Database (PDF), because STS requires the SQL Tuning Advisor. Conclusion Whenever you plan to upgrade, check out Oracle’s Database Upgrade Blog. It’s full of really great information and it will hook you up with the latest advice at the finest level of detail. For example, here are some useful specifics on SPM to get you started. So now is the time to upgrade, unless you’re a retailer like Amazon who’s heating up its systems for the big Christmas push, or perhaps you’re heading into your end-of-year financial reporting. Nevertheless, even for you, the “now” is pretty close…

Those of you that made it to the demo grounds at Oracle Open World this year (it’s still just about 2015) will have had the chance to chat with the Oracle developers and throw lots of difficult...

12c

Optimizer Adaptive Features and Upgrading to Oracle Database 12c Release 2 or Oracle Database 18c

Introduction Here's a reminder of the changes we’ve made to the optimizer’s adaptive feature settings in Oracle Database 12c Release 2 onwards. The new default behavior is covered in an earlier post but I want to cover the various options available to you depending on what you're upgrading from. Since the October 2017 database bundle patch, Oracle has included fixes for the adaptive features. This is covered in a more recent post. I have kept this post in place for those that are upgrading and have not applied the proactive bundle patches. Scenario 1 Upgrading from Oracle Database 11g (or an earlier release) Once you’ve upgraded the database to Oracle Database 12c Release 2 or Oracle Database 18c, use the default adaptive feature settings. To do this, simply don't include any adaptive feature parameters in your database's initialization parameter file. In other words, there's no need to set optimizer_adaptive_plans or optimizer_adaptive_statistics. Keep it simple! Scenario 2 Upgrading from Oracle Database 12c Release 1 where patches for bugs 21171382 and 22652097 have been applied. These two patches enable an Oracle Database 12c Release 1 database to use the same adaptive feature settings as those in Oracle Database 12c Release 2 (see MOS note 2187449.1). There is no requirement to apply them before upgrading. They have been provided because some customers want to use the new adaptive parameter settings in Oracle Database 12c Release 1 production environments. Oracle Database 12c Release 1 databases with these patches can be upgraded without changing any adaptive feature settings.That's it! Alternatively, if you were not using the recommended defaults pre-upgrade and you want to use them post-upgrade, then: Remove references to optimizer_adaptive_plans and optimizer_adaptive_statistics from the database initialization parameter file. Ensure that the DBMS_STATS preference AUTO_STAT_EXTENSIONS is set to OFF using DBMS_STATS.SET_GLOBAL_PREFS. Scenario 3 Upgrading from Oracle Database 12c Release 1 and patches for bugs 21171382 and 22652097 have not been applied. If you disabled adaptive features in Oracle Database 12c Release 1 (by setting, for example, optimizer_adaptive_features to FALSE) you should use the new recommended defaults once you've upgraded. To do this, you will need to check your initialization parameter file as follows: Remove references to the optimizer_adaptive_features parameter (it is obsolete in Oracle Database 12c Release 2 onwards). Remove any fix control and hidden parameter settings that were used to disable various adaptive features. Fix controls like 12914055, 12914055 and 7452863 were typically used along with hidden parameters like _optimizer_dsdir_usage_control and _sql_plan_directive_mgmt_control. There is no need to set optimizer_adaptive_plans and optimizer_adaptive_statistics because the default values are the recommended values.  If adaptive features are enabled in your Oracle Database 12c Release 1 database and you want to continue to use these features in the same way once the database has been upgraded, then: Remove references to optimizer_adaptive_features from the initialization file (it is obsolete in Oracle Database 12c Release 2 onwards). Add optimizer_adaptive_statistics=TRUE to the initialization parameter file (and there's no need to set optimizer_adaptive_plans because the default is TRUE). Set DBMS_STATS preference AUTO_STAT_EXTENSIONS to ON using DBMS_STATS.SET_GLOBAL_PREFS Testing and SQL Plan Management SQL execution plans are subject to change when a database is upgraded,so critical systems should be tested to make sure that there are no performance regressions. If comprehensive testing is not practical but there is a strong requirement to mitigate the risk of performance regression, then use SQL plan management (SPM). This feature allows you to keep the same execution plans post-upgrade as the ones you had pre-upgrade. It can do this without "freezing" SQL execution plans: SPM will identify, verify and enable SQL execution plans that perform better in the new release. In this way it is possible to take advantage of new optimizer features that yield improved performance without risking poor performance associated with regressed SQL execution plans. There's more on SPM in the context of upgrades here and a four-part series here. You can of course use the blog's search facility (above) to find more. If anything is not clear, please don't  hesistate to post a comment. Thanks.

Introduction Here's a reminder of the changes we’ve made to the optimizer’s adaptive feature settings in Oracle Database 12c Release 2 onwards. The new default behavior is covered in an earlier postbut...

Oracle Optimizer

Efficient Statistics Maintenance for Partitioned Tables Using Incremental Statistics – Part 2

Introduction This post covers how you can manage optimizer statistics efficiently when you use partition exchange load (PEL). This technique is used when large volumes of data must be loaded and maximum performance is paramount.  It’s common to see it used in decision support systems and large operational data stores. Make sure you’ve taken a look at Part 1, or you are at least familiar with the concept of incremental statistics so that you know what a synopsis is in the context of a partitioned table. Partition Exchange Load Most of you will be familiar with partition exchange load, but I’ll summarize it briefly to introduce you to the terminology I’ll be using here. The graphic below represents the process. Firstly, the LOAD table is filled with new data and then exchanged with a partition in the “live” application table (SALES). SALES has partitions for quarter 1 and quarter 2 (Q1 and Q2) and LOAD is exchanged with the empty Q2 partition. The effect of the exchange is to incorporate all of the data in LOAD into SALES by swapping the “identity” of LOAD with Q2. The exchange is a logical operation: a change is made in the Oracle data dictionary and no data is moved. The data in LOAD is published to SALES “at the flick of a switch”. Typically, the exchange step looks like this: alter table SALES exchange partition Q2 with table LOAD including indexes without validation; Operationally, this approach is more complex than inserting data directly into SALES but it offers some advantages. For example, new data can be inserted into LOAD before any indexes have been created on this table. If the volume of data is large, creating indexes at the end of the load is very efficient and avoids the need to bear the higher cost of index maintenance during the load. The performance benefit is especially impessive if data is loaded at very high rates in parallel. The exact steps you need to execute for a partition exchange load will vary depending on the type of partitioning you use, whether there are local or global indexes on the table and what constraints are being used. For the purposes of this blog post I’m going to stick to how you manage statistics, but you can find details on how to deal with indexes and constraints in the Database VLDB and Partitioning Guide. When new data is loaded into a table, optimizer statistics must be updated to take this new data into account. In the example above, the global-level statistics for SALES must be refreshed to reflect the data incorporated into the table when LOAD is exchanged with Q2. To make this step as efficient as possible SALES must use incremental statistics maintenance. I expect you'll have guessed from the title of this post that I’m going to assume that from now on! I'm also going to assume that the statistics on SALES are up-to-date prior to the partition exchange load. Oracle Database 11g The moment after LOAD has been exchanged with Q2 there will be no synopsis on Q2; it won’t have been created yet. Incremental statistics requires synopses to update the global-level statistics for SALES efficiently so a synopsis for Q2 will be created automatically when statistics on SALES are gathered. For example: EXEC dbms_stats.gather_table_stats(ownname=>null,tabname=>'SALES') Statistics will be gathered on Q2 and the synopsis will be created so that the global-level statistics for SALES will be updated. Once the exchange has taken place, Q2 will need fresh statistics and a synopsis and it might also need extended statistics and histograms (if SALES has them). For example, if SALES has a column group, "(COL1, COL2)" then Q2 will need these statistics too. The database takes care of this automatically, so there's no requirement to create histograms and extended column statistics on LOAD prior to the exchange because they are created for you when statistics are gathered on SALES. There is nevertheless a scenario where you might want to gather statistics on LOAD prior to the exchange. For example, if it’s likely that Q2 will be queried before statistics have been gathered on SALES then you might want to be sure that statistics are available on Q2 as soon as the exchange completes. This is easy to do because any statistics gathered on LOAD will be associated with Q2 after the exchange. However, bear in mind that this will ultimately mean that statistics for the new data will be gathered twice: once before the exchange (on the LOAD table) and once again after the exchange (for the Q2 partition when SALES statistics are re-gathered). Oracle Database 12c gives you an alternative option, so I’ll cover that below.  If you want to know more about extended statistics and column usage then check out this post. It covers how you can seed column usage to identify where there's a benefit in using extended statistics. Note that some column usage information is always captured to help identify columns that can benefit from histograms. This happens even if you don't choose to seed column usage. Oracle Database 12c Oracle Database 12c includes an enhancement that allows you to create a synopsis on LOAD prior to the exchange. This means that a synopsis will be ready to be used as soon as the exchange has taken place without requiring statistics to be gathered on Q2 post-exchange. The result of this is that the global-level statistics for SALES can be refreshed faster in Oracle Database 12c than they can be in Oracle Database 11g. This is how to prepare the LOAD table before the exchange: begin    dbms_stats.set_table_prefs (null,'load','INCREMENTAL','TRUE');    dbms_stats.set_table_prefs (null,'load','INCREMENTAL_LEVEL','TABLE');    dbms_stats.gather_table_stats (null,'load'); end; / Q2 will have fresh statistics and a synopsis as soon as the exchange completes. This isn't quite the end of the story though. Statistics on Q2 will be gathered again after the exchange (when statistics are gathered on SALES) unless you have created appropriate histograms and extended statistics on LOAD before the exchange. The list_s.sql script in GitHub displays extended statistics and histograms for a particular table if you want to take a look at what you have. If you are using METHOD_OPT to specify exactly what histograms to create on SALES then you can use the same METHOD_OPT for gathering statisitcs on LOAD. For example: Table preference... dbms_stats.set_table_prefs( ownname=>null, tabname=>'SALES', method_opt=>'for all columns size 1 for columns sales_area size 254'); Then...    dbms_stats.set_table_prefs (null,'load','INCREMENTAL','TRUE');    dbms_stats.set_table_prefs (null,'load','INCREMENTAL_LEVEL','TABLE'); select dbms_stats.create_extended_stats(null,'load','(col1,col2)') from dual;    dbms_stats.gather_table_stats( ownname=>null, tabname=>'LOAD', method_opt=>'for all columns size 1 for columns sales_area size 254'); Alternatively, if you are using the default 'FOR ALL COLUMNS SIZE AUTO' to gather statistics on SALES, then it's usually best to preserve automation and exchange without creating histograms on LOAD. This allows stats gathering on SALES to figure out what histograms are needed for Q2 post-exchange. Statistics on Q2 will be gathered post-exchange if SALES has column usage information indicating that there are columns in Q2 that don't have a histogram but might benefit from having one. Also, as mentioned above, extended statistics will be maintained automatically too. Summary of Steps If you are using Oracle Database 12c then you can minimize the statistics gathering time for SALES (post-exchange) if you create a synopsis on LOAD along with appropriate histograms and extended statistics. For Oracle Database 11g, statistics will always be gathered on Q2 once the exchange has completed. Here are the steps (bearing in mind I’m sticking to statistics maintenance and not including steps to manage indexes and constraints etc): Create LOAD table and insert new data (or CREATE TABLE load AS SELECT…) Create a new (empty) partition for SALES (Q2) Populate LOAD with data Optionally (Oracle Database 12c) - follow these steps if you want Q2 to have valid statistics immediately after the exchange: Set INCREMENTAL to 'TRUE' and INCREMENTAL_LEVEL to 'TABLE' for LOAD table Create extended statistics on LOAD to match SALES Gather statistics on LOAD using METHOD_OPT parameter to match histograms with SALES Optionally (Oracle Database 11g) - follow these steps if you want Q2 to have valid statistics immediately after the exchange: Create extended statistics on LOAD to match SALES Gather statistics on LOAD using METHOD_OPT parameter to match histograms with SALES Exchange LOAD with Q2 (this will exchange synopses in Oracle Database 12c, basic column statistics and histograms) Gather statistics for SALES. Oracle Database 12c will complete this step more quickly if you implemented "4", above.  If, in the past, you have used partition exchange load and gathered statistics in an ad-hoc manner then you should probably check that the histograms you have match your expectations when comparing table-level histograms with histograms on partitions and sub-partitions. I’ve included a script in GitHub to help you do that. Composite Partitioned Tables If you are using a composite partitioned table, partition exchange load works in the same way as described above.  If you would like to experiment with a complete example, I’ve created a script called example.sql here.

Introduction This post covers how you can manage optimizer statistics efficiently when you use partition exchange load (PEL). This technique is used when large volumes of data must be loaded and...

Statistics

Efficient Statistics Maintenance for Partitioned Tables Using Incremental Statistics – Part 1

Introduction It’s common to have multi-terabyte partitioned tables in an Oracle database these days. If you are not there yet but you’re heading that way, then you need to know about how to maintain statistics on large tables. I will cover this in a series of posts as follows: Part 1 (this post) – Concepts and implementation Part 2 – Incremental statistics and partition exchange loading Part 3 – New to Oracle Database 12c Release 2 There are some additional details and some overlap with an earlier Optimizer blog post, so you should check that out too. Concepts What statistics are there on partitioned tables?   Consider a partitioned table called SALES that has date-range partitions for financial quarters 1 and 2 (Q1 and Q2). The partition key is on the column QUARTER. The database gathers statistics for individual partitions so that cardinality can be estimated for queries that are pruned to a single partition. These are called partition-level statistics. To illustrate this with an example, I’m going to consider just a couple of statistics and ignore the others. The number of rows in Q1 and Q2 are 600 and 550 respectively. The number of distinct values (NDVs) for SALE_TYPE in Q1 is 30 and Q2 it’s 50: When a query is compiled, if the Oracle Optimizer determines that it will need to access a single partition (using partition pruning, for example) then the statistics at the partition-level will be enough to help determine the execution plan. Here’s a query that reads Q1 only: SELECT SUM(amount) FROM sales WHERE quarter = 'Q1' AND sale_type = 'DIRECT'; If the Optimizer determines at compile-time that a query has the potential to access more than one partition, then individual partition statistics are not enough. In the next example, the query needs to access more than one partition: SELECT SUM(amount) FROM sales WHERE  sale_type = 'DIRECT'; For queries that might access multiple partitions, the Optimizer must consider statistics at the table level. These statistics are known as global-level statistics: You will know that the Oracle database can further subdivide partitions into subpartitions; a feature known as composite partitioning. For now I’m only going to talk about partitions, and later on I’ll say something about subpartition statistics. How does Oracle manage statistics information at the partition and table level? Now that we have established the need for both partition and table level statistics, how does Oracle collect them? Can the table-level statistics be derived from partition-level statistics? It is very easy to derive NUM_ROWS at the global level from individual partitions; simply sum NUM_ROWS for each partition (e.g. 600+550=1150 in the example). Unfortunately, it isn’t that simple for the number of distinct values (denoted as NDVs).  In the example above, the NDV for SALE_TYPE at the global level (55) can’t be calculated using the values 30 and 50 at the partition-level. There’s insufficient information: the basic values 30 and 50 don’t tell us anything about the overlap of SALE_TYPE values in Q1 and Q2. Consider two identical tables, TAB1 and TAB2 that contain difference SALE_TYPE values in Q1 and Q2 partitions: In TAB1, the Q1 partition has SALE_TYPE values A and B, so the NDV is 2. Q2 has the same values, A and B, so the overall table NDV is 2. In the TAB2 case, there is no overlap in values between Q1 and Q1, so even though the partition NDV values are also 2, the overall NDV is 4. To calculate the global-level NDV value, the database must examine all table partitions (assuming that we don’t have some additional information at our disposal). This can become very time-consuming as tables grow large, especially if there are hundreds or thousands of partitions. Synopses to the rescue How does the Oracle Database resolve this problem? Tables can be configured to instruct the statistics gathering procedures to store additional information about each individual partition. Each table partition has a new data structure called a synopsis. Collectively, these structures are called synopses. If data changes in one partition, there is no need to read to contents of all other partitions when recalculating the global-level NDV values. In the following example, change has been made to the data in Q2 (the star symbols indicate where change is occurring). When statistics are re-gathered, there is no need to read the contents of the Q1 partition because the information contained in the Q1 and Q2 synopses can be used instead: Synopses allow the database to maintain accurate table statistics in a scalable manner: as tables grow in size and the number of partitions increases, the performance benefit of this feature will become more apparent. Synopses storage Synopses are maintained automatically by the database. They store additional information in the SYSAUX tablespace about the data stored in every table partition. For tables with large numbers of columns and high NDVs, the amount of data can become large so space usage in SYSAUX should be monitored. Statistics gathering procedures must maintain the synopsis information so this can add a performance overhead for some operations. I will return to this topic in Part 3 of this series. Staleness and DML Change If statistics are not gathered periodically and if the data in the database changes over time, then statistics will be out of date and potentially stale and inaccurate. Statistics need to be accurate to generate good SQL execution plans so the database must detect when they are stale. It does this by tracking the number of DML row insert, update and delete operations for tables, partitions and sub-partitions. Once the number of DML operations exceeds a certain threshold the statistics status for the table, partition or sub-partition is changed to stale. By default, incremental maintenance does not use the staleness status to decide when to update statistics. This scenario is covered in an earlier blog post for Oracle Database 11g. If a partition or sub-partition is subject to even a single DML operation, statistics will be re-gathered, the appropriate synopsis will be updated and the global-level statistics will be re-calculated from the synopses. This behavior can be changed in Oracle Database 12c, allowing you to use the staleness threshold to define when incremental statistics will be re-calculated. This is covered in Staleness and DML thresholds, below. Implementation Enabling synopses To enable the creation of synopses, a table must be configured to use incremental maintenance. This feature is switched on using a DBMS_STATS preference called ‘INCREMENTAL’. For example: EXEC dbms_stats.set_table_prefs(null,'SALES','INCREMENTAL','TRUE') Checking that incremental maintenance is enabled The value of the DBMS_STATS preference can be checked as follows: SELECT dbms_stats.get_prefs(pname=>'INCREMENTAL', tabname=>'SALES') FROM dual; Staleness and DML thresholds As mentioned above, Optimizer statistics are considered stale when the number of changes made to data exceeds a certain threshold. This threshold is expressed as a percentage of row changes for a table, partition or subpartition and is set using a DBMS_STATS preference called STALE_PERCENT. The default value for stale percent is 10 so, for example, a partition containing 100 rows would be marked stale if more than 10 rows are updated, added or deleted. Here is an example of setting and inspecting the preference: EXEC dbms_stats.set_table_prefs(null, 'SALES', 'STALE_PERCENT','5') select dbms_stats.get_prefs('STALE_PERCENT',null,'SALES') from dual; It is easy to check if a table or partition has been marked as stale: select partition_name,        subpartition_name,        stale_stats               /* YES or NO */ from   dba_tab_statistics where  table_name = 'SALES'; The database tracks DML operations to measure when data change has caused a table to exceed its staleness threshold. If you want to take a look at this information, bear in mind that the statistics are approximate and they are autmatically flushed to disk periodically. If you want to see the figures change immediately during your tests then you will need to flush them manually (you must have ‘ANALYZE ANY’ system privilege), like this: EXEC dbms_stats.flush_database_monitoring_info select * from    dba_tab_modifications where   table_name = 'SALES'; Remember that if you are using incremental statistics in Oracle Database 11g, a single DML operation on a partition or sub-partition will make it a target for a statistics refresh  - even if it is not marked stale. In other words, we might update one row in a partition containing 1 million rows. The partition won't be marked state (if we assume a 10% staleness threshold) but fresh statistics will be gathered. Oracle Database 12c exhibits the same behavior by default, but this release gives you the option to allow multiple DML changes to occur against a partition or sub-partition before it is a target for incremental refresh. You can enable this behavior by changing the DBMS_STATS preference INCREMENTAL_STALENESS from its default value (NULL) to 'USE_STALE_PERCENT'. For example: exec dbms_stats.set_global_prefs('INCREMENTAL_STALENESS', 'USE_STALE_PERCENT') Once this preference is set, a table’s STALE_PERCENT value will be used to define the threshold of DML change in the context of incremental maintenance. In other words, statistics will not be re-gathered for a partition if the number of DML changes is below the STALE_PERCENT threshold. Locking statistics Incremental statistics does work with locked partitions statistics as long as no DML occurs on the locked partitions. However, if DML does occurs on the locked partitions then we can no longer guarantee that the global statistics built from the locked statistics will be accurate so the database will fall back to using the non-incremental approach when gathering global statistics. However, if for some reason you must lock the partition level statistics and still want to take advantage of incremental statistics gathering, you can set the 'INCREMENTAL_STALENESS' preference to include ‘USE_LOCKED_STATS’. Once set, the locked partitions/subpartitions stats are NOT considered as stale as long as they have synopses, regardless of DML changes. Note that ‘INCREMENTAL_STALENESS’ accepts multiple values, such as: BEGIN dbms_stats.set_table_prefs( ownname=>null, tabname=>'SALES',       pname =>'INCREMENTAL_STALENESS',       pvalue=>'USE_STALE_PERCENT, USE_LOCKED_STATS'); END; / Checking for staleness You can check for table/partition/subpartition staleness very easily using the statistics views. For example: EXEC dbms_stats.flush_database_monitoring_info select partition_name,subpartition_name,stale_stats from   dba_tab_statistics where  table_name = 'SALES' order by partition_position, subpartition_position; Database monitoring information is used identify stale statistics, so you’ll need to call FLUSH_DATABASE_MONITORING_INFO if you’re testing this out and you want to see immediately how the staleness status is affected by data change. Checking for Synopses Oracle Support maintains Note 1953961.1 which includes a query that lists objects with synopses. Gathering statistics How do you gather statistics on a table using incremental maintenance? Keep things simple! Let the Oracle Database work out how best to do it. Use these procedures:              EXEC dbms_stats.gather_table_stats(null,'SALES')       or             EXEC dbms_stats.gather_schema_stats(…) or, even better        EXEC dbms_stats.gather_database_stats() For the DBMS_STATS.GATHER... procedures you must use ESTIMATE_PERCENT set to AUTO_SAMPLE_SIZE. Since this is the default, then that is what will be used in the examples above unless you have overriden it. If you use a percentage value for ESTIMATE_PERCENT, incremental maintenance will not kick in. Regathering statistics when data hasn’t changed From time-to-time you might notice that statistics are gathered on partitions that have not been subject to any DML changes. Why is this? There are a number of reasons: Statistics have been unlocked. Table column usage has changed (this is explained below). New columns are added. This includes hidden columns created from statistics extensions such as column groups, column expressions. Synopses are not in sync with the column statistics. It is possible that you have gathered statistics in incremental mode at time T1. Then you disable incremental and regather statistics at time T2. Then the synopses’ timestamp T1 is out of sync with the basic column statistics’ timestamp T2. Unusual cases such as column statistics have been deleted using delete_column_statistics. Bullet point "2" has some implications. The database tracks how columns are used in query predicates and stores this information in the data dictionary (sys.col_usage$). It uses this information to help it figure out which columns will benefit from a histogram to improve query cardinality estimates and, as a result, improve SQL execution plans. If column usage changes, then the database might choose to re-gather statistics and create a new histogram. Locally partitioned index statistics For locally partitioned index statistics, we first check their corresponding table partitions (or subpartitions). If the table (sub)partitions have fresh statistics and the index statistics have been gathered after the table (sub)partition-level statistics, then they are considered fresh and their statistics are not regathered. Composite partitioned tables Statistics at the subpartition level are gathered and stored by the database, but note that synopses are created at the partition level only. This means that if the statistics for a subpartition become stale due to data changes, then the statistics (and synopsis) for the parent partition will be refreshed by examining all of its subpartitions. The database only regathers subpartition-level statistics on subpartitions that are stale. More information There is more on this topic in the Database SQL Tuning Guide.  Please feel free to post questions and comments.   

Introduction It’s common to have multi-terabyte partitioned tables in an Oracle database these days. If you are not there yet but you’re heading that way, then you need to know about how to...

Optimizer Adaptive Features in Oracle Database 12c Release 2

Introduction In Oracle Database 12c Release 2 we have changed the way optimizer adaptive features can be controlled. In this post, I'll present what has changed and give you guidance on how to you can choose what settings to use. These changes are also relevant for Oracle Database 12c Release 1. If you want to know more about that, there's information at end of this post. What’s Changed In Oracle Database 12c Release 1, the database parameter optimizer_adaptive_features controls all of the adaptive features like this: In Oracle Database 12c Release 2, this parameter has been made obsolete and replaced with two new parameters that control adaptive plans and adaptive statistics separately, like this: What are the Optimizer Adaptive Features? Adaptive plans and adaptive statistics are umbrella terms for a number of optimizer features. The following tables summarize them. These features are enabled by default: optimizer_adaptive_plans default TRUE Description Nested loop join/Hash join selection The optimizer chooses between nested loops or hash joins at query runtime. Adaptive parallel distribution method The parallel distribution method is determined at runtime. Star transformation bitmap pruning Certain bitmap indexes may be removed from the SQL execution plan at runtime if selectivity is significantly poorer than the estimate.   These features are disabled by default: optimizer_adaptive_statistics default FALSE Description SQL plan directives SQL plan directives are created and used to adapt SQL execution plans. Statistics feedback for joins Cardinality from table joins is used to improve SQL execution plans. Performance feedback Improves the degree of parallelism chosen when PARALLEL_DEGREE_POLICY is set to ADAPTIVE Adaptive dynamic sampling for parallel execution Dynamic statistics are gathered for certain parallel queries to improve cardinality estimates. The sample size is determined automatically.   SQL Plan Directives Setting optimizer_adaptive_statistics to FALSE does not disable the creation of SQL plan directives. Instead, it prevents them from being used in conjunction with dynamic statistics to influence SQL execution plans. Statistics Feedback Setting optimizer_adaptive_statistics to FALSE disables Oracle Database 12c statistics feedback for joins.Note that statistics feedback for single table cardinality (introduced in Oracle Database 11g as cardinality feedback) is always enabled. Adaptive Dynamic Sampling If optimizer_adaptive_statistics is set to TRUE then dynamic statistics using an adaptive sample size will be used in response to SQL plan directives. In addition, adaptive dynamic sampling will be used for certain parallel queries if optimizer_dynamic_sampling is 2 (the default value). If optimizer_adaptive_statistics is set to FALSE, then adaptive dynamic sampling will not be used in these scenarios. Changes to Auto Creation of Column Group Statistics In Oracle Database 12c Release 1, SQL Plan Directives trigger the creation of column group statistics when statistics are gathered. You can see the extended statistics that were created automatically using a query like this: select owner, table_name, extension, extension_name from dba_stat_extensions where creator = 'SYSTEM' order by owner,table_name,extension_name; We received feedback that some DBAs wanted to be able to control this feature, so in Oracle Database 12c Release 2 automatic column group statistics creation is controlled by a DBMS_STATS preference AUTO_STAT_EXTENSIONS. By default, the preference is OFF so that extended statistics are not created automatically. You can re-enable this feature using: EXEC DBMS_STATS.SET_GLOBAL_PREFS('AUTO_STAT_EXTENSIONS','ON') Choosing What Settings to Use The defaults have been chosen on the basis of what works best for widest range of workloads, so it is recommended that you start with these for most systems. If you’re already using all the adaptive features in Oracle Database 12c Release 1, and you want the same in Oracle Database 12c Release 2, then override the defaults by setting the parameter optimizer_adaptive_statistics to TRUE and the preference AUTO_STAT_EXTENSIONS preference to ON. We have received very positive feedback from customer with respect to adaptive plans. It has proved to be very able at improving system performance for a huge range of workloads. Wrongly choosing a nested loop join rather than a hash (or visa-versa) will often cause a query to experience very poor performance. Adaptive plans give the optimizer a way of avoiding this mistake; the final plan being based on the number of rows actually seen rather than an estimate that was made prior to execution. In other words, adapting the execution plan at run time is likely to benefit all types of systems so we elected to set the default of optimizer_adaptive_plans to TRUE. Of course, if you prefer not to use it initially then you can set the parameter to FALSE. Adaptive statistics features are more useful in environments where queries, data distributions and schemas are very complex. In systems like this, where query execution times are longer, it makes especially good sense for the optimizer to invest time and resources to improve the overall quality of the execution plans. The parameter optimizer_adaptive_statistics can be set to TRUE in these environments. What about Oracle Database 12c Release 1? Oracle has made improvements to the way adaptive features are controlled. If you are currently using or upgrading to Oracle Database 12c Release 1, then check out the previous post. It shows you where you can find out more information on how to control adaptive features in this release. If you need to find out more about the adaptive features, there is a white paper that covers Oracle Database 12c Release 1 and Release 2.    

Introduction In Oracle Database 12c Release 2 we have changed the way optimizer adaptive features can be controlled. In this post, I'll present what has changed and give you guidance on how to you...

Oracle OpenWorld

Oracle OpenWorld 2016

It’s nearly that time of the year again: Oracle OpenWorld kicks off on September 18th in San Francisco. I’m hoping to meet some of you out there, but for those of you not going I intend to keep you up-to-date with the latest optimizer news by posting updates here during OOW week and more when the dust has settled a bit. For those of you going, the Oracle Optimizer session is on Monday 19th, 1:45pm to 2:30pm in Moscone South—303. The title is Oracle Optimizer: Upgrading Without Pain [CON6450] and I'll be covering some important changes and enhancements we've made to make things easier for you. Also, check out the panel session Optimizing SQL for Performance and Maintainability[CON6557] on Thursday 22nd,1:15pm - 2:00 pm in Moscone South—103. The developer advocates hosting the panel, Chris and Connor, have asked me to join them so I'll be there as an official gatecrasher. Please try and take the opportunity to come and meet the Oracle Optimizer developers at the Oracle Optimizer and Statistics demo ground booth (number 1633). You can ask them in person about all the new features they’ve added and the enhancements they’ve made. I’ll be at the booth on Tuesday afternoon, and probably on-and-off throughout the week. Finally, my fellow product managers in the data warehousing and big data arena are presenting a bunch of sessions. Check them out in the full searchable OOW catalog. If you search using keywords like partitioning, warehousing, warehouse, parallel and analytics you’ll find them.

It’s nearly that time of the year again: Oracle OpenWorldkicks off on September 18th in San Francisco. I’m hoping to meet some of you out there, but for those of you not going I intend to keep you...

12c

Setting a Session Parameter Overrides OFE

I received an email recently that demonstrated something the author considered strange when the init.ora parameter optimizer_feature_enable (OFE) is set in a database session. I thought I'd mention it here because the behavior he spotted is expected, but I don't think that it is entirely obvious. Let's assume that you're logged into Oracle Database 12c. Now check the value of a hidden parameter applicable to this database version: select ksppinm name,        ksppstvl value from   sys.x$ksppi x,        sys.x$ksppcv y where (x.indx = y.indx) and   ksppinm = '_optimizer_aggr_groupby_elim'; NAME                            VALUE =============================   ===== _optimizer_aggr_groupby_elim    TRUE Next, set OFE to 11.2.0.4: alter session set optimizer_features_enable = '11.2.0.4'; And you will then see this: NAME                             VALUE =============================   ===== _optimizer_aggr_groupby_elim     FALSE That's not surprising - we have turned off an Oracle Database 12c feature by setting OFE down to 11.2.0.4. You can probably guess that the use of hidden parameters is not something I normally recommend, but there are circumstances where you might want to set them. Now, check this out: alter session set optimizer_features_enable = '11.2.0.4'; NAME                            VALUE =============================   ===== _optimizer_aggr_groupby_elim    TRUE This time we see "TRUE", and it's because we do not cascade OFE to a parameter that has been changed in the session. Hints are different because they override session settings. The group-by and aggregation elimination will not be available to the query in this example: alter session set "_optimizer_aggr_groupby_elim"=true; SELECT /*+ optimizer_features_enable('11.2.0.4') */ ...

I received an email recently that demonstrated something the author considered strange when the init.ora parameter optimizer_feature_enable (OFE) is set in a database session. I thought I'd mention it...

12c

Global Temporary Tables and Upgrading to Oracle Database 12c - Don't Get Caught Out

Prior to Oracle Database 12c, global temporary tables (GTTs) shared statistics between sessions, so each GTT had one set of statistics visible to all sessions using it. Of course, gathering statistics on GTTs can be very useful because it will help the optimizer find better execution plans, but this advantage was sometimes tempered by the fact that multiple session would see the same set of stats. To resolve this dilemma, Oracle Database 12c included the ability to create session-private statistics, allowing sessions to create and use their own set of statistics for each GTT. This new behavior is controlled by a DBMS_STATS preference GLOBAL_TEMP_TABLE_STATS, which can be set to SHARED or SESSION.  It is important to remember that the default in Oracle Database 12c is now SESSION, and that this has some implications if you're upgrading to this release. Firstly, you should consider whether your database application depends on SHARED GTT statistics. For example, you might have an ETL job that consist of multiple processes, and perhaps one process uses a database session to gather statistics on a GTT for other sessions to make use of later on. If your database applications are upgraded without taking into account the new default, then workloads relying on shared statistics won't see statistics on GTTs where they had seen them before. Dynamic sampling might kick in where it hadn't before (depending on your optimizer_dynamic_sampling level), and this could result in new or sub-optimal execution plans. So - if you're experiencing SQL execution plan degredations in Oracle Database 12c after an upgrade, check to see if any of the problem queries are associated with GTTs. What should you do about this? Firstly, you can use the following query to see if GTT preferences match your expectations (you'll need to modify it to include your own application schema names): select owner, table_name, dbms_stats.get_prefs('GLOBAL_TEMP_TABLE_STATS',owner,table_name) gtt_stats_preference from dba_tables where temporary = 'Y' and owner in ('my_application_owner') and duration = 'SYS$SESSION' order by owner,table_name; Note that the predicate on the duration column is used to identify GTTs created with "on commit preserve rows". If you want to be very selective, it is easy to change preferences on individual GTTs using a PL/SQL procedure similar to this. Just adjust the query to select the GTTs you want to change: declare cursor c1 is select owner, table_name from dba_tables where temporary = 'Y' and duration = 'SYS$SESSION' and owner in ('my_application_owner'); begin for r in c1 loop dbms_stats.set_table_prefs   (r.owner,r.table_name, 'GLOBAL_TEMP_TABLE_STATS','SHARED'); end loop; end; / You can use a bigger and easier switch to set the preference at the global, database and schema-level too. For example: exec dbms_stats.set_global_prefs('GLOBAL_TEMP_TABLE_STATS','SHARED'); exec dbms_stats.set_schema_prefs('my_application_owner','GLOBAL_TEMP_TABLE_STATS','SHARED'); There is an "upgrade gotcha" you need to be aware of. If you have upgraded to Oracle Database 12c and are using optimizer_features_enable (OFE) set to a pre-12.1.0.1 version (let's say, for the sake of argument, it is '11.2.0.4') then be aware that the Oracle Optimizer will not "see" session private statistics. You can see this in action if you do something like this: -- Set dynamic sampling to the default value to help us indentify when stats are missing alter session set optimizer_dynamic_sampling = 2; -- Create a GTT create global temporary table gtt1 (id number(10)) on commit preserve rows; insert into gtt1 values (10); commit; -- Just to make sure you're using the default preference value, "SESSION": exec dbms_stats.set_table_prefs(user,'GTT1','GLOBAL_TEMP_TABLE_STATS','SESSION') -- Gather stats exec dbms_stats.gather_table_stats(user,'gtt1'); Execute a query on the GTT: set autotrace on select * from gtt1 where id = 1; The execution plan will include the following comment in the "Note" section: Global temporary table session private statistics used If you now do the same with OFE set to 11.2.0.4: alter session set optimizer_features_enable = '11.2.0.4';set autotrace on select * from gtt1 where id = 1; You will see the following note in the execution plan, where the use of dynamic sampling indicates that the session-private statistics are not being seen: dynamic statistics used: dynamic sampling (level=2) Note also, that if you do this: exec dbms_stats.set_table_prefs(user,'GTT1','GLOBAL_TEMP_TABLE_STATS','SESSION') alter session set optimizer_features_enable = '11.2.0.4'; exec dbms_stats.gather_table_stats(user,'gtt1'); ...then you will gather session private statistics, which will not be seen by queries executed against the GTT if OFE remains set prior to 12.1.0.1. Conclusions If you're setting OFE prior to 12.1.0.1 and you have "on commit preserve rows" GTTs, you will probably want to change your GTT table preferences to SHARED so that statistics will be created and seen as expected. If you are upgrading to Oracle Database 12c or developing new functionality on this release, you need to consider whether this change of behavior is relevant to your application. Do you have GTTs created with "on commit preserve rows" and does the application gather statistics on those tables? Has your application been modified to accommodate the new default? If not, you might need to change your GTT table preferences to SHARED.

Prior to Oracle Database 12c, global temporary tables (GTTs) shared statistics between sessions, so each GTT had one set of statistics visible to all sessions using it. Of course, gathering statistics...

12c

How to Use SQL Plan Management

Introduction In December I posted an entry on the subject of SQL plan management (SPM). It was inspired by questions raised at Oracle Open World 2015 and since then I’ve been talking to more customers about it, particularly in the context of upgrades. As you might guess, I get a lot of questions on the topic of how to use SPM in practice, so it seems like a good time to collate my answers and give you some pointers on how to get the best from the feature. If you're not familiar with SPM, there’s a four-part blog series to get you up to speed. I’m going to make the assumption that your system is a suitable candidate for implementing SPM. This implies that you have a finite set of repeatable queries (or at least a finite set of critical queries). In SPM terminology, a repeatable query is one that is executed multiple times without the query text being modified, so this will make it practical and worthwhile to capture and stabilize its execution plan. This usually means that the application will need to use bind variables and not incorporate literal values in query strings (unless cursor sharing is being used to address that). If your application does use literal values for some critical queries, consider using SQL profiles with the “force_match” parameter. Choosing between a strategic or tactical approach SPM has been designed so that it can be used strategically. In other words, it will stabilize the execution plans of all your repeatable SQL statements and prevent them from encountering performance regression. If you need to avoid regression at all costs then a strategic, “SQL-plan-baseline-everything” approach is often the most appropriate choice. You should definitely consider creating SQL plan baselines for all queries if you are upgrading and want to continue to use the same SQL execution plans. After upgrade, you can evolve SQL plan baselines once the system is established, taking advantage of new optimizations and better execution plans in an incremental and verified way. SPM can be implemented in a more tactical manner; using it to target a subset of your workload. This is appropriate if you are confident that you can identify a core set of critical queries and you can be sure that regressions outside this set are unlikely to significantly harm your service levels. Using SPM proactively or reactively You might have seen the following image in our documentation; comparing SPM with SQL profiles: You can see that SPM is usually used proactively to prevent the use of suboptimal plans, whereas SQL Profiles are usually used reactively to correct problems once they have occurred. But don’t assume that this guidance is cast in stone. If you’ve read the earlier SPM series you’ll know that you can use SPM to “repair” queries that have regressed. In my experience, many DBAs use SPM reactively to correct individual queries, avoiding the need to change the application in any way (something that every DBA wants to have in their tool-box). For the rest of this post, I’ll assume that SPM is being used proactively and that there are SQL plan baselines for all (or a significant part) of your workload. When to capture SQL plan baselines Capture queries from your workload when the system is performing well so that, by definition, all of the captured SQL execution plans will be “good” execution plans. SPM auto capture is a very popular choice with DBAs and it's particularly easy to use, but be aware that it will add some overhead to your workload because the execution plans are captured and stored at (hard) parse time. This isn't usually a problem because hard parse rates are generally low of course, but if you're concerned about it you can always capture in bulk from the cursor cache or from SQL tuning sets at any time. Make sure that you create accepted SQL plan baselines for every captured query and if a query has multiple execution plans (because of adaptive cursor sharing) go ahead and accept them all. This approach requires little manual intervention and there is no need to micro-manage which queries to accept. SQL plan baseline capture is an ongoing-process, but Oracle recommends that you don’t switch on auto-capture indefinitely. Instead, capture new queries when changes have been made to applications and workloads. Bulk capture (from the cursor cache or from SQL tuning sets) is a good on-going approach too. For all types of capture, SQL plan baselines will be created without requiring you to figure out which ones are new and which ones already exist – SPM takes care of that for you. If you are upgrading a critical system to Oracle Database 12c, then this is a perfect time to consider creating SQL plan baselines for all your queries. You can do this using the pre or post-upgraded system (see here for details). Implementing SPM and backing out SQL plan baselines There is always risk associated with making changes to a production system, and change management procedures will usually demand that you have a back-out procedure. It is reasonable to consider what would happen if you add or enable a large number of SQL plan baselines and you encounter problems of some kind. To help you in these circumstances, the DBA_SQL_PLAN_BASELINES view includes timestamps for creation and modification (as well as the origin of the SQL plan baseline). You can use this information to identify recently added and changed SQL plan baselines. The DROP_SQL_PLAN_BASELINE and ALTER_SQL_PLAN_BASELINE procedures are very efficient so, for example, most systems will be capable of enabling and disabling hundreds of SQL plan baselines per second. Fixing SQL plan baselines Individual SQL plan baselines can be fixed. This means that new plans will not be captured automatically for SQL statements that have fixed SQL plan baselines (although you can still load new plans manually and make them fixed if you want to). Be aware that if a SQL plan baseline is fixed for a particular query, but other non-accepted plans in the SQL plan history were previously captured (for this query) then these plan history entries will be subject to evolution. I have heard of DBAs encountering this, but it is a reare scenario - it is more usual to fix all baselines for a given SQL statement if you were considering fixing at all. Some DBAs use SPM very tactically, creating and fixing SQL plan baselines for a carefully chosen set of queries with the intention of never evolving them. However, before fixing SQL plan baselines, consider the possibility of using a slightly different approach. Instead, don't fix them, but allow SPM to capture alternative plans in the SQL plan history. The trick then is to use the SPM auto evolve task to report on the new SQL execution plans without accepting them by default. Like this: DBMS_SPM.SET_EVOLVE_TASK_PARAMETER( task_name => 'SYS_AUTO_SPM_EVOLVE_TASK' , parameter => 'ACCEPT_PLANS' , value => 'FALSE' ); Now you have the option to view the evolve task report at any time and experiment with new plans, accepting the better ones individually and manually if you prefer. How and when to evolve If SQL plan baselines have been comprehensively captured for a representative workload, the number of new plans identified is usually very low. In this case, the motivation to evolve plans will also be low. Nevertheless, evolution is still worthwhile to take into account new plans that are generated by change (such as changes to underlying schema objects, application changes or configuration changes). In Oracle Database 12c, the evolve process is fully automated and enabled by default. We want you to be confident of using SPM with very little manual intervention, but I know that automation has surprised a number of DBAs that had originally intended to implement evolution down-the-line. If you want to postpone evolution for whatever reason, then the best approach is usually to set the evolve auto task parameter ACCEPT_PLANS to FALSE (see above). New excution plans will be verified and you can view the auto task report at any time to identify improvements and then be as selective as you like about which ones to accept using the evolve API on a case-by-case basis. For example: DBMS_SPM.EVOLVE_SQL_PLAN_BASELINE( … sql_handle=>'...', plan_name=>'...' …); When upgrading a database, a strategic SPM approach will enable you to carry forward your existing SQL execution plans into the new release. Once the new environment is established, it is likely that you will capture a large number of new execution plans in the SQL plan history. After all, the Optimizer will be different. The motivation to evolve baselines is likely to be stronger because it makes good sense to make full use of the most up-to-date Optimizer but you might want it to verify that it is better (and find out why if it’s not). For this scenario, it is not unreasonable to use ACCEPT_PLANS = FALSE to help you identify improved SQL execution plans to evolve individually. This will give you the opportunity to gain trust in evolution before you set ACCEPT_PLANS to TRUE. Troubleshooting Systems are rarely static. Schema changes or environment changes can sometimes make it impossible to continue to use a previously-valid execution plan defined in a SQL plan baseline. For example, if a SQL plan baseline specifies an index range scan but the corresponding index has been dropped, then it will become impossible for the SQL plan baseline excution plan to be used. This will not cause the corresponding SQL statement to fail; there is no error condition or error message, but it can mean that there will be a SQL statement that is no longer “protected” by a SQL plan baseline so it could conceivably suffer from a performance regression. If evolve is used frequently then this scenario can be avoided to a large extent because plan changes are captured in the SQL plan history where they can be verified and accepted when necessary. I have seen cases where a customer evolves their SQL plan history infrequently, and subsequently some SQL statements have stopped matching their intended SQL plan baselines without being noticed. Also, this can sometimes happen because of a bug or where there has been a schema change without the DBAs realizing that it could make some execution plans in SQL plan baselines non-viable.  Fortunately, it is not difficult to identify queries in the SQL cursor cache that have a corresponding SQL plan baseline but (for whatever reason) it is not being successfully matched and used. Take a look at the utility scripts I’ve uploaded to GitHub – and in particular “nomatch.sql” shows you how it’s possible to identify SQL statements in the cursor cache that are failing to match their SQL plan baselines. As usual, if you have any suggestions on how I could improve the scripts, just let me know in the comments.  

Introduction In December I posted an entry on the subject of SQL plan management (SPM). It was inspired by questions raised at Oracle Open World 2015 and since then I’ve been talking to more...

Oracle Optimizer

Group-by and Aggregation Elimination

I get a fair number of questions on query transformations, and it’s especially true at the moment because we’re in the middle of the Oracle Database 12c Release 2 beta program. Sometimes people notice “something missing” or unusual in an execution plan and realize that a transformation is happening. For example, Join Elimination (thanks, Christian) can take you by surprise if you notice that a table in your query doesn’t appear in the SQL execution plan at all (and neither do any of its indexes). I’m sure you’re aware that query transformations are an important step in query optimization because it’s often possible to reduce the cost of a query by eliminating steps like joins or sorts. Sometimes changing the shape of the query will allow it to use different access paths, different types of join, and entirely different types of query blocks. We cover most transformations in our “What to expect from the Optimizer” collateral and, in particular, release-specific Optimizer white papers (here's the Oracle Database 12c one). In Oracle Database 12.1.0.2, we added a new transformation called Group-by and Aggregation Elimination and it slipped through any mention in our collateral. It happens to be one of the simplest transformations in the Oracle Optimizer’s repertoire and I know that some of you are very well-informed and know about it already. You might have seen it getting a mention in Mike Dietrich’s upgrade blog. Let’s take a look at what the transformation does… Many applications contain queries that have a form where a group-by query block has a single table that is a “group-by view”. Here’s an example: SELECT v.column1, v.column2, MAX(v.sm), SUM(v.sm) FROM (SELECT t1.column1, t1.column2, SUM(t1.item_count) AS sm       FROM   t1, t2       WHERE  t1.column4 > 3       AND    t1.id = t2.id       AND    t2.column5 > 10       GROUP BY t1.column1, t1.column2) V GROUP BY v.column1, v.column2; For the transformation to be possible, the outer query must have a single group-by view, but the inner query (that forms the view) can include multiple tables. Under certain conditions, we can transform these types of query to eliminate group-by aggregations, which is an expensive operation. The resulting query is simpler and contains fewer group-by clauses and aggregate functions.  For the query above, we can transform it into something like this: SELECT t1.column1, t1.column2, SUM(t1.item_count) AS "MAX(V.SM)", SUM(t1.item_count) AS "SUM(V.SM)" FROM   t1, t2 WHERE  t1.column4 > 3 AND        t1.id = t2.id AND        t2.column5 > 10 GROUP BY t1.column1, t1.column2;  Here’s what the SQL execution plan looks like without the transformation. In my case there are 100,000 rows in each table and the elapsed time is 2.09 seconds: ------------------------------------------------------    Id | Operation             | Name | Rows  | Bytes | ------------------------------------------------------ |   0 | SELECT STATEMENT      |      |       |       | |   1 |  HASH GROUP BY        |      | 66521 |  1494K| |   2 |   VIEW                |      | 66521 |  1494K| |   3 |    HASH GROUP BY      |      | 66521 |  2143K| |   4 |     HASH JOIN         |      | 99800 |  3216K| |   5 |      TABLE ACCESS FULL| T2   | 99800 |   877K| |   6 |      TABLE ACCESS FULL| T1   | 99998 |  2343K| ------------------------------------------------------ You can see from the plan above that there are two HASH GROUP BY steps, one for the view and one for the outer query block. I used database version 12.1.0.2 for this test, so I disabled the query transformation by setting the database hidden parameter _optimizer_aggr_groupby_elim to FALSE. Looking at the plan for the transformed query, you can see that there is only a single HASH GROUP BY step. The elapsed time was much lower too – only 1.29 seconds in my case: ----------------------------------------------------    Id | Operation           | Name | Rows  | Bytes | ---------------------------------------------------- |   0 | SELECT STATEMENT    |      |       |       | |   1 |  HASH GROUP BY      |      | 66521 |  2143K| |   2 |   HASH JOIN         |      | 99800 |  3216K| |   3 |    TABLE ACCESS FULL| T2   | 99800 |   877K| |   4 |    TABLE ACCESS FULL| T1   | 99998 |  2343K| ---------------------------------------------------- The example above is relatively easy to understand because the GROUP BY columns in the view and the outer query are the same. This doesn’t have to be the case. It’s sometimes possible to transform the query even if the outer query’s GROUP BY is a subset of the view’s GROUP BY. For example: SELECT v.column1, v.column3, MAX(v.column1), SUM(v.sm) FROM (SELECT t1.column1, t1.column2, t1.column3, SUM(t1.item_count) AS sm       FROM   t1, t2       WHERE  t1.column4 > 3 AND              t1.id = t2.id  AND              t2.column5 > 10       GROUP BY t1.column1, t1.column2, t1.column3) V GROUP BY v.column1, v.column3; ----------------------------------------------------    Id | Operation           | Name | Rows  | Bytes | ---------------------------------------------------- |   0 | SELECT STATEMENT    |      |       |       | |   1 |  HASH GROUP BY      |      | 49891 |  1607K| |*  2 |   HASH JOIN         |      | 99800 |  3216K| |*  3 |    TABLE ACCESS FULL| T2   | 99800 |   877K| |*  4 |    TABLE ACCESS FULL| T1   | 99998 |  2343K| ---------------------------------------------------- You don’t need to do anything special to enable this query transformation. It is enabled by default and it happens automatically whenever a particular query passes the validation criteria. In real-world, enterprise-class systems you can of course expect to see much more significant time-savings when this transformation can be applied. Note that the transformation is not applied for queries that use ROLLUP or CUBE grouping functions. Were there problems with this transformation? Yes, there were (and this is why Mike Dietrich mentioned it). With any transformation, the Oracle Optimizer has to figure out when it can and can’t be applied and the logic behind this can be incredibly complex. The bottom line is that there were some cases where the transformation was being applied and it shouldn’t have been. Generally, this was where the outer group-by query was truncating or casting columns used by the inner group-by. This is now fixed and it’s covered by patch number 21826068. Please use MOS to check availability for your platform and database version.

I get a fair number of questions on query transformations, and it’s especially true at the moment because we’re in the middle of the Oracle Database 12c Release 2 beta program. Sometimes people notice...

SQL Plan Management

Upgrade to Oracle Database 12c and Avoid Query Regression

Those of you that made it to the demo grounds at Oracle Open World this year (BTW - it’s still 2015 - just) will have had the chance to chat with the Oracle developers and throw lots of difficult questions at them! For everyone in the Optimizer development team it was a great opportunity to get lots of interaction and feedback, so thanks to all of you that took the time to come along and talk to us. We’re not all lucky enough to get a trip out to San Francisco, so I’ve been collating the main topics that came up to steer the subject matter of the next few blog posts. In this way I hope that we can all benefit from these interactions.I can tell you right away that the number one demo ground question at OOW 2015 was …drum roll… “How can I reduce the risk of query regression when I upgrade to Oracle Database 12c?”. Well, maybe not those words exactly, but words to that effect. There is quite a lot of information out there on this topic, but people seem to struggle to find it… so we put our heads together and we realized that we should publish a 20,000ft view of this topic with pointers down into the detail. That’s the purpose of this post and, for the sake of brevity, I’m going to make the assumption that you are upgrading your Enterprise Edition database from Oracle Database 11g Release 2 to Oracle Database 12c. The bottom line is this: if you want to mitigate the risk of query regression when you upgrade to Oracle Database 12c, then use SQL Plan Management (SPM). This is the recommended approach for the vast majority of systems out there, where the most critical SQL statements are reusable or, in other words, they are executed more than once. Here are a couple of common scenarios: Scenario#1 You want to use all Oracle Database 12c Optimizerfeatures right away, but you need to “repair” any regressed queries quickly and with minimum effort. Scenario#2 You’re upgrading and want to keep your “tried-and-tested”, Oracle Database 11g execution plans. Nevertheless, you do not want your application to be frozen in time: you want to evolve and use improved execution plans that are available in the new release, and you need to do this in a proven and controlled way. Scenario 1This is something you’ll want to think about before your production system goes live, particularly if you have not been able to test a realistic workload on all of your production data. It’s also very relevant if you are running a benchmark or proof of concept, where the time that’s available to resolve problem queries can be pretty limited (I’m using some understatement there!). Ideally you will have captured SQL plan baselines before you’ve upgraded, because then you’ll have a set of “good” execution plans at-the-ready. It isn’t absolutely necessity to do this, though. As long as you can reproduce or find an example of a good plan, then this can be used to create a SQL plan baseline on-demand. For example, you may find a better plan: By running the problem query in a pre-upgrade environment (remembering that you can export and import SQL plan baselines to copy them between databases). Inside an existing SQL Tuning Set (STS). By reproducing the good plan in the post-upgrade environment using (for example) “alter session set optimizer_features_enabled = 11…”, adjusting other Optimizer database parameters or by using hints. Yes, setting this parameter to an older version will give you the plan of the older version; that’s the whole purpose of it (and if it doesn’t work for you then it usually means that there’s a bug). The next step is the particularly clever part, but I get the impression that a lot of Oracle experts don’t know that it’s even possible. When you’ve found a good plan and captured the details in a SQL plan baseline, you can use SPM to associate it with a regressed SQL statement without having to change the existing query or the existing application code. For details, take a look in the section, “Creating an accepted plan by modifying the SQL text” in an earlier Optimizer blog entry and the section entitled, "Using SPM to Correct Regressed SQL Statements" in the SQL Plan Management with Oracle Database 12c Release 2 white paper. In both cases, an improved SQL execution plan is found using a hint. This plan is associated with a regressed SQL statement so that, in future, the better plan is used. Scenario 2You should capture SQL Plan Baselines in your Oracle Database 11g environment and export them so that they can be imported into the upgraded database. If you are upgrading in-place, then existing SQL plan baselines will be available without the need to export and import them. If you neglected to capture baselines in the pre-upgrade environment, then you still have the option to capture 11.2 execution plans in an Oracle Database 12c environment by executing your queries in a session where the database parameter optimizer_features_enabled is set to “11.2.0.4” (or whatever version you like). Once SQL plan baselines are established in the upgraded database, you will enjoy plan stability while you get to know your system in the critical early days after the upgrade. Once you are happy with your shiny new database, you can evolve SQL plan baselines either automatically or at your own pace. Then you will gain advantage of all the new Optimizer features available in Oracle Database 12c.Licensing SQL Plan Management is an Oracle Database Enterprise Edition (EE) feature (you can see this here). You don’t need to buy an additional license to use SPM on EE. Optionally, you might choose to use SPM with SQL Tuning Sets (STS). If you do, then you will need to have purchased the Oracle Tuning Pack for Oracle Database (PDF) in adition to EE because STS requires this pack. Top Tip Whenever you plan to upgrade, check out Oracle’s Database Upgrade Blog. It’s full of really great information and it will hook you up with the latest advice at the finest level of detail. For example, here are some useful specifics on SPM to get you started. So now is the time to upgrade, unless you’re a retailer like Amazon who’s heating up its systems for the big Christmas push, or perhaps you’re heading into your end-of-year financial reporting period. Nevertheless, even for you, the “now” is pretty close…

Those of you that made it to the demo grounds at Oracle Open World this year (BTW - it’s still 2015 - just) will have had the chance to chat with the Oracle developers and throw lots of difficult...

Oracle Optimizer

Tips on SQL Plan Management and Oracle Database In-Memory – Part 3

html{font-size:45%;} tbody{font-family:Arial, Helvetica, sans-serif; font-size:110%;} .mycode{font-family: monospace; font-size:100%;} .ilc{font-family: monospace; font-size:100%;} .u14mnav{font-size:100%;} .u14logo{font-size:100%;} var h1Headers = document.getElementsByTagName("h1"); for (var i = 0; i< h1Headers.length; i++) { h1Headers[i].style.fontSize = '28px'; } In Part 1 of this series of tips on SQL Plan Management (SPM) and Oracle Database In-Memory I covered an example where a full table scan query made use of the In-Memory column store immediately without changing the SQL execution plan. In Part 2 I presented an example where the In-Memory column store made an alternative SQL execution plan viable, and where there was a corresponding SQL plan baseline already in place so that this plan could be used immediately.  In this post I will consider a slightly different scenario: SQL plan management is used to stabilize the SQL execution plans for our critical application (that until now has not been using the In-Memory column store). The In-Memory column store is subsequently enabled and populated with application tables.  The Optimizer identifies new SQL execution plans. Most of the new execution plans have never been chosen by the Optimizer before. Let me say right away; SPM behaves in a business-as-usual manner:  New plans for existing baselines are captured by SPM but they will not be used until they are accepted. Existing SQL plan baselines are used, so queries continue to use "approved" SQL execution plans. The database administrator chooses how and when to evolve the SQL plan baselines to take full advantage of In-Memory SQL execution plans. This is probably one of the most common scenarios you’ll encounter if you use SPM and you start to use Oracle Database In-Memory. As Andy Rivenes pointed out in his blog post, SPM is a very good way to avoid query regressions by controlling how and when queries are affected as you populate the In-Memory column store with more and more tables. I'll use the following example to show you how SPM behaves: There is a query called Q3. Q3 queries a table called MYSALES.  MYSALES is not yet populated into the In-Memory column store. Q3 filters rows in MYSALES using a column called SALE_TYPE. SALE_TYPE has relatively low cardinality, but an index is still useful. There is a SQL plan baseline for Q3 to ensure that it will uses an index range scan and not a full table scan. This is the plan before the In-Memory column store is populated with MYSALES: PLAN_TABLE_OUTPUT ----------------- SQL_ID  8xkx5abshb4rz, child number 2 ------------------------------------- select /* SPM */ count(*),sum(val) from mysales where sale_type in (2,3)   Plan hash value: 719460714   ------------------------------------------------------------------------------------------------- |     | Operation                             | Name    | Rows  | Bytes | Cost (%CPU)| Time     | ------------------------------------------------------------------------------------------------- |   0 | SELECT STATEMENT                      |         |       |       |   246 (100)|          | |   1 |  SORT AGGREGATE                       |         |     1 |     7 |            |          | |   2 |   INLIST ITERATOR                     |         |       |       |            |          | |   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| MYSALES | 20000 |   136K|   246   (0)| 00:00:01 | |*  4 |     INDEX RANGE SCAN                  | SI      | 20000 |       |    40   (0)| 00:00:01 | -------------------------------------------------------------------------------------------------   Predicate Information (identified by operation id): ---------------------------------------------------      4 - access(("SALE_TYPE"=2 OR "SALE_TYPE"=3))   Note -----    - SQL plan baseline SQL_PLAN_bk42daz2f53zwb9fe04b5 used for this statement   It's an index range scan, and the “note” section (above) shows us that a SQL plan baseline is being used (it's name ending in "4b5" ). Let's take a look at our baselines- there's just one:  SELECT plan_name,sql_handle,sql_text,enabled, accepted FROM   dba_sql_plan_baselines WHERE  sql_text LIKE '%SPM%'; PLAN_NAME                       SQL_HANDLE           SQL_TEXT                            ENA ACC ------------------------------- -------------------- ----------------------------------- --- --- SQL_PLAN_bk42daz2f53zwb9fe04b5  SQL_b9104d57c4e28ffc select /* SPM */ count(*),sum(val)  YES YES                                                      from mysales                                                      where sale_type in (2,3) Now, populate the In-Memory column store:   -- Mark MYSALES with the In-Memory attribute ALTER TABLE mysales INMEMORY; -- Access MYSALES to trigger population into In-Memory column store SELECT count(*) FROM mysales; If we re-run Q3, we still get an index range scan rather than the INMEMORY FULL scan we might have anticipated (because an In-Memory scan can be more efficient than an index range scan in some cases):  ------------------------------------------------------------------------------------------------- |     | Operation                             | Name    | Rows  | Bytes | Cost (%CPU)| Time     | ------------------------------------------------------------------------------------------------- |   0 | SELECT STATEMENT                      |         |       |       |   246 (100)|          | |   1 |  SORT AGGREGATE                       |         |     1 |     7 |            |          | |   2 |   INLIST ITERATOR                     |         |       |       |            |          | |   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| MYSALES | 20000 |   136K|   246   (0)| 00:00:01 | |*  4 |     INDEX RANGE SCAN                  | SI      | 20000 |       |    40   (0)| 00:00:01 | ------------------------------------------------------------------------------------------------- Has the Optimizer decided that an index range scan is still the best option? We can answer that if we take another look at the SQL plan baselines:  SELECT plan_name,sql_handle,sql_text,enabled, accepted FROM   dba_sql_plan_baselines WHERE  sql_text LIKE '%SPM%'; PLAN_NAME                       SQL_HANDLE           SQL_TEXT                            ENA ACC ------------------------------- -------------------- ----------------------------------- --- --- SQL_PLAN_bk42daz2f53zwc69cec1f  SQL_b9104d57c4e28ffc select /* SPM */ count(*),sum(val)  YES NO                                                      from mysales                                                      where sale_type in (2,3) SQL_PLAN_bk42daz2f53zwb9fe04b5  SQL_b9104d57c4e28ffc select /* SPM */ count(*),sum(val)  YES YES                                                      from mysales                                                      where sale_type in (2,3)   OK. There’s a new baseline, but it isn’t accepted (the value “NO” appears in the accepted column). This is exactly what SPM is supposed to do: we continue to use accepted and "approved" plans until we have verified or chosen to use alternatives. What is the new baseline plan in this case? SELECT PLAN_TABLE_OUTPUT FROM   V$SQL s, DBA_SQL_PLAN_BASELINES b,         TABLE(           DBMS_XPLAN.DISPLAY_SQL_PLAN_BASELINE(b.sql_handle,b.plan_name,'basic')         ) t WHERE  s.EXACT_MATCHING_SIGNATURE=b.SIGNATURE AND    b.PLAN_NAME='SQL_PLAN_bk42daz2f53zwc69cec1f';   PLAN_TABLE_OUTPUT ----------------- SQL handle: SQL_b9104d57c4e28ffc SQL text: select /* SPM */ count(*),sum(val) from mysales where sale_type in          (2,3) --------------------------------------------------------------------------------   -------------------------------------------------------------------------------- Plan name: SQL_PLAN_bk42daz2f53zwc69cec1f         Plan id: 3332172831 Enabled: YES     Fixed: NO      Accepted: NO      Origin: AUTO-CAPTURE Plan rows: From dictionary --------------------------------------------------------------------------------   Plan hash value: 3292460164   ----------------------------------------------- | Id  | Operation                   | Name    | ----------------------------------------------- |   0 | SELECT STATEMENT            |         | |   1 |  SORT AGGREGATE             |         | |   2 |   TABLE ACCESS INMEMORY FULL| MYSALES | -----------------------------------------------   There it is! The Optimizer has established that the In-Memory full table scan is a good choice, but it will not be used until the new SQL plan baseline has been accepted. Let's go ahead and accept it, but take note that in this example there’s a good chance that the difference in performance will be very small because, after all, it’s only a simple query on a small dataset. If the performance difference is small then automatic plan evolution won’t deem the performance improvement to be sufficient to trigger automatic acceptance of the new baseline. It's worth remembering this if you find that you have a bunch of new baselines that are not accepted automatically. I'll use “verify=>’NO’” to force acceptance:  cVal := dbms_spm.evolve_sql_plan_baseline(sql_handle=>' SQL_b9104d57c4e28ffc',verify=>'NO');   SELECT plan_name,sql_handle,sql_text,enabled, accepted FROM   dba_sql_plan_baselines WHERE  sql_text LIKE '%SPM%';   PLAN_NAME                       SQL_HANDLE           SQL_TEXT                            ENA ACC ------------------------------- -------------------- ----------------------------------- --- --- SQL_PLAN_bk42daz2f53zwc69cec1f  SQL_b9104d57c4e28ffc select /* SPM */ count(*),sum(val)  YES YES                                                      from mysales                                                      where sale_type in (2,3) SQL_PLAN_bk42daz2f53zwb9fe04b5  SQL_b9104d57c4e28ffc select /* SPM */ count(*),sum(val)  YES YES                                                      from mysales                                                      where sale_type in (2,3) Now, re-run the query: PLAN_TABLE_OUTPUT ----------------- SQL_ID  8xkx5abshb4rz, child number 0 ------------------------------------- select /* SPM */ count(*),sum(val) from mysales where sale_type in (2,3)   Plan hash value: 3292460164   --------------------------------------------------------------------------------------- |     | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     | --------------------------------------------------------------------------------------- |   0 | SELECT STATEMENT            |         |       |       |    24 (100)|          | |   1 |  SORT AGGREGATE             |         |     1 |     7 |            |          | |*  2 |   TABLE ACCESS INMEMORY FULL| MYSALES | 20000 |   136K|    24   (9)| 00:00:01 | ---------------------------------------------------------------------------------------   Predicate Information (identified by operation id): ---------------------------------------------------      2 - inmemory(("SALE_TYPE"=2 OR "SALE_TYPE"=3))        filter(("SALE_TYPE"=2 OR "SALE_TYPE"=3))   Note -----    - SQL plan baseline SQL_PLAN_bk42daz2f53zwc69cec1f used for this statement   That’s more like it! We have accepted the new In-Memory execution plan and we’ve done it in a controlled manner. We are using the new SQL plan baseline (the name ends in "c1f" ). In reality, you might have thousands of SQL plan baselines to evolve, but you can use the SPM evolve advisor task to automate the process of verification and acceptance. If you use this feature, then any SQL statements in your baseline that don’t benefit from the In-Memory column store significantly will continue to use their existing SQL execution plans.   In this series of posts I don’t pretend to have covered every possible scenario, but I hope that this has given some idea of how SPM will behave if you choose to use Oracle Database In-Memory. I'm still not absolutely sure that "Part 3" will be the last part, so this might end up being a trilogy in four or five parts (to steal a quote from a famous author).You can take and develop the scripts I wrote to try out scenarios of your own. They are available on GitHub. So go ahead and check them out, and post any questions you have in the comments section below.

In Part 1 of this series of tips on SQL Plan Management (SPM) and Oracle Database In-Memory I covered an example where a full table scan query made use of the In-Memory column store immediately...

Oracle Optimizer

Tips on SQL Plan Management and Oracle Database In-Memory - Part 2

html{font-size:45%;} tbody{font-family:Arial, Helvetica, sans-serif; font-size:110%;} .mycode{font-family: monospace; font-size:100%;} .ilc{font-family: monospace; font-size:100%;} .u14mnav{font-size:100%;} .u14logo{font-size:100%;} var h1Headers = document.getElementsByTagName("h1"); for (var i = 0; i< h1Headers.length; i++) { h1Headers[i].style.fontSize = '28px'; } In Part 1 of this series of tips on SQL Plan Management (SPM) and Oracle Database In-Memory, I covered what would happen if we have a SQL plan baseline for a full table scan query when the table was populating the In-Memory column store.  In this part I’m going to cover a scenario where a query has more than one SQL plan baseline:  There is a query (called Q2, for short). Q2 queries a table called MYSALES, which is not yet populating the In-Memory column store. Q2 filters rows in MYSALES using a predicate on the SALE_TYPE column. Data in SALE_TYPE is skewed, so there’s an index and a histogram on this column. Because there is data skew, Q2 has two accepted SQL plan baselines; one with a full table scan and one with an index range scan. You’ve probably come across this situation many times: the Oracle Optimizer must choose between a full table scan or an index range scan depending on predicate selectivity. The ability to change the execution plan based on the value of bind variables is called adaptive cursor sharing. If you’ve not come across that, then you’ll find it useful to check out the section on this topic in the Database SQL Tuning Guide. What’s great about SPM is that it allows you to have multiple SQL plan baselines for individual queries, so you're not forced to pick one plan in preference to another. This capability is most relevant in environments where SQL statements use bind variables and there is a good deal of data skew. Queries like this are likely to have their plans affected by Oracle In-Memory Database because in-memory full table scans will have a lower cost than storage-resident table scans. Clearly, the In-Memory column store will affect the point of inflection where a full table scan will become more efficient than an index range scan. How is this going to work with SPM?  Take a look at the following example. Q2 executes and matches 2 million rows because I picked the value of bind variable “:val” to do just that. The Optimizer chooses a full table scan:  PLAN_TABLE_OUTPUT ----------------- SQL_ID  d3u63rk540w0r, child number 1 ------------------------------------- select /* SPM */ count(*),sum(val) from mysales where sale_type = :val   Plan hash value: 3292460164   ------------------------------------------------------------------------------   Id  | Operation          | Name    | Rows  | Bytes | Cost (%CPU)| Time     | ------------------------------------------------------------------------------ |   0 | SELECT STATEMENT   |         |       |       |  2475 (100)|          | |   1 |  SORT AGGREGATE    |         |     1 |    17 |            |          | |*  2 |   TABLE ACCESS FULL| MYSALES |  2000K|    32M|  2475   (1)| 00:00:01 | ------------------------------------------------------------------------------   Predicate Information (identified by operation id): ---------------------------------------------------   2 - filter("SALE_TYPE"=:VAL) Note -----   - SQL plan baseline SQL_PLAN_93ct9zmnvtbuhc69cec1f used for this statement For the second execution, the value “:val” is set so that it would match only 20,001 rows. This time the Optimizer chooses an index range scan:  PLAN_TABLE_OUTPUT ----------------- SQL_ID  d3u63rk540w0r, child number 2 ------------------------------------- select /* SPM */ count(*),sum(val) from mysales where sale_type = :val   Plan hash value: 1266559460   ------------------------------------------------------------------------------------------------   Id  | Operation                            | Name    | Rows  | Bytes | Cost (%CPU)| Time     | ------------------------------------------------------------------------------------------------ |   0 | SELECT STATEMENT                     |         |       |       |   133 (100)|          | |   1 |  SORT AGGREGATE                      |         |     1 |    17 |            |          | |   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| MYSALES | 20001 |   332K|   133   (0)| 00:00:01 | |*  3 |    INDEX RANGE SCAN                  | SI      | 20001 |       |    44   (0)| 00:00:01 | ------------------------------------------------------------------------------------------------   Predicate Information (identified by operation id):a ---------------------------------------------------   3 - access("SALE_TYPE"=:VAL)   Note -----   - SQL plan baseline SQL_PLAN_93ct9zmnvtbuh5d8bf80c used for this statement   As you will have figured out, the Optimizer has calculated that the index is less efficient than a full table scan when Q2 matches a large number of rows (2 million in this case) so we have two viable SQL execution plans for this query. Before I ran the queries above, I accepted two SQL plan baselines for Q2. You can see in the “note” sections above that two different baselines are used (one ending in “80c” and one ending in “c1f”). They can be seen in the dba_sql_plan_baselines view:  SELECT plan_name,sql_text,enabled, accepted  FROM   dba_sql_plan_baselines WHERE  sql_text LIKE '%SPM%';   PLAN_NAME                           SQL_TEXT                                ENA ACC ----------------------------------- ----------------------------------      --- --- SQL_PLAN_93ct9zmnvtbuhc69cec1f      select /* SPM */ count(*),sum(val)      YES YES                                     from mysales where sale_type = :val            SQL_PLAN_93ct9zmnvtbuh5d8bf80c      select /* SPM */ count(*),sum(val)      YES YES                                     from mysales where sale_type = :val   We’re good shape here. The Optimizer is adapting the query execution plan to take into account bind variable values and data skew. What’s more, SPM is working with us and not against us because it is not forcing Q2 to use a single SQL execution plan. What happens if we populate MYSALES into the In-Memory column store?  -- Mark MYSALES with the In-Memory attribute ALTER TABLE mysales INMEMORY;   -- Access MYSALES to trigger population into In-Memory column store SELECT count(*) FROM mysales;   If we execute Q2 to match 2 million rows, the Optimizer continues to choose a full table scan:  PLAN_TABLE_OUTPUT ----------------- SQL_ID  d3u63rk540w0r, child number 1 ------------------------------------- select /* SPM */ count(*),sum(val) from mysales where sale_type = :val   Plan hash value: 3292460164   ---------------------------------------------------------------------------------------   Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     | --------------------------------------------------------------------------------------- |   0 | SELECT STATEMENT            |         |       |       |   115 (100)|          | |   1 |  SORT AGGREGATE             |         |     1 |    17 |            |          | |*  2 |   TABLE ACCESS INMEMORY FULL| MYSALES |  2000K|    32M|   115  (20)| 00:00:01 | ---------------------------------------------------------------------------------------   Predicate Information (identified by operation id): ---------------------------------------------------   2 - inmemory("SALE_TYPE"=:VAL)       filter("SALE_TYPE"=:VAL)   Note -----   - SQL plan baseline SQL_PLAN_93ct9zmnvtbuhc69cec1f used for this statement   The full table scan is now annotated with INMEMORY, so we know that some or all of the data for MYSALES is scanned via the In-Memory column store. The “note” section reports that the same baseline is being used as before (ending in “c1f”). This is good news, and it’s the scenario that was covered in Part 1 of this series.   What if we executed the query to match 20,001 rows? You can probably guess what’s coming; the Optimizer judges that the In-Memory scan is more efficient than the index range scan:  PLAN_TABLE_OUTPUT ----------------- SQL_ID  d3u63rk540w0r, child number 2 ------------------------------------- select /* SPM */ count(*),sum(val) from mysales where sale_type = :val   Plan hash value: 3292460164   ---------------------------------------------------------------------------------------   Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     | --------------------------------------------------------------------------------------- |   0 | SELECT STATEMENT            |         |       |       |   115 (100)|          | |   1 |  SORT AGGREGATE             |         |     1 |    17 |            |          | |*  2 |   TABLE ACCESS INMEMORY FULL| MYSALES | 20001 |   332K|   115  (20)| 00:00:01 | ---------------------------------------------------------------------------------------   Predicate Information (identified by operation id): ---------------------------------------------------   2 - inmemory("SALE_TYPE"=:VAL)       filter("SALE_TYPE"=:VAL)   Note -----   - SQL plan baseline SQL_PLAN_93ct9zmnvtbuhc69cec1f used for this statement   Since there is a SQL plan baseline that allows a full table scan to be used, Q2 can use this access method straight away and we get immediate benefit from scanning the In-Memory column store! Hold on a minute! Wasn’t that just a little bit too convenient? I arranged it so that there was a handy full-table-scan SQL plan baseline ready and waiting for when I "flipped the switch" and started using the In-Memory column store. This example might seem a little contrived, but it is a real-world example and I chose it to illustrate how SPM works together with both Oracle In-Memory Database and adaptive cursor sharing (and if you want more, there's an earlier blog on how adaptive cursor sharing interacts with SPM). If, instead, I had started out with a single baseline that specified an index range scan, then this is the plan that would have been used even after MYSALES populated the In-Memory column store (and we would not have had an INMEMORY FULL scan). That’s not a bad thing; it is exactly what plan stability means and it is how SPM is meant to work. In the example above I made use of a couple of SQL execution plans that were validated and accepted before I initiated the In-Memory column store. In the more general case, where the Optimizer identifies a brand new execution plan for use with the In-Memory column store, we might want to validate it before we allow the database to use it in our critical application. How can we do that? Happily, it's what SPM evolution was built for, and it goes all the way back to the initial scenario I mentioned in Part 1. I'll cover the details in Part 3.  If you want to try out this example for yourself, the scripts are in GitHub.

In Part 1 of this series of tips on SQL Plan Management (SPM) and Oracle Database In-Memory, I covered what would happen if we have a SQL plan baseline for a full table scan query when the table...

Oracle Optimizer

Tips on SQL Plan Management and Oracle Database In-Memory Part 1

html{font-size:45%;} tbody{font-family:Arial, Helvetica, sans-serif; font-size:110%;} .mycode{font-family: monospace; font-size:100%;} .ilc{font-family: monospace; font-size:100%;} .u14mnav{font-size:100%;} .u14logo{font-size:100%;} var h1Headers = document.getElementsByTagName("h1"); for (var i = 0; i< h1Headers.length; i++) { h1Headers[i].style.fontSize = '28px'; } If you follow Oracle’s In-Memory blog then you probably came across a post mentioning how you should use SQL Plan Management when you’re upgrading to Oracle Database In-Memory. Whether you have read that post or not, you might be wondering what will happen if you have some SQL plan baselines and you begin to populate the In-Memory column store with a bunch of tables as used by those baselines. That’s what this post is about. Well, in fact, I’m going to break the topic up into a few posts because (as ever!) there is a little bit of subtlety to cover. Luckily, this will make your life easier rather than more difficult because you can get immediate benefit from In-Memory even if you don’t evolve SQL plan baselines on day one.   When I started to think about this post I thought that I would start with the first scenario that probably comes to mind if you’re familiar with SQL Plan Management (SPM):  The Optimizer comes up with a new execution plan for a SQL statement because something has changed, and Oracle Database In-Memory would be a very good example of that!  If there’s a SQL plan baseline for the statement, the database will use the baseline execution plan and capture the new plan. Where appropriate, the new plan will be validated and accepted using SQL plan evolution.  I will get to that, but first it’s better to start with a couple of more subtle points. With this information in our back pocket it will be easier to understand (and explain) the more traditional aspects of SQL plan evolution in the context of Oracle Database In-Memory.  Here, I will cover the following example: There is a table called MYSALES that’s not yet populated into the In-Memory column store.  A query (called “Q1”) includes a full table scan of MYSALES. There is no index on the table that’s useful to Q1.  Q1 has an active SQL plan baseline. MYSALES is subsequently populated into the In-Memory column store. Let’s take a look at Q1 and its SQL execution plan before populating MYSALES into the In-Memory column store (and I'll explain the significance of the highlighted text further down)... SQL_ID  4ss4zbb813250, child number 0 ------------------------------------- SELECT /* SPM */ COUNT(*) FROM   mysales WHERE  val = 'X'   Plan hash value: 3292460164   ------------------------------------------------------------------------------   Id  | Operation          | Name    | Rows  | Bytes | Cost  %CPU | Time     | ------------------------------------------------------------------------------ |   0 | SELECT STATEMENT   |         |       |       |    69 (100)|          | |   1 |  SORT AGGREGATE    |         |     1 |     2 |            |          | |*  2 |   TABLE ACCESS FULL| MYSALES | 99991 |   195K|    69   (2)| 00:00:01 | ------------------------------------------------------------------------------   Predicate Information (identified by operation id): ---------------------------------------------------   2 - filter("VAL"='X')   Note -----   - SQL plan baseline SQL_PLAN_7469nmnn7nsu3c69cec1f used for this statement   Q1 performs a full table scan of MYSALES. The "note" section makes it clear that a SQL plan baseline is used. This is what that looks like:   SELECT PLAN_TABLE_OUTPUT FROM   V$SQL s, DBA_SQL_PLAN_BASELINES b,       TABLE(         DBMS_XPLAN.DISPLAY_SQL_PLAN_BASELINE(b.sql_handle,b.plan_name,'basic')       ) t WHERE  s.EXACT_MATCHING_SIGNATURE=b.SIGNATURE AND    b.PLAN_NAME=s.SQL_PLAN_BASELINE AND    s.SQL_ID='4ss4zbb813250';   -------------------------------------------------------------------------------- SQL handle: SQL_7219349d287a6343 SQL text: SELECT /* SPM */ COUNT(*) FROM   mysales WHERE  val = 'X' --------------------------------------------------------------------------------   -------------------------------------------------------------------------------- Plan name: SQL_PLAN_7469nmnn7nsu3c69cec1f         Plan id: 3332172831 Enabled: YES     Fixed: NO      Accepted: YES     Origin: AUTO-CAPTURE Plan rows: From dictionary --------------------------------------------------------------------------------   Plan hash value: 3292460164   --------------------------------------   Id  | Operation          | Name    | -------------------------------------- |   0 | SELECT STATEMENT   |         | |   1 |  SORT AGGREGATE    |         | |   2 |   TABLE ACCESS FULL| MYSALES | --------------------------------------   What happens if MYSALES is now populated into the In-Memory column store?  -- Mark MYSALES with the In-Memory attribute ALTER TABLE mysales INMEMORY;   -- Access MYSALES to trigger population into In-Memory column store SELECT count(*) FROM mysales;   Let’s rerun our query and examine the execution plan: SQL_ID  4ss4zbb813250, child number 1 ------------------------------------- SELECT /* SPM */ COUNT(*) FROM   mysales WHERE  val = 'X'   Plan hash value: 3292460164   ---------------------------------------------------------------------------------------   Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     | --------------------------------------------------------------------------------------- |   0 | SELECT STATEMENT            |         |       |       |     3 (100)|          | |   1 |  SORT AGGREGATE             |         |     1 |     2 |            |          | |*  2 |   TABLE ACCESS INMEMORY FULL| MYSALES |   100K|   195K|     3   (0)| 00:00:01 | ---------------------------------------------------------------------------------------   Predicate Information (identified by operation id): ---------------------------------------------------    2 - inmemory("VAL"='X')        filter("VAL"='X')   Note -----    - SQL plan baseline SQL_PLAN_7469nmnn7nsu3c69cec1f used for this statement   There is still a full table scan, but this time the query will read data from MYSALES via the In-Memory column store rather than the storage-resident table and, even better, the same SQL plan baseline is used. That was pretty easy! The Optimizer chose a full table scan in both cases, so the same SQL plan baseline was used both cases. The INMEMORY annotation for the full table scan is “for your information only”; it tells you that the query scanned some or all of the data for your table via the In-Memory column store but as far as the Optimizer is concerned it is “just” a full table scan, as the keyword INMEMORY does not affect the plan hash value, so it will match the existing the SQL plan baseline (above, you can see that the plan hash value is always "3292460164" ). Why do I say the INMEMORY keyword indicates some or all of the data for your table is scanned via the In-Memory column store? Remember until all of the data belonging to MYSALES has been populated into the In-Memory column store, Oracle will automatically pick up the rest of the data from wherever it resides. That could be from memory (e.g. the buffer cache) or from flash or from disk. It should be pretty obvious by now that if we decide to remove MYSALES from the In-Memory column store, the query will revert to scanning the storage-resident table and the plan will display “TABLE ACCESS FULL”.  This example is very simple, but the principle applies to queries that have the same execution plan for In-Memory versus non-In-Memory. What happens if there are execution plan changes and, in particular, if indexes are involved? Start by looking at Part 2. If you want to try out this example for yourself, the scripts are in GitHub.

If you follow Oracle’s In-Memory blog then you probably came across a post mentioning how you should use SQL Plan Management when you’re upgrading to Oracle Database In-Memory. Whether you have read...

Oracle Optimizer

What you need to know about SQL Plan Management and Auto Capture

Introduction SQL Plan Management (SPM) is an Oracle database feature that allows you to establish a set of SQL execution plans that will be used even if the database is subject to changes that would otherwise cause execution plan changes to occur. For example, you might have an end-of-day batch run that operates in a business context where there are extreme peaks and troughs in daily volume, or perhaps you are upgrading a database and want to be sure that plans are carried over (at least initially). You do not have to fix execution plans in stone with SPM, you can use plan evolution to automate the process of finding improved plans, improving performance in a controlled way and at your own pace. If you’re not familiar with SPM, a very good place to start is to take a look at Maria Colgan’s four-part blog post on the subject. It gives you all the tools you need to get started. If you are using SPM in Oracle Database 11gR2 or 12c, or if you are considering whether you should use it, then this blog post is for you. I decided to publish this post because I recently encountered a couple of environments that ran into, let’s say, “difficulties” with SPM when capturing SQL plan baselines automatically and continuously over a very long period of time (more than a year in fact). I’d like to give you a few pointers to avoid running into the same problems and why automatic SQL baseline capture was never intended to be used in that way. Capturing Baselines The first step in using SPM is to create SQL plan baselines. You can capture SQL plan baselines in a number of different ways, but the key idea is that you should capture a representative set of SQL statements for the workloads you want to have baselines for. Typically, this means that you will capture SQL plan baselines for a fixed period of time – perhaps during the end of quarter batch run or for a weeks-worth of query processing. One way to capture SQL plan baselines is to set the database parameter optimizer_capture_sql_plan_baselines to TRUE so that all previously unseen repeatable SQL statements executed against the database will be captured automatically to create new “accepted” SQL plan baselines. This “auto capture” mechanism is very easy to use, but there is often a temptation to leave it enabled for an indefinite period of time. Don’t do this! It is of course possible to do it, but it rarely offers any significant advantage over a more targeted approach and it has implications for space usage in the data dictionary that you need to be aware of. Why shouldn’t you continuously capture SQL plan baselines indefinitely? Capturing SQL plan baselines indefinitely means that the plan for every repeatable SQL statement must be stored in the data dictionary. Over the course of a year or more, a large database application can generate hundreds of thousands or even millions of repeatable SQL statements. This will not only consume a significant amount of space in the data dictionary, but you will probably collect a lot of “dead wood” that is unlikely to benefit you as much as you think. Also, if there are hundreds of thousands of baselines captured, it can take many hours to purge them if you are not regularly doing this. Checking how much space in SYSAUX tablespace is being used by SPM Check your alert logs to see if you have messages like this, because it is the first indication that you have exceeded the configured space budget percent for SPM: SPM: SMB space usage (99215979367) exceeds 10.000000% of SYSAUX size (1018594954366). This message doesn't necessarily signify that there is an urgent issue and the database will continue to work normally. That's why it can go unnoticed! Note also that it is only a warning, and the database does not enforce any hard upper limit for the space consumed by SPM (this might be changed/enhanced in a future version, but don’t take this as any product commitment and base any product purchase decision on it!). The data dictionary contains a number of tables that store all of the data associated with SPM. These tables and their associated indexes are stored in an area known as the SQL Management Base (SMB). This is the meaning of "SMB" in the alert log message, above. You can execute the following query to find out how much space the SMB is using: SELECT  occupant_desc, space_usage_kbytes FROM    v$sysaux_occupants WHERE   occupant_name='SQL_MANAGEMENT_BASE'; The tables and indexes associated with the SMB might be among the top space consumers inside the data dictionary if there are a very large number of baselines. Here’s an example of what that can look like: SELECT  * FROM (    SELECT  bytes/(1024*1024) MBytes, tablespace_name, segment_name, segment_type    FROM    dba_segments WHERE owner = 'SYS' ORDR BY bytes DESC) WHERE   rownum <= 20 /     MBYTES TABLESPACE_NAME   SEGMENT_NAME       SEGMENT_TYPE ---------- ----------------- ------------------ -------------      66735 SYSAUX            SQL$TEXT           TABLE      25420 SYSAUX            SQLOBJ$AUXDATA     TABLE      10089 SYSAUX            SQLOBJ$_PKEY       INDEX       9447 SYSAUX            I_SQLOBJ$NAME_TYPE INDEX       7665 SYSAUX            SQLOBJ$DATA_PKEY   INDEX          … Auto Purging SQL Plan Baselines The Oracle database includes an automatic mechanism for purging SPM baselines that have not been used for longer than the plan retention period, as identified by the LAST_EXECUTED column in the DBA_SQL_PLAN_BASELINES view (see below). Here's an example of how to set the retention period to 10 weeks: EXECUTE DBMS_SPM.CONFIGURE('plan_retention_weeks',10); The default retention period is 53 weeks (see sub-section 23.7.2 in the Oracle Database SQL Tuning Guide). Some 11g versions seem to suffer from a problem that auto-purging does not work successfully if there are a very large number of SQL plan baselines. If you have a large number of SQL plan baselines you might want to check out bug #16996646 (see My Oracle Support). Or simply use the work-around to execute the purge manually: VARIABLE ret NUMBER; BEGIN  :ret := sys.dbms_spm_internal.auto_purge_sql_plan_baseline; END; / PRINT ret; Identifying When Baselines Were Last Used The SMB records a LAST_EXECUTED time-stamp against every baseline, making it possible to find out when each one was last used by a SQL statement. For example, this query counts the number of query baselines, aggregating by the age of last execution in weeks: SELECT FLOOR(EXTRACT(DAY FROM (SYSDATE - last_executed)) / 7) AS "Weeks Since Last Execute",        COUNT(*) AS "Num Baselines" FROM   dba_sql_plan_baselines WHERE  accepted = 'YES' AND    autopurge = 'YES' GROUP BY FLOOR(EXTRACT(DAY FROM (SYSDATE - last_executed)) / 7) ORDER BY 1; Note that the “last executed” timestamp is not refreshed continuously but on a weekly basis, so expect to see the timestamps for active baselines to take a least a week to reflect that they have been used. The following query counts how many baselines have not been used for a period exceeding the plan retention period. For example, if you find SQL plan baselines that have not been used for longer than the retention time, then this can indicate that SQL plan baselines are not being auto purged successfully: SELECT count(*) AS " Purgeable Baselines Count" FROM   dba_sql_plan_baselines WHERE  autopurge = 'YES' AND    EXTRACT(DAY FROM (SYSDATE - last_executed)) / 7 >          (SELECT parameter_value           FROM   dba_sql_management_config           WHERE  parameter_name = 'PLAN_RETENTION_WEEKS'); If you find that you have a large number of SQL plan baselines that have not been used for a significant time, then it is worth considering purging them individually and (optionally) reclaiming the space that was used in the SYSAUX tablespace. See Doc IDs 790039.1 and 1499542.1 in My Oracle Support which cover how to do this. Alternatively, consider setting an appropriate retention period for the auto purge process and execute the purge procedure manually if you are using Oracle Database 11gR2. Don’t only clean up, but also spend a couple of minutes thinking about how you got such a large number of SQL plan baselines and why they were not used for a long time. Maybe blindly using auto-capture is the culprit, in which case you are asking for doing the same exercise in a year from now at the latest … Summary If you want to use automatic baseline capture then it is best to capture SQL plan baselines for representative and smoothly-running workloads for fixed periods of time. This is usually sufficient to achieve very good results. Nevertheless, if you do want to use if for periods of days or weeks, then it is particularly necessary to be aware of your SPM storage management settings and you should monitor your alert log for SPM space usage messages and use manual purge procedures if the auto purge job is not clearing down expired SQL plan baselines successfully. More can be found in the Oracle 12c Documentation and the white paper, SQL Plan Management with Oracle Database 12c.      

Introduction SQL Plan Management (SPM) is an Oracle database feature that allows you to establish a set of SQL execution plans that will be used even if the database is subject to changes that...

Oracle Optimizer

Space Management and Oracle Direct Path Load

Most of you will be familiar with the concept of direct path load and how it’s an efficient way to load large volumes of data into an Oracle database as well as being a great technique to use when moving and transforming data inside the database. It’s easy to use in conjunction with parallel execution too, so you can scale out and use multiple CPUs and database servers if you need to process more data in less time. Probably less well known is how the Oracle database manages space during direct path load operations. This is understandable because the Oracle database uses a variety of approaches and it has not always been very obvious which one it has chosen. The good news is that Oracle Database 12c from version 12.1.0.2 onwards makes this information visible in the SQL execution plan and it is also possible to understand what’s happening inside earlier releases with a bit of help from this post and some scripts I’ve linked to at the end. Why isn’t there a one-size-fits-all approach for space management? Simply put, direct path load has to work well in a wide variety of different circumstances. It is used to fill non-partitioned and partitioned tables, which means that it must work well with a small or large number of database segments. It must operate serially or in parallel, be confined to a single database instance or distributed across an entire RAC database.There are some subtle complexities to take into account too, such as data skew. For example, some table partitions might contain much more data than others. Successful parallel execution depends on the even distribution of workloads across a number of parallel execution servers. If this isn’t the case then some of the servers will finish early, leaving them with no useful work to do. This results in a low effective degree of parallelism, poor machine resource utilization and an extended elapsed time. Avoiding issues associated with data skew increases the complexity of space management because it’s usually not appropriate to simply map each parallel execution server to an individual database segment. The strategies Oracle uses are designed to achieve excellent scalability in a wide variety of circumstances, avoiding the extended run times associated with skewed datasets. In this post I will focus on how the strategies work without delving too deeply into how the database makes its choice. I will explain space management using Oracle Database 11g Release 2 as the baseline and then I will introduce the changes we’ve made in Oracle Database 12c. To put things in perspective, a full rack Oracle Exadata Database Machine is capable of loading data at over 20 terabytes an hour, so you can be sure that space management has received some attention! Even if you don’t have a system like that at your disposal, you will still benefit from the improvements and optimizations Oracle makes to keep your database operating at its full potential. High Water Mark (HWM) Loading High water mark (HWM) loading is perhaps the simplest and most intuitive data load mechanism. For example, it is the default approach if you load data serially into a table like this: ALTER SESSION DISABLE PARALLEL DML; INSERT /*+ APPEND */ INTO sales_copy t1 SELECT ...; The new data is inserted into extents allocated above the table (or partition) high water mark.  Rows above the high water mark will not be scanned by queries, so the new data remains invisible until the transaction is committed, whereupon the high water mark is moved to incorporate the extents containing the new data. High water mark loading can be used whenever an Oracle process has exclusive access to the segment. This happens for serial load, parallel loads which distribute by the partition key (PKEY distribution) and “equi-partition” loads.  For example, an equi-partition load will map parallel execution servers to particular partitions in the target table, where it will be responsible for loading all of the data from the corresponding partition in the source table. Equi-partition loading will be chosen automatically if you copy data in parallel between two equi-partitioned tables, but only if the tables have a relatively large number of partitions. There’s more on the producer/consumer model for parallel execution and the meaning of all of the terminology used here in the Oracle documentation and the parallel execution fundamentals white paper. The following diagram represents a serial high water mark load on a 2-node RAC cluster. Rows are shown filling from left-to-right and on commit, the high water mark is moved up to the right to incorporate the new data: Temp Segment Merge (TSM) Loading Temp segment merge (TSM) loading was the first mechanism used to achieve parallel data loading in Oracle and is the default mechanism for parallel loads into single segments in Oracle Database 11g Release 2 (although AutoDOP uses something different, see later). For example, the following parallel load statements in Oracle Database 11g will result in a temp segment merge load: Parallel Create Table As Select (PCTAS) for a non-partitioned table: CREATE TABLE sales_copy PARALLEL 8 AS SELECT * FROM sales; Parallel Insert Direct Load (PIDL) into a non-partitioned table or a single partition: INSERT /*+ APPEND PARALLEL(t1,8) */ INTO sales_copy_nonpart t1 SELECT ...; INSERT /*+ APPEND PARALLEL(t1,8) */ INTO sales_copy partition (part_p1) t1 SELECT ...; When a parallel data load begins, each parallel execution server producer is allocated to a single temporary segment to populate. The temporary segments will reside in the same tablespace as the table or partition being loaded, and at commit time the temporary segments will be merged with the base segment by manipulating the extent map. In other words, the temporary segments will be incorporated into the table or partition without moving the data a second time. The following diagram represents a parallel degree 4, temp segment merge load on a 2-node RAC cluster: Temp segment merge loading scales very well with degree of parallelism (DOP) because each parallel execution server (PX server) is dedicated to its own temporary segment, so there is a good deal of isolation between each of the processes that are loading data. You can expect that a higher DOP will give you more performance until (usually) CPU or storage becomes the limiting factor. However, there is a potential downside; each temporary segment will become at least one table extent when it is made part of the table or partition. For example, if we load at DOP 16 then we’ll usually add at least 16 new extents to the table even if the number of rows is relatively small. It is important to be aware of this for the following reasons: Segment extent maps are read by query coordinators when parallel execution is initiated, so tables or partitions that have an extremely large number of extents can take longer to query and update in parallel than an identical table or partition with fewer extents. Extent trimming can sometimes leave gaps between extents that may be too small for new extents to be created and used. This is commonly referred to as external fragmentation. If you are using UNIFORM extent allocation then extents will not be trimmed. In this case, the final extent created for each temporary segment will be partially empty (unless the new rows just happen to fit exactly into the allocated extent). Statistically, 50% of the space for the last extent in each temporary segment will be empty when the load completes. For example, a temp segment merge load of DOP 8 into 1MB uniform extents will, on average, under populate the final extents in each of the 8 temporary segments by 0.5MB (1MB*50%) -  or 4MB in total (0.5MB * 8). If data is only loaded into a table using direct path load, then this space will remain unused. This is commonly referred to as internal fragmentation and in this example, 4MB of space per load is effectively wasted unless rows are also loaded into the segment using conventional path INSERT. These downsides can be mitigated if you load large volumes of data relatively infrequently rather than small amounts of data frequently. High Water Mark Brokering (HWMB) High water mark brokering (HWMB) may be used if multiple PX servers can potentially fill the same table or partition segment. It is often used when executing direct path loads into multiple partitions for both Oracle Database 11g and 12c, like this: INSERT /*+ APPEND PARALLEL(t1,8) */ INTO sales_copy_partitioned t1 SELECT ...; It is also the default for single-segment loads in Oracle Database 11g if Auto DOP is used (i.e. a load into a non-partitioned table or a specific table partition). In common with high water mark loading, new row data is added directly to base segments above the high water mark. Depending on the DOP and the type of parallel execution plan chosen by the Oracle Optimizer, multiple PX servers may need to insert row data into the same database segment. In this situation, it becomes necessary for the new position of the high water mark to be coordinated or “brokered” between multiple processes and even multiple database servers. Brokering is implemented using database HV enqueues. Each segment has its own HV enqueue, which is used to record the position that the high water mark must be moved to once the transaction is committed. The enqueue ensures that multiple processes can’t update the high water mark position value at the same time. The following diagram represents a DOP 4 high water mark brokered load on a single instance. Rows are shown filling from left-to-right, and on commit the high water mark is moved up to the right to incorporate the new data: In general, HWMB results in fewer extents being added to the table (or partition) segment than temp segment merging. This advantage is especially significant in systems that load data frequently and those that leverage high DOP to reduce data load elapsed times. Oracle Database 12c:  Hybrid TSM/HWMB The changes we have made in Oracle Database 12c release 12.1.0.1 improve the scalability of parallel direct path load, particularly when it’s used to populate single database segments such as non-partitioned tables or individual table partitions. This change is particularly important in high performance environments which move and transform data between non-partitioned tables and individual table partitions. It is also beneficial in environments that make use of partition exchange loading, where non-partitioned tables are populated before being incorporated into a partitioned table using a partition exchange operation. Before we look at how the new approach works, here’s an Oracle Database 12c example that demonstrates the new behavior: INSERT /*+ APPEND PARALLEL(t1,8) */ INTO   sales_copy t1              /* sales_copy is not partitioned */ SELECT /*+ PARALLEL(t2,8) */ * FROM sales t2; The execution plan in 12.1.0.2 will look something like this: ---------------------------------------------------------------------------------------------------------------------------- | Id  | Operation                          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib | ---------------------------------------------------------------------------------------------------------------------------- |   0 | INSERT STATEMENT                   |          |       |       |    74 (100)|          |        |      |            | |   1 |  PX COORDINATOR                    |          |       |       |            |          |        |      |            | |   2 |   PX SEND QC (RANDOM)              | :TQ10000 |   999K|  8789K|    74   (2)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  | |   3 |    LOAD AS SELECT (HYBRID TSM/HWMB)|          |       |       |            |          |  Q1,00 | PCWP |            | |   4 |     OPTIMIZER STATISTICS GATHERING |          |   999K|  8789K|    74   (2)| 00:00:01 |  Q1,00 | PCWP |            | |   5 |      PX BLOCK ITERATOR             |          |   999K|  8789K|    74   (2)| 00:00:01 |  Q1,00 | PCWC |            | |*  6 |       TABLE ACCESS FULL            | SALES    |   999K|  8789K|    74   (2)| 00:00:01 |  Q1,00 | PCWP |            | ---------------------------------------------------------------------------------------------------------------------------- There is a decoration in the SQL execution plan that you won’t have seen prior to Oracle Database 12c: HYBRID TSM/HWMB. As the name suggests, this is a hybrid solution that combines the beneficial characteristics of temp segment merge and high water mark brokering. Note that Oracle database release 12.1.0.1 will use the same execution plan but it won't show the new decoration. The primary motivation for this new approach is that high water mark brokering has a disadvantage if a large DOP is used to insert data into a single database segment. The single HV enqueue will need to be updated frequently, introducing a potential point of contention which will become more significant if PX servers are distributed across a Real Application Clusters (RAC) environment. Serializing access to a “busy” HV enqueue over an entire cluster can be time consuming, so in Oracle Database 12c we combine high water mark brokering with temp segment merge to eliminate cross-instance contention for HV enqueues while keeping down the number of new extents that need to be created per load operation. I will illustrate how Hybrid TSM/HWMB works when parallel loading data into a non-partitioned table or single partition, since this is where you are most likely to encounter the optimization. When the parallel load operation is initiated, a temporary segment is created for each instance involved in the load (there will be two segments for a 2-node RAC cluster if the parallel insert operation is distributed across the cluster). Each temporary segment can be loaded by multiple PX servers, but the database will ensure that each temporary segment will be loaded by PX servers from a single instance. In this way, the HV enqueue associated with each temporary segment is brokered locally for each database instance and does not need to be brokered across the entire cluster. The following diagram represents a DOP 4 Hybrid TSM/HWMB load on a 2-node RAC cluster: To summarize, the benefits of Oracle Database 12c Hybrid TSM/HWMB are: Improved scalability; especially significant in high DOP and RAC deployments. Fewer table extents; especially significant if frequent, high DOP data loads are used. Reduced risk of “bloating” table extent maps and suffering from an associated degradation in PX performance. In Oracle Database 12c: Parallel create table as select, INSERT and MERGE operations use Hybrid TSM/HWMB instead of temp segment merge for single segment loads. It is used in both Auto DOP and manual DOP environments. Temp segment merge continues to be available for some partitioned table parallel create table as select operations because this approach is highly scalable and (since it is a one-time operation) it avoids the potential downsides related to space fragmentation and extent bloat. Hybrid TSM/HWMB is used instead of temp segment merge for some partitioned table parallel create table as select operations. Space management decorations are clearly shown in execution plans for parallel load operations from database version 12.1.0.2. LOAD AS SELECT is decorated with “TEMP SEGMENT MERGE”, “HIGH WATER MARK BROKERED”, “HIGH WATER MARK”, “HYBRID TSM/HWMB” or “EQUI-PARTITION”. Examples and Proof Points If you’d like to see some examples, or would like to try this out for yourself, I’ve uploaded some scripts to GitHub. In particular you will see a comparison between Oracle Database 11g and 12c, showing the reduced number of extents created per load in 12c. If you have any questions or problems with the scripts, or if you think they can be improved then let me know in the comments and I will update them. Final Word This post is in memory of my colleague Allen Brumm.  

Most of you will be familiar with the concept of direct path load and how it’s an efficient way to load large volumes of data into an Oracle database as well as being a great technique to use when...

12c

Controlling Access To The In-Memory Column Store Via Hints

It’s been almost a year since I’ve had an opportunity to write a new blog post here due to my crazy schedule now that I'm the In-Memory Maven (thanks to Doug Burns & Graham Wood for the new and improved title). But this morning while I was writing a post for the In-Memory blog about controlling the use of the In-Memory column store (IM column store), I realized that the content on Optimizer hints really belonged over here. So, I decided to split the blog post in two. I’ve put the information on the initialization parameter that control the use of the IM column store on the In-Memory blog and the information about the Optimizer hint that control the use of the IM column store here. The first hint we need to discuss is the (NO_)INMEMORY hint. This hint is one of the most misunderstood hints in recent releases. I’ve heard a number of different explanations about what this hint is suppose to do, most of which aren’t true. The only thing the INMEMORY hint does is enables the IM column store to be used when the INMEMORY_QUERY parameter is set to DISABLE. It won’t force a table or partition without the INMEMORY attribute to be populated into the IM column store. If you specify the INMEMORY hint in a SQL statement where none of the tables referenced in the statement are populated into memory, the hint will be treated as a comment since its not applicable to this SQL statement. Nor will the INMEMORY hint force a full table scan via the IM column store to be chosen, if the default plan (lowest cost plan) is an index access plan. You will need to specify the FULL hint to see that plan change take effect. Let’s take a look at the INMEMORY hint in action. In this example I’m using a simple query that asks the question, “What is the most expensive order we have received to date?” SELECT Max(lo_ordtotalprice) most_expensive_order FROM lineorder; The LINEORDERS table has 23,996,604 rows and has been fully populated into the IM column store. In memory it's comprised of 44 IMCUs (In-Memory Compression Units). I’ve started a new session and set the INMEMORY_QUERY parameter to DISABLE. If I execute our simple query, you’ll notice the elapse time is over 36 seconds and the execution plan chosen is a full table scan but it wasn’t executed via the IM column store (keyword INMEMORY is missing from the TABLE ACCEESS FULL operation). Now in the same session, let’s execute the query again, this time using the In-Memory hint. By specifying the hint we were able to reduce the response time down to a sub-second and force the full table scan to occur in the IM column store. The NO_INMEMORY hint does the same thing in reverse. It will prevent the access of an object from the IM column store; even if the object is full populated into the column store and the plan with the lowest cost is a full table scan. Let’s look at another example, just to ensure I’ve cleared up any confusion about what the INMEMORY hint actually controls. In this example let’s looking for a specific order in the LINEORDER table based on the order key. SELECT lo_orderkey, lo_custkey, lo_revenue FROM lineorder WHERE  lo_orderkey = 5000000;  A b-tree index exists on the LO_ORDERKEY column. The LINEORDER table is fully populated into the IM column store and the INMEMORY_QUERY parameter is set to the default ENABLE.  The lowest cost plan (default plan) is an INDEX RANGE SCAN followed by a TABLE ACCESS BY ROWID RANGE, as shown below. Now let's add the INMEMORY hint to see if we can change the plan, so we can take advantage of the In-Memory column. As I said before, specifying the INMEMORY hint is not the way to influence the access method chosen in a plan. To change the access method to a FULL TABLE SCAN we need to specify the FULL hint. With the FULL hint the plan changes to allow the IM column store to be used but if we examine the cost of this new plan it is far more expense than the original plan, which is why it wasn't chosen by default. The second hint introduced with Oracle Database In-Memory is (NO_)INMEMORY_PRUNING, which controls the use of In-Memory storage indexes. By default every query executed against the IM column store can take advantage of the In-Memory storage indexes (IM storage indxes), which enable data pruning to occur based on the filter predicates supplied in a SQL statement. As with most hints, the INMEMORY_PRUNING hint was introduced to help test the new functionality. In other words the hint was originally introduced to disable the IM storage indexes. Let's look at an example, so you can see the hint in action. If we examine the new In-Memory session statistics (IM session statistics) for our previous query, we can see that 41 of the 44 IMCUs were pruned and only 3 IMCUs (44 - 41)were actually scanned to answer the query. Let's start a new session, rerun the query with the NO_INMEMORY_PRUNING hint and examine the IM session statistics again. After disabling the IM storage indexes with the NO_INMEMORY_PRUNING hint, the elapse time for the query has gone up and if we examine the IM statistics you can see that neither the IM scan CU pruned nor the IM scan segments minmax eligible statistics have been incremented for this execution. Hopefully this post has helped clear up some of the confusion around what the new In-Memory hints control.

It’s been almost a year since I’ve had an opportunity to write a new blog post here due to my crazy schedule now that I'm the In-Memory Maven (thanks to Doug Burns & Graham Wood for the new and...

Oracle OpenWorld

Oracle OpenWorld 2013 - Time to plan your schedule

There are only two weeks to go until Oracle Open World, the largest gathering of Oracle customers, partners, developers, and technology enthusiasts, which begins on September 22nd in San Francisco. Of course the Optimizer development group will be there and you will have multiple opportunities to meet the team, in one of our technical sessions, or at the Oracle demogrounds. This year the Optimizer team has 2 technical sessions; Monday, September 23rd at 3:15pm  Oracle Optimizer: What’s New in Oracle Database 12c? Session CON8622 at Moscone North - room 131 This session provides a deep-dive explanation of how the new adaptive approach to query optimization works. We will use a real-world demo to show the different components of the adaptive model and how they help the optimizer learn additional information during query execution that can be used to improve not only the performance of the current SQL statement but all statements. Tuesday, September 24th at 5:15pm Oracle Optimizer Bootcamp: 10 Optimizer tips you can't do withoutSession CON8643 at Moscone South - room 104 Jonathan Lewis will join us to explain the process of analyzing and solving 10 of the most common SQL execution performance problems. These problems include poor cardinality estimates, bind peeking issues, the selection of sub-optimal access methods, and many more. Through clear how-to examples, you will learn how to identify and quickly resolve these issues and add 10 new tricks to your SQL tuning arsenal. If you have burning Optimizer or statistics related questions, you can ask them at the Optimizer demo booth in the Database area of the demogrounds. Members of the Optimizer development team will be there Monday to Wednesday from 9:45am until 5:30pm. You may even be able to pick up an optimizer bumper sticker. The full searchable OOW catalog is on-line, or you can browse the speakers by name. So start planning your trip today! Maria Colgan+

There are only two weeks to go until Oracle Open World, the largest gathering of Oracle customers, partners, developers, and technology enthusiasts, which begins on September 22nd in San Francisco....

12c

What's new in 12c: Adaptive joins

As we promised in our previous post, we are starting a blog series describing all of new Optimizer and statistics related functionality on Oracle Database 12c. We begin the series with an in-depth look at adaptive plans, one of the key features in the new adaptive query optimization framework.  I want to thank Allison Lee for all her help with this post. The goal of adaptive plans is to avoid catastrophic behavior of bad plans on the first execution. If we detect during execution that the optimizer’s cardinality estimates were wrong, then some plan choices can be changed “on the fly” to better options. While we can’t completely change the plan during execution, there are certain local decisions that can be changed, like join method. In this post, we’ll introduce the concepts and terminology related to adaptive plans, and then goes through an example in detail. Concepts and Terminology An adaptive plan allows certain decisions in a plan to be postponed until runtime, in case runtime conditions different from optimizer assumptions. For the purposes of explaining concepts, we will consider a plan for a simple two table join, where the join method is adapted. The diagram below shows the two options for this plan. An adaptive plan consists of a default plan, which is the plan that the optimizer picks based on the current statistics, as well as alternatives to various portions of the plan. In our example join, let’s assume the default is the nested loops plan, and the alternative is the hash join. Each alternative portion of a plan is referred to as a “subplan”. A subplan is a set of related operations in a plan. For instance, the nested loops operation and the index scan consist of one subplan; the alternative subplan consists of the hash join node and the table scan on the right of it.For each decision that can be adapted, the plan contains two or more alternative subplans. During execution, one of those alternatives is chosen to be used, in a process called “adaptive plan resolution”. Adaptive plan resolution occurs on the first execution of a plan; once the plan is resolved, future executions of the same child cursor will use the same plan. In order to resolve the plan, statistics are collected at various points during execution. The statistics collected during one part of execution are used to resolves parts of the plan that run later. For instance, statistics can be collected during the scan of table T1, and based on those statistics, we can choose the right join method for the join between T1 and T2. The statistics are collected using a “statistics collector”. Since the join of T1 to T2, and the scan of T1 would typically be pipelined, buffering is required in order to collect the statistics, resolve the choice of join method, and then perform the join. Some plan decisions can be adapted without buffering rows, but for adaptive joins, we require a “buffering statistics collector”. The optimizer determines what statistics are to be collected, and how the plan should be resolved for different values of the statistics. The optimizer computes an “inflection point” which is the value of the statistic where the two plan choices are equally good. For instance, if the nested loops join is optimal when the scan of T1 produces fewer than 10 rows, and the hash join is optimal when the scan of T1 produces more than 10 rows, then the inflection point for these two plans is 10. The optimizer computes this value, and configures a buffering statistics collector to buffer and count up to 10 rows. If at least 10 rows are produced by the scan, then the join method is resolved to hash join; otherwise it is resolved to nested loops join. We refer to the plan that is chosen by resolution as the “final plan”. The plan that the optimizer expects to be chosen (based on its estimates) is the “default plan”. The physical plan actually contains all of the operations from all of the subplan options; we refer to this as the “full plan”. As the plan is resolved, the plan hash value changes to indicate the new choice of plan. The plan that is displayed by our plan display APIs (in DBMS_XPLAN) changes as the plan is resolved. At any given point, some plan decisions may have been resolved, while others have not. For the unresolved plan choices, the plan display APIs show the plan that is expected by the optimizer (based on its estimates). When EXPLAIN PLAN generates a query plan, none of the adaptive subplans have been resolved, so we see the default plan when displaying the plan through DBMS_XPLAN.DISPLAY. For example, if the optimizer thinks that nested loops join plan is best, then EXPLAIN PLAN and DBMS_XPLAN.DISPLAY would display the nested loops join plan as shown below. During execution, suppose that the plan resolves to a hash join instead. Then the plan displayed by DBMS_XPLAN.DISPLAY_CURSOR would show the final plan, containing the hash join with full table scan on the right. Our next post will contain a real-world example of Adaptive joins in action.

As we promised in our previous post, we are starting a blog series describing all of new Optimizer and statistics related functionality on Oracle Database 12c. We begin the series with an in-depth...

Upcoming events

Upcoming event: Kscope 13

Kscope 13, is the annual conference of the Oracle Development Tools User Group, taking place  June 23 - June 27, in New Orleans. This is a great conference for Oracle developers and architects, offering some of the best content by renowned experts. I am lucky enough to be involved in five sessions this year around the Oracle Optimizer and performance. Below are details on the sessions I will be presenting. I hope you have an opportunity to check out some of these sessions if you plan to attend the conference! Sunday, June 24st at 2:00pm What's new with the Optimizer This session is part of the Database and Developers SIG that is happening on Sunday. In this session I will demonstrate what you can expect from the latest version of the Optimizer and provide details on how the different plan influencing features work together. Monday, June 24th at 9:45 am Optimizer Hints: Tops Tips for Understanding and Using ThemI am kicking things off early on Monday morning with this session on Optimizer hints. I plan to discuss in detail how Optimizer hints are interpreted, when they should be used, why they appear to be ignored and what you can do if you have inherited a hint ridden application. Wednesday June 26th at 1:45 pm Hands-on Training: How to Prevent Suboptimal Execution PlansCome join me and get your hands dirty with this great Optimizer lab where you will have an opportunity to analyze and resolve some of the most common SQL execution performance problems in 4 detailed exercises. The VM for this lab can be downloaded here. Wednesday June 26th at 3:00 pm Oracle Optimizer: An Insider's View of How it WorksIn this session I will discuss how with each new release the Optimizer evolves as we strive to find the optimal execution plan for every SQL statement and will provide a detailed explanation of  how the latest version of the Optimizer works and the best ways you can influence its decisions. Thursday June 27th at 9:45 am Exadata & the Optimizer The untold storySince the introduction of the Exadata platform, 4 years ago, people have been asking how should they tune for it. This session explains, with the use of detailed examples, how the Optimizer behaves on Exadata and what information you should supply to ensure an optimal execution plan is selected every time. I hope to see you are there! +Maria Colgan

Kscope 13, is the annual conference of the Oracle Development Tools User Group, taking place  June 23 - June 27, in New Orleans. This is a great conference for Oracle developers and architects,...

Statistics

How does the METHOD_OPT parameter work?

In last week’s post we addressed the most popular question surrounding statistics gathering, what sample size should be used? Once that decision has been made, folks typically turn their attention to histograms and the time honored question of, do I need histograms or not? Regardless of which side of this religious debate you come down on, you are going to need to figure out what to set the METHOD_OPT parameter to. This post explains in detail what the METHOD_OPT parameter controls and how it can be used to influence which columns get statistics and what type of statistics they get.The METHOD_OPT parameter is probably the most misunderstood parameter in the DBMS_STATS.GATHER_*_STATS procedures. It’s most commonly known as the parameter that controls the creation of histograms but it actually does so much more than that. The METHOD_OPT parameter actually controls the following, which columns will or will not have base column statistics gathered on them the histogram creation, the creation of extended statistics The METHOD_OPT parameter syntax is made up of multiple parts. The first two parts are mandatory and are broken down in the diagram below. The leading part of the METHOD_OPT syntax controls which columns will have base column statistics (min, max, NDV, number of nulls, etc) gathered on them. The default, FOR ALL COLUMNS, will collects base column statistics for all of the columns (including hidden columns) in the table.  The alternative values limit the collection of base column statistics as follows; FOR ALL INDEXED COLUMNS limits base column gathering to only those columns that are included in an index. This value is not recommended as it is highly unlikely that only index columns will be used in the select list, where clause predicates, and group by clause of all of the SQL statement executed in the environment. FOR ALL HIDDEN COLUMNS limits base column statistics gathering to only the virtual columns that have been created on a table. This means none of the actual columns in the table will have any column statistics gathered on them. Again this value is not recommended for general statistics gathering purposes. It should only be used when statistics on the base table columns are accurate and a new virtual column(s) has been created (e.g. a new column group is created). Then gathering statistics in this mode will gather statistics on the new virtual columns without re-gathering statistics on the base columns. Note that if a column is not included in the list to have statistics gathered on it, then only its average column length is gathered. The average column length is used to correctly compute average row length and discarded (i.e., not saved to disk) after use.The SIZE part of the METHOD_OPT syntax controls the creation of histograms and can have the following settings;AUTO means Oracle will automatically determines the columns that need histograms based on the column usage information (SYS.COL_USAGE$), and the presence of a data skew. An integer value indicates that a histogram will be created with at most the specified number of buckets. Must be in the range [1,254]. To force histogram creation it is recommend that the number of buckets be left at 254. Note SIZE 1 means no histogram will be created.REPEAT ensures a histogram will only be created for any column that already has one. If the table is a partitioned table, repeat ensures a histogram will be created for a column that already has one on the global level.  However, this is not a recommended setting, as the number of buckets currently in each histogram will limit the maximum number of buckets used for the newly created histograms. Lets assume there are 5 buckets currently in a histogram. When the histogram is re-gathered with SIZE REPEAT, the newly created histogram will use at most 5 buckets and may not been of good quality.   SKEWONLY automatically creates a histogram on any column that shows a skew in its data distribution. If the default value of METHOD_OPT parameter, FOR ALL COLUMNS SIZE AUTO, doesn’t work in your particular environment then you most likely fall into one of the following categories, Automatically create a histogram on any column in the table except a specific column Only create a histogram on this specific column(s) In an earlier blog post we showed an example of how to prevent Oracle from creating a histogram on a specific column, so let’s look at option 2, creating a histogram only on specific columns.  Let’s assume we only wanted to create a histogram on the CUST_ID of the SALES table. Remember, the METHOD_OPT parameter controls which columns get basic statistics as well as which columns get histograms, so we need to think about setting the METHOD_OPT parameter in two parts. The first part will specify which columns will have base statistics gathered on them. In this case we wanted all of the columns in the table to have base statistics so we should use FOR ALL COLUMNS.  But what about the SIZE part of the parameter? We only want a histogram on one column, so for this leading edge of the parameter setting we need to specify SIZE 1 to prevent a histogram from being created on any column. The second part of the parameter setting needs to specify that a histogram is needed on the CUST_ID column. This is achieved using an additional part of the METHOD_OPT syntax that we haven’t mentioned yet, called FOR COLUMNS. The FOR COLUMNS syntax allows us to provide explicit instructions for specific columns listed in this part of the parameter setting. In this case we would use, FOR COLUMNS SIZE 254 CUST_ID to specify we need a histogram on the CUST_ID column. So the final METHOD_OPT parameter setting will be; Below is the DBMS_STATS.GATHER_TABLE_STATS command in action. The only column that has a histogram created on it is the CUST_ID, even though several of the columns in the SALES table were used in the where clause of queries executed on the system. BEGIN dbms_stats.Gather_table_stats('SH', 'SALES', - method_opt => 'FOR ALL COLUMNS SIZE 1 FOR COLUMNS SIZE 254 CUST_ID'); END; /PL/SQL procedure successfully completed. SELECT column_name, num_distinct, histogram FROM user_tab_col_statistics WHERE table_name = 'SALES';   There are several other cases where you may need to provide a more complex setting for the METHOD_OPT parameter. Take for example a scenario where you don’t want any form of statistics on a column. There is no way to tell Oracle don’t gather statistics on a particular column, so you need to do the reverse and explicitly list the column you want to have statistics gathered on using the FOR COLUMNS syntax. Let’s assume we don’t want statistics of any kind on the PROD_ID column of the SALES TABLE. Then the DBMS_STATS.GATHER_TABLE_STATS command would be as follows; BEGIN dbms_stats.delete_column_stats('SH', 'SALES', 'PROD_ID'); END; / PL/SQL procedure completed successfully. BEGIN dbms_stats.Gather_table_stats('SH', 'SALES',- method_opt => 'FOR COLUMNS SIZE 254 CUST_ID TIME_ID CHANNEL_ID PROMO_ID QUANTITY_SOLD AMOUNT_SOLD'); END; /PL/SQL procedure completed successfully. -- The average row length still got recorded accurately even though we did not gather statistics on the PROD_ID column. SELECT num_rows, avg_row_len FROM user_tables WHERE table_name = 'SALES';  SELECT column_name, num_distinct, histogram FROM user_tab_col_statistics WHERE table_name = 'SALES';  Note the FOR COLUMNS syntax is only valid for the GATHER_TABLE_STATS procedure. Finally, at the start of this post I mentioned that the METHOD_OPT parameter can also be used to create extended statistics. Extended statistics encompasses two additional types of column statistics; column groups and expression statistics. In the example below, a column group will be automatically created on the PROD_ID and CUST_ID column in the SALES table. It will be given a system-generated name and will have all of the base column statistics gathered on it. BEGINdbms_stats.Gather_table_stats('SH', 'SALES',method_opt => 'FOR ALL COLUMNS SIZE 254 FOR COLUMNS SIZE 254(PROD_ID, CUST_ID)');END;/ PL/SQL procedure successfully completed.  SELECT column_name, num_distinct, histogram FROM user_tab_col_statistics WHERE table_name = 'SALES';  Rather than specifying the METHOD_OPT parameter in the statistics gathering command it is highly recommended that you specify any non-default value for the METHOD_OPT via DBMS_STATS.SET_TABLE_PREFS. BEGIN dbms_stats.Set_table_prefs('SH', 'SALES', 'METHOD_OPT', - 'FOR ALL COLUMNS SIZE 254 FOR COLUMNS SIZE 1 PROD_ID');END;/  In summary: The METHOD_OPT parameter in the GATHER_DICTIONARY_STATS, GATHER_DATABASE_STATS, and GATHER_SCHEMA_STATS procedures only accepts 'FOR ALL [INDEXED|HIDDEN] columns' syntax. No specific column names can be specified. When used in the GATHER_TABLE_STATS procedure, the METHOD_OPT parameter can accept an additional argument in the form of ‘FOR columns …'. Using this syntax allows you to controls; which columns to gather basic statistics which columns to gather histograms and the bucket size which extended statistics to create +Maria Colgan

In last week’s post we addressed the most popular question surrounding statistics gathering, what sample size should be used? Once that decision has been made, folks typically turn their attention to...

Statistics

How does AUTO_SAMPLE_SIZE work in Oracle Database 11g?

Note that if you're interested in learning about Oracle Database 12c, there's an updated version of this post here. When it comes to gathering statistics, one of the most critical decisions you have to make is, what sample size should be used? A 100% sample will ensure accurate statistics but could take a really long time. Whereas a 1% sample will finish quickly but could result in poor statistics. The ESTIMATE_PERCENT parameter in the DBMS_STATS.GATHER_*_STATS procedures controls the sample size used when gathering statistics and its default value is AUTO_SAMPLE_SIZE. In an earlier blog post, we talked about the new implementation of AUTO_SAMPLE_SIZE in Oracle Database 11g in terms of its improvements in the speed and accuracy of statistics gathering compared to the old AUTO_SAMPLE_SIZE prior to Oracle Database 11g. In this post, we will offer a closer look at the how the new AUTO_SAMPLE_SIZE algorithm works and how it affects the accuracy of the statistics being gathered. Before we delve into how the new algorithm works, let us briefly recap how the old algorithm works and its downsides. The old AUTO_SAMPLE_SIZE used the following approach: Step 1. Oracle starts with a small sampling percentage. If histograms need to be gathered, Oracle might materialize the sample, depending on the sampling percentage. Step 2. Oracle gathers basic column statistics on the sample. For example, suppose a table T has only one column C1, then the basic stats gathering query looks like below (this is not the exact syntax we use but a simplified version for illustration purpose): Query 1 Query Gathering Basic Column Statistics Using AUTO_SAMPLE_SIZE Prior to 11g The select list items in the query correspond to number of rows in table T, number of non-null values, number of distinct values, total column length, minimal and maximal values of column C1 respectively. “X.0000000000” in the FROM clause is the sampling percentage determined by Oracle. Step 3: if histograms need to be gathered, Oracle issues a SQL query on the sample for each column that requires a histogram. Step 4: For each column that requires a histogram, Oracle uses several metrics to determine whether the current sample is sufficient: Non-null value metric: Whether the sample contains sufficient non-null values of this column; NDV metric: Whether number of distinct values (NDV) can be properly scaled from the sample. Step 5: If all metrics in step 4 pass, Oracle concludes that the current sample size is sufficient and the histogram creation for that column is complete. Otherwise, it bumps up the sample size and goes though the above steps again until it finds a satisfactory sample or reaches 100% sampling. Note that step 3 to step 5 are done per column. For example, if there are 3 columns in the table that require histograms. In the first iteration, we get a sample and materialize it. We issue 3 queries, one per column, on the same materialized sample to gather histograms. Suppose Oracle determines that the sample is sufficient for columns 1 and 2 but insufficient for column 3. Then we bump up the sample size. In the second iteration, only 1 query is issued on the sample to gather histogram for column 3. As you can see the old AUTO_SAMPLE_SIZE can be inefficient if several iterations are required. A dominating contributor for several iterations is the inability to gather accurate NDVs using a small sample. If there is a skew in the data, a lot of low frequency values may not make into the sample and thus the sample fails the NDV metric. In Oracle Database 11g, we use a completely different approach for gathering basic column statistics. We issue the following query to gather basic column statistics (again this is a simplified version for illustration purpose). Query 2: Query Gathering Basic Column Statistics Using AUTO_SAMPLE_SIZE in 11g You will notice in the new basic column statistics gathering query, no sampling clause is used. Instead we do a full table scan. Also, there is no more count(distinct C1) to gather NDV for C1. Instead, during the execution we inject a special statistics gathering row source to this query. The special gathering row source uses a one-pass, hash-based distinct algorithm to gather NDV. More information on how this algorithm works can be found in the paper, “efficient and scalable statistics gathering for large databases in Oracle 11g”. The algorithm requires a full scan of the data, uses a bounded amount of memory and yields a highly accurate NDV that is nearly identical to a 100 percent sampling (can be proven mathematically). The special statistics gathering row source also gathers the number of rows, number of nulls and average column length on the side. Since we do a full scan on the table, the number of rows, average column length, minimal and maximal values are 100% accurate. AUTO_SAMPLE_SIZE also affects histogram gathering and index statistics gathering in the following ways. Effect of auto sample size on histogram gathering With the new AUTO_SAMPLE_SIZE, histogram gathering is decoupled from basic column statistics gathering (they used to be gathered on the same sample). Therefore when determining whether we need to bump up the sample size, the new AUTO_SAMPLE_SIZE algorithm no longer performs the “NDV metric” check (see step 4 in above description) because we do not derive NDV from the sample. Sample size needs to be bumped up for a histogram only when the sample contains too many nulls or too few rows. This helps to reduce number of iterations of the histogram creation. More information on this can be found in this blog post. If the minimal (resp. maximal) value that appears in the sample used for gathering the histogram is not the minimal (resp. maximal) value gathered in basic statistics, we will modify the histogram so that the minmal (resp. maximal) value gathered in basic statistics now appears as the endpoint of the first (resp. last) bucket in the histogram. Effect of auto sample size on index stats gathering The new AUTO_SAMPLE_SIZE also affects how index statistics are gathered. The flow chart below shows how index statistics are gathered in 11g when AUTO_SAMPLE_SIZE is specified. Index statistics gathering are sampling based. It could potentially go through several iterations because either the sample contained too few blocks or the sample size was too small to properly gather number of distinct keys (NDKs). With the new AUTO_SAMPLE_SIZE algorithm, however, if the index is defined on a single column, or if the index is defined on multiple columns that correspond to a column group, then the NDV of the column or column group will be used as NDK of the index. The index statistics gathering query will NOT gather NDK in such cases. This helps to alleviate the need to bump up sample size for index statistics gathering. Summary: New AUTO_SAMPLE_SIZE algorithm does a full table scan to gather basic column statistics NDV gathered by new AUTO_SAMPLE_SIZE has an accuracy close to 100% sampling Other basic column statistics, such as the number of nulls, average column length, minimal and maximal values have an accuracy equivalent to 100% sampling Both Histogram and index statistics gathering under new auto sample size algorithm still use sampling. But new auto sample size algorithm helps to alleviate the need to bump up sample size.

Note that if you're interested in learning about Oracle Database 12c, there's an updated version of this post here. When it comes to gathering statistics, one of the most critical decisions you have...

Cursor Sharing

How do adaptive cursor sharing and SQL Plan Management interact?

We've received a lot of questions about how adaptive cursor sharing (ACS) and SQL plan management (SPM) interact.  We discussed this briefly in one of the original SPM posts, but in this post, we'll explain the concepts of how the two features interact in more details, and show an example.  The simplest way to reason about the interaction is to remember that they are responsible for two different tasks.  ACS controls whether or not a child cursor is shared on a particular execution.  For each execution of the query, ACS considers the current bind values and decides if an existing child cursor can be shared or if the optimizer should be given the chance to find a better plan for the current bind values.  SPM controls which plans the optimizer may choose.  If a child cursor is bind-aware, the decision to share or not is made irrespective of whether the query is controlled by SPM.  But once the query and its current bind values are sent to the optimizer for optimization, SPM constrains the optimizer's choice of plans, without regard to whether this query is being optimized due to ACS. Let's look at a small example.  There are many different ways to load plans into SPM, but for simplicity, we will manually load the plans from the cursor cache.  I am using a modified version of the EMPLOYEES table from the sample HR schema - the table has been modified so that it has more rows, with more skew on the job column (there is only one president, and relatively few VPs), and there is also an index defined on it.  This modified table is called EMPLOYEES_ACS (which can be created using this script).  But just to give you an idea of the data distribution, here are the row counts and job distribution: We will be working with a simple query that joins this table, filtered on job_id, to DEPARTMENTS and aggregates the results: We are using the BIND_AWARE hint, to expedite the process of getting bind-aware cursors into the cursor cache.  If we run the query with three different bind values, AD_PRES, AD_VP, and SA_REP, the optimizer chooses three different plans. AD_PRES: AD_VP: SA_REP: To make things interesting, let's consider what happens if we load only two of these plans into SPM.  After running the query with the bind values AD_PRES and SA_REP, there are two child cursors with different plans.  Let's load these plans into SPM.  Now if we run the query with the three different bind values, SPM will constrain the optimizer to pick from the two accepted plans in the SQL plan baseline.  Let's run the query with the same sequence of bind values again (AD_PRES, AD_VP, SA_REP), and see the plans that we pick for each: AD_PRES: For this bind value, we pick the same plan that we picked without the SQL plan baseline in the mix.  This is because this was one of the plans that we loaded into the SQL plan baseline as an accepted, so the optimizer is allowed to choose it. AD_VP: For this bind value, the optimizer comes up with a plan that is not in the SQL plan baseline.  So instead we pick the best accepted plan, which uses a hash join.  The cost-based plan the optimizer came up with is added to the SQL plan baseline, but it will not be considered until it has been evolved. Finally lets run with the last value SA_REP. SA_REP: As you would expect, we get the same plan here that we originally got for this bind value, since that was one of the plans that we loaded into SPM. Since the second and third execution picked the same plan, there is now only one shareable cursor in the cursor cache for this plan hash value.  And that cursor will now match bind values with a similar selectivity to AD_VP or SA_REP (or anything in-between). If you are playing around with SPM and ACS, either with our demo or your own, there are a few potential surprises to keep in mind: The plans that are chosen by ACS impact the number of child cursors and number of hard parses that you will see for a particular sequence of bind values.  Thus, the presence of a SQL plan baseline that constrain the possible plans can cause a different number of child cursors and hard parses.  For instance, with our example, we end up with one child cursor that can be used for AD_VP, SA_REP, or bind values whose selectivity falls somewhere in-between.  Without SPM in the picture, a bind value that falls in-between may generate an additional hard parse and a new child cursor (and possibly a new plan). New cost-based plans chosen by the optimizer that are not in the SQL plan baseline are automatically added to it, as unaccepted plans.  However, the optimizer only has the opportunity to choose a plan when cursor sharing fails to match a cursor and a hard parse is triggered.  As mentioned in the point above, the presence of a SQL plan baseline can reduce the number of hard parses. When SPM updates a SQL plan baseline, the cursor built using that baseline is invalidated.  A couple of common reasons for updating the SQL plan baseline include: when a new (cost-based) plan is added to the SQL plan baseline, or when a plan in the SQL plan baseline is marked as reproduced (when it is successfully reproduced for the first time).  So if you are tinkering with this, and you can't find the cursor that you just ran with, this may be the case.  In the examples I showed above, after I loaded the plans into the SQL plan baseline, I ran the queries with each bind multiple times to get over the hump and ensure that I could display the plan after I ran the query.  This is not likely to have a significant impact on a running system, but when you are running small test cases like this, it can be a bit confusing. We hope that this small example clears up some of the confusion about how these two features interact.  There are many different ways to load plans into a SQL plan baseline, and to get bind-aware cursors into the cursor cache, which can cause small changes in the behavior.  If you have specific questions, please post them as a comment.

We've received a lot of questions about how adaptive cursor sharing (ACS) and SQL plan management (SPM) interact.  We discussed this briefly in one of the original SPM posts, but in this post, we'll...

Upcoming events

Upcoming events : RMOUG 2013

I'm really looking forward to the  Rocky Mountain Oracle User Groups Training days next week, at the Colorado Convention Center, as it will bring several new challenges for me.  I am going to be involved in three very different sessions at this years conference: On Monday February 11th at 1-5pm, in room 4e Oracle Database 11.2 Performance Tips This will be my first ever OTN LAB so hopefully it will go well!  I plan to use step by step tutorials to show you how to answer questions like why an index wasn't used, why partition eliminations didn't occur, why statistics were ignored, and many more. You will learn to identify and quickly resolve these issues without the need for optimizer hints or changing initialization parameters. On Tuesday February 12th at 1:15pm, in room 4a Women in TechnologyI am delighted to be taking part in this panel discussion on women in technology led by Kellyn Pot’Vin. The panel plans to discuss topics like how to get more women involved in the Oracle community, how to get more young women interested in technical careers and what ever else gets brought up! On Wednesday February 13th at 1:30pm in room 4b Inside the Oracle 11g OptimizerMy final session at RMOUG will be a more traditional presentation that will discuss in detail how Optimizer hints are interpreted, when they should be used, why they appear to be ignored and what you can do if you have inherited a hint ridden application. I hope to see you there as it is always a great conference with over 700 attending and a cast of super speakers. +Maria Colgan

I'm really looking forward to the  Rocky Mountain Oracle User Groups Training days next week, at the Colorado Convention Center, as it will bring several new challenges for me.  I am going to be...

SQL Plan Management

How does SQL Plan Management match SQL statements to SQL plan baselines?

Happy New Year to all of our readers! As more and more of your systems migrate to Oracle Database 11g, SQL Plan Management (SPM) is becoming increasingly popular as a mechanism to maintain plan stability and subsequently system performance. However, one of the key frustrations folks encounter when using SPM is that the SQL plan baseline they painstakingly created is not used. So, how does Oracle decide whether or not to use an existing SQL plan baseline? SQL statements are matched to SQL plan baselines using the signature of the SQL statement. A signature is a unique SQL identifier generated from the normalized SQL text (uncased and with whitespaces removed). This is the same technique used by SQL profiles and SQL patches. This means, if you issue identical SQL statements from two different schemas they would resolve to the same SQL plan baseline. Now, you might be thinking that this approach could lead to all sorts of performance problems because you may not want the same plan used by each schema. But remember a SQL plan baseline can have multiple plans for the same SQL statement and SPM records the parsing schema along with each plan. Let’s take a look at this in action to fully understand why it is not a problem using the following SQL text, Select count(s.quantity_sold) total_sales From Sales s Where s.prod_id=30; We begin in the SH sample schema and execute the statement to see the cost-based plan. Let’s create a SQL plan baseline for this SQL statement. We can do this by capturing the plan directly from the cursor cache into a SQL plan baseline, using the DBMS_SPM package. But first we need the SQL_ID for our statement, which we can get from V$SQL. We can confirm our SQL plan baseline exists and our preferred plan is both enabled and accepted by querying DBA_SQL_PLAN_BASELINES. By re-executing the query we can see that the SQL plan baseline is now being used even when we change the case and white spaces in our statement. Let’s now connect as the sample schema SCOTT. We have created an identical copy of the sales table in the SCOTT schema but it does not have the IND_SALES_PROD_QTY_SOLD index. If we execute the identical SQL statement from the SCOTT schema should it use the existing SQL plan baseline or not? Well, we didn’t get the same plan or a note to say the SQL plan baseline was used. The plan used in this case was a fast full scan of the bitmap index SALES_PROD_BIX. So does this mean the SQL statement wasn’t matched to our existing SQL plan baseline? No, the SQL statement was in fact matched to the existing SQL plan baseline but the accepted plan in the SQL plan baseline couldn’t be reproduced in the SCOTT schema. Here is exactly what happened. When the statement was issued in the SCOTT schema Oracle determined the best cost-based plan for the SQL statement to be a fast full index scan on the SALES_PROD_BIX index. Before executing this plan we checked to see if the SQL statement matched an existing SQL plan baseline using its SQL signature. We found a corresponding SQL plan baseline, so we checked to see if the cost-based plan we came up with matches an accepted plan in the SQL plan baseline. It does not, so our cost-based plan got added to the SQL plan baseline as a non-accepted plan and we tried to use the existing accepted plan.However, the existing plan requires a b-tree index on the prod_id and quantity_sold, which doesn’t exist in the SCOTT schema. Since we can’t reproduce the accepted plan, we use the cost-based plan that the Optimizer came up with at parse.If we check the SQL plan baseline we will see that this new plan using the SALES_PROD_BIX index has been added to the SQL plan baseline but it is not accepted. If the new plan is accepted using DBMS_SPM.EVOLVE_SQL_PLAN_BASELINE procedure, we will see that it is used by the identical SQL statement in the SCOTT schema. Now, that the new plan is accepted let try running the SQL statement again from the SCOTT schema. So, as you can see the same SQL plan baseline is being used for an identical SQL statement issued from two different schemas. This is not a problem since a SQL plan baseline can have multiple accepted execution plans. You just need to manage the plans in the SQL plan baseline to ensure your preferred plan for each schema is an accepted plan. You can get a copy of the script I used to generate this post here. +Maria Colgan

Happy New Year to all of our readers! As more and more of your systems migrate to Oracle Database 11g, SQL Plan Management (SPM) is becoming increasingly popular as a mechanism to maintain plan...

How do I

How do I create statistics to make ‘small’ objects appear ‘large’ to the Optmizer?

I recently spoke with a customer who has a development environment that is a tiny fraction of the size of their production environment. His team has been tasked with identifying problem SQL statements in this development environment before new code is released into production. The problem is the objects in the development environment are so small, the execution plans selected in the development environment rarely reflects what actually happens in production. To ensure the development environment accurately reflects production, in the eyes of the Optimizer, the statistics used in the development environment must be the same as the statistics used in production. This can be achieved by exporting the statistics from production and import them into the development environment. Even though the underlying objects are a fraction of the size of production, the Optimizer will see them as the same size and treat them the same way as it would in production. Below are the necessary steps to achieve this in their environment. I am using the SH sample schema as the application schema who's statistics we want to move from production to development. Step 1. Create a staging table, in the production environment, where the statistics can be stored Step 2. Export the statistics for the application schema, from the data dictionary in production, into the staging table Step 3. Create an Oracle directory on the production system where the export of the staging table will reside and grant the SH user the necessary privileges on it. Step 4. Export the staging table from production using data pump export Step 5. Copy the dump file containing the stating table from production to development Step 6. Create an Oracle directory on the development system where the export of the staging table resides and grant the SH user the necessary privileges on it.  Step 7. Import the staging table into the development environment using data pump import Step 8. Import the statistics from the staging table into the dictionary in the development environment. You can get a copy of the script I used to generate this post here. +Maria Colgan

I recently spoke with a customer who has a development environment that is a tiny fraction of the size of their production environment. His team has been tasked with identifying problem SQL statements...

Oracle OpenWorld

More on Oracle OpenWorld 2012

With only two weeks to go until Oracle OpenWorld, it is time to start planning your schedule. Every year folks ask me what Optimizer related sessions they should go and see at OpenWorld. Below are my top two picks for each day of the conference, to get your schedule started. Sunday, September 30th at 9am Beginning performance tuning Session UGF 3320 in Moscone West, room 2022 Sunday September 30th at 12:30pm Ten Surprising Performance Tactics Session UGF10426 in Moscone West, room 2016 Monday October 1st at 12:15pm The Evolution of Histograms in Oracle Database Session CON2803 in Moscone south, room 302 Monday October 1st at 1:45pm A Day in the Life of a Real-World Performance Engineer Session CON8404 in Moscone south, room 303 Tuesday October 2nd at 11:45am Oracle Partitioning: It’s Getting Even Better Session CON8421 in Moscone South, room 101 Tuesday October 2nd at 1:15pm Oracle Optimizer: Harnessing the Power of Optimizer Hints  Session CON8455 in Moscone South, room 103 Wednesday October 3rd at 3:30pm SQL Plan Stability: Post 11g Upgrade—Verizon Wireless’ Experience Session CON4485 in Moscone South, room 302 Wednesday October 3rd at 5pm Five SQL and PL/SQL Things in the Latest Generation of Database Technology Session CON8432 Moscone South, room 103 Thursday, October 4th at 11:15pm How the Query Optimizer Learns from Its Mistakes  Session CON3330 in Moscone west, room 3016 Thursday, October 4th at 12:45pm Oracle Optimizer: An Insider’s View of How the Optimizer Works Session CON8457 in Moscone South, room 104 Don't forget to pickup an Optimizer bumper sticker at the Optimizer demo booth. This year we are located in booth 3157, in the Database area of the demogrounds, in Moscone South. Members of the Optimizer development team will be there Monday through Wednesday from 9:45 am until 6pm. +Maria Colgan

With only two weeks to go until Oracle OpenWorld, it is time to start planning your schedule. Every year folks ask me what Optimizer related sessions they should go and see at OpenWorld. Below are my...

Oracle OpenWorld

Oracle OpenWorld 2012

I can't believe it's time for OpenWorld again! Oracle OpenWorld is the largest gathering of Oracle customers, partners, developers, and technology enthusiasts. This year it will take place between September 30th and October 4th in San Francisco. Of course, the Optimizer development group will be there and you will have multiple opportunities to meet the team, in one of our technical sessions, or at the Oracle Database demogrounds. This year the Optimizer team has 2 technical sessions, as well as a booth in the Oracle Database demogrounds. Tuesday, October 2nd at 1:15pm Oracle Optimizer: Harnessing the Power of Optimizer Hints Session CON8455 at Moscone South - room 103 In this session we will discuss in detail how optimizer hints are interpreted, when they should be used, and why they sometimes appear to be ignored. Thursday, October 4th at 12:45pm Oracle Optimizer: An Insider’s View of How the Optimizer Works Session CON8457 at Moscone South - room 104This session explains how the latest version of the optimizer works and the best ways you can influence its decisions to ensure you get optimal execution every time. It will also include a full history of the Cost Based Optimizer, so make sure you stick around for this one! If you have burning Optimizer or statistics related questions, or if you just want to pick up an Optimizer bumper sticker, you can stop by the Optimizer demo booth. This year we are located in booth 3157, in the Database area of the demogrounds, in Moscone South. Members of the Optimizer development team will be there Monday through Wednesday from 9:45 am until 6pm. The full Oracle OpenWorld catalog is on-line, or you can browse by speakers by name. So start planning your trip today! +Maria Colgan

I can't believe it's time for OpenWorld again! Oracle OpenWorld is the largest gathering of Oracle customers, partners, developers, and technology enthusiasts. This year it will take place between...

How do I

How do I restrict concurrent statistics gathering to a small set of tables from a single schema?

I got an interesting question from one of my colleagues in the performance team last week about how to restrict a concurrent statistics gather to a small subset of tables from one schema, rather than the entire schema. I thought I would share the solution we came up with because it was rather elegant, and took advantage of concurrent statistics gathering, incremental statistics, and the not so well known “obj_filter_list” parameter in DBMS_STATS.GATHER_SCHEMA_STATS procedure. You should note that the solution outline below with “obj_filter_list” still applies, even when concurrent statistics gathering and/or incremental statistics gathering is disabled. The reason my colleague had asked the question in the first place was because he wanted to enable incremental statistics for 5 large partitioned tables in one schema. The first time you gather statistics after you enable incremental statistics on a table, you have to gather statistics for all of the existing partitions so that a synopsis may be created for them. If the partitioned table in question is large and contains a lot of partition, this could take a considerable amount of time. Since my colleague only had the Exadata environment at his disposal overnight, he wanted to re-gather statistics on 5 partition tables as quickly as possible to ensure that it all finished before morning. Prior to Oracle Database 11g Release 2, the only way to do this would have been to write a script with an individual DBMS_STATS.GATHER_TABLE_STATS command for each partition, in each of the 5 tables, as well as another one to gather global statistics on the table. Then, run each script in a separate session and manually manage how many of this session could run concurrently. Since each table has over one thousand partitions that would definitely be a daunting task and would most likely keep my colleague up all night! Beginning with Oracle Database 11g Release 2 we can take advantage of concurrent statistics gathering, which enables us to gather statistics on multiple tables in a schema (or database), and multiple (sub)partitions within a table concurrently. By using concurrent statistics gathering we no longer have to run individual statistics gathering commands for each partition. Oracle will automatically create a statistics gathering job for each partition, and one for the global statistics on each partitioned table. With the use of concurrent statistics, our script can now be simplified to just five DBMS_STATS.GATHER_TABLE_STATS commands, one for each table. This approach would work just fine but we really wanted to get this down to just one command. So how can we do that? You may be wondering why we didn’t just use the DBMS_STATS.GATHER_SCHEMA_STATS procedure with the OPTION parameter set to ‘GATHER STALE’. Unfortunately the statistics on the 5 partitioned tables were not stale and enabling incremental statistics does not mark the existing statistics stale. Plus how would we limit the schema statistics gather to just the 5 partitioned tables? So we went to ask one of the statistics developers if there was an alternative way. The developer told us the advantage of the “obj_filter_list” parameter in DBMS_STATS.GATHER_SCHEMA_STATS procedure. The “obj_filter_list” parameter allows you to specify a list of objects that you want to gather statistics on within a schema or database. The parameter takes a collection of type DBMS_STATS.OBJECTTAB. Each entry in the collection has 5 fields; the schema name or the object owner, the object type (i.e., ‘TABLE’ or ‘INDEX’), object name, partition name, and subpartition name. You don't have to specify all five fields for each entry. Empty fields in an entry are treated as if it is a wildcard field (similar to ‘*’ character in LIKE predicates). Each entry corresponds to one set of filter conditions on the objects. If you have more than one entry, an object is qualified for statistics gathering as long as it satisfies the filter conditions in one entry. You first must create the collection of objects, and then gather statistics for the specified collection. It’s probably easier to explain this with an example. I’m using the SH sample schema but needed a couple of additional partitioned table tables to get recreate my colleagues scenario of 5 partitioned tables. So I created SALES2, SALES3, and COSTS2 as copies of the SALES and COSTS table respectively (setup.sql). I also deleted statistics on all of the tables in the SH schema beforehand to more easily demonstrate our approach. Step 0. Delete the statistics on the tables in the SH schema.   Step 1. Enable concurrent statistics gathering. Remember, this has to be done at the global level.   Step 2. Enable incremental statistics for the 5 partitioned tables.   Step 3. Create the DBMS_STATS.OBJECTTAB and pass it to the DBMS_STATS.GATHER_SCHEMA_STATS command.   Here, you will notice that we defined two variables of DBMS_STATS.OBJECTTAB type. The first, filter_lst, will be used to pass the list of tables we want to gather statistics on, and will be the value passed to the obj_filter_list parameter. The second, obj_lst, will be used to capture the list of tables that have had statistics gathered on them by this command, and will be the value passed to the objlist parameter. In Oracle Database 11g Release 2, you need to specify the objlist parameter in order to get the obj_filter_list parameter to work correctly due to bug 14539274. Will also needed to define the number of objects we would supply in the obj_filter_list. In our case we ere specifying 5 tables (filter_lst.extend(5)). Finally, we need to specify the owner name and object name for each of the objects in the list. Once the list definition is complete we can issue the DBMS_STATS.GATHER_SCHEMA_STATS command. Step 4. Confirm statistics were gathered on the 5 partitioned tables. Here are a couple of other things to keep in mind when specifying the entries for the  obj_filter_list parameter. If a field in the entry is empty, i.e., null, it means there is no condition on this field. In the above example , suppose you remove the statement Obj_filter_lst(1).ownname := ‘SH’; You will get the same result since when you have specified gather_schema_stats so there is no need to further specify ownname in the obj_filter_lst. All of the names in the entry are normalized, i.e., uppercased if they are not double quoted. So in the above example, it is OK to use Obj_filter_lst(1).objname := ‘sales’;. However if you have a table called ‘MyTab’ instead of ‘MYTAB’, then you need to specify Obj_filter_lst(1).objname := ‘”MyTab”’; As I said before, although we have illustrated the usage of the obj_filter_list parameter for partitioned tables, with concurrent and incremental statistics gathering turned on, the obj_filter_list parameter is generally applicable to any gather_database_stats, gather_dictionary_stats and gather_schema_stats command. You can get a copy of the script I used to generate this post here.  

I got an interesting question from one of my colleagues in the performance team last week about how to restrict a concurrent statistics gather to a small subset of tables from one schema, rather than...

Upcoming events

Upcoming events: INOUG Oracle Training Day!

The Indiana Oracle User Group (INOUG) have been kind enough to ask me to present an all day Optimizer workshop at their annual training day on July 26th at Eli Lilly & Co. The workshop will consist of four 90 minute session and will explain the fundamentals of the cost based Optimizer and the statistics that feed it, as well as providing a methodology for diagnosing and resolving SQL execution performance problems. The workshop begins with an in-depth look at statistics and statistics maintenance, a challenge all DBAs must face in order to prevent suboptimal execution plans. It continues with a detailed explanation on how to interpret an execution plan. Each aspect of an execution plan will be examined, from cardinality estimates to parallel execution, and you will learn what information you should be gleam from the plan and how it affects the execution performance of a SQL statement. After lunch the workshop focuses on the process of analyzing and resolving the most common SQL execution performance problems including poor cardinality estimations, bind peeking issues, selecting the wrong access method and much, much more. You can register for the event here. I hope you can join us! +Maria Colgan

The Indiana Oracle User Group (INOUG) have been kind enough to ask me to present an all day Optimizer workshop at their annual training day on July 26th at Eli Lilly & Co. The workshop will consist...

Upcoming events

Upcoming events : ODTUG Kscope 12

Kscope 12, is the annual conference of the Oracle Development Tools User Group, which is taking place  June 24 - June 28, in San Antonio Texas this year. This is a great conference for Oracle developers and architects, offering the best content by renowned experts. I am luck enough to be involved in five sessions this year around the Oracle Optimizer and performance. Below are details on the session I will be presenting or co-presenting in. I hope you have an opportunity to check out some of these sessions if you plan to attend the conference! Sunday, June 24st at 3:45pm Oracle Optimizer: Top Tips to get Optimal SQL Execution all the TimeThis session is part of the Database SIG that is happening on Sunday. In this session I will  show you how to identify and resolve common SQL execution performance problems, such as poor cardinality estimations, bind peeking issues, selecting the wrong access method, join type or join order. Monday, June 25th at 4:15 pm DB Experts Panel at ODTUGThis session provides an open Q&A for attendees to pose questions about the Oracle Database and Oracle Exadata to a panel of database experts including:Cary Millsap, Jonathan Lewis, Dominic Delmolino, Tim Gorman, Dan Norris and myself. Tuesday June 26th at 11:30 am Implement extensible in-database analytics by programming MapReduce in SQL & PL/SQLI play just a small role in this session, which will actually be delivered by Bryn Llewellyn. This session uses real-world examples to illustrate how to implement MapReduce style solutions in the Oracle Databaseusing standard SQL and PL/SQL. Wednesday June 27th at 8:30 am SQL Tuning in a Data Warehouse EnvironmentThis session will discuss a straight-forward approach to tackling SQL tuning in a warehouse environment. Uses real-world examples it demonstrates how you should navigate the complex execution plans found in a data warehouse environment and determine what steps are necessary to improve the performance. Wednesday June 27th at 9:45 am Understanding Optimizer Statistics In this session I will discuss the features introduced in Oracle Database 11g to improve the quality and efficiency of statistics-gathering as well as strategies for managing statistics in various database environments. I hope to see you are there! +Maria Colgan

Kscope 12, is the annual conference of the Oracle Development Tools User Group, which is taking place  June 24 - June 28, in San Antonio Texas this year. This is a great conference for Oracle...

How do I

How Do I Compare Optimizer Statistics?

This question came up recently when I was helping a customer upgrade a large data warehouse. Prior to the upgrade, they were using an ESTIMATE_PERCENT of 0.000001, the smallest possible sample size allowed, when they gathering statistics on their larger tables. When I asked why they picked such a tiny sample size they said it was because they needed statistics to be gathered extremely quickly after their daily load both at the partition and at the global level. Since these large tables were partitioned, I thought they would be an excellent candidate for incremental statistics. However, in order to use incremental statistics gathering you have to let the ESTIMATE_PERCENT parameter default to AUTO_SAMPLE_SIZE. Although the customer saw the benefit of using incremental statistics they were not keen on changing the value of ESTIMATE_PERCENT. So I argued that with AUTO_SAMPLE_SIZE they would get statistics that were equivalent to a 100% sample but with the speed of a 10% sample.  Since we would be using incremental statistics we would only have to gather statistics on the freshly loaded partition and the global statistics (table level statistics) would be automatically aggregated correctly. So with much skepticism, they tried incremental statistics in their test system and they were pleasantly surprised at the elapse time. However, they didn’t trust the statistics that were gathered and asked me, “how do I compare the statistics I got with AUTO_SAMPLE_SIZE  to the statistics I normally get with an ESTIMATE_PERCENT of 0.000001?” The answer to that was easy, ‘use DBMS_STAT.DIFF_TABLE_STATS’. Although the answer was easy, it wasn’t an easy process to help them to work out how to use the DBMS_STAT.DIFF_STATS functions correctly. In this post I hope to share some of the gotchas you many encounter using DIFF_STATS that are not so obvious.  Let’s take the SALES table from the SH schema as an example. The SALES table has 28 partitions and 918843 rows. First we will gather statistics on the SALES table with their original setting for ESTIMATE_PERCENT, 0.00001. These statistics can now be backed up into a newly created stats table. Now that we have an export of the statistics from the manually specified ESTIMATE_PERCENT run, let’s re-gather statistics on the SALES table using the default, AUTO_SAMPLE_SIZE. So, we are now ready to compare the two sets of the statistics for the SALES table using the DBMS_STAT.DIFF_TABLE_STATS function. There are actually three version of this function depending on where the statistics being compared are located; DBMS_STAT.DIFF_TABLE_STATS_IN_HISTORY DBMS_STAT.DIFF_TABLE_STATS_IN_PENDING DBMS_STAT.DIFF_TABLE_STATS_IN_STATTAB In this case we will be using the DBMS_STAT.DIFF_TABLE_STATS_IN_STATTAB  function. The functions also compare the statistics of the dependent objects (indexes, columns, partitions) and will only displays statistics for the object(s) from both sources if the difference between the statistics exceeds a certain threshold (%). The threshold can be specified as an argument to the function; the default value is 10%. The statistics corresponding to the first source will be used as the basis for computing the differential percentage. The DBMS_STAT.DIFF_TABLE_STATS functions are table functions so you must use the key word table when you are selecting from them, otherwise you will receive an error saying the object does not exist.  The table function returns a report (clob datatype) and maxdiffpct (number). In order to display the report correctly you must use the set long command to define the width of a long so the report can be displayed properly. How that we know how to generate the report, let’s look at what it says, The report has three sections. It begins with a comparison of the basic table statistics. In this case the table statistics (number of rows, number of blocks etc) are the same. The results so far are to be expected since we can accurately extrapolate the table statistics from a very small sample. The second section of the report examines column statistics. Each of the columns where the statistics vary is listed (AMOUNT_SOLD, CUST_ID, TIME_ID) along with a copy of the statistics values from each source. Source A is the STATTAB, which in our case is the ESTIMATE_PERCENT of 0.000001. Source B is the statistics currently in the dictionary, which in our case is the AUTO_SAMPLE_SIZE set. You will notice quite a significant difference in the statistics, especially in the NDV (number of distinct values) and the minimum and maximum values for each of the columns. If we compare these results with the actual number of distinct values for these (below), we see that the statistics reported by source B, the AUTO_SAMPLE_SIZE are the most accurate. The report then goes on to list the column statistics differences for each of the partitions. In this section you will see that the problems occur only in the AMOUNT_SOLD, CUST_ID columns. The SALES table is range partitioned on TIME_ID, so there is a limited number of TIME_IDs in each partition, thus the percentage difference between the results on this level is not enough to meet the threshold of 10%. Finally the report looks at the index statistics.In this case the index statistics were different but they were not greater than 10% different so the report doesn't show them. Along with the report the function returns the MAXDIFFPCT. This is the maximum percentage difference between the statistics. These differences can come from the table, column, or index statistics. In this case the MAXDIFFPCT was 96%! So the statistics were significantly different with AUTO_SAMPLE_SIZE but the proof is in the pudding. We put the new statistics to the test and found that the Optimizer chose much better execution plan with the new statistics. So much so they were able to remove a ton of hints from their code that had been necessary previously due to poor statistics. You can't ask for anything better than that! You can get a copy of the script I used to generate this post here.  

This question came up recently when I was helping a customer upgrade a large data warehouse. Prior to the upgrade, they were using an ESTIMATE_PERCENT of 0.000001, the smallest possible sample size...

SQL Plan Management

What is the difference between SQL Profiles and SQL Plan Baselines?

I'm frequently asked about the difference between SQL profiles and SQL plan baselines and why SQL profiles can be shared but SQL plan baselines can't. So I thought it would be a good idea to write a post explaining the differences between them and how they interact. But first let's briefly recap each feature. The query optimizer normally uses information like object and system statistics, compilation environment, bind values and so on to determine the best plan for a SQL statement. In some cases, defects in either these inputs or the optimizer can lead to a sub-optimal plan.  A SQL profile contains auxiliary information that mitigates these problems.  When used together with its regular inputs, a SQL profile helps the optimizer minimize mistakes and thus more likely to select the best plan. A SQL plan baseline for a SQL statement consists of a set of accepted plans. When the statement is parsed, the optimizer will only select the best plan from among this set. If a different plan is found using the normal cost-based selection process, the optimizer will add it to the plan history but this plan will not be used until it is verified to perform better than the existing accepted plan and is evolved. We described this behavior in more detail in a previous post. So, SQL profiles provide additional information to the optimizer to help select the best plan; they don't constrain the optimizer to any specific plan, which is why they can be shared. SQL plan baselines, on the other hand, constrain the optimizer to only select from a set of accepted plans. The cost-based approach is still used to choose a plan, but only within this set of plans. SQL plan baselines are a more conservative plan selection strategy than SQL profiles. So when should you use SQL profiles versus SQL plan baselines? You should use a SQL profiles if you just want to help the optimizer a little in its costing process without constraining it to any specific plan. This approach can be extremely useful when you want the system to adapt immediately to changes like new object statistics. You should use SQL plan baselines if you are more conservative and want to control which plans are used. If you are using SQL plan baselines and find that the optimizer sometimes does not select the best plan from the accepted list or does not find a best-cost plan to add to the plan history, then you can always use a SQL profile as well. What happens if a SQL statement has both a SQL Profile and a SQL plan Baseline? If you recall, SQL Plan Management (SPM) has three component, plan capture, plan selection, and plan evolution. The presence of a SQL profile affects all three components of SPM and we will describe each of those interactions below. SPM plan capture and SQL profiles When the statement is executed it will be hard parsed and a cost based plan will be generated. That plan will be influenced by the SQL Profile. Once the cost based plan is determined it will be compared to the plans that exist in the SQL plan baseline. If the plan matches one of the accepted plans in the SQL plan baseline, we will go ahead and use it. However, if the cost based plan doesn't match any of the accepted plans in the SQL plan baseline it will be added to the plan baseline as an unaccepted plan. SPM plan selection and SQL profiles When a SQL statement with a SQL plan baseline is parsed, the accepted plan with the best cost will be chosen.  This process uses the regular optimizer.  The presence of a SQL profile will affect the estimated cost of each of these plans and thus potentially the plan that is finally selected. SPM plan evolution and SQL profiles The third sub-component of SPM is verification or evolution of non-accepted plans. The evolution process test-executes the non-accepted plan against the best of the accepted plans.  The best accepted plan is selected based on cost.  Again, if a SQL profile exists, it will influence the estimated cost and thus the accepted plan chosen for comparison against the non-accepted plan. Hopefully this information gives you a clear picture of how SQL profile and SQL plan baselines differ and how they interact with one another! There's more on using SQL plan baselines to control SQl execution plans here.

I'm frequently asked about the difference between SQL profiles and SQL plan baselines and why SQL profiles can be shared but SQL plan baselines can't. So I thought it would be a good idea to write a...

Statistics

Lies, damned lies, and statistics Part 2

There was huge interest in our OOW session last year on Managing Optimizer Statistics. It seems statistics and the maintenance of them continues to baffle people. In order to help dispel the mysteries surround statistics management we have created a two part white paper series on Optimizer statistics.  Part one of this series was released in November last years and describes in detail, with worked examples, the different concepts of Optimizer statistics. Today we have published part two of the series, which focuses on the best practices for gathering statistics, and examines specific use cases including, the fears that surround histograms and statistics management of volatile tables like Global Temporary Tables. Here is a quick look at the Introduction and the start of the paper. You can find the full paper here. Happy Reading! Introduction The Oracle Optimizer examines all of the possible plans for a SQL statement and picks the one with the lowest cost, where cost represents the estimated resource usage for a given plan. In order for the Optimizer to accurately determine the cost for an execution plan it must have information about all of the objects (table and indexes) accessed in the SQL statement as well as information about the system on which the SQL statement will be run. This necessary information is commonly referred to as Optimizer statistics. Understanding and managing Optimizer statistics is key to optimal SQL execution. Knowing when and how to gather statistics in a timely manner is critical to maintaining acceptable performance. This whitepaper is the second of a two part series on Optimizer statistics. The first part of this series, Understanding Optimizer Statistics, focuses on the concepts of statistics and will be referenced several times in this paper as a source of additional information. This paper will discuss in detail, when and how to gather statistics for the most common scenarios seen in an Oracle Database. The topics are · How to gather statistics · When to gather statistics · Improving the efficiency of gathering statistics · When not to gather statistics · Gathering other types of statistics How to gather statistics The preferred method for gathering statistics in Oracle is to use the supplied automatic statistics-gathering job. Automatic statistics gathering job The job collects statistics for all database objects, which are missing statistics or have stale statistics by running an Oracle AutoTask task during a predefined maintenance window. Oracle internally prioritizes the database objects that require statistics, so that those objects, which most need updated statistics, are processed first. The automatic statistics-gathering job uses the DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC procedure, which uses the same default parameter values as the other DBMS_STATS.GATHER_*_STATS procedures. The defaults are sufficient in most cases. However, it is occasionally necessary to change the default value of one of the statistics gathering parameters, which can be accomplished by using the DBMS_STATS.SET_*_PREF procedures. Parameter values should be changed at the smallest scope possible, ideally on a per-object bases. You can find the full paper here. Happy Reading! +Maria Colgan

There was huge interest in our OOW session last year on Managing Optimizer Statistics. It seems statistics and the maintenance of them continues to baffle people. In order to help dispel the...

Statistics

I thought the new AUTO_SAMPLE_SIZE in Oracle Database 11g looked at all the rows in a table so why do I see a very small sample size on some tables?

There's now an updated post on auto sample size that covers Oracle Database 12c, but this one is still relevant to Oracle Database 11g... I recently got asked this question and thought it was worth a quick blog post to explain in a little more detail what is going on with the new AUTO_SAMPLE_SIZE in Oracle Database 11g and what you should expect to see in the dictionary views. Let’s take the SH.CUSTOMERS table as an example.  There are 55,500 rows in the SH.CUSTOMERS tables. If we gather statistics on the SH.CUSTOMERS using the new AUTO_SAMPLE_SIZE but without collecting histogram we can check what sample size was used by looking in the USER_TABLES and USER_TAB_COL_STATISTICS dictionary views. The sample sized shown in the USER_TABLES is 55,500 rows or the entire table as expected. In USER_TAB_COL_STATISTICS most columns show 55,500 rows as the sample size except for four columns (CUST_SRC_ID, CUST_EFF_TO, CUST_MARTIAL_STATUS, CUST_INCOME_LEVEL ). The CUST_SRC_ID and CUST_EFF_TO columns have no sample size listed because there are only NULL values in these columns and the statistics gathering procedure skips NULL values. The CUST_MARTIAL_STATUS (38,072) and the CUST_INCOME_LEVEL (55,459) columns show less than 55,500 rows as their sample size because of the presence of NULL values in these columns. In the SH.CUSTOMERS table 17,428 rows have a NULL as the value for CUST_MARTIAL_STATUS column (17428+38072 = 55500), while 41 rows have a NULL values for the CUST_INCOME_LEVEL column (41+55459 = 55500). So we can confirm that the new AUTO_SAMPLE_SIZE algorithm will use all non-NULL values when gathering basic table and column level statistics. Now we have clear understanding of what sample size to expect lets include histogram creation as part of the statistics gathering. Again we can look in the USER_TABLES and USER_TAB_COL_STATISTICS dictionary views to find the sample size used. The sample size seen in USER_TABLES is 55,500 rows but if we look at the column statistics we see that it is same as in previous case except  for columns  CUST_POSTAL_CODE and  CUST_CITY_ID. You will also notice that these columns now have histograms created on them. The sample size shown for these columns is not the sample size used to gather the basic column statistics. AUTO_SAMPLE_SIZE still uses all the rows in the table - the NULL rows to gather the basic column statistics (55,500 rows in this case). The size shown is the sample size used to create the histogram on the column. When we create a histogram we try to build it on a sample that has approximately 5,500 non-null values for the column.  Typically all of the histograms required for a table are built from the same sample. In our example the histograms created on CUST_POSTAL_CODE and the CUST_CITY_ID were built on a single sample of ~5,500 (5,450 rows) as these columns contained only non-null values. However, if one or more of the columns that requires a histogram has null values then the sample size maybe increased in order to achieve a sample of 5,500 non-null values for those columns. n addition, if the difference between the number of nulls in the columns varies greatly, we may create multiple samples, one for the columns that have a low number of null values and one for the columns with a high number of null values.  This scheme enables us to get close to 5,500 non-null values for each column. You can get a copy of the script I used to generate this post here.

There's now an updated post on auto sample size that covers Oracle Database 12c, but this one is still relevant to Oracle Database 11g... I recently got asked this question and thought it was worth...

Upcoming events

Upcoming events : OBUG Connect Conference 2012

The Oracle Benelux User Group (OBUG) have given me an amazing opportunity to present a one day Optimizer workshop at their annual Connect Conference in Maastricht on April 24th. The workshop will run as one of the parallel tracks at the conference and consists of three 45 minute sessions. Each session can be attended stand alone but they will build on each other to allow someone new to the Oracle Optimizer or SQL tuning to come away from the conference with a better understanding of how the Optimizer works and what techniques they should deploy to tune their SQL. Below is a brief description of each of the sessions Session 7 - 11:30 am Oracle Optimizer: Understanding Optimizer StatisticsThe workshop opens with a discussion on Optimizer statistics and the features introduced in Oracle Database 11g to improve the quality and efficiency of statistics-gathering. The session will also provide strategies for managing statistics in various database environments. Session 27 -  14:30 pm Oracle Optimizer: Explain the Explain PlanThe workshop will continue with a detailed examination of the different aspects of an execution plan, from selectivity to parallel execution, and explains what information you should be gleaning from the plan. Session 47 -  15:45 pm Top Tips to get Optimal Execution Plans Finally I will show you how to identify and resolving the most common SQL execution performance problems, such as poor cardinality estimations, bind peeking issues, and selecting the wrong access method.   Hopefully I will see you there! +Maria Colgan

The Oracle Benelux User Group (OBUG) have given me an amazing opportunity to present a one day Optimizer workshop at their annual Connect Conference in Maastricht on April 24th. The workshop will run...

Statistics

Incremental Statistics Maintenance – what statistics will be gathered after DML occurs on the table?

Hi visitor. This blog post was originally written in 2012. I still serves as a good introduction to incremental statistics. Once you've read it, there's now more on this topic here (part one of a three-part series). Incremental statistics maintenance was introduced in Oracle Database 11g to improve the performance of gathering statistics on large partitioned table. When incremental statistics maintenance is enabled for a partitioned table, Oracle accurately generated global level  statistics by aggregating partition level statistics. As more people begin to adopt this functionality we have gotten more questions around how they expected incremental statistics to behave in a given scenario. For example, last week we got a question around what partitions should have statistics gathered on them after DML has occurred on the table? The person who asked the question assumed that statistics would only be gathered on partitions that had stale statistics (10% of the rows in the partition had changed). However, what they actually saw when they did a DBMS_STATS.GATHER_TABLE_STATS was all of the partitions that had been affected by the DML had statistics re-gathered on them. This is the expected behavior, incremental statistics maintenance is suppose to yield the same statistics as gathering table statistics from scratch, just faster. This means incremental statistics maintenance needs to gather statistics on any partition that will change the global or table level statistics. For instance, the min or max value for a column could change after just one row is inserted or updated in the table. It might easier to demonstrate this using an example. Let’s take the ORDERS2 table, which is partitioned by month on ORDER_DATE.  We will begin by enabling incremental statistics for the table and gathering statistics on the table.   After the statistics gather the last_analyzed date for the table and all of the partitions now show 13-Mar-12. And we now have the following column statistics for the ORDERS2 table. We can also confirm that we really did use incremental statistics by querying the dictionary table SYS.HIST_HEAD$, which should have an entry for each column in the ORDERS2 table. So, now that we have established a good baseline, let’s move on to the DML. Information is loaded into the latest partition of the ORDERS2 table once a month. Existing orders maybe also be update to reflect changes in their status. Let’s assume that a large number of update transactions take place on the ORDERS2 table this month. After these transactions have occurred we need to re-gather statistic since the partition ORDERS_MAR_2012 now has rows in it and the number of distinct values and the maximum value for the STATUS column have also changed. Now if we look at the last_analyzed date for the table and the partitions, we will see that the global statistics and the statistics on the partitions where rows have changed due to the update (ORDERS_FEB_2012) and the data load (ORDERS_MAR_2012) have been updated. The column statistics also reflect the changes with the number of distinct values in the status column increase to reflect the update. So, incremental statistics maintenance will gather statistics on any partition, whose data has changed and that change will impact the global level statistics. You can get a copy of the script I used to generate this post here. As I mentioned at the beginning, there more on this topic here (this is part one of a three-part series).  

Hi visitor. This blog post was originally written in 2012. I still serves as a good introduction to incremental statistics. Once you've read it, there's now more on this topic here (part one of a...

Upcoming events

Upcoming events : Oracle User Group Norway Conference

On March 22nd the  Oracle user group in Norway will take to the sea for their annual conference! This year for the first time they are starting the conference on dry land on Wednesday March 21st, followed by a two day trip from Oslo to Germany and back on a ferry. I'm presenting the following three sessions at the conference; Wednesday, March 21st at 11 am Oracle Optimizer: Upgrading to 11g without pain In this session I will describe the different approaches you can take to use SQL Plan Management to prevent plan regressions during a database upgrade and beyond. Thursday, March 22nd at 5:00 pm Oracle Optimizer: Understanding Optimizer Statistics In this session I will discuss the features introduced in Oracle Database 11g to improve the quality and efficiency of statistics-gathering as well as strategies for managing statistics in various database environments. Friday March 23rd at 3:45 pm Tips to Prevent Suboptimal Execution Plans In this session I use 'how-to' examples to answer questions like why an index wasn't used, why partition eliminations did not occur, why statistics were ignored, and many more. You will learn to identify and quickly resolve these issues without the need for optimizer hints or changing initialization parameters. Hopefully I will see you there and I'll make it to and from Germany without getting sea sick! +Maria Colgan

On March 22nd the  Oracle user group in Norway will take to the sea for their annual conference! This year for the first time they are starting the conference on dry land on Wednesday March 21st,...

Diagnostic tools

Additional Information on SQL Patches

html{font-size:45%;} tbody{font-family:Arial, Helvetica, sans-serif; font-size:110%;} .mycode{font-family: monospace; font-size:100%;} .ilc{font-family: monospace; font-size:100%;} .u14mnav{font-size:100%;} .u14logo{font-size:100%;} var h1Headers = document.getElementsByTagName("h1"); for (var i = 0; i< h1Headers.length; i++) { h1Headers[i].style.fontSize = '28px'; } Our last post on SQL patches generated a lot of comments and questions.  Today I will address a couple of questions that are quick to answer.  For those that require more discussion, we'll post more details in the next few weeks. 1) Do we need additional licenses or the SQL diag pack to use this?  No.  No additional licenses are needed to use SQL Repair Advisor or SQL patches. SQL Repair Advisor is available as part of Oracle Database Enterprise Edition 11g. 2) Can I supply a SQL_ID instead of SQL text?  The API requires that the SQL text be passed in.  Keep in mind that the SQL patch will be applied to SQL statements that match this SQL text after normalization.  We use the same normalization that is used for SQL profiles, where the whitespace and the case (of non-literals) are normalized. Here's a quick example to show what I mean.  In the last post, we showed an example SQL patch created with an index hint, and saw the patch kick in for this query: If we add some whitespace, and change the case of some of the query, we will still use the SQL patch: So even though the SQL ID of the two queries is different, they both use the SQL patch.  This is why you cannot supply a SQL ID -- we aren't associating the patch with a SQL ID, but a (normalized) SQL text. You can get a copy of the script I used to generate this post here.

Our last post on SQL patches generated a lot of comments and questions.  Today I will address a couple of questions that are quick to answer.  For those that require more discussion, we'll post more...

Diagnostic tools

Using SQL Patch to add hints to a packaged application

html{font-size:45%;} tbody{font-family:Arial, Helvetica, sans-serif; font-size:110%;} .mycode{font-family: monospace; font-size:100%;} .ilc{font-family: monospace; font-size:100%;} .u14mnav{font-size:100%;} .u14logo{font-size:100%;} var h1Headers = document.getElementsByTagName("h1"); for (var i = 0; i< h1Headers.length; i++) { h1Headers[i].style.fontSize = '28px'; } From Oracle Database 12c Release 2 onwards there's a public API call to create SQL patches using DBMS_SQLDIAG.CREATE_SQL_PATCH. If you're using this release you should check out this post too. In the last post, I showed how you can use the BIND_AWARE hint to skip the monitoring phase of adaptive cursor sharing.  If you have a packaged application, you might be wondering how you can use this hint if you can't edit a query directly.  In this post I'll who you how to do just that, for any hint (or set of hints).  In order to do this, we will create a SQL patch for a query.  We haven't said much, if anything, about SQL patches here on the blog before.  A SQL patch is a SQL manageability object that can be generated by the SQL Repair Advisor, in order to circumvent a plan which causes a failure.  In essence, a SQL patch tells the optimizer to change the plan in some way or avoid an optimization so that the failure does not occur.  For instance, if a failure happens during index access, then generating a plan without that index (or without any index) may work around the failure.  You can read more about this in the documentation here. What we don't tell you in the documentation is that you can create your own SQL patch, which will tell the optimizer to use a set of hints when it optimizes a particular query.  You can use this trick to essentially insert hints into a query whose text you cannot edit.  You can do this with the function dbms_sqldiag_internal.i_create_patch.  This function lets you specify a query text and a set of hints to apply to the query.  I'll start with an example using the same query from the BIND_AWARE hint post. Recall from the last post, we saw that if we ran the query without the BIND_AWARE hint, the first child cursor to be generated would not be bind aware.  But with the BIND_AWARE hint, we get a bind-aware child cursor from the start: Now let's create the patch for this query.  The hint that we want to apply is simply "BIND_AWARE". And then see what happens when we run the original query, without the hint. I cleared the cursor cache before that last step, to make it easier to read.  If I hadn't done that, you should still see a new cursor, due to STB_OBJECT_MISMATCH. The BIND_AWARE hint is pretty simple, but you can use all kinds of hint text with this function.  The hint text just needs to work at the top level of the query.  So, for instance, if you want to hint an access path inside of a view, you can use global query block names to do this. Consider this query, which picks a full table scan by default: Let's say for some crazy reason you want to use an index instead.  If you wanted to hint the query with global query block names, it would look like this: So if you want to create a patch for this query instead, you would use that same hint text: And now if we run the query or use explain plan to generate the plan, we pick the index: And we even include a note in the plan that tells you that the SQL patch was used. So now you have enough rope to hang yourself.  As you may have noticed, we generally aren't proponents of hinting your queries; we've talked about some of the perils of hints here before.  But we know there are legitimate cases where you need to add hints, and now you can do this even with a packaged application in which you cannot edit the queries directly. You can get a copy of the script I used to generate this post here.  

From Oracle Database 12c Release 2 onwards there's a public API call to create SQL patches using DBMS_SQLDIAG.CREATE_SQL_PATCH. If you're using this release you should check out this post too. In the l...

Upcoming events

Upcoming events : Hotsos Symposium 2012

I have been lucky enough to be invited to the tenth annual Hotsos Symposium in Irving Texas on March 4-8. The Hotsos Symposium differs from other Oracle conferences because it is a dedicated Oracle performance conference and attracts some of the top performance speakers like, Tom Kyte, Cary Millsap, and Jonathan Lewis. If you get a chance you should really check it out. I have two sessions at this years symposium: Inside the Oracle 11g Optimizer With every new release of the Oracle Optimizer and the statistics that feed it are enhance in the hope of producing more optimal execution plans. Navigating the new features and behavior changes can be a daunting task for a developer and a DBA. This session describes what changes were made in the Optimizer in 11g and how you can take advantage of the new features to achieve more optimal execution plans. Tips to Prevent Suboptimal Execution PlansIn this session I use how-to examples to answer questions like why an index wasn't used, why partition eliminations did not occur, why statistics were ignored, and many more. You will learn to identify and quickly resolve these issues without the need for optimizer hints or changing initialization parameters.Hope to see you there! +Maria Colgan

I have been lucky enough to be invited to the tenth annual Hotsos Symposium in Irving Texas on March 4-8. The Hotsos Symposium differs from other Oracle conferences because it is a dedicated Oracle...

How do I

How do I force a query to use bind-aware cursor sharing?

Back in our original post on adaptive cursor sharing, a commenter asked if there was a way to skip the monitoring for a query and go straight to bind-aware cursor sharing.  We mentioned then that we would add a hint for this purpose, but we never addressed the new hint here on the blog.  From Oracle Database 11g, the BIND_AWARE hint can be used to force bind-aware cursor sharing from the first execution.  The hint only works if all of the other criteria required for bind-aware cursor sharing are met; for instance, the query must have binds, the binds must appear in where-clause predicates, and the columns in those predicates must have the proper statistics (e.g. a histogram) to allow the plan to actually change when the query is executed with different bind values. Let's look again at the query we used in that original post on ACS. select count(*), max(empno)   from emp   where deptno = :deptno; Recall that in that example, we had to execute the query twice, with different bind values, before we switched to bind-aware cursor sharing.  So we had to run the query a total of at least four times in order to populate the cursor cache with the two plans we want to use, depending on bind value.  This is what the cursor cache looked like after 5 executions: Now let's look at the behavior with the BIND_AWARE hint: After two executions with two different bind values, there are two bind-aware child cursors.  And as expected, the very selective bind value (9) used an index scan, while the not-so-selective bind value (10) used a full table scan:   Now that the cursor cache is populated with these two bind-aware child cursors, they will behave just as if they were created due to adaptive cursor sharing. For instance, if I run the query again with another very selective bind value, the first child cursor will be used. Before you start using this hint all over the place, keep in mind that there is a reason why we monitor queries before deciding to use bind-aware cursor sharing by default.  There are various forms of overhead from using this form of cursor sharing.  These include: A small amount of additional cursor memory, to store the information we need to pick the child cursor to use for a particular bind value. Additional overhead for soft parse, since we have to compute the selectivity with the current bind value in order to decide which child cursor, if any, is right for that bind value. Additional hard parses, which may not actually generate different plans for different bind values. More child cursors, which may cause cursor cache contention. These are all of the reasons why we take an adaptive approach to cursor sharing.  But for certain queries, which you know will benefit from different plans for different bind values, the hint can be helpful. Some of you might be wondering how you can use this hint for queries in a packaged application (where you cannot edit the query text).  I will discuss this in my next post. You can get a copy of the script I used to generate this post here.

Back in our original post on adaptive cursor sharing, a commenter asked if there was a way to skip the monitoring for a query and go straight to bind-aware cursor sharing.  We mentioned then that we...

Cursor Sharing

My cursor wasn’t shared because of STB_OBJECT_MISMATCH. What does that mean?

I got a question recently about cursor sharing and why a cursor wasn’t shared due to an STB_OBJECT_MISMATCH. And I thought the question and subsequent investigation might make an interesting blog post. Let me begin by outlining the question in detail. The SQL statement in question is a simple update: UPDATE /*test1*/ test a SET a.name='zjt' WHERE a.id > 6000AND rownum< 2; After the statement is executed for the first time a cursor is generated as expected. Then the statement is executed again and we can see from v$SQL that the same cursor (child cursor 0) is used for the statement (execution count has gone up to 2). So far so good. However, after the statement is executed a third time a new child cursor (child cursor 1) appears in v$SQL for this SQL Statement. So the question is why wasn’t the initial cursor used on the third execution? We should be able to get some sort of hint at what is going on by looking up the SQL_ID (cuqcxr9jurqgb) in v$SQL_SHARED_CURSOR. From the output above we can see that the reason we can’t share child cursor 0 on the third execution is because ‘SQL Tune Base Object dif’. You will also notice that there is a ‘Y’ in column 45 of the view. This column is the STB_OBJECT_MISMATCH. If you look up the explanation for the STB_OBJECT_MISMATCH column in the Oracle Database 11g Reference Guide it says ‘STB has come into existence since cursor was compiled’. So the database has a genuine reason for not sharing the cursor even if it’s not crystal clear to us what it is. The question now becomes what is a SQL Tune Base Object or STB_OBJECT? STB_OBJECTS are what we officially refer to as SQL Management Object, such as SQL profiles, SQL plan baselines, or a SQL patches. The reason the cursor was not being shared on the third execution was because a new SQL Management Object was created between the second and third execution for this SQL statement. When a SQL Management Object is created for a SQL statement the corresponding cursor is marked invalid to trigger a hard parse the next time it is executed. The hard parse is necessary so that the metadata about the SQL Management Object can be added to the cursor. Remember cursors are read-only only objects so if something changes or new information becomes available about the objects accessed in the SQL statement a new child cursor is required. We still don’t know what SQL Management Object was created for our SQL statement. But we can find out by querying v$SQL, which shows what SQL Management Objects are associated with each child cursors. So the SQL Management Object is a SQL plan baseline but where did it come from? If we look in the origins column of DBA_SQL_PLAN_BASELINES we can see it was automatically capture. Most likely the baseline was created because automatic plan capture was enabled via the init.ora parameter, OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES. We can confirm this by checking the value of the init.ora parameter. So the SQL plan baseline was automatically created because when automatic plan capture is enabled ( OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES = TRUE). When automatic plan capture is enabled Oracle will create a SQL plan baseline for any repeatable SQL statement. So the first time the statement was executed its SQL_Handle was recorded. The second time it was executed it was marked as a repeatable SQL statement and a SQL plan baseline was created. The original child cursor 0 was marked invalid to trigger a hard parse the next time it is executed so Oracle could include information about the new SQL plan baseline. Hence on the third execution a new child cursor 1 was created. You can get a copy of the script I used to generate this post here. +Maria Colgan

I got a question recently about cursor sharing and why a cursor wasn’t shared due to an STB_OBJECT_MISMATCH. And I thought the question and subsequent investigation might make an interesting blog...