Efficient Statistics Maintenance for Partitioned Tables Using Incremental Statistics – Part 3

Introduction

This post covers how Oracle has improved incremental statistics for partitioned tables in Oracle Database 12c Release 2. If you’re not already familiar with synopses in the context of incremental statistics then take a look at Part 1 and Part 2 before you read on.

Beginning with Oracle Database 12c Release 2 there's a new synopsis format that’s significantly more compact than the format used in earlier releases.  For brevity, I’ll refer to the Oracle Database 12c Release 2 format as new and the previous format as old.

This enhancement is particularity relevant if you looked at incremental statistics maintenance in the past but decided not to use it because of the additional space usage in SYSAUX. Oracle Database 12c Release 2 resolves this issue and, in many cases, reduces the amount of system resource required to manage synopses.

A New Type of Synopsis

A synopsis is metadata stored in a couple of tables in the data dictionary (SYS.WRI$_OPTSTAT_SYNOPSIS$ and SYS.WRI$_OPTSTAT_SYNOPSIS_HEAD$). The metadata is managed automatically by the Oracle Database, so there’s generally no reason to be aware of the underlying implementation. Nevertheless, I’ll cover some of the details here because it will help you to see why the change is so significant.

Synopses in Oracle Database 12c Release 2 are now created (by default) using an algorithm called HyperLogLog (HLL). This is a state-of-the-art algorithm that calculates the approximate number of distinct values for table column values. Even though it is an approximation, it is nevertheless very accurate with a typical error rate of less than 2%. Prior to Oracle Database 12c Release 2, synopses consisted of rows stored in the WRI$_OPTSTAT_SYNOPSIS$ table. The number of rows in this table can be very large if there are a large number of partitions and table columns, and if the columns contain a large number of distinct values. New-style synopses do not store rows in this table. Instead, some additional (and compact) hash data is stored in the WRI$_OPTSTAT_SYNOPSIS_HEAD$ table (in the SPARE2 column).

How much smaller are the new synopses? Well, as you've probably guessed, it's the consultants' answer: "It Depends". As outlined above, the space used by synopses is a function of the number of partitions, columns and distinct values in columns. I can give you an example from one of our test systems containing an 8TB test table with 84-partitions. The total size of the old-style synopses was around 160MB and the new-style synopses totaled only 6MB. The example I created in GitHub (see below) was contrived to generate a particularly large amount of synopsis data. In the example I've given, new-style synopses take up virtually no space at all and the old-style synopses take up about 160MB (in SYSAUX).

How to Use Them

What do you need to do to use new-style synopses? Nothing! They are used by default if you choose to use incremental statistics and you don’t need to do anything different in Oracle Database 12c Release 2 compared to earlier releases.

Controlling Synopses

You can control the type of that will be created using a DBMS_STATS preference called APPROXIMATE_NDV_ALGORITHM.

The default is REPEAT OR HYPERLOGLOG: if a table is using old-style synopses then it will continue to do so, and tables using new-style synopses will continue to use those!

There’s no reason to use anything other than the default unless you are upgrading a database to Oracle Database 12c Release 2. If this is the case then you might want to consider the options. That's covered next.

Upgrading

If you are upgrading a database that’s using incremental statistics, then you will want to migrate to using the new-style synopses. How do you go about doing that? It’s worth noting from the outset that it’s possible to have partitions with old-style and new-style synopses in the same table. Also, the good news is that you can control when and how to transition from one type of synopses to the other.

There is a DBMS_STATS preference called INCREMENTAL_STALENESS. It the controls whether or not you want to allow partitions within an individual table to have different types of synopses during the transition period from old-style to new-style. Let’s look at the different scenarios and how to proceed after you have upgraded to Oracle Database 12c Release 2. There is a spectrum of choice from “very conservative” (i.e., maintaining old behaviors) to “aggressive” (i.e., taking advantage of new features immediately). The chart below describes the different scenarios from the most conservative cases to most aggressive cases.

 Use-Case
Action

Initially, you want to continue to use old-format synopses for all tables.

We recommend that you use the new-style synopses, but can choose to use them later on if you prefer.

The algorithm used prior to HLL is called adaptive sampling.

EXEC DBMS_STATS.SET_TABLE_PREFS(
                <table_owner>,
                '<table-name>',
                'APPROXIMATE_NDV_ALGORITHM',
                'ADAPTIVE SAMPLING')

You want tables using old-style synopses to continue to use them.

Newly created incrementally-managed tables will use new-style synopses. Incrementally-managed tables without synopses will use new-style when statistics are gathered.

Incrementally-managed tables with new-style synopses will continue to use them.

No action. This is the default behavior.


The APPROXIMATE_NDV_ALGORYTHM is, by default, REPEAT OR HYPERLOGLOG.

You have some very large partitioned tables. They are using old-style synopses and you want to gradually replace the old with the new.

Old-format synopses are not immediately replaced and new partitions will have synopses in the new format. Mixed formats will yield less accurate statistics but the advantage is that there is no need to re-gather all table statistics in the foreground. The statistics auto job will gradually re-gather statistics on partitions with old format synopses and generate new format synopses. Eventually, new format synopses will be used for all partitions and statistics will be accurate.


EXEC DBMS_STATS.SET_TABLE_PREFS(
                <table_owner>,
                <table_name>,
                'APPROXIMATE_NDV_ALGORITHM',
                'HYPERLOGLOG')


Note that INCREMENTAL_STALENESS preference must have the value ALLOW_MIXED_FORMAT, but it does not need to be set explicity (unless you've changed it) because it is the default setting.

You have time to re-gather all statistics. Incrementally managed tables are using old-style synopses and you want to replace the old-style with the new immediately.

If you have a window of time to completely re-gather statistics for partitioned tables, then this is the recommended approach.


EXEC DBMS_STATS.SET_TABLE_PREFS(
   <table_owner>,
   <table_name>,
   'APPROXIMATE_NDV_ALGORITHM',
   'HYPERLOGLOG')


You also need to specify that you don’t want a mix of old synopses and new synopses in the same table:

EXEC DBMS_STATS.SET_TABLE_PREFS (
   <table_owner>,
   <table_name>,
   'INCREMENTAL_STALENESS',
   'NULL')

You need to take some care here. The preference value should be set to 'NULL' (in quotes) and not NULL (without quotes). NULL (without quotes) sets a preference to its default value, which in this case is ALLOW_MIXED_FORMAT.

Once these preferences are set you will need to re-gather the table’s statistics.

Remember that you can also set DBMS_STATS preferences (such as APPROXIMATE_NDV_ALGORITHM) at the database, global and schema-level as well as at the table level (as per the examples above).

Summary

The synopsis format in Oracle Database 12c Release 2 is much more compact than the previous format. If your database is very large, expect to save a lot of space in SYSAUX while maintaining very good accuracy for your statistics. You can expect  the system overhead required to manage synopses to drop too (for example, when you exchange partitions). For more on this and some example scripts, take a look at GitHub.

If you have comments on this post or the scripts in GitHub, please go ahead below.


Comments:

Team - Thanks for this post.

gone through the scripts at the github, very difficult to follow, expecially dont know in which order should i go through them.

It would be really cool if you could walk through the set of scripts available in the gituhub, one by one, with some brief explanation.

Posted by Rajeshwaran on March 27, 2017 at 06:14 PM PDT #

Hi nbayliss-oracle,

Are You plan to add some possibility (for user) to tune HLL algorithm for example how much hash values can be stored in base HLL structure (for example by adding some parameters to dbms_stats and APPROX_COUNT_DISTINCT) ? As if we use bigger internal structure for HLL we can have much accurate data and for some tables (for example smaller tables) we can use much smaller structure to have very close estimate.

Another question, APPROX_COUNT_DISTINCT use HLL or Adaptive Sampling algorithms as it was introduced in 12.1 but HLL as You said in 12.2 ?

Best Regards

Posted by guest on April 07, 2017 at 01:45 AM PDT #

Hi - I agree with you. We've been migrating the scripts to a new location and I can see that I need to brush up a lot of the README files. Expect me to do that in the next week or two.

This is the new location:

https://github.com/oracle/oracle-db-examples/tree/master/optimizer

I will update the links in the blog posts when I've made the changes.

Thanks,
Nigel.

Posted by NigelBayliss on April 11, 2017 at 02:55 AM PDT #

Hi -

Regarding HLL - we don't have plans to make HLL tunable at the moment. It works well for large data sets and incremental stats maintenance is not usually needed for small datasets. The hash structures are not particularly large anyway.

Correct: approx count distinct was introduced in 12.1 and uses HLL, but new in 12.2 is HLL used for synopses.

Regards,
Nigel

Posted by NigelBayliss on April 11, 2017 at 03:11 AM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

The Oracle Optimizer blog is written by members of the Optimizer development team. The goal of this blog is to provide an insight into the workings of the Optimizer and the statistics it relies on. The views expressed on this blog are our own and do not necessarily reflect the views of Oracle and its affiliates. The views and opinions expressed by visitors on this blog are theirs solely and may not reflect ours.

Search


Archives
« April 2017
SunMonTueWedThuFriSat
      
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
      
Today