This post covers how Oracle has improved incremental statistics for partitioned tables in Oracle Database 12c Release 2. If you’re not already familiar with synopses in the context of incremental statistics then take a look at Part 1 and Part 2 before you read on.
Beginning with Oracle Database 12c Release 2 there's a new synopsis format that’s significantly more compact than the format used in earlier releases. For brevity, I’ll refer to the Oracle Database 12c Release 2 format as new and the previous format as old.
This enhancement is particularity relevant if you looked at incremental statistics maintenance in the past but decided not to use it because of the additional space usage in SYSAUX. Oracle Database 12c Release 2 resolves this issue and, in many cases, reduces the amount of system resource required to manage synopses.
A synopsis is metadata stored in a couple of tables in the data dictionary (SYS.WRI$_OPTSTAT_SYNOPSIS$ and SYS.WRI$_OPTSTAT_SYNOPSIS_HEAD$). The metadata is managed automatically by the Oracle Database, so there’s generally no reason to be aware of the underlying implementation. Nevertheless, I’ll cover some of the details here because it will help you to see why the change is so significant.
Synopses in Oracle Database 12c Release 2 are now created (by default) using an algorithm called HyperLogLog (HLL). This is a state-of-the-art algorithm that calculates the approximate number of distinct values for table column values. Even though it is an approximation, it is nevertheless very accurate with a typical error rate of less than 2%. Prior to Oracle Database 12c Release 2, synopses consisted of rows stored in the WRI$_OPTSTAT_SYNOPSIS$ table. The number of rows in this table can be very large if there are a large number of partitions and table columns, and if the columns contain a large number of distinct values. New-style synopses do not store rows in this table. Instead, some additional (and compact) hash data is stored in the WRI$_OPTSTAT_SYNOPSIS_HEAD$ table (in the SPARE2 column).
How much smaller are the new synopses? Well, as you've probably guessed, it's the consultants' answer: "It Depends". As outlined above, the space used by synopses is a function of the number of partitions, columns and distinct values in columns. I can give you an example from one of our test systems containing an 8TB test table with 84-partitions. The total size of the old-style synopses was around 160MB and the new-style synopses totaled only 6MB. The example I created in GitHub (see below) was contrived to generate a particularly large amount of synopsis data. In the example I've given, new-style synopses take up virtually no space at all and the old-style synopses take up about 160MB (in SYSAUX).
What do you need to do to use new-style synopses? Nothing! They are used by default if you choose to use incremental statistics and you don’t need to do anything different in Oracle Database 12c Release 2 compared to earlier releases.
You can control the type of that will be created using a DBMS_STATS preference called APPROXIMATE_NDV_ALGORITHM.
The default is REPEAT OR HYPERLOGLOG: if a table is using old-style synopses then it will continue to do so, and tables using new-style synopses will continue to use those!
There’s no reason to use anything other than the default unless you are upgrading a database to Oracle Database 12c Release 2. If this is the case then you might want to consider the options. That's covered next.
If you are upgrading a database that’s using incremental statistics, then you will want to migrate to using the new-style synopses. How do you go about doing that? It’s worth noting from the outset that it’s possible to have partitions with old-style and new-style synopses in the same table. Also, the good news is that you can control when and how to transition from one type of synopses to the other.
There is a DBMS_STATS preference called INCREMENTAL_STALENESS. It the controls whether or not you want to allow partitions within an individual table to have different types of synopses during the transition period from old-style to new-style. Let’s look at the different scenarios and how to proceed after you have upgraded to Oracle Database 12c Release 2. There is a spectrum of choice from “very conservative” (i.e., maintaining old behaviors) to “aggressive” (i.e., taking advantage of new features immediately). The chart below describes the different scenarios from the most conservative cases to most aggressive cases.
|Initially, you want to continue to use old-format synopses for all tables.
We recommend that you use the new-style synopses, but can choose to use them later on if you prefer.
|The algorithm used prior to HLL is called adaptive sampling.
|You want tables using old-style synopses to continue to use them.
Newly created incrementally-managed tables will use new-style synopses. Incrementally-managed tables without synopses will use new-style when statistics are gathered.
Incrementally-managed tables with old-style synopses will continue to use them.
This is the default behavior.
|You have some very large partitioned tables. They are using old-style synopses and you want to gradually replace the old with the new.||Old-format synopses are not immediately replaced and new partitions will have synopses in the new format. Mixed formats will yield less accurate statistics but the advantage is that there is no need to re-gather all table statistics in the foreground. The statistics auto job will gradually re-gather statistics on partitions with old format synopses and generate new format synopses. Eventually, new format synopses will be used for all partitions and statistics will be accurate.
Note that INCREMENTAL_STALENESS preference must have the value ALLOW_MIXED_FORMAT, but it does not need to be set explicity (unless you've changed it) because it is the default setting.
|You have time to re-gather all statistics. Incrementally managed tables are using old-style synopses and you want to replace the old-style with the new immediately.||
If you have a window of time to completely re-gather statistics for partitioned tables, then this is the recommended approach.
Remember that you can also set DBMS_STATS preferences (such as APPROXIMATE_NDV_ALGORITHM) at the database, global and schema-level as well as at the table level (as per the examples above).
The synopsis format in Oracle Database 12c Release 2 is much more compact than the previous format. If your database is very large, expect to save a lot of space in SYSAUX while maintaining very good accuracy for your statistics. You can expect the system overhead required to manage synopses to drop too (for example, when you exchange partitions). For more on this and some example scripts, take a look at GitHub.
If you have comments on this post or the scripts in GitHub, please go ahead below.