Rethinking Oracle Optimizer Statistics for P6
By Brian Diehl on Sep 17, 2013
This is the first in a series of posts about Oracle Optimizer Statistics and the P6 application. Over the past six months, I've done a lot of rethinking about the best process for maintaining optimizer statistics given the type of data in the PM database and the variety of queries coming from the application. I need to emphasize that this by no means constitutes general guidance with regard to Oracle Optimizer Statistics for any other applications. These recommendations are only applicable to the Primavera products and, in this case, only to P6 8.x. My hope is for a more nuanced approach to optimizer statistics to better achieve performance, scalability and stability goals.
Optimizer Statistics Purpose
Statistics are calculated (or estimated) about the number of rows and cardinality of columns in the database. (Additional histogram data may also be collected about the distribution of data within a column.) The stored version of these statistics are used by the query optimizer to find the most efficient way to execute a given query. As a general rule, the more accurate the statistics--the closer they reflect the actual contents of the table--the more optimal the resulting query plan.
Of all the statistics, it is column selectivity that has the largest impact on the query plan and its subsequent execution. Selectivity means (simply) that for a given column predicate (such as proj_id=101), what is the expected number of rows that result (cardinality). Cardinality plays a huge role in determining the sequence of table access in a query plan.
In an ideal optimization, the predicate references a literal value (proj_id=101) that results in a specific estimation of rows (proj_id=101 will return 2010 rows). But more often what is used is an estimated cardinality because the reference predicate contains a bind variable (proj_id = :pid) or is part of a join (t.proj_id = p.proj_id). In these cases, it is the average cardinality that comes into play. For the bind variable, we do not know what the value will be, so the best choice is the average cardinality (number of rows / unique column values). In the join case it is the average cardinality multiplied by the cardinality of the joined table. While this is a broad simplification of the optimization process, my point is that the critical element is the average cardinality.
P6 Transactional and Reporting
The typical P6 query involves mostly bind predicates or, most commonly, a primary/foreign key join condition. Most predicates, against the larger application tables, are some form of join condition where the number of unique values are variable at execution time. This means the estimation of cardinality has two levels of estimation: Rows from the join condition and average cardinality of the joined column. Take the following query as an example:
select task_id, task_name, task_code... from task t, projshar s where t.proj_id = s.proj_id and s.session_id = :sess
The first estimate is the number of rows returned from projshar based on the bind predicate session_id = :sess. The estimate of task cardinality for this query is the average cardinality for proj_id multiplied by the previously calculated cardinality of the projshar table.
The point of this is to emphasize that calculated average cardinality is the most relevant statistic for transactional and OLTP reporting queries. And this is the key insight I want readers to take away from this post. While the rows are deleted and inserted into the database every day, the averages act like averages always do; they don't change in significant ways. In fact, the cardinalities are more about the shape of your particular P6 implementation. This is unlikely to change frequently unless you are greatly changing the way you are using P6.
Next up I will discuss our real goals for query optimization and the impact on how statistics should be captured.