Optimizing the PL/SQL Challenge IV: More OR Condition Woes
By Chris Saxon-Oracle on May 21, 2015
In the previous post in the PL/SQL Challenge optimization series, we'd made huge performance gains simply by removing optional bind parameter clauses. The main body of the query is still doing a significant amount of work however and taking several seconds to execute. Fortunately there are simple changes we can make to improve its performance. Let's take a look.
Reviewing our trusty autotrace report we can see the following:
The red boxes highlight key issues in the execution plan. The three index scans are all accessing over 86,000 rows. The index lookups against qdb_comp_events are done over 21,000 times. There's something more insidious that's not apparent from looking at this plan though:
All three indexes are also on the qdb_comp_events table!
Essentially the optimizer has combined three indexes from one table. It's then filtered these results along with data from other tables. Several steps later it goes back and accessing the original table via its primary key. Ouch!
Clearly something's going wrong here.
Let's take a look at the join conditions for the comp events table:
and ce.competition_id = c.competition_id and ce.comp_event_id = qz.comp_event_id and ce.hide_quizzes = 'N' and ( ce.approval_needed = 'N' or ( ce.approval_needed = 'Y' and ce.approved_by_user_id is not null)) and ( ce.end_date <= sysdate or ( qt.text = 'Free-Form Text' and ce.start_date <= sysdate) or ( trunc (ce.end_date) = trunc (sysdate) and c.frequency = 'D') or ( ce.start_date <= sysdate and trunc (ce.end_date) = trunc (next_day (sysdate, 'SAT') - 1) and c.frequency = 'W') or ( ce.start_date <= sysdate and trunc (ce.end_date) = trunc (add_months (sysdate, 1), 'MON') - 1 and c.frequency = 'M'))
Hmmm, another thorny set of OR conditions. No wonder the optimizer is struggling! Is there anything we can do to simplify or remove these?
Let's tackle clause the approval_needed clause first.
Currently there are no indexes including these columns. We could create an index on (approval_needed, approved_by_user_id). It's then possible Oracle could do an OR expansion using this index. The optimizer isn't guaranteed to select such an expansion however. Even if it does, we'll be accessing the table or its index multiple times. This is likely to be less efficient than a single AND condition (with supporting index).
Can we rewrite this OR clause as a single condition?
Let's review what it's actually doing. In English, we're saying:
If the event requires approval (approval_needed = 'Y'), then ensure that someone has done so (approved_by_user_id is not null). Otherwise we don't care whether approved_by_user_id has a value or not.
This gives us a clue to how we can rewrite this. Firstly, we can check if approval_needed equals 'N'. If it does, then map to any non-null value. Otherwise use the value for approved_by_user_id.
How can we do this?
Using decode() (or case), like so:
and decode(ce.approval_needed, 'N', 'X', 'Y', ce.approved_by_user_id) is not null
We can replace the previous OR clause in our query with the above and create a (function based) index including this expression. The optimizer can then use this index without having to use OR expansions!
Applying just this condition to qdb_comp_events returns all except a handful of rows however. Therefore an index that only includes this expression is unlikely to be useful. Let's keep looking at the query to see if there are any other improvements we can make.
The second set of OR expressions are complex, referencing the start_date and end_date columns alternately along with other tables. This means there's no easy way to rewrite it to enable index use.
We can still make some changes here to improve the indexability of the query however. Look carefully at the subexpressions:
and ( ce.end_date <= sysdate or ( qt.text = 'Free-Form Text' and ce.start_date <= sysdate) or ( trunc (ce.end_date) = trunc (sysdate) and c.frequency = 'D') or ( ce.start_date <= sysdate and trunc (ce.end_date) = trunc (next_day (sysdate, 'SAT') - 1) and c.frequency = 'W') or ( ce.start_date <= sysdate and trunc (ce.end_date) = trunc (add_months (sysdate, 1), 'MON') - 1 and c.frequency = 'M'))
Most of these check whether start_date <= sysdate. The two that don't validate that end_date <= sysdate and trunc(end_date) = trunc(sysdate). Checking the data we can see that, as we'd expect, start_date <= end_date for all rows. Therefore we can infer that everything returned by the query will have a start_date <= sysdate. This enables us to add this predicate outside the OR branches. Doing so permits Oracle to use an index including start_date without fancy OR expansions or other transformations.
Finally, hiding quizzes is deprecated functionality. All the rows have the value 'N' for the hide_quizzes column. Therefore we can remove the hide_quizzes = 'N' clause without affecting the results.
Putting this all together we can rewrite where clause above as:
and ce.competition_id = c.competition_id and ce.comp_event_id = qz.comp_event_id and decode(ce.approval_needed, 'N', 'X', 'Y', ce.approved_by_user_id) is not null and ce.start_date <= sysdate and ( <big or branch trimmed for brevity> );
Having simplified the query, it's time to re-evaluate the indexes to see if we can create one that better suits our query. We now have checks against the following columns or expressions on qdb_comp_events:
- decode(ce.approval_needed, 'N', 'X', 'Y', ce.approved_by_user_id)
Comp_event_id is the primary key for the table. Therefore including this in the index is unlikely to give much benefit. Of the three remaining three conditions, competition_id is the only one with an equality (=) check against it, so this should be the leading column in the index. This leaves us with two candidate indexes:
- competition_id, decode(approval_needed, 'N', 'X', 'Y', approved_by_user_id), start_date
- competition_id, start_date, decode(approval_needed, 'N', 'X', 'Y', approved_by_user_id)
Let's try both out and see what impact they have on performance. First up, the index with start_date listed last:
Hmm, we're still accessing qdb_comp_events in the same way. There are some other (unshown) differences in the execution plan. These have only led to a slight drop in the work the query does though.
Let's try the second ordering:
Aha! Now we're getting somewhere!
The explain plan looks completely different. Gone are those crazy index fast full scans. Oracle's using the index we've just created, qdb_coev_comp_start_apprv_i, to return just 1,744 rows from the comp events table instead of over twenty thousand. Additionally the total gets performed by the query is down from ninety thousand to just under six thousand, an order of magnitude less work!
What have we learned here?
When you have or conditions in your query it's worth taking a moment to ask yourself the following questions:
- Can you move any of the checks outside of the OR into the main body of the query?
- Can the OR be rewritten using a function such as case or decode?
Moving checks outside the OR is particularly beneficial when the branches check different columns on a table. If you're can find a condition to apply against a single column outside of the OR expressions it increases the chances that Oracle will use an index (against that column). To do this you need good knowledge of the data and logical deduction skills or you may find your updated query is wrong!
Changing an OR condition into a function requires similar logical reasoning skills. Additionally you'll have to create an index including an exact copy of the new expression for the optimizer to use an index access path. In some cases you may need to make a trade off. Changing your query and creating a targeted, more efficient function-based index may be the best solution for your query. Sticking with the original OR can be better however if a plain index including the columns involved will be usable by many other queries in your system.
We've made great improvements to the query so far. There's still more to come however, so stay tuned for more posts in this series!