Yet Another Case Study: The over-commit trap!
By gaurav.verma on Mar 01, 2008
IntroductionIts always rewarding to talk about lesser known performance tuning scenarios, which are not all that well appreciated in the DBA or developer world. Ask anyone as to what performance tuning has one worked upon and quite likely, you may hear - Ah well, I made sure the stats were gathered latest, the right index was being used or I used a hint to use the right index. How often do we look at other aspects like the frequencey of commits?
An atypical performance problem scenarioAt a particular client site, a custom concurrent progam had a history of taking more than 8-10 hrs to complete.
The Applicaton load nature was a mixture of high OLTP during the regular business day hours and more than moderate batch processing during the evening to early morning hours. Needless to say, a lot of long running batch jobs were contending for shared resources like Rollback & temp segments.
A lot of diagnosis was done as to why the custom concurrent program would take so long. Quite often, the program would terminate with the notorious ORA-1555 (snapshot too old error), even though the data volume involved was quite less (7000 records) and abundant rollback segment space -- 120g.
Even setting a high rollback segment retention initialization parameter undo_retention as 8 hours did not help out in ironing out the performance issue.
Examining the possibilities...There could have been several reasons why the performance lag was manifesting:
- The query had a bad plan and the compile time plan was different than runtime plan. This happens many times, but it didnt seem to be the case
- A hint for using unique index (from using index range scans) to avoid db sequential file read wait event seemed to make no difference
- The level of DML activity against tables within the process could be simply too much for the batch load to complete successfully in a suitable time window. This theory did not sound right since rollback retention period ~ 8 hours
- The program was not doing commit or rollback often enough to be stay within the rollback retention period?
Foxy observations..While sql tracing (event 10046) identified that a simple select on a custom table having not so much data was having a hard time, the corresponding SQL plan was using a unique index (!!). What more could one ask for? The situation was almost wierd.
At the same time, pl/sql profiling (dbms_profiler) identified that FETCH phase of that very simple select was taking long. The HTML output generated from plsql profiler runs using the profiler.sql script of Metalink Note 243755.1 was especially helpful in getting to this stage of analysis to determine where was the maximum delay in the entire process.
Getting to the bottom of itAgain, one more thing identified, which could be missed easily, in the tkprof'ed output of raw sql trace was that the CR blocks read for the fetch phase of the cursor were high. The sql engine had to read a huge number of block reads for forming a read consistent image for the FETCH phase. This aspect is not readily discernable since the regular thought process is to look at elapsed time and SQL plan.
What could the reason for such a high number of block reads for forming a read consistent image ? Too few commits/rollbacks or too much commits/rollbacks? Overdoing anything is usually not a rewarding experience.
Looking closer into the code...On re-examining the PL/SQL profiler HTML output, it was seen that the program logic was executing a commit statement within a loop for every input record processed.
This would obviously overtax the rollback segments and causing the user server process to record a read consistent image of the corresponding transaction data at every commit.
Taking cue from this observation, when commits were done in a batch fashion at some threshold, the entire load ran in 4 hours (Input volume: 7000 lines).
Neither is it good to do commits less frequently, nor is it good to over-commit. Moderation is the key for a happy medium and at times, it can have bearings on performance problems as well.
Who said software cannot learn from Philosophy?