Reducing KGL latch contention while querying v$ internal views in 9.2.x RDBMS

Preface


Latch contention can be the hidden killing element that would slow down your system in a very subtle way: creating a new database connection would take more time (upto 2 mins) if there is medium to extreme contention for different KGL latches. While no amount of tracing would uncover it for you, this is one area where statspack analysis can help you.

In one such customer production instance, one would observe inexplicable occasions when the production instance would slow down suddenly. During these times, it was noticed that spawning a new sql*plus connection would take a long time (sometimes upto 1-2 mins) as compared to less than 2-3 seconds. How is one even supposed to handle such kinds of "freezes"?

Could be tell when this DB freeze was happening?

A very simple piece of unix Shell script was written for checking the time taken for logging into SQL*plus:

check_response()
{
#
# get the latch free waits
#
sqlplus -s "/ as sysdba" > ${script}/apps/tempmsg.out << EOF
exit
EOF

time sqlplus -s system/${PASSWD}@${ORACLE_SID} << EOF
exit
EOF
echo "\n\n++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n\n" >> ${script}/apps/tempmsg.out
}

notify()
{
##################################
# the threshold is so many seconds
##################################
threshold=10

echo "--------------------------------------------------------------" >> ${script}/apps/tempmsg.out
echo "PLEASE KEEP THIS EMAIL FOR FUTURE REFERENCE....\n\n" >> ${script}/apps/tempmsg.out
echo "This is an attempt to de-mystify and automatically record the occurance of so called system lockups/freezes.\n\n" >>  ${script}/apps/tempmsg.ou
t
echo "THINGS YOU MIGHT HEAR ABOUT SOON:" >>  ${script}/apps/tempmsg.out
echo "++++++++++++++++++++++++++++++++" >>  ${script}/apps/tempmsg.out
echo " 1) The customer service reps complain of freezing or hanging or locking up or blue screens." >> ${script}/apps/tempmsg.out
echo " 2) Simple DB login for a new sessions might take ~ 1 minute as opposed to instantly.\n\n" >> ${script}/apps/tempmsg.out
echo "--------------------------------------------------------------" >> ${script}/apps/tempmsg.out


if [ $time_taken_in_seconds -gt $threshold ];
then
  ###################################################################
  # send out a message to all the people who want to be notified
  ###################################################################
fi
}

The Statspack story

The statspack analysis told a very interesting story. It seemed that the latch contention and misses were quite high during these "freezing" times. In the statspack, the largest number of sleeps and  waiter sleeps were for caller kglic. 

Now, kglic is the code which goes through the  library cache and row cache to answer queries on various dictionary fixed  views and tables. This is the  function which returns data for the fixed views and tables that scan the sql  area.

Therefore, it was highly possible that such queries could also be coming from  monitoring tools used by DBAs and they are not restricted to the two views  specifically mentioned in the bug by Joan. Any monitoring job which looks at  v$open_cursor would also use the kglic iterator.

One of the contributers would be dbms_pipe: a quick test using dbms_pipe was done - one session loops putting messages in a  pipe and a separate session reads messages off the same pipe.

The loop was  run 100,000 times. What was found was that library cache latch gets in 9i are  double that of 10g (approx. 400,000 in 10g vs 800,000+ in 9i). If anything,  dbms_pipe performance seems to be more optimized in 10g.

Potential Workarounds

One could possibly reduce the  frequency of queries on v$sql views and x$kglob etc. to see if that reduced the contention. While this was possible, certainly the sql monitoring scripts were not the only contributors to this kind of latch contention.

Resolving it through RDBMS code fix

After a lot of research, the following 9207 one-off fixes seemed to be addressing this dependency graph issue:

1) Patch 4451759   [ Base Bug(s): 4451759  ]

   This bug is fixed in 9208 as per bug 4635723.

2) Patch 5094515   [ Base Bug(s): 4450964 5094515 4339128  ]

  Bug 4450964  is fixed in 9208 as per bug 4509067.
  Bug 4339128  is fixed in 9208 as per bug 4482601.

Also, this latch contention issues are fixed in 9.2.0.8 patchset. While there is a rare chance that not many customers are on 9.2.0.7 now, now that it has been desupported by Oracle Support and is supported on a limited basis. At the same time, I am aware that there are a lot of people who are on 9.2.0.7. This article might be of interest to them.

Seeing the advantage after the RDBMS patches

To simulate this freeze issue, a bunch of sql scripts which were monitoring x$kglob, v$open_cursor and v$sql views were run continuously for about 2 hours across 20 threads simulataneously. This did simulate the hang situation for the database login. The same regression was done after the patches and statspack data (level 5) was collected.

Statspack top events, latch waiting and contention information before and after the RDBMS patches:

1) This diagram shows the comparision of the latch misses before and after the patches:

latch misses sources for DB - before and after.GIF:

2) This diagram shows the comparision of the sleeps before and after the patches:


latch sleep breakdown for DB - before and after.GIF:

Conclusion

The moral of the story is over querying from the v$ and x$ views can sometimes be detrimental and cause login slowdowns which may be hard to even debug or analyze. More analysis is always possible through hang analyze commands, but that will serve as a confirmation point for checking the depth of the session dependency graphs and if there is genuine deadlock/race condition. Statspack analysis and simple sql scripts can also be used a complementary tool for checking if the latching condition has been improved.

The 9.2.0.8 patchset delivers important latch contention fixes in it and hence should be imbibed as soon as possible.


Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

bocadmin_ww

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today