Monday Apr 29, 2013

Parse and Parse again.

Since the first release of JDBC and event longer ago with Pro* Precompilers to this day I still see the following issue :

Please notice below in this abstract of a 30 min AWR the number of Parse Calls vs Executions, the SQL Module and in the case of the first SQL, the SQL Text too. 

Parse Calls Executions % Total Parses SQL Id SQL Module SQL Text
339,567 339,566 15.93 0mr7azgm9psws JDBC Thin Client SELECT COUNT(*) FROM DUAL
101,574 101,880 4.77 5uta56d6syjv3 JDBC Thin Client select TRIM(emp_fname || ' ' |...
46,800 46,799 2.20 6vw7kzhpb1dm3 JDBC Thin Client select "STATUS_ID" as "status...
35,509 35,509 1.67 648yjr5s1w7uc JDBC Thin Client SELECT "PROTOCOL_GROUP_ID" as...
31,292 32,540 1.47 0jvyqkxaytndr bvlab.exe select crf.CRF_LABEL from COMP...

Since the introduction of the shared pool in Oracle7 it was available the capability for clients to have a SQL Statement Cache in order to do the famous "Parse once, Execute Many" best practice. This feature is controlled on the RDBMS by the SESSION_CACHED_CURSORS parameter and this makes many people believe that the feature is managed at the RDBMS side alone but this far from the truth.

When this feature is not used then the client software parses the statement for each execution does not matter if the parameter is set in the database.
Many take this feature for granted because SQL*Plus uses it, so they assume it is true for other client software but it is a feature that has to be explicitly managed by the client software and works in conjunction with the RDBMS when the parameter is set.

In JDBC it is necessary to explicitly enable it and has an implicit setImplicitCachingEnabled and an explicit setExplicitCachingEnabled way to manage the cache

It is a little easier with the precompilers by setting are precompile time the HOLD_CURSOR and RELEASE_CURSOR parameters.

Take a moment to look at your own AWR or Statspack and verify your application is using the Client Statement Cache as intended.

Now the cherry of the sundae is the SQL "SELECT COUNT(*) FROM DUAL".

Over the years I've seen variations of queries over DUAL coming from JDBC in large quantities. I have been given the explanation from Customers and Developers that these queries are used as a heartbeat, to confirm the connection to the database is stil open.

I can understand that was necessary when jdbc and connection pooling was a young technology but isn't it anything better nowadays that will do the same thing without parsing and parsing the same useless statement without end ?

I would like to hear from people about what other techniques/features are used that can avoid the need to parse a bogus SQL to the database. 

swing out sister - windmills of your mind

Tuesday Apr 16, 2013

Disabled Fixes ... Hidden gems of the RDBMS

When making changes to the RDBMS,the developers of Oracle are always wondering if the new change will introduce a regression.
Every change is checked against a suite of regressions tests to try to trap if the change introduced a regression and fix it before it is released to our clients.
Just in case that change does introduce a regression the developers add parameters and events to disable them. These are the workarounds used when a defect is encounter.
But sometimes the developers are overly cautious and decide that a change may be better to be introduced in a more controlled manner and decide to leave it disabled and only very few people would know how to enable them.
Another reason is that these changes are introduced to help one specific customer and there is no eviedence to support it might help or hurt someone else.

These disabled changes are still tested for regressions anyway.

With the introduction of FIX CONTROL in 10.2.0.2 there is now discover-able information about these disabled fixes.

SQL> desc v$session_fix_control
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 SESSION_ID                                         NUMBER
 BUGNO                                              NUMBER
 VALUE                                              NUMBER
 SQL_FEATURE                                        VARCHAR2(64)
 DESCRIPTION                                        VARCHAR2(64)
 OPTIMIZER_FEATURE_ENABLE                           VARCHAR2(25)
 EVENT                                              NUMBER
 IS_DEFAULT                                         NUMBER

SQL> select value,count(*) from v$system_fix_control group by value order by value;
     VALUE   COUNT(*)  -- 11.2.0.3
---------- ----------
         0         47  <-- Disabled
         1        600  <-- Enabled
         3          4  ] 
         7          1  ]__ Configurable
        15          1  ]
        20          1  ]

So in 11.2.0.3 there are 47 fixes that dormant, waiting for an application to take advantage of them.

Lets see an example :

Bug 6708183 - CBO cannot get good cardinality estimate for TABLE() functions [ID 6708183.8]
Introduced in 11.1.0.7 and 11.2.0.1.

This fix allows for a limited version of dynamic sampling to look inside a Collection Table in an effort to get a better cardinality estimation.
By default the CBO makes a guess that the collection has as many rows of 100 bytes that can fit in 100 block and that usually translates to 8168 rows.

There are exception when this fix is enabled where still cardinality is not better estimated and are corrected in posterior fixes or are still to be corrected but all of them them depend on fix 6708183 to be enabled first.

The fix is enabled by "_fix_control" = '6708183:ON' or by adding a DYNAMIC_SAMPLING hint in the query.

If you wish to know more about one particular disabled (or enabled) fix, try its bugtag first.
if it does not have a bugtag let me know and I'll see if one can be made.

Friday Apr 12, 2013

The Best Kept Secret in Support That is Not Suppose to be a Secret

BugTags 

What are Bugtags ?

The bug database keeps dialogue and fix information for suspected product defects for various Oracle products. 
Some bugs are true defects and get fixed in some code line, sometimes with detailed information about the defect and sometimes with very little comment.
Some times they are enhancements to features or new features altogehter, also with or without details.

A bugtag is a separate entity which can be created for any fixed defect or enhancement and includes summary information and categorizations of the defect
or enhacement.
A bugtag is an optional extension to a fixed defect or added enhancement.

See : Introduction to "Bug Description" Articles [ID 245840.1]) for more details.

What products BugTags cover ?

Currently a bugtags are only created for server products, and mostly only the RDBMS.

What information is in a BugTag ?

A bugtag include attributes such as:
  • Bug number that the bugtag describes
  • A summary title (intended to be a customer friendly bug title)
  • A short description of the bug issue (a few paragraphs at most)
  • A range of versions believed to be affected by the bug
  • A list of specific versions known to be affected by the bug
  • A list of specific versions where the bug is fixed (eg: Patch set versions / Patch set update versions etc. where the fix is included)
  • Some categorization of the bug (to give an indication of product area / feature affected, type/s of symptom seen etc..)
  • Hooks to indicate any key errors, parameters etc  which may be relevant
  • Optional link to a more specific document about the bug
  • Some indication of if an issue is particularly notable or not
  • For port specific issues which platforms are affected
  • A simple testcase,if available, to reproduce the issue as example of what to look for.

Where Can I find the BugTags ?

Bugtag data is propogated to customers via MOS Notes in several ways:

  • Each bugtag is represented by a single MOS Note (document id = bugno.8) 
Eg (Defect 16475397.8  / Enh 10248538.8)
  • Certain reference notes include automatically maintained lists of bugs derived from bugtags inlined into them. 
Eg: ORA-600/7445 reference notes include a "Known Bugs" section which is automatically populated using the bugtag data for bugtags marked with a HOOK for that ORA-600, etc..
ORA-600 [kclchkblkdma_3] [ID 468202.1]
  • Certain notes have sections created from reports against the bugtag data.
Eg: Init.ora Parameter "CURSOR_SHARING" Reference Note [ID 94036.1]
Eg: 11.2.0.3 Patch Set Updates - List of Fixes in each PSU [ID 1449750.1]

Who writes the BugTags ?

Bugtags are created and maintained internally by Oracle Support.
Among them, Me (Abel Macias). I specially like writing the ones for CBO and performance related bugs and fixes included in the Exadata Bundle Patches.

What is the lifecycle of BugTag ?

When a defect is fixed then it becomes eligible for making a bugtag.
Support Analysts will pickup the task to make the bugtag note and posted in MOS.
In the following events the note becomes elibile for an update (some updates are automatic, some are manual):
  • Someone adds a comment to the Note.
  • A new base version includes the fix in its code.
  Lets assume the fix for bug 1 is scheduled to be included in a future version like 12.1.0.1. While the future version is not officially released the note will indicate 12g (future release)
  •   When 12.1.0.1 is released and the fix is confirmed to be included in it then the note is updated indicating it is fixed in 12.1.0.1.
  • A regression was introduced by this fix or has been superseded with another fix.
  • A second defect filed is closed as duplicate of the defect and has new symptoms not observed before.
  Like a defect may address one particular ora-600 but later a different ora-600 is identified that is fixed by the same fix.
  Eg: Bug 10408668 - Assorted Dumps with Queries with distinct aggregates having subqueries as arguments [ID 10408668.8]

Thursday Apr 11, 2013

how to identify "think time" in a SQL trace ?

There are multiple blogs posts, presentations and documentation about "think time" and "service time" and some tools that would translate that for you 
but if you do not have those tools available or you need to understand a pattern of behavior from the client app, how can you identify "think time" in a SQL trace ?

This case is of a customer which reported a performance problem from its JDBC connections running reports that are constantly timing out and needed to identify where in the stack the problem is. 

From database point of view we got a 10046 level 12 trace (SQL Trace with Wait events and Bind variables)


=====================
PARSING IN CURSOR #4... len=50 dep=0 uid=888 oct=3 lid=888 tim=1365524355043575 
SELECT STATUS FROM <table> WHERE ID = :1
END OF STMT
PARSE #4...:c=1000,e=1126,p=0,cr=2,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=1365524355043574
BINDS #4...:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=01 fl2=1000000 frm=01 csi=873 siz=24 off=0
  kxsbbbfp=2ab1d8538888  bln=22  avl=06  flg=05
value=123456789
EXEC #4...:c=2999,e=3049,p=0,cr=2,cu=0,mis=1,r=0,dep=0,og=1,plh=3860417426,tim=1365524355046681
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524355046730
WAIT #4...: nam='SQL*Net message from client' ela= 274 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524355047023
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524355047055
WAIT #4...: nam='SQL*Net message from client' ela= 259 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524355047330
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524355047385
FETCH #4...:c=0,e=41,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=3860417426,tim=1365524355047402

*** 2013-04-09 12:19:17.047
WAIT #4...: nam='SQL*Net message from client' ela= 2000161 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524357047634
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524357047707
WAIT #4...: nam='SQL*Net message from client' ela= 258 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524357047973
BINDS #4...:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=01 fl2=1000000 frm=01 csi=873 siz=24 off=0
  kxsbbbfp=2ab1d8538888  bln=22  avl=06  flg=05
value=123456789
EXEC #4...:c=0,e=58,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3860417426,tim=1365524357048051
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524357048086
FETCH #4...:c=0,e=26,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=3860417426,tim=1365524357048098

*** 2013-04-09 12:19:19.054
WAIT #4...: nam='SQL*Net message from client' ela= 2006775 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524359054902
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524359055015
WAIT #4...: nam='SQL*Net message from client' ela= 294 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524359055324
BINDS #4...:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=01 fl2=1000000 frm=01 csi=873 siz=24 off=0
  kxsbbbfp=2ab1d8538888  bln=22  avl=06  flg=05
value=123456789
EXEC #4...:c=0,e=74,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3860417426,tim=1365524359055435
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524359055482
FETCH #4...:c=0,e=40,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=3860417426,tim=1365524359055503

*** 2013-04-09 12:19:21.055
WAIT #4...: nam='SQL*Net message from client' ela= 2000065 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524361055603
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524361055688
WAIT #4...: nam='SQL*Net message from client' ela= 267 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524361055967
BINDS #4...:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=01 fl2=1000000 frm=01 csi=873 siz=24 off=0
  kxsbbbfp=2ab1d8538888  bln=22  avl=06  flg=05
value=123456789
.....
.....
Pattern repeats 30 times.
.....
.....
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524415383807
FETCH #4...:c=0,e=45,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=3860417426,tim=1365524415383827
WAIT #4...: nam='SQL*Net message from client' ela= 282 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524415384145
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524415384170
WAIT #4...: nam='SQL*Net message from client' ela= 317 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524415384497


The trace shows a query that is looking for the STATUS of a specific ID over an over again.
In between each roundtrip there is a delay of 2 seconds (Notice the ela= 2000000+ in 'SQL*Net message from client')

I asked the customer if he considered normal for the session to execute the same query with same binds over and over again.
The answer was no. And there was neither an explanation about why there is a 2 seconds delay between calls.

From the point of view of the database, the query was parsed, bound , executed and rows fetched in microseconds.
After that, the Oracle Process sits idle waiting for the client software to issue the next statement and that came after 2 seconds.

That is the application "think time" and it is reflected by the 'SQL*Net message from client' in the SQL trace.

After some digging, the customer recognized that this was a session that comes from a JDBC program that was
in a loop keeping taps on the execution of a report from another session and it was doing it by looking 
at the STATUS of the report over and over after sleeping for 2 seconds.

The objective of my post is to show how "think time" looks like but just to complete the root cause,

Other traces show the reports were finishing in a timely manner but these "tracking" sessions were on the same connection pool as the reports.
The connection pooling was not recognizing that these "tracking" sessions were "idle" for those 2 seconds so the connection was being considered active and not yielding it for other incoming report so effectively half the slots were used by useful work and half by tracking sessions.

To make things worst, for some reason the "tracking" were not going away after the reports finished and new ones were being created as new reports came in so eventually the connection pool got exhausted and no work was being done.



About

I am Abel Macias email, an Oracle support engineer that specialized in Performance that belongs to Exadata Support.
Disclaimer This blog looks to broadcast my experiences while diagnosing performance problems on Oracle's RDBMS, mostly on Exadata.
The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Search

Archives
April 2013 »
SunMonTueWedThuFriSat
 
1
2
3
4
5
6
7
8
9
10
13
14
15
17
18
19
20
21
22
23
24
25
26
27
28
30
    
       
Today