Thursday Apr 11, 2013

how to identify "think time" in a SQL trace ?

There are multiple blogs posts, presentations and documentation about "think time" and "service time" and some tools that would translate that for you 
but if you do not have those tools available or you need to understand a pattern of behavior from the client app, how can you identify "think time" in a SQL trace ?

This case is of a customer which reported a performance problem from its JDBC connections running reports that are constantly timing out and needed to identify where in the stack the problem is. 

From database point of view we got a 10046 level 12 trace (SQL Trace with Wait events and Bind variables)


=====================
PARSING IN CURSOR #4... len=50 dep=0 uid=888 oct=3 lid=888 tim=1365524355043575 
SELECT STATUS FROM <table> WHERE ID = :1
END OF STMT
PARSE #4...:c=1000,e=1126,p=0,cr=2,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=1365524355043574
BINDS #4...:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=01 fl2=1000000 frm=01 csi=873 siz=24 off=0
  kxsbbbfp=2ab1d8538888  bln=22  avl=06  flg=05
value=123456789
EXEC #4...:c=2999,e=3049,p=0,cr=2,cu=0,mis=1,r=0,dep=0,og=1,plh=3860417426,tim=1365524355046681
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524355046730
WAIT #4...: nam='SQL*Net message from client' ela= 274 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524355047023
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524355047055
WAIT #4...: nam='SQL*Net message from client' ela= 259 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524355047330
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524355047385
FETCH #4...:c=0,e=41,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=3860417426,tim=1365524355047402

*** 2013-04-09 12:19:17.047
WAIT #4...: nam='SQL*Net message from client' ela= 2000161 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524357047634
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524357047707
WAIT #4...: nam='SQL*Net message from client' ela= 258 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524357047973
BINDS #4...:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=01 fl2=1000000 frm=01 csi=873 siz=24 off=0
  kxsbbbfp=2ab1d8538888  bln=22  avl=06  flg=05
value=123456789
EXEC #4...:c=0,e=58,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3860417426,tim=1365524357048051
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524357048086
FETCH #4...:c=0,e=26,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=3860417426,tim=1365524357048098

*** 2013-04-09 12:19:19.054
WAIT #4...: nam='SQL*Net message from client' ela= 2006775 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524359054902
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524359055015
WAIT #4...: nam='SQL*Net message from client' ela= 294 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524359055324
BINDS #4...:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=01 fl2=1000000 frm=01 csi=873 siz=24 off=0
  kxsbbbfp=2ab1d8538888  bln=22  avl=06  flg=05
value=123456789
EXEC #4...:c=0,e=74,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3860417426,tim=1365524359055435
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524359055482
FETCH #4...:c=0,e=40,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=3860417426,tim=1365524359055503

*** 2013-04-09 12:19:21.055
WAIT #4...: nam='SQL*Net message from client' ela= 2000065 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524361055603
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524361055688
WAIT #4...: nam='SQL*Net message from client' ela= 267 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524361055967
BINDS #4...:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=01 fl2=1000000 frm=01 csi=873 siz=24 off=0
  kxsbbbfp=2ab1d8538888  bln=22  avl=06  flg=05
value=123456789
.....
.....
Pattern repeats 30 times.
.....
.....
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524415383807
FETCH #4...:c=0,e=45,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=3860417426,tim=1365524415383827
WAIT #4...: nam='SQL*Net message from client' ela= 282 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524415384145
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524415384170
WAIT #4...: nam='SQL*Net message from client' ela= 317 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524415384497


The trace shows a query that is looking for the STATUS of a specific ID over an over again.
In between each roundtrip there is a delay of 2 seconds (Notice the ela= 2000000+ in 'SQL*Net message from client')

I asked the customer if he considered normal for the session to execute the same query with same binds over and over again.
The answer was no. And there was neither an explanation about why there is a 2 seconds delay between calls.

From the point of view of the database, the query was parsed, bound , executed and rows fetched in microseconds.
After that, the Oracle Process sits idle waiting for the client software to issue the next statement and that came after 2 seconds.

That is the application "think time" and it is reflected by the 'SQL*Net message from client' in the SQL trace.

After some digging, the customer recognized that this was a session that comes from a JDBC program that was
in a loop keeping taps on the execution of a report from another session and it was doing it by looking 
at the STATUS of the report over and over after sleeping for 2 seconds.

The objective of my post is to show how "think time" looks like but just to complete the root cause,

Other traces show the reports were finishing in a timely manner but these "tracking" sessions were on the same connection pool as the reports.
The connection pooling was not recognizing that these "tracking" sessions were "idle" for those 2 seconds so the connection was being considered active and not yielding it for other incoming report so effectively half the slots were used by useful work and half by tracking sessions.

To make things worst, for some reason the "tracking" were not going away after the reports finished and new ones were being created as new reports came in so eventually the connection pool got exhausted and no work was being done.



About

I am Abel Macias email, an Oracle support engineer that specialized in Performance that belongs to Exadata Support.
Disclaimer This blog looks to broadcast my experiences while diagnosing performance problems on Oracle's RDBMS, mostly on Exadata.
The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Search

Archives
April 2013 »
SunMonTueWedThuFriSat
 
1
2
3
4
5
6
7
8
9
10
13
14
15
17
18
19
20
21
22
23
24
25
26
27
28
30
    
       
Today