how to identify "think time" in a SQL trace ?

There are multiple blogs posts, presentations and documentation about "think time" and "service time" and some tools that would translate that for you 
but if you do not have those tools available or you need to understand a pattern of behavior from the client app, how can you identify "think time" in a SQL trace ?

This case is of a customer which reported a performance problem from its JDBC connections running reports that are constantly timing out and needed to identify where in the stack the problem is. 

From database point of view we got a 10046 level 12 trace (SQL Trace with Wait events and Bind variables)


=====================
PARSING IN CURSOR #4... len=50 dep=0 uid=888 oct=3 lid=888 tim=1365524355043575 
SELECT STATUS FROM <table> WHERE ID = :1
END OF STMT
PARSE #4...:c=1000,e=1126,p=0,cr=2,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=1365524355043574
BINDS #4...:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=01 fl2=1000000 frm=01 csi=873 siz=24 off=0
  kxsbbbfp=2ab1d8538888  bln=22  avl=06  flg=05
value=123456789
EXEC #4...:c=2999,e=3049,p=0,cr=2,cu=0,mis=1,r=0,dep=0,og=1,plh=3860417426,tim=1365524355046681
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524355046730
WAIT #4...: nam='SQL*Net message from client' ela= 274 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524355047023
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524355047055
WAIT #4...: nam='SQL*Net message from client' ela= 259 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524355047330
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524355047385
FETCH #4...:c=0,e=41,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=3860417426,tim=1365524355047402

*** 2013-04-09 12:19:17.047
WAIT #4...: nam='SQL*Net message from client' ela= 2000161 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524357047634
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524357047707
WAIT #4...: nam='SQL*Net message from client' ela= 258 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524357047973
BINDS #4...:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=01 fl2=1000000 frm=01 csi=873 siz=24 off=0
  kxsbbbfp=2ab1d8538888  bln=22  avl=06  flg=05
value=123456789
EXEC #4...:c=0,e=58,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3860417426,tim=1365524357048051
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524357048086
FETCH #4...:c=0,e=26,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=3860417426,tim=1365524357048098

*** 2013-04-09 12:19:19.054
WAIT #4...: nam='SQL*Net message from client' ela= 2006775 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524359054902
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524359055015
WAIT #4...: nam='SQL*Net message from client' ela= 294 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524359055324
BINDS #4...:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=01 fl2=1000000 frm=01 csi=873 siz=24 off=0
  kxsbbbfp=2ab1d8538888  bln=22  avl=06  flg=05
value=123456789
EXEC #4...:c=0,e=74,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3860417426,tim=1365524359055435
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524359055482
FETCH #4...:c=0,e=40,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=3860417426,tim=1365524359055503

*** 2013-04-09 12:19:21.055
WAIT #4...: nam='SQL*Net message from client' ela= 2000065 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524361055603
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524361055688
WAIT #4...: nam='SQL*Net message from client' ela= 267 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524361055967
BINDS #4...:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=01 fl2=1000000 frm=01 csi=873 siz=24 off=0
  kxsbbbfp=2ab1d8538888  bln=22  avl=06  flg=05
value=123456789
.....
.....
Pattern repeats 30 times.
.....
.....
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524415383807
FETCH #4...:c=0,e=45,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=3860417426,tim=1365524415383827
WAIT #4...: nam='SQL*Net message from client' ela= 282 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524415384145
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524415384170
WAIT #4...: nam='SQL*Net message from client' ela= 317 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524415384497


The trace shows a query that is looking for the STATUS of a specific ID over an over again.
In between each roundtrip there is a delay of 2 seconds (Notice the ela= 2000000+ in 'SQL*Net message from client')

I asked the customer if he considered normal for the session to execute the same query with same binds over and over again.
The answer was no. And there was neither an explanation about why there is a 2 seconds delay between calls.

From the point of view of the database, the query was parsed, bound , executed and rows fetched in microseconds.
After that, the Oracle Process sits idle waiting for the client software to issue the next statement and that came after 2 seconds.

That is the application "think time" and it is reflected by the 'SQL*Net message from client' in the SQL trace.

After some digging, the customer recognized that this was a session that comes from a JDBC program that was
in a loop keeping taps on the execution of a report from another session and it was doing it by looking 
at the STATUS of the report over and over after sleeping for 2 seconds.

The objective of my post is to show how "think time" looks like but just to complete the root cause,

Other traces show the reports were finishing in a timely manner but these "tracking" sessions were on the same connection pool as the reports.
The connection pooling was not recognizing that these "tracking" sessions were "idle" for those 2 seconds so the connection was being considered active and not yielding it for other incoming report so effectively half the slots were used by useful work and half by tracking sessions.

To make things worst, for some reason the "tracking" were not going away after the reports finished and new ones were being created as new reports came in so eventually the connection pool got exhausted and no work was being done.



Comments:

I think that that most important instrumentation that an application can have is to set module, action, and client-id to uniquely identify each user experience. ...In essence, like this, for each distinct task (business function) in your application:

1. Set module, action, client-id to something identifiable. For example, identify the name of the business function with module="GL" and action="Posting", and distinguish this execution experience from all other execution experiences with client-id set to a concatenation of the userid, IP address, and a GUID.

2. Execute the task (that is, execute the code that implements the business function).

3. Set module, action, client-id to "" (empty).

The effect of this is that every block of trace data gets labeled with "***" lines that will allow you to associate each group of database calls (PARSE, EXEC, FETCH, CLOSE, ...) and OS calls (WAIT) with a distinct end-user response time experience. In your analysis, you can discard the groups of calls with module+action+clientid="", because these are executed between user experiences, which nobody is experiencing as response time (your think time). However, you MUST pay attention to all the calls in a group with module+action+clientid≠"", even if they're "idle" events like "SQL*Net message from client," because they are part of someone's response time.

With just this little bit of instrumentation, (1) you can identify exactly which calls are affecting people and which are not; and (2) you can <shameless-plug>use tools like our Method R Tools software package ( http://method-r.com/software/mrtools ) to arrange calls into groups that exactly match people's response time experiences.</shameless-plug>.

Posted by Cary Millsap on April 11, 2013 at 05:12 PM EDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

I am Abel Macias email, an Oracle support engineer that specialized in Performance that belongs to Exadata Support.
Disclaimer This blog looks to broadcast my experiences while diagnosing performance problems on Oracle's RDBMS, mostly on Exadata.
The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today