X

An Oracle blog about Oracle Enterprise Manager and Oracle Management Cloud

Quickly Diagnose the Root Cause of Stuck Threads using Oracle Enterprise Manager 12c JVM Diagnostics

Note: Clicking on any image will open the same image in full size in a new window

One
of the hidden gems in Oracle Enterprise Manager 12c is JVM Diagnostics.
If you purchased the Weblogic Management Pack license then you already own it. JVMD
allows administrators to diagnose performance problems in production Java
applications. By eliminating the need to reproduce these “production
only” problems in QA, it reduces the time required to resolve them.
It does not require complex instrumentation or restarting of the application
to get in-depth application details. Application administrators will be
able to identify Java problems or database issues that are causing application
downtime without any detailed knowledge of the application internals.
It is also very well suited to diagnosing issues with “Stuck Threads”
which will be the focus of this blog.

What
is a [STUCK] Thread

In a
Weblogic server, all incoming requests are handled by a thread pool which
is controlled by a work manager. Worker threads that are taken out of
the pool and not returned after a specified time period are marked as
[STUCK] by the work manager. This time period is 10 minutes by default
but it is configurable on a per work manager basis using the "StuckThreadMaxTime"
parameter (default is set to 600 seconds).


Note that it is possible that some of your threads are doing legitimate
work for over 10 min with no issues. If you have such threads then you
should consider placing them in a another work manager with proper setting
for the "StuckThreadMaxTime" parameter

Why
JVMD is Well Suited to Diagnosing [STUCK] Threads

Traditionally,
developers will use a stack trace generated by jstack or kill -3 and try
to determine the cause of a stuck thread. However, in my experience a
majority of the time this stack is not even the culprit. The problem often
lies in another tier of the application or even in another thread of the
same application. JVMD has the ability to provide additional context such
as the name of the request and which tier it called out to Eg: RDBMS servers,
LDAP servers, Web servers, RMI servers, etc. Using fine grained thread
states (i.e. DB, Network, IO, CPU, RMI, Lock, etc) and the ability to
see additional details about the thread, JVMD users can quickly pinpoint
the root cause of the problem. Since JVMD is always on, it can also debug
these issues that happened in the past and can proactively notify you
about stuck threads Eg: Get an email at 1am when you had stuck threads.
And lastly, sometimes developers have no access to the target host due
to lack of credentials needed to run command line applications.


On several occasions, the thread may be stuck but is doing legitimate
work. In such scenarios JVMD allows you to scan back and forth through
a large number of samples to see what work is being processed by the thread.
In addition, you can take a look at other threads that were serviced the
same request to see if they behaved similarly or not. This will allow
you to quickly determine whether there is really a problem or not.

Real-Time
[STUCK] Thread Analysis

With
JVMD there are two real use cases for stuck thread analysis. If you get
notified about a stuck thread in real-time (via email, etc) then you can
perform a real-time stuck thread analysis. Alternatively, if you are investigating
a thread that was stuck in the past but is not present any more, then
you can perform a historical stuck thread analysis. In either case the
first thing to do is to navigate to the JVM (or JVM pool) where the thread
is stuck. We do this by clicking on Targets -> Middleware as shown

From
here we can filter the list of targets by target type or by target name. Your most
recent filter request will be remembered the next time you visit the page.
Select the Target Type of JVM to see all of the JVM targets.

Pick
the JVM for the Weblogic server which is having the stuck thread issues and click on it. This
will take you to the target home page. Click the button at the top that
says “Live Thread Analysis”. Type the word "stuck"
into the thread name search box and click on the arrow to filter the table.
Now you should see all the stuck threads. In this case we can see a thread
that is stuck in the “Network Wait” state. It is stuck on
line 358 in function writeBuffer() of OutputRecord.java which is in package
com.sun.ssl.internal.ssl which makes it clear that this stuck thread has
made an SSL call and the remote server has not responded in a reasonable
amount of time so the client thread is stuck.

Here
is another example of stuck threads, this time in the “DB Wait”
state. Notice how the tool tip over the SQL ID field shows the SQL being
executed. Click on it to view longer SQL statements. Also try clicking
on the DB Wait link which takes you directly to this specific database
session in the Oracle Database Diagnostics section of EM for further analysis. The columns
displayed are controlled by “View” drop-down menu. Here we
added the “User” column to show the logged in user who executed
the request.

Historical
[STUCK] Thread Analysis

In order
to start historical stuck thread analysis you need to navigate to the
JVM target home page in the same way as discussed in the real-time section.
From the target home page clicking on the “JVM Performance Diagnostics”
button at the top of the page. On the performance diagnostics page you
can filter the data to make it more relevant to your task. The first filter
to apply is of time. If you know the exact time you can use the “Edit
Date and Time” button to specify it as shown. Otherwise use the
handy shortcut links for Day, 1 Hours, 1 Hour or 15 Minutes as needed.

The
next thing to filter is the Thread Name. Expand the filter options region
if necessary and add the Thread Name filter to be “[STUCK]*”
so you only see threads whose name starts with [STUCK]

Below
the filter region the “General” tab will show you the Thread
States, Top Requests, Top Methods, Top SQLs, Top DB Wait Events and Top
Databases – only for the filtered data i.e. for only threads that
are stuck. Try clicking on method names to see the call stack for the
method. The charts are all interactive and fetch additional data about
the item clicked.

If you
want to find a specific thread move from the “General” tab
over to the “Threads” tab. This is fine grained data with
each sample and state transition visible. You can click on any sample
to view it in the sample analyzer which should look familiar to you if
you saw the threads in real-time. Details about SQLs, Wait states, etc
are all available here also along with the complete call stack which can
also be exported to a CSV file.

In conclusion,
we can see the JVMD provides a rich set of additional details which are
only a mouse click away that help you to diagnose the root causes of your
stuck threads.

NOTE: Many of
the screen shots taken here are using testing & debug code, which
deliberately tries to create stuck threads. This does not and should not reflect on the
nature of any Oracle products being shipped to customers.



Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.