Capture JVM thread dump automatically

In one of my recent projects, I was faced with an issue in a WebLogic 10.3.5 domain running on JRockit, where the JVM was sporadically reporting a flood of threads being spawned. When I say flood it means exactly 16383 threads, always!

The log entries looked like these:

<Feb 3, 2014 11:28:25 AM EST> <Notice> <LoggingService> <BEA-320401> <The log file has been rotated to /oracle/logs/fchost_domain/host_server1/access.log00248. Log messages will continue to be logged in /oracle/logs/fchost_domain/host_server1/access.log.>
[WARN ] Thread table can't grow past 16383 threads.
[ERROR][thread ] Could not start thread Timer-167910. errorcode -1
[WARN ] Thread table can't grow past 16383 threads.
[ERROR][thread ] Could not start thread Thread-169389. errorcode -1
[WARN ] Thread table can't grow past 16383 threads.
[ERROR][thread ] Could not start thread process reaper. errorcode -1
[WARN ] Thread table can't grow past 16383 threads.
[ERROR][thread ] Could not start thread process reaper. errorcode -1
.
.
.
<Feb 3, 2014 6:28:51 PM> <FINEST> <NodeManager> <Waiting for the process to die: 26228>
<Feb 3, 2014 6:28:51 PM> <INFO> <NodeManager> <Server was shut down normally>
<Feb 3, 2014 6:28:51 PM> <FINEST> <NodeManager> <runMonitor returned, setting finished=true and notifying waiters>

Having these errors was one issue which I will deal with in another post. Identifying what was creating these was another. Easiest and most accurate way was to take a thread dump. Challenge was, when! Notice that these error messages do not report any time or date stamp. So, I created a UNIX shell script which did following:

  1.  Ran in the background polling the JVM for the count of live threads
  2. As soon as the number of live threads exceeded a configured threshold, took a thread dump
  3. Waited for a configured interval and then took another thread dump
  4. The total number of thread dumps did not exceed a predetermined and configured value
  5. The script was unaffected even if the JVM was restarted

Here is the script that I used. Note that it may or may not work exactly as intended as a lot depends on the shell environment. Hence, feel free to tweak it according to your requirement.

#!/bin/bash

######################################################
# Overview: This script captures thread dump of a JRockit
# JVM in case the number of live threads exceeds a
# certain threshold. This can help in diagnosing
# which code possibly creates those number of threads.
######################################################


# export JRockit home
varJrockitHome=/oracle/app/jrockit-jdk1.6.0_33/
# set JVM live thread threshold. Script takes TDump only when live threads exceed this number
varLiveThreadThold=5000
# set thread dump counter
varTdumpCounter=0
# set thread dump threshold. Script takes these many TDumps before exiting
varTdumpThold=3
# thread dump interval in seconds
varTdumpIntvl=15
# set WLS name. Script takes TDump this server
varWlsName=host_server1


# check the TD counter value with TD threshold value
while [ $varTdumpCounter -lt $varTdumpThold ]
do

# sleep for specified time before capturing TD or checking the process again
/bin/sleep $varTdumpIntvl

# find PID of JVM process
varPid=`ps -ef | grep java | grep $varWlsName | grep -v grep | awk '{print $2}'`
# check if the PID returned is actually a WLS java process
$varJrockitHome/bin/jrcmd -p | grep $varPid | grep "weblogic.Server"

if [ $? == 0 ]; then
        # calculate number of live threads of JVM
        varLiveThreads=`$varJrockitHome/bin/jrcmd $varPid -l | grep -iw java.threads.live | awk -F"=" '{print $2}'`

        if [ $varLiveThreads -gt $varLiveThreadThold ]; then
                /bin/date
                /bin/echo "Number of threads = $varLiveThreads"
                /bin/echo "Taking thread dump..."
                ## UNCOMMENT THE LINES BELOW TO ENABLE T-DUMPS
                ## /bin/echo "=== Threads Details $varTdumpCounter ===" >> thread_cpu_util_$varPid.log               
                ## /bin/echo "Number of Live Threads = $varLiveThreads" >> thread_cpu_util_$varPid.log
                ## /bin/ps -Le -o pid,user,s,lwp,pcpu | grep $varPid >> thread_cpu_util_$varPid.log
                ## /bin/kill -3 $varPid
                varTdumpCounter=`expr $varTdumpCounter + 1`
        else
                /bin/date
                /bin/echo "Number of threads = $varLiveThreads"
        fi
else
        /bin/date
        /bin/echo "Could not identify $varPid as a WLS JRockit JVM"
        exit 1
fi
done


Comments:

Thanks! This is very helpful. Hope, you will keep posting.

Posted by Ashish Goel on March 28, 2014 at 12:10 AM EST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Prateek is a technology consultant in Oracle and thorugh this blog he shares his experiences on different features of products that are used by Oracle customers worldwide

Search

Categories
Archives
« July 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today