The title is not about doing the "complete" performance analysis in 30 minutes. Performance analysis is an art more than a science. You don't know what you are getting into or it's magnitude. The "art or science" is a subject of a great debate, but let me stick to my title. I am trying to address situations where you have only 30 minutes to do your analysis because your production system is crawling. You get called into a production issue, and your system or customer is experiencing performance problem and wants a solution in 30 minutes. What do you do?
Here, I am going to talk about what options you have and how do you approach the performance issue without having to go thru the application code. This is called system performance analysis. Well, as most of the experts say that it's the application code that is mostly the culprit, but can you blame application code without eliminating the possible systemic performance problems. If you have only half an hour, my recommendation is to start with the systemic approach. If you want to dive deep, then I recommend using "DTrace"
Solaris Dynamic Tracing Guide
DTrace Community. Dtrace framework enables you to trace any point of interest with thousands of built-in probes already ready for use, including tracing java methods and jvm issues.
While most of the following information here applies directly to Unix based derivatives, Solaris or Linux, but these principles can be applied generically. So, what do you need and where do you start. In terms of what do you need, the answer is you need tools. You need built-in tools on the system which you can quickly use. "vmstat" is a good utility which gives a full overview of process, memory, paging/swap, io, system and cpu. In terms of where do you start, you need to divide up different problem areas. I am going to start with these areas. I will also point out what I think are good general guidelines based on my experience, but this is highly subjective so your mileage varies.
CPU: First thing to start is to look at cpu utilization. Provided that you have loaded up the users gradually, then you should see a steady increase in cpu utilization. If you are not able to max out the cpus in spite of loading up users, you could have locking and synchronization issues.
Use "vmstat" and pay attention to proc and cpu statistics.
1. Look for runnable processes reported as 'r' under procs which reports processes in run queue. If you have processes in run queue, and have idle time on cpu then you have a scalability issue. It is ok to have run queue equal or little more than the number of cpus as long the system is running at full cpu utilization. In fact this is ideal scalability situation which means that we are able to utilize the system to its full potential. Also, make sure that you do not have any processes reported as 'b' or 'w' under procs ('b' - blocked for resources i/o, paging etc.; and 'w' - process swapped).
2. You want to have cpus spend more time in user land(us) compared to kernel land(sy). A good rule of thumb is about 4:1(us:sy). Look at 'us' and 'sy' under cpu.
Use "mpstat" to get more details. A typical mpstat output looks like this:
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 378 0 4422 403 300 162 3 13 23 0 966 0 4 0 95
1 366 0 3374 111 100 283 10 34 8 0 1188 0 5 0 95
Lock contention is one of the major reasons for performance problems. There are four types of locking available in Solaris. 1. Mutexes
3. Condition Variables and
4. Multiple Readers Single Writer(read-write) locks.
1. Look for "smtx" value as this measurement indicates the number of times CPU failed to obtain a mutex immediately. If smtx value exceeds 500 for any CPU, and system time is greater than 20%, then it is possible that mutex contention is happening on the system.
2. Look for "srw" value as this measurement indicates the number of times CPU failed to obtain read-write lock immediately.
You can use lockstat sleep 15 command to help identify the contention on your system. Look for large counts(indv) with long locking times(nsec) for adaptive mutex block.
You could have cpu cycles chewed up by other sub-systems such as I/O (iostat -xnz 5), network ( netstat -an) or memory swapping (vmstat -S) i.e. "use -S switch" for vmstat to see swap statistics. Run these commands at regular intervals and spool the output to a file so that you can review these files for potential problems.
Okay, after all this let's assume that everything looks good, and you have a situation where your cpu is being consumed by a single thread in your java program, then how to figure out what that the thread is running. For that, you have to use "prstat" for figuring out the cpu hog process, and "pstack pid" to note the LWP number which is hogging the cpu at that particular time. Once you know LWP number, you can use "kill -3 pid" command and look for the same corresponding LWP number of "nid" value (in hex) to figure out the stack trace & the method consuming the cpu for that java process. I will cover this later with more details once I have some example snapshots.
Performance Analysis in 30 minutes