This article is in continuation of previous article (link here) that defined a simple project, simulated performance issue and outlined steps for generating the IWS reports for the project.
In this article we will do step by step analysis of the IWS reports and try to identify the root cause of slowdown for this project. We will reference two IWS reports as generated in previous article:
1 - Benchmark IWS Report (click here) - report for period when things are working normal/expected
2 - IWS Report for time period under review/analysis (click here
Analyzing the Issue
Before analyzing the reports make sure that the report are for the test period under analysis. Top of the IWS report indicate the snapshot timestamps and intervals. The Begin Timestamp and End Timestamps represents the time window for which data was collected and shown in the report.
Step 1: Review the resource Usages (Section 1 of report)
Compare the two reports and check the CPU, Memory, DataSource and Work Manager stats to rule out the slowness due to resource constraints.
Java Virtual Machine Stats
The stats reveal that for Test 2 the memory has increased little bit but the avg load (or cpu usage) has reduced significantly compared to benchmark.
Data Source Stats
For Test 2 at the average number of active connections are significantly higher, but it it still within configured max capacity so data source is bottleneck.
Work Manager Stats
The stats reveal default_dspInvoke (workmanager pool used by BPEL Service Engine to invoke downstream services) shows much higher number of active threads and pending requests compared to benchmark case.
For our example, it is clear that that for Test 2 the resources are more stressed compared to benchmark case. Let’s review the report further to look for clues why this might be happening.
Step 2: Check composite rollup stats (Section 2)
These are derived stats and obtained by treating all the inbounds as single input and outbounds as single output to composite. This allow for quick analysis and useful for larger projects that consist of multiple inbounds and outputs. By quick comparison of the latency data for the report for the period with slowness with baseline we can narrow down to the offending composite.
For our scenario we have single composite and looking at the Composite Outbound stats it is clear that latency has jumped many folds - from ~90 ms to ~ 39 secs!.
Clearly there is some issue with composite outbound. But there are two outbounds from this composite so let's drill down to isolate the slow outbound.
IWS Report - Test 1 (Baseline)
IWS Report - Test 2
Step 3: Review the Slowest Composite Endpoints (Section 3)
Section 3 of report breaks down the data for each endpoint of composite. Reviewing the two reports it is clear that while the latency of file adapter (time taken to write the file by adapter is similar for two cases but the latency for external service has jumped from ~90 ms to almost 39 secs!
We can therefore safely conclude that the slowness of the composite is due to external service slowdown.
The above discussion showcase step-by-step and evidence based analysis that can be performed to isolate the root cause using IWS Reports. The stats presented in this article is small sub-subset of metrics collected in IWS report. The next article in this series will outline the rest.
Click below to download the IWS reports referenced in this article.