IWS Reports introduced in 12.2.1 is new diagnostics tool for analyzing performance and scaling issues in SOA environments. For IWS overview refer this blog entry and for more detailed discussion on how to enable, methodology and report analysis refer these links - article 1, article 2, article 3.
In this article I will talk about IWS Design & Architecture and provide some internal details.
IWS Schema - Persistence Tables
IWS uses dedicated tables to store the snapshots and generate reports. These tables resides in soainfra schema and are created as part of repository creation step during SOA installation. Note that IWS does not query or rely on information any other SOA Persistence tables of any SOA components such as MDS or BPEL engine or for generating IWS Reports.
IWS tables as listed below - divided into two groups:
Currently (12.2.1) there are 3 supported report formats - CSV, XML and HTML.
You can load up the CSV file in programs like MS Excel, XML is more suitable if you like to write some tools for automation testing and analysis. HTML is more suitable for human analysis.
IWS Data Purging
IWS tables data older than specified retention period are purged by common purge framework. Default retention is 7 days. Refer this page (TODO) for details on how to change the retention period.
Performance Overhead of IWS
As a guiding principle IWS was designed with minimal overhead so can be left on in Production. IWS does not record individual execution in database. The executions for various metrics is aggregated in memory and persisted only at snapshot point.
Alternate mechanisms of generating reports
Besides dedicated IWS EM page, you can also use System MBean Browser and WLST commands to generate IWS reports.
Refer “About IWSReport MBean Section” on IWS documentation page (link here
Latency Calculations (Section 3 of the report)
Latency is defined as the round-trip time for the thread to return back at the measurement point. For inbound endpoints a timer is started as soon as client thread enters Binding/Adapter layer. The client thread then goes through Service Infrastructure to downstream components as defined in the flow. The timer is stopped when the thread returns back from Service Infrastructure to the measurement point. This time spent in round-trip represents inbound latency statistics for this endpoint for one execution. Similarly for outbound endpoints the timer is started just before the call to external service and stopped when the thread returns.
Throughout (TPS) Caclualations:TPS column represents transactions per second. This is calculated by dividing the total number of executions by the report interval.
Active and Backlog Metrics
The active and backlogs for BPEL Engine as indicated in the report are derived metrics based on status of message flow to BPEL engine. IWS framework is updated as messages are received by engine, removed for processing and closed/faulted as a result of processing. IWS framework calculates the active and backlog counts based on these updates.
For EDN the backlog data is provided by EDN component using apis from by underlying providers. The reason being external clients can add events to EDN queue directly (outside of SOA). The active value as shown on the report is derived based on message flow updates to IWS framework.
Composite Rollup Stats Calculation
Section 2 of the report contain this data. The information below is also embedded in the IWS report under this section.
Composite (rollup) Endoint statistics are derived by aggregating individual endpoint statistics (shown in Section 3) by taking weighted average of latency values for all endpoints (inbounds and outbounds) for a composite. Example shown below, when there are two inbound endpoints (EP1 and EP2) of a composite (Composite1) with counts (total number of executions during the report interval) and latencies.
Endpoint Composite Count Latency
EP1 Composite1 C1 L1
EP2 Composite1 C2 L2
Composite (Rollup) Inbound Latency = ((C1 * L1) + (C2 * L2)) / (C1 + C2)
For a given endpoint the latencies for all such executions in a report interval are aggregated and averaged by taking sum of execution times and dividing by the number of executions.
Total column represents the total number of successful executions either during the report interval or cumulatively.
Fault column indicates total number of failed executions either during the report interval or cumulatively - calls that resulted in exceptions being thrown from downstream component.
Report for Cluster
For cluster environments you can generate a consolidated report that incorporates data from all nodes into single report or you can pick individual node and generate report thereof.
In the consolidated report for cluster the resource utilization stats - JVM, Memory, Datasource and Workmanager are listed for each server. The latency data (composite rollup, endpoints, wire) and activity execution data is weighted average of corresponding values across nodes. The backlog values are derived by adding the corresponding values.