Tuesday Dec 04, 2012

Using BPEL Performance Statistics to Diagnose Performance Bottlenecks

Tuning performance of Oracle SOA 11G applications could be challenging. Because SOA is a platform for you to build composite applications that connect many applications and "services", when the overall performance is slow, the bottlenecks could be anywhere in the system: the applications/services that SOA connects to, the infrastructure database, or the SOA server itself.How to quickly identify the bottleneck becomes crucial in tuning the overall performance.

Fortunately, the BPEL engine in Oracle SOA 11G (and 10G, for that matter) collects BPEL Engine Performance Statistics, which show the latencies of low level BPEL engine activities. The BPEL engine performance statistics can make it a bit easier for you to identify the performance bottleneck.

Although the BPEL engine performance statistics are always available, the access to and interpretation of them are somewhat obscure in the early and current (PS5) 11G versions.

This blog attempts to offer instructions that help you to enable, retrieve and interpret the performance statistics, before the future versions provides a more pleasant user experience.

Overview of BPEL Engine Performance Statistics 

SOA BPEL has a feature of collecting some performance statistics and store them in memory.

One MBean attribute, StatLastN, configures the size of the memory buffer to store the statistics. This memory buffer is a "moving window", in a way that old statistics will be flushed out by the new if the amount of data exceeds the buffer size. Since the buffer size is limited by StatLastN, impacts of statistics collection on performance is minimal. By default StatLastN=-1, which means no collection of performance data.

Once the statistics are collected in the memory buffer, they can be retrieved via another MBean oracle.as.soainfra.bpel:Location=[Server Name],name=BPELEngine,type=BPELEngine.>

My friend in Oracle SOA development wrote this simple 'bpelstat' web app that looks up and retrieves the performance data from the MBean and displays it in a human readable form. It does not have beautiful UI but it is fairly useful.

Although in Oracle SOA onwards the same statistics can be viewed via a more elegant UI under "request break down" at EM -> SOA Infrastructure -> Service Engines -> BPEL -> Statistics, some unsophisticated minds like mine may still prefer the simplicity of the 'bpelstat' JSP. One thing that simple JSP does do well is that you can save the page and send it to someone to further analyze

Follows are the instructions of how to install and invoke the BPEL statistic JSP. My friend in SOA Development will soon blog about interpreting the statistics. Stay tuned.

Step1: Enable BPEL Engine Statistics for Each SOA Servers via Enterprise Manager

First st you need to set the StatLastN to some number as a way to enable the collection of BPEL Engine Performance Statistics

  • EM Console -> soa-infra(Server Name) -> SOA Infrastructure -> SOA Administration -> BPEL Properties
  • Click on "More BPEL Configuration Properties"
  • Click on attribute "StatLastN", set its value to some integer number. Typically you want to set it 1000 or more.

Step 2: Download and Deploy bpelstat.war File to Admin Server,

Note: the WAR file contains a JSP that does NOT have any security restriction. You do NOT want to keep in your production server for a long time as it is a security hazard. Deactivate the war once you are done.
  • Download the bpelstat.war to your local PC
  • At WebLogic Console, Go to Deployments -> Install
  • Click on the "upload your file(s)"
  • Click the "Browse" button to upload the deployment to Admin Server
  • Accept the uploaded file as the path, click next
  • Check the default option "Install this deployment as an application"
  • Check "AdminServer" as the target server
  • Finish the rest of the deployment with default settings

  • Console -> Deployments
  • Check the box next to "bpelstat" application
  • Click on the "Start" button. It will change the state of the app from "prepared" to "active"

Step 3: Invoke the BPEL Statistic Tool

  • The BPELStat tool merely call the MBean of BPEL server and collects and display the in-memory performance statics. You usually want to do that after some peak loads.
  • Go to http://<admin-server-host>:<admin-server-port>/bpelstat
  • Enter the correct admin hostname, port, username and password
  • Enter the SOA Server Name from which you want to collect the performance statistics. For example, SOA_MS1, etc.
  • Click Submit
  • Keep doing the same for all SOA servers.

Step 3: Interpret the BPEL Engine Statistics

You will see a few categories of BPEL Statistics from the JSP Page.

First it starts with the overall latency of BPEL processes, grouped by synchronous and asynchronous processes. Then it provides the further break down of the measurements through the life time of a BPEL request, which is called the "request break down".

1. Overall latency of BPEL processes

The top of the page shows that the elapse time of executing the synchronous process TestSyncBPELProcess from the composite TestComposite averages at about 1543.21ms, while the elapse time of executing the asynchronous process TestAsyncBPELProcess from the composite TestComposite2 averages at about 1765.43ms. The maximum and minimum latency were also shown.

Synchronous process statistics
    <stats key="default/TestComposite!2.0.2-ScopedJMSOSB*soa_bfba2527-a9ba-41a7-95c5-87e49c32f4ff/TestSyncBPELProcess" min="1234" max="4567" average="1543.21" count="1000">

Asynchronous process statistics
    <stats key="default/TestComposite2!2.0.2-ScopedJMSOSB*soa_bfba2527-a9ba-41a7-95c5-87e49c32f4ff/TestAsyncBPELProcess" min="2234" max="3234" average="1765.43" count="1000">

2. Request break down

Under the overall latency categorized by synchronous and asynchronous processes is the "Request breakdown". Organized by statistic keys, the Request breakdown gives finer grain performance statistics through the life time of the BPEL requests.It uses indention to show the hierarchy of the statistics.

Request breakdown
    <stats key="eng-composite-request" min="0" max="0" average="0.0" count="0">
        <stats key="eng-single-request" min="22" max="606" average="258.43" count="277">
            <stats key="populate-context" min="0" max="0" average="0.0" count="248">

Please note that in SOA, the statistics under Request breakdown is aggregated together cross all the BPEL processes based on statistic keys. It does not differentiate between BPEL processes. If two BPEL processes happen to have the statistic that share same statistic key, the statistics from two BPEL processes will be aggregated together. Keep this in mind when we go through more details below.

2.1 BPEL process activity latencies

A very useful measurement in the Request Breakdown is the performance statistics of the BPEL activities you put in your BPEL processes: Assign, Invoke, Receive, etc. The names of the measurement in the JSP page directly come from the names to assign to each BPEL activity. These measurements are under the statistic key "actual-perform"

Example 1: 
Follows is the measurement for BPEL activity "AssignInvokeCreditProvider_Input", which looks like the Assign activity in a BPEL process that assign an input variable before passing it to the invocation:

                               <stats key="AssignInvokeCreditProvider_Input" min="1" max="8" average="1.9" count="153">
                                    <stats key="sensor-send-activity-data" min="0" max="1" average="0.0" count="306">
                                    <stats key="sensor-send-variable-data" min="0" max="0" average="0.0" count="153">
                                    <stats key="monitor-send-activity-data" min="0" max="0" average="0.0" count="306">

Note: because as previously mentioned that the statistics cross all BPEL processes are aggregated together based on statistic keys, if two BPEL processes happen to name their Invoke activity the same name, they will show up at one measurement (i.e. statistic key).

Example 2:
Follows is the measurement of BPEL activity called "InvokeCreditProvider". You can not only see that by average it takes 3.31ms to finish this call (pretty fast) but also you can see from the further break down that most of this 3.31 ms was spent on the "invoke-service". 

                                <stats key="InvokeCreditProvider" min="1" max="13" average="3.31" count="153">
                                    <stats key="initiate-correlation-set-again" min="0" max="0" average="0.0" count="153">
                                    <stats key="invoke-service" min="1" max="13" average="3.08" count="153">
                                        <stats key="prep-call" min="0" max="1" average="0.04" count="153">
                                    <stats key="initiate-correlation-set" min="0" max="0" average="0.0" count="153">
                                    <stats key="sensor-send-activity-data" min="0" max="0" average="0.0" count="306">
                                    <stats key="sensor-send-variable-data" min="0" max="0" average="0.0" count="153">
                                    <stats key="monitor-send-activity-data" min="0" max="0" average="0.0" count="306">
                                    <stats key="update-audit-trail" min="0" max="2" average="0.03" count="153">

2.2 BPEL engine activity latency

Another type of measurements under Request breakdown are the latencies of underlying system level engine activities. These activities are not directly tied to a particular BPEL process or process activity, but they are critical factors in the overall engine performance. These activities include the latency of saving asynchronous requests to database, and latency of process dehydration.

My friend Malkit Bhasin is working on providing more information on interpreting the statistics on engine activities on his blog (https://blogs.oracle.com/malkit/). I will update this blog once the information becomes available.

Update on 2012-10-02: My friend Malkit Bhasin has published the detail interpretation of the BPEL service engine statistics at his blog http://malkit.blogspot.com/2012/09/oracle-bpel-engine-soa-suite.html.

Configure Oracle SOA JMSAdatper to Work with WLS JMS Topics

We will walk through how to configure the JMS Topic, the JmsAdapter connection factory, as well as the composite so that the JMS Topic messages will be evenly distributed to same composite running off different SOA cluster nodes without causing duplication.

[Read More]

Retrieve Performance Data from SOA Infrastructure Database

Here I would like offer examples of some basic SQL queries you can run against the infrastructure database of Oracle SOA Suite 11G to acquire the performance statistics for a given period of time. The final version of the script will prompt for the start and end time of the period of your interest.[Read More]

Thursday Oct 18, 2012

A brief note for customers running SOA Suite on AIX platforms

When running Oracle SOA Suite with IBM JVMs on the AIX platform, we have seen performance slowdowns and/or memory leaks. On occasion, we have even encountered some OutOfMemoryError conditions and the concomittant Java coredump. If you are experiencing this issue, the resolution may be to configure -Dsun.reflect.inflationThreshold=0 in your JVM startup parameters.

https://www.ibm.com/developerworks/java/library/j-nativememory-aix/ contains a detailed discussion of the IBM AIX JVM memory model, but I will summarize my interpretation and understanding of it in the context of SOA Suite, below.

Java ClassLoaders on IBM JVMs are allocated a native memory area into which they are anticipated to map such things as jars loaded from the filesystem. This is an excellent memory optimization, as the file can be loaded into memory once and then shared amongst many JVMs on the same host, allowing for excellent horizontal scalability on AIX hosts.

However, Java ClassLoaders are not used exclusively for loading files from disk. A performance optimization by the Oracle Java language developers enables reflectively accessed data to optimize from a JNI call into Java bytecodes which are then amenable to hotspot optimizations, amongst other things.

This performance optimization is called inflation, and it is executed by generating a sun.reflect.DelegatingClassLoader instance dynamically to inject the Java bytecode into the virtual machine. It is generally considered an excellent optimization. However, it interacts very negatively with the native memory area allocated by the IBM JVM, effectively locking out memory that could otherwise be used by the Java process.

SOA Suite and WebLogic are both very large users of reflection code. They reflectively use many code paths in their operation, generating lots of DelegatingClassLoaders in normal operation. The IBM JVM slowdown and subsequent OutOfMemoryError are as a direct result of the Java memory consumed by the DelegatingClassLoader instances generated by SOA Suite and WebLogic. Java garbage collection runs more frequently to try and keep memory available, until it can no longer do so and throws OutOfMemoryError.

The setting sun.reflect.inflationThreshold=0 disables this optimization entirely, never allowing the JVM to generate the optimized reflection code.

IBM JVMs are susceptible to this issue primarily because all Java ClassLoaders have this native memory allocation, which is shared with the regular Java heap. Oracle JVMs don't automatically give all ClassLoaders a native memory area, and my understanding is that jar files are never mapped completely from shared memory in the same way as IBM does it. This results in different behaviour characteristics on IBM vs Oracle JVMs.

Monday Oct 15, 2012

BPM 11g - Dynamic Task Assignment with Multi-level Organization Units

I've seen several requirements to have a more granular level of task assignment in BPM 11g based on some value in the data passed to the process. Parametric Roles is normally the first port of call to try to satisfy this requirement, but in this blog we will show how a lot of use-cases can be satisfied by the easier to implement and flexible Organization Unit.[Read More]

Friday Sep 28, 2012

OSB, Service Callouts and OQL - Part 2

This section of the "OSB, Service Callouts and OQL" blog posting will delve into thread dump analysis of OSB server and detecting threading issues relating to Service Callout using ThreadLogic. We would also use Heap Dump and OQL to identify the related Proxies and Business services. The previous section dealt with threading model used by OSB to handle Route and Service Callouts.

[

OSB, Service Callouts and OQL - Part 1

Oracle Fusion Middleware customers use Oracle Service Bus (OSB) for virtualizing Service endpoints and implementing stateless service orchestrations. Behind the performance and speed of OSB, there are a couple of key design implementations that can affect application performance and behavior under heavy load. One of the heavily used feature in OSB is the Service Callout pipeline action for message enrichment and invoking multiple services as part of one single orchestration. Overuse of this feature, without understanding its internal implementation, can lead to serious problems.

This post will delve into OSB internals, the problem associated with usage of Service Callout under high loads, diagnosing it via thread dump and heap dump analysis using tools like ThreadLogic and OQL (Object Query Language) and resolving it. The first section in the series will mainly cover the threading model used internally by OSB for implementing Route Vs. Service Callouts.

[Read more...]

OSB, Service Callouts and OQL - Part 3

In the previous sections of the "OSB, Service Callouts and OQL" series, we analyzed the threading model used by OSB for Service Callouts and analysis of OSB Server threads hung in Service callouts and identifying  the Proxies and Remote services involved in the hang using OQL.

This final section of the series will focus on the corrective action to avoid Service Callout related OSB Server hangs.

[

Monday Sep 24, 2012

2 way SSL between SOA and OSB

This blog describes all the steps to setup 2 way SSL between SOA and OSB.  The steps should be applicable if the external service is not hosted on OSB and other server with certain adjustment where appropriate.   [Read More]

Wednesday Sep 19, 2012

BPM ADF Task forms. Checking whether the current user is in a BPM Swimlane

So this blog will focus on BPM Swimlane roles and users from a ADF context.

So we have an ADF Task Details Form and we are in the process of making it richer and dynamic in functionality. A common requirement could be to dynamically show different areas based on the user logged into the workspace. Perhaps even we want to know even what swim-lane role the user belongs to.

It is is a little bit harder to achieve then one thinks unless you know the trick.

[Read More]

Wednesday Sep 12, 2012

The curious case of SOA Human tasks' automatic completion

A large south-Asian insurance industry customer using Oracle BPM and SOA ran into this. I have survived this ordeal previously myself but didnt think to blog it then. However, it seems like a good idea to share this knowledge with this reader community and so here goes..

Symptom: A human task (in a SOA/BPEL/BPM process) completes automatically while it should have been assigned to a proper user.There are no stack traces, no related exceptions in the logs.

Why: The product is designed to treat human tasks that don't have assignees as one that is eligible for completion. And hence no warning/error messages are recorded in the logs.

Usecase variant: A variant of this usecase, where an assignee doesnt exist in the repository is treated as a recoverable error. One can find this in the 'pending recovery' instances in EM and reactivate the task by changing the assignees in the bpm workspace as a process owner /administrator.

But back to the usecase when tasks get completed automatically...

When: This happens when the users/groups assigned to a task are 'empty' or null. This has been seen only on tasks whose assignees are derived from an assignment expression - ie at runtime an XPath is used to determine who to assign the task to. (This should not happen if task assignees are populated via swim-lane roles.)

How to detect this in EM

For instances that are auto-completed thus, one will notice in the Audit Trail of such instances, that the 'outcome' of the task is empty. The 'acquired by' element will also show as empty/null.

Enabling the oracle.soa.services.workflow.* logger in em should print more verbose messages about this.

How to fix this

The application code needs two fixes:

  1. input to HT: The XSLT/XPath used  to set the task 'assignee' and the process itself should be enhanced to handle nulls better. For eg: if no-data-found, set assignees to alternate value, force default assignees etc.

  2. output from HT: Additionally, in the application code, check that the 'outcome' of the HT is not-null. If null, route the task to be performed again after setting the assignee correctly. Beginning PS4FP, one should be able to use 'grab' to route back to the task to fire again.

    Hope this helps. 

Thursday Aug 30, 2012

SOA Suite 11g Asynchronous Testing with soapUI

Although there are various write-ups on the topic of testing asynchronous web services using soapUI, this blog is intended to provide a very simple guide to setting this up in the context of SOA Suite 11g. With this knowledge you can use soapUI free edition to go beyond the test harness that comes bundled with SOA Suite 11g Enterprise Manager. This also serves as a nice introduction to another blog of mine: SOA Suite 11g Dynamic Payload Testing with soapUI Free Edition.[Read More]

SOA Suite 11g Dynamic Payload Testing with soapUI Free Edition

When running various tests like smoke tests, unit tests, load tests, etc. tools like soapUI are frequently used. Although soapUI is a very easy to use tool, there are things that can be done to expand beyond the basics which many may or may not consider "easy to use". For example, how do you create dynamic payloads for stress tests without using the Pro version of soapUI? Well, this blog will show one way to use soapUI free edition to run tests with dynamic payloads.[Read More]

Monday Aug 13, 2012

Automatically Disable Proxy Service to avoid overloading OSB

A very frequently asked question of using Oracle Service Bus (OSB) is how to avoid OSB to be overloaded once an endpoint becomes temporarily unreachable. The same question is equally asked for an endpoint doesn't respond as quickly as expected and, in worse case, the situation deteriorates gradually. The concern is raised for the potential risk of overwhelming OSB if the relevant proxy service continues receiving requests from callers.

From the design practice point of view, it is always recommended to make your endpoint highly available. With the help of either load balancer or OSB's failover capability across multiple endpoints, you should be able to make your OSB safer and less vulnerable from endpoint errors. In addition, you'd better implement OSB best practice by configuring a Work Manager attached to Proxy Service and enabling throttling control on Business Service. The details of these two features will not be covered in this blog. You can find the details of using work manager with OSB here and How to enable and use throttling control here in the documentation.

However, this is not the whole picture of the story. Once an unexpected problem occurs on endpoint, you certainly want to be noticed for the incident and take some appropriate action to cope with it especially in the case where you have only one endpoint or the endpoint systems reside externally and out of your control.

Now, let's see how can you get notified when the following two common problems occur at endpoint:

  • the endpoint becomes unreachable
  • the endpoint still alive but the response becomes slow and results in SLA violation

OSB provides a nice feature to offline the unresponsive endpoint URI due to communication error. A Service Level Agreement (SLA) rule can be defined to trigger the alert once endpoint URI is marked as offline. You can also define a SLA regarding response time. If the response time is longer than the defined value, the alert can be fired automatically by OSB.

Once alerted, it is up to you to decide what are the appropriate actions to take. If the problem on the endpoint persists and cannot be resolved quickly, a possible action you might want to take is to stop the Proxy Service for not receiving requests anymore until the problem is sorted out. You can manually disable the Proxy Service via OSB console or WLST to achieve this.  This blog shows an alternative approach which is to disable Proxy Service automatically via OSB's Java API when SLA is violated due to endpoint abnormalities.

The following steps highlight how to implement the solution:

  1. create a SLA rule which triggers alert if the response time is longer than the expected one or the endpoint is unreachable
  2. configure alert to be sent to a designated JMS queue
  3. create an "administrative" Proxy Service to listen on the JMS queue
  4. make a Java callout within the "administrative" Proxy Service to disable the relevant Proxy Service once the alert is received from JMS queue.  

 The details are as follows:

Create an alert destination to send alert to a designated JMS queue.

 Create the Business Service and enable the offline endpoint URI. If the endpoint URI is unreachable, OSB changes its status to offline automatically.

Define the SLA. An alert will be triggered if the max response time > 5000 ms OR the endpoint is marked as offline.

Create the "administrative" Proxy Service listening on the designated JMS queue

 The Proxy Service, upon the receipt of alert message, get the JMS headers set by OSB when sending alert to JMS queue. In this blog, we simply use alert message's ServiceName header to find out which Business Service fired this alert.  The XAPTH expression to achieve this is:


Based on the Business Service name, you can configure some parameters to find out which Proxy Service is supposed to be disabled. And finally, make a Java Callout to disable that Proxy Service. In this blog, the way to determine the Proxy Service is simply hard coded.

The following is the sample Java code which disables the Proxy Service. It shows how to programatically enable or disable a Proxy Service via Java API. The same code can be applied to Business Service as well.

package com.oracle.ateam;

import java.io.IOException;
import java.net.MalformedURLException;
import java.util.Hashtable;

import com.bea.wli.sb.management.configuration.SessionManagementMBean;
import javax.management.MBeanServerConnection;
import javax.management.ObjectName;
import javax.management.remote.JMXConnector;
import javax.management.remote.JMXConnectorFactory;
import javax.management.remote.JMXServiceURL;
import javax.naming.Context;
import weblogic.management.mbeanservers.domainruntime.DomainRuntimeServiceMBean;
import weblogic.management.jmx.MBeanServerInvocationHandler;

import com.bea.wli.config.Ref;
import com.bea.wli.sb.management.configuration.ProxyServiceConfigurationMBean;

public class ServiceManager {

    private static JMXConnector initConnection(String hostname, int port,
            String username, String password)
    throws IOException,MalformedURLException
        JMXServiceURL serviceURL =
        new JMXServiceURL("t3", hostname, port,
        "/jndi/" + DomainRuntimeServiceMBean.MBEANSERVER_JNDI_NAME);

        Hashtable<String, String> h = new Hashtable<String, String>();
        h.put(Context.SECURITY_PRINCIPAL, username);
        h.put(Context.SECURITY_CREDENTIALS, password);
        h.put(JMXConnectorFactory.PROTOCOL_PROVIDER_PACKAGES, "weblogic.management.remote");

        return JMXConnectorFactory.connect(serviceURL, h);

    private static Ref convertServiceURI2Ref(String resType,String serviceuri){
        Ref ref = null;
                return ref;

        String[] uriData = serviceuri.split("/");
        ref = new Ref(resType,uriData);
        return ref;

    public static void changeProxyServiceStatus(String serviceref,boolean status)throws Exception{
        JMXConnector conn = null;
        SessionManagementMBean sm = null;
        String sessionName = "Session.ByApp." + System.currentTimeMillis();


            conn = initConnection("localhost", 7001, "weblogic", "welcome1");
            MBeanServerConnection mbconn = conn.getMBeanServerConnection();
            DomainRuntimeServiceMBean domainService = (DomainRuntimeServiceMBean) MBeanServerInvocationHandler.
                 newProxyInstance(mbconn, new ObjectName(DomainRuntimeServiceMBean.OBJECT_NAME));

            sm = (SessionManagementMBean) domainService.
                             SessionManagementMBean.TYPE, null);


            ProxyServiceConfigurationMBean proxyConfigMBean = (ProxyServiceConfigurationMBean) domainService.
             findService(ProxyServiceConfigurationMBean.NAME + "." + sessionName,
                     ProxyServiceConfigurationMBean.TYPE, null);
            Ref ref = convertServiceURI2Ref("ProxyService",serviceref);
            String msg = "";
               msg="Disabled the Proxy Service : " + serviceref;
            else {             
                msg="enabled the Proxy Service : " + serviceref;

            sm.activateSession(sessionName, msg);
        }catch(Exception ex){
            if(null != sm) {
                }catch(Exception e) {
                    System.out.println("discard session error");
            throw ex;
            if(null != conn)
                }catch(Exception e) {


By putting a sleep of 10 seconds at the endpoint which is greater than the defined 5 seconds in SLA, the Proxy Service is automatically disabled as shown below.

Make another round of test by killing the endpoint, the same result (automatically disabled the Proxy Service) outcomes as expected. At the same time, we can also see the endpoint is automatically marked offline by OSB.

This blog demonstrates a way to automatically turn off a Proxy Service when the endpoint is unreachable or the SLA is violated. However, you need to be cautious when considering the automation. The reason is there are so many factors and criteria to consider and evaluate to make the decision of whether or when the Proxy Service should be disabled. Alternatively, you can also break the above sample into two steps for being alerted and disabling Proxy Service via Java API, then add a manual decision making step in between.      

Monday Jul 23, 2012

BPM 11g Task Form Version Considerations

This post discusses version considerations of ADF BPM Task Forms in a runtime context. How to have multiple versions of the same Task Form available for different versions of BPM processes.

[Read More]


This is the blog for the Oracle FMW Architects team fondly known as the A-Team. The A-Team is the central, technical, outbound team as part of the FMW Development organization working with Oracle's largest and most important customers. We support Oracle Sales, Consulting and Support when deep technical and architectural help is needed from Oracle Development.
Primarily this blog is tailored for SOA issues (BPEL, OSB, BPM, Adapters, CEP, B2B, JCAP)that are encountered by our team. Expect real solutions to customer problems, encountered during customer engagements.
We will highlight best practices, workarounds, architectural discussions, and discuss topics that are relevant in the SOA technical space today.


« April 2014