Using Transaction Timeouts in BPM and SOA
Introduction
Misconfigured timeouts in an enterprise application can lead to a cascade of problems, impacting performance, availability, and data integrity. Essentially, timeouts are safety mechanisms designed to prevent indefinite waits and resource exhaustion. When they are set too high, resources can be held unnecessarily, leading to bottlenecks and deadlocks. When set too low, legitimate operations may be prematurely aborted, resulting in errors, data inconsistencies, and poor user experience.
Let’s examine the timeouts involved in a typical SOA implementation, their relationship and proper configuration.
Timeouts Relationship
It’s important to highlight the interdependency of various application timeouts. They don’t exist in isolation; instead, they form a hierarchy or a nested structure. For a SOA or BPM application to function robustly, these timeouts must have a definitive and logical relationship, with some always being larger than others.
The general principle is that outermost timeouts should be larger than inner timeouts. This allows inner operations a chance to complete or fail gracefully within the scope of the larger encompassing operation. If this relationship is broken, it can lead to a host of confusing and difficult-to-diagnose problems.
General Consequences of Misaligned Timeouts:
- Unreliable Application Behavior: Transactions frequently fail or succeed in unexpected ways.
- Data Inconsistencies: Partial updates or uncommitted data can be left behind, requiring manual cleanup.
- Resource Exhaustion: Connections, threads, and memory can be held indefinitely.
- Performance Degradation: Operations take longer or fail, reducing throughput.
- Debugging Nightmares: Error messages are misleading, pointing to the wrong component or cause.
In general, the following relationships need to be followed:
syncMaxWaitTime < EJB’s transaction timeout <= XA transaction timeout
JTA timeout <= XA transaction timeout < distributed lock timeout
Let’s look into some of these timeouts in more detail.
BPEL/BPM syncMaxWaitTime
The syncMaxWaitTime is the maximum time a synchronous BPEL process waits before it times out to get the response from another BPEL process or a Web Service or some other breakpoint activities in a BPEL process. Here is an example given from the documentation:
“When the client (or another BPEL process) calls the process, the wait (breakpoint) activity is executed. However, since the wait is processed after some time by an asynchronous thread in the background, the executing thread returns to the client side. The client (actually the delivery service) tries to pick up the reply message, but it is not there since the reply activity in the process has not yet executed. Therefore, the client thread waits for the syncMaxWaitTime seconds value. If this time is exceeded, then the client thread returns to the caller with a timeout exception.If the wait is less than the syncMaxWaitTime value, the asynchronous background thread then resumes at the wait and executes the reply. The reply is placed in the HashMap and the waiter (the client thread) is notified. The client thread picks up the reply message and returns.”
The default value for this property is 45. Thus, if you have an invocation that times out after 45 seconds, then, this is likely the property which needs to modified.
From version 12.2.1.4.0 onward, the BPM Service Engine also includes a SyncMaxWaitTime property. Without this property, a call to a BPM process would not honor the SyncMaxWaitTime configured. Therefore, SyncMaxWaitTime needs to be set in two locations:
For more details on the property please see the documentation: 7.3 Specifying Transaction Timeout Values in Durable Synchronous Processes
BPEL/BPM EJB transaction timeouts
Enterprise JavaBeans (EJB) transaction timeouts are configurable from the WebLogic console, by accessing the deployments section and expanding soa-infra. There are a number of EJBs for BPEL and BPM. By default, the EJBs with configurable timeouts are set to 300 seconds (or 600 seconds, in some cases). If you need to modify the timeout for these EJBs, you can do so by clicking on the EJB, then selecting the Configuration tab, and editing the “Transaction Timeout” under the “Enterprise Bean Configuration” section. Note that not all EJBs timeout can edited.
EJB timeouts override the JTA timeout. This could lead to confusion in some cases when the JTA timeout is updated to a large value, say 600 seconds, but the transactions still timeout after 300 seconds (which is the default timeout for most EJBs). When that is the case, to find out which EJB caused the timeout, please, check the stack trace of the timeout exception for the participating EJB.
For more details, see: SOA 11g and SOA 12c: How to configure transaction timeouts for BPEL (Note 880313.1), Transactions in EJB Applications , and: How to Configure Transaction Timeouts for BPM EJBs (Note 1475462.1).
JTA timeout (or global transaction timeout)
The Java Transaction API (JTA) timeout is the global transaction timeout property of a WebLogic domain. It is configurable through the WebLogic console at the domain level. For more details, see: Configuring Transactions
The default JTA timeout value for a new domain is 30 seconds. This value is too small for most system configurations. As processes are deployed and the instance volume increases, a small JTA timeout could lead to a large number of transactions timing out that could have completed if they were given more time. This causes a chain reaction of widespread timeouts in many different classes since:
- These transactions will be rolledback (which has time and resource costs on the DB side).
- The transactions will be re-attempted multiple times and
- The transactions will, most likely, fail again since the required time to complete them will actually increase given the additional load caused by rollbacks and retries.
Because of all these reasons, we strongly recommend setting a JTA timeout of, at least, 300 seconds.
JDBC XA transaction timeout
The final timeout configurable within the WebLogic console is the JDBC XA Transaction Timeout (or XA timeout, for short). This is configurable on all of the available datasources associated with the transaction, which includes application specific datasources.
As an example, here is an exception from an application specific JDBC connection timeout being misconfigured:
Caused by: weblogic.transaction.TimedOutException: Transaction timed out after 341 seconds BEA1-1CB4156557436868781B at weblogic.jdbc.jta.DataSource.enlist(DataSource.java:1721) ... 72 more at weblogic.jdbc.jta.DataSource.refreshXAConnAndEnlist(DataSource.java:1629) at weblogic.jdbc.jta.DataSource.getConnectionInternal(DataSource.java:499) at weblogic.jdbc.jta.DataSource.getConnection(DataSource.java:483) at weblogic.jdbc.common.internal.RmiDataSource.getConnectionInternal(RmiDataSource.java:527) at weblogic.jdbc.common.internal.RmiDataSource.getConnection(RmiDataSource.java:513) at weblogic.jdbc.common.internal.RmiDataSource.getConnection(RmiDataSource.java:506) at com.<custom-package>.submeterfulfillmentisbpm.FaultHandler.saveFault(FaultHandler.java:90) at com.<custom-package>.submeterfulfillmentisbpm.FaultHandler.handleFault(FaultHandler.java:73) at com.collaxa.cube.engine.fp.RecoveryActionJava.execute(RecoveryActionJava.java:74) at com.collaxa.cube.engine.fp.BPELRecoverFault.recover(BPELRecoverFault.java:87) at oracle.fabric.CubeServiceEngine.recoverFault(CubeServiceEngine.java:1954) at oracle.integration.platform.faultpolicy.RecoverFault.recoverAndChain(RecoverFault.java:161) at oracle.integration.platform.faultpolicy.RecoverFault.resolveAndRecover(RecoverFault.java:124) at oracle.integration.platform.faultpolicy.FaultRecoveryManagerImpl.resolveAndRecover(FaultRecoveryManagerImpl.java:121) at com.collaxa.cube.engine.ext.common.FaultPolicyHandler.resolveAndRecover(FaultPolicyHandler.java:90) at com.collaxa.cube.engine.ext.common.InvokeHandler.handleException(InvokeHandler.java:697) ...
In the above exception you can see that the engine attempted to perform a fault recovery and probably entered a customers custom class to do the recovery which ended up making a JDBC call that timed out.
In some scenarios, you might need to disable the XA Transaction Timeout (to disable it, set XATransactionTimeout to 0). Such is the case when you face a “java.sql.SQLException: XA error: XAResource.XAER_NOTA start() failed on resource '<resource-name>': XAER_NOTA : The XID is not valid.
”
In order to review the JDBC XA timeout values:
|
For the more details, see these articles:
- Unexpected Exception while Enlisting XAConnection java.sql.SQLException:… ‘The XID is not valid’ (Note 1389691.1)
- XAER_NOTA: The XID is not valid, message: null Occurs Randomly (Note 1138284.1)
- Resolving XAER_NOTA : The XID is not valid.
- JDBC Data Source: Configuration: Transaction
Oracle DB distributed lock timeout
The distributed lock timeout is the distributed transaction timeout property on the Oracle DB side. For more details, see: DISTRIBUTED_LOCK_TIMEOUT documentation
The recommended value is 900 seconds. This will allow the Weblogic transaction manager to be in control of transaction timeouts and will prevent DB resources to timeout before the JTA global transaction timeout expires.
Transaction timeout stack trace example
Here is an example from the .out log in weblogic which shows a transaction timeout and specifically identifies which resource caused the issue:
<Mar 30, 2015 3:12:42 PM BRT> <Error> <EJB> <BEA-010026> <Exception occurred during commit of transaction Name=[EJB oracle.bpm.bpmn.engine.ejb.impl.BPMNDeliveryBean.handleInvoke(com.collaxa.cube.engine.dispatch.message.invoke.InvokeInstanceMessage)],Xid=BEA1-068AA4A6EB866868781B(298385209),Status=Rolled back. [Reason=oracle.jdbc.xa.OracleXAException],numRepliesOwedMe=0,numRepliesOwedOthers=0,seconds since begin=68,seconds left=60,XAServerResourceInfo[SOADataSource_bpmnqa_domain]=(ServerResourceInfo[SOADataSource_bpmnqa_domain]=(state=rolledback,assigned=soa1),xar=SOADataSource,re-Registered = false),XAServerResourceInfo[IntegracaoDS_bpmnqa_domain]=(ServerResourceInfo[IntegracaoDS_bpmnqa_domain]=(state=rolledback,assigned=soa1),xar=IntegracaoDS,re-Registered = false),SCInfo[bpmnqa_domain+soa1]=(state=rolledback),properties=({weblogic.transaction.name=[EJB oracle.bpm.bpmn.engine.ejb.impl.BPMNDeliveryBean.handleInvoke(com.collaxa.cube.engine.dispatch.message.invoke.InvokeInstanceMessage)], weblogic.jdbc.mp.SOADataSource=SOADataSource-rac0}),local properties=({weblogic.jdbc.jta.IntegracaoDS=[ No XAConnection is attached to this TxInfo ], weblogic.jdbc.jta.SOADataSource=[ No XAConnection is attached to this TxInfo ]}),OwnerTransactionManager=ServerTM[ServerCoordinatorDescriptor=(CoordinatorURL=soa1+<server-ip>:8000+bpmnqa_domain+t3+, XAResources={eis/Apps/Apps, WLStore_bpmnqa_domain_BPMJMSFileStore_auto_1, eis/activemq/Queue, eis/AQ/aqSample, EDNDataSource_bpmnqa_domain, eis/tibjmsDirect/Queue, eis/jbossmq/Queue, SOADataSource-rac0_bpmnqa_domain, eis/aqjms/Topic, eis/webspheremq/Queue, WLStore_bpmnqa_domain_UMSJMSFileStore_auto_1, IntegracaoDS-rac0_bpmnqa_domain, eis/aqjms/Queue, SOADataSource_bpmnqa_domain, WLStore_bpmnqa_domain_AGJMSFileStore_auto_1, eis/sunmq/Queue, eis/pramati/Queue, WLStore_bpmnqa_domain__WLS_soa1, WLStore_bpmnqa_domain_SOAJMSFileStore_auto_1, eis/tibjms/Topic, IntegracaoDS-rac1_bpmnqa_domain, IntegracaoDS_bpmnqa_domain, eis/wls/Queue, eis/tibjmsDirect/Topic, EDNDataSource-rac0_bpmnqa_domain, eis/wls/Topic, eis/tibjms/Queue, SOADataSource-rac1_bpmnqa_domain, WSATGatewayRM_soa1_bpmnqa_domain, EDNDataSource-rac1_bpmnqa_domain, WLStore_bpmnqa_domain_PS6SOAJMSFileStore_auto_1, eis/fioranomq/Topic},NonXAResources={})],CoordinatorURL=soa1+<server-ip>:8000+bpmnqa_domain+t3+): weblogic.transaction.RollbackException: Could not prepare resource 'IntegracaoDS_bpmnqa_domain at weblogic.transaction.internal.TransactionImpl.throwRollbackException(TransactionImpl.java:1884) at weblogic.transaction.internal.ServerTransactionImpl.internalCommit(ServerTransactionImpl.java:376) at weblogic.transaction.internal.ServerTransactionImpl.commit(ServerTransactionImpl.java:268) at weblogic.ejb.container.internal.BaseLocalObject.__WL_postInvokeTxRetry(BaseLocalObject.java:455) at weblogic.ejb.container.internal.SessionLocalMethodInvoker.invoke(SessionLocalMethodInvoker.java:52) at oracle.bpm.bpmn.engine.ejb.impl.BPMNDeliveryBean_of8dk6_ICubeDeliveryLocalBeanImpl.handleInvoke(Unknown Source) at com.collaxa.cube.engine.dispatch.message.invoke.InvokeInstanceMessageHandler.handle(InvokeInstanceMessageHandler.java:30) at com.collaxa.cube.engine.dispatch.DispatchHelper.handleMessage(DispatchHelper.java:141) at com.collaxa.cube.engine.dispatch.BaseDispatchTask.process(BaseDispatchTask.java:89) at com.collaxa.cube.engine.dispatch.BaseDispatchTask.run(BaseDispatchTask.java:66) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919) at com.collaxa.cube.engine.dispatch.Dispatcher$ContextCapturingThreadFactory$2.run(Dispatcher.java:933) at java.lang.Thread.run(Thread.java:682) Caused by: oracle.jdbc.xa.OracleXAException at oracle.jdbc.xa.OracleXAResource.checkError(OracleXAResource.java:1657) at oracle.jdbc.xa.client.OracleXAResource.prepare(OracleXAResource.java:947) at weblogic.jdbc.jta.DataSource.prepare(DataSource.java:1035) at weblogic.transaction.internal.XAServerResourceInfo.prepare(XAServerResourceInfo.java:1346) at weblogic.transaction.internal.XAServerResourceInfo.prepare(XAServerResourceInfo.java:516) at weblogic.transaction.internal.ServerSCInfo$1.run(ServerSCInfo.java:373) at weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl.run(SelfTuningWorkManagerImpl.java:545) at weblogic.work.ExecuteThread.execute(ExecuteThread.java:256) at weblogic.work.ExecuteThread.run(ExecuteThread.java:221) Caused by: java.sql.SQLException: ORA-24756: transaction does not exist at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:462) at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:397) at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:389) at oracle.jdbc.driver.T4CTTIfun.processError(T4CTTIfun.java:689) at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:481) at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:205) at oracle.jdbc.driver.T4CTTIOtxen.doOTXEN(T4CTTIOtxen.java:171) at oracle.jdbc.driver.T4CXAResource.doTransaction(T4CXAResource.java:773) at oracle.jdbc.driver.T4CXAResource.doPrepare(T4CXAResource.java:534) at oracle.jdbc.xa.client.OracleXAResource.prepare(OracleXAResource.java:916) ... 7 more .>
In this situation the transaction attempted to prepare the resource IntegracaoDS_bpmnqa_domain
(this was actually a Datasource), and the operation failed because its timeout had already been exceeded. To resolve this issue, it is recommended to set “Set XA TransactionTimeout” to true and set “XA Transaction Timeout” to 0.
Conclusion
In conclusion, managing timeouts in an enterprise application is a subtle art. It’s not just about setting individual values, but about understanding their hierarchical relationship and ensuring that outer timeouts are always sufficiently larger than inner ones. This provides a buffer, allowing internal components to fail or complete their work gracefully before the broader operation is aborted. Constant monitoring, profiling, and iterative tuning are essential to strike the right balance and ensure system stability and performance.
Related Articles
- Health Check List when Configuring XA Transactions in Oracle SOA 11g (Note 1201244.1)
- SOA 11g and SOA 12c: How to configure transaction timeouts for BPEL (Note 880313.1).
- Troubleshooting ORA-24756 while Running an XA Program or MSDTC with the Oracle RDBMS (Doc ID 1076242.6).
- Unexpected Exception while Enlisting XAConnection java.sql.SQLException:… ‘The XID is not valid’ (Note 1389691.1)
- XAER_NOTA: The XID is not valid, message: null Occurs Randomly (Note 1138284.1).