Fault Management Framework by Example

Overview

The purpose of the Fault Management Framework is to provide error handling that is external to SOA and does not impact the SOA/BPEL design or runtime. The framework is implemented using policies defined in XML. These policies are reusable across composites/components and can catch both runtime and business faults. Once a fault is caught, the policy defines actions that can be used for the SOA instance such as retry, human intervention, replay scope, rethrow fault, abort, and custom Java actions. When human intervention comes into play, the Enterprise Manager provides a GUI for managing the faulted instance.

When the policies have been defined and bound to composites and/or components, the framework will intercept the fault before the standard fault handler comes into play. For example: if a BPEL process has defined standard BPEL fault handling and a fault policy has been defined/bound to the BPEL process, when a fault occurs the framework will intercept the fault allowing any of the supported actions to be applied to the instance:

The fault policy files are loaded at startup, so when any changes are made to them a server restart is required.  The location for the fault policy files can be in the same directory as the composite.xml or in a location identified by a property in the composite.xml:

<property name="oracle.composite.faultPolicyFile">
   oramds:/apps/faultpolicyfiles/fault-policies.xml
</property>

<property name="oracle.composite.faultBindingFile">
   oramds:/apps/faultpolicyfiles/fault-bindings.xml
</property>


When using the property settings in the composite.xml, you can then use a different name for the files instead of the default.

Fault Policies (fault-policies.xml / fault-policies.xsd)

There are two XML policy files required to setup the Fault Management Framework in SOA, the first of which is the fault-policies.xml. This file contains one or more fault policy definitions, fault definitions (which can also include conditions), and actions:

NOTE: Pay close attention to the case for elements in the policy files. If you don't have an editor that enforces the schema, it's very easy to define an element like <action> instead of <Action>.

<faultPolicy> Element

In order to more easily manage all the possible faults an enterprise deployment will contain, you have the option of logically grouping your faults using multiple fault policies. Each policy is defined using the <faultPolicy> Element. Each policy definition must contain a unique policy id:

<faultPolicy version="0.0.1" id="FusionMidFaults"
 xmlns:env="http://schemas.xmlsoap.org/soap/envelope/"
 xmlns:xs="http://www.w3.org/2001/XMLSchema"
 xmlns="http://schemas.oracle.com/bpel/faultpolicy"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<faultName> Element

Within the <faultPolicy> Element, you will define all the faults associated with the policy wrapped in a <Conditions> Element. Each policy name definition must contain a fault identified by the QName (e.g., bpelx:remoteFault) and an associated action “reference”. You can further refine the fault with an XPath expression to test for values (e.g., $fault.code="3220"):

<Conditions>
   <faultName
      xmlns:bpelx="http://schemas.oracle.com/bpel/extension"
      name="bpelx:remoteFault”>
      <condition>
         <test>$fault.code="3220"</test>
         <action ref="ora-retry"/>
      </condition>
   </faultName>

...

</Conditions>

<Action> Element

Following the </Conditions> Element, you will define individual actions associated with the policy wrapped in an <Actions> Element. Each action definition must contain a unique action id and action specification:

...

   <Actions>

      <Action id="ora-retry">
         <retry>
            <retryCount>3</retryCount>
            <retryInterval>5</retryInterval>
            <exponentialBackoff/>
            <retryFailureAction ref="ora-terminate"/>
         </retry>
      </Action>

      <Action id="ora-terminate">
         <abort/>
      </Action>

   </Actions>

</faultPolicy>

On thing you will notice in the product documentation is that the action ids use a certain nomenclature where everything is prefixed with “ora-”. It is a common misunderstanding that the ids are reserved, but in reality you can use any name you wish. It's the action specification elements that define what the action definition will do and the ids are used as “references”. For example: in the code snippet above the ora-retry contains a retryFailureAction with a reference to ora-terminate, another action definition.

Fault Bindings (fault-bindings.xml / fault-bindings.xsd)

The second policy file that is required by the Fault Management Framework is the fault-bindings.xml.  This policy file will bind (or map) policies defined in the fault-policies.xml file to levels within the composite.  These levels include:

  • Composite Application
  • Component
    • Reference
    • BPEL Process
    • Mediator

Composite Application Binding

When binding to a composite application, use the <composite> element with an attribute called faultPolicy.  The value of the faultPolicy attribute must match a policy id defined in the fault-policies.xml:

  <composite faultPolicy="FusionMidFaults"/>

Reference Binding

When binding to a reference, use the <reference> element with an attribute called faultPolicy.  The value of the faultPolicy attribute must match a policy id defined in the fault-policies.xml.  You will also need to specify a <name> or <portType> element:

  <reference faultPolicy="FusionMidFaults">
    <name>creditRatingService</name>
    <portType xmlns:credit="http://services.otn.com">credit:CreditRatingService</portType>
  </reference>

  <reference faultPolicy="FusionMidFaults">
    <name>CreditApprovalService</name>
  </reference>

BPEL Process Binding

When binding to a BPEL Process, use the <component> element with an attribute called faultPolicy.  The value of the faultPolicy attribute must match a policy id defined in the fault-policies.xml.  You will also need to specify a <name> element containing the name of the BPEL process:

  <component faultPolicy="FusionMidFaults">
    <name>HelloWorld</name>
    <name>ShippingComponent</name>
  </component>

Mediator Binding

When binding to a mediator, use the <component> element with an attribute called faultPolicy.  The value of the faultPolicy attribute must match a policy id defined in the fault-policies.xml.  You will also need to specify a <name> element containing the name of the meditor:

  <component faultPolicy="FusionMidFaults">
    <name>RouteToShippingMediator</name>
  </component>

The Example (bpel-300-FaultHandlingFramework_rev1.0.jar)

I'm a big proponent of following up written text with a working example to help better understand what the text is trying to convey. The example I put together here will demonstrate all of the actions provided out of the box by the Fault Management Framework including custom Java and something I am calling throw vs. reply.

To use the example, save the sca_bpel-300-FaultHandlingFramework_rev1.0.zip somewhere on your file system and extract the .jar. Then in JDeveloper either create a new SOA project and then import the .jar using the “SOA Archive into SOA Project” option:

You should see something like the following:


At this point, open the fault-policies.xml and fault-binding.xml files and review the contents. You will see that there are multiple policies defined and those policies are bound to various levels within the composite application. Once you have reviewed the policy files, deploy the composite to a running SOA server. Then open EM and select the deployed composite and navigate to the Test page. You will see that there is only one value to provide and it's called faultAction:

To test the various scenarios, the following are the values you can provide for the faultAction:

  • ora-retry

  • ora-human-intervention

  • ora-terminate

  • ora-rethrow-fault

  • ora-replay-scope

  • ora-java

  • mediator

  • throw-vs-reply (see more details below)

  • reply-with-fault (see more details below)

After inputting one of the faultAction values mentioned above and pushing the “Test Web Service” button, review the instance by pushing the “Launch flow Trace” button. You will be able to examine the Trace and Audit Trails to see how the Fault Management Framework is behaving. Once you have a better feeling for what's going on, try changing the policies, redeploy the composite, restart the server, and run the test(s) again to see how your updates compare to what I provided.

throw-vs-reply / reply-with-fault

The throw-vs-reply and reply-with-fault faultAction requires a bit of explanation. I ran across a situation where a “poor design decision” caused a point of confusion with regard to the Fault Management Framework. The scenario that was causing the confusion was as follows: a BPEL process invokes another synchronous BPEL process. The second BPEL process contained an asynchronous flow by invoking a JMS adapter followed by a receive. Even though the BPEL process is defined as synchronous and the response back from the JMS adapter was almost immediate, the JMS adapter caused a dehydration thus a new thread picked up the JMS response to deliver the results. If the new thread were to throw a fault, the BPEL engine would handle it because the original thread was gone due to the dehydration (i.e., the correlation between the first thread and the Fault Management Framework was lost). Furthermore, the originating BPEL process “invoke” will timeout because there is no response or fault flowing to it. For this scenario we do have an option for regaining the correlation: instead of “throwing” an exception you can “reply” with the exception as the payload.

To simplify things with my example, I did not implement the JMS Adapter scenario. Instead, I used a sleep in the FaultGeneratorBPELProcess to force a dehydration which surfaces the same behavior. To see the “point of confusion”, provide throw-vs-reply for the faultAction value and see the timeout exception happen. To see the “solution/expected behavior”, provide reply-with-fault for the faultAction value.  I would also recommend looking at the logic in the FaultGeneratorBPELProcess to see what it's doing with regard to throw-vs-reply and reply-with-fault.

Hopefully this provides some valuable insight into the Fault Management Framework and it's capabilities.

Greg


Comments:

Is there any documentation about using 'conditions' how could we access elements from the detail payload to check for condition

Posted by guest on September 01, 2011 at 07:37 AM PDT #

A cool thing is that this kind of policy can be reused for other SOA Components, like Mediators. In this case though, it is important to keep in mind that the policy will work only if the routing rule of the Mediator is parallel (aka deferred), as mentioned in the documentation at http://download.oracle.com/docs/cd/E17904_01/integration.1111/e10224/med_faulthandling.htm#CHDJAADA, see the Note in paragraph 22.1.1.

Posted by olivier.lediouris on September 19, 2011 at 09:27 AM PDT #

Regarding the "Is there any documentation about using 'conditions' how could we access elements from the detail payload to check for condition" question: There isn't any detailed documentation because the string you provide in the <test> tag is an XPath expression and XPath is already well documented. The only thing to remember is you access the fault payload like the following: $fault.[some path] (e.g., $fault.code="2011").

Posted by Greg on September 29, 2011 at 03:15 AM PDT #

Do you have to use a Java Action within the fault policy to call a composite that could be used for error handling? For example I could configure a composite that generates an email, writes to a log, etc instead of having my java class do that.

Is it possible to call a specific composite from the fault policy?

Any info would be appreciated.

Thanks,

--M

Posted by Mike G on October 04, 2011 at 05:53 AM PDT #

The Fault Management Framework (FMF) doesn't have an action that will invoke a SOA component directly (e.g., BPEL Process). You could either write your own Java Action Fault Policy that invokes the BPEL Process using Java from within the FMF or rethrow the fault from the FMF and have the standard BPEL fault handlers invoke the error handling BPEL Process.

Posted by Greg on October 04, 2011 at 10:26 AM PDT #

Thanks for the info Greg. I really appreciate it.

Couple more questions if you don't mind:

1. Can you raise events from FMF? Or does that go back to the composite question I asked earlier? I was just thinking that instead of calling a composite if I could raise an event from FMF it would have the same effect.

2. Do you know when you calling a composite from Java what permissions the calling profile has to have in order to be able to make the call?

For instance when I am building my Hashtable:

Hashtable jndiProps = new Hashtable();
jndiProps.put(Context.PROVIDER_URL, "t3://localhost:80/soa-infra/");
jndiProps.put(Context.INITIAL_CONTEXT_FACTORY,"weblogic.jndi.WLInitialContextFactory");
jndiProps.put(Context.SECURITY_PRINCIPAL, "weblogicJavaUser");
jndiProps.put(Context.SECURITY_CREDENTIALS,"password");
jndiProps.put("dedicated.connection", "true");

Thanks Greg!

--M

Posted by Mike G on October 12, 2011 at 04:39 AM PDT #

Mike, when you say "raise events" from FWF are you talking about raising events through the Event Delivery Network feature of 11g? Assuming that's what you are referring to, this is not an available action but seems like it would make sense to provide something like that. I'll pass this along to Product Management and see where it goes ;)

But again, similar to my previous comment, you could invoke a mediator component that will raise the event using the Java Action Fault Policy.

With regard to the permissions to use in your example, obviously users in the administrator group will work. I have to admit that every example of this I've seen uses the weblogic/admin user. It's easy enough to test whether or not other groups would work for your weblogicJavaUser (e.g., OracleSystemGroup).

Posted by Greg on October 14, 2011 at 04:16 AM PDT #

Thanks Greg!

Sorr about that I should have been more specific. I was referring to the Event Delivery Network. Thanks for passing that along.

All of the examples I found were using the weblogic user as well. I am having some authentication issues with the user(s) I have been trying. Our admin doesn't want to give me the weblogic password so I had him create another weblogic account with admin and still not working. That is why I was inquiring just trying to figure out why it keeps saying Login failed for unknown reason.

Thanks again for all the help. I will proceed forward with the Java Action path.

Posted by Mike G on October 14, 2011 at 07:31 AM PDT #

Hi Greg,

it seams as if the fault-bindings.xml file is rerencing a non existing fault-policy "ora-rethrow-fault-sample" while fault-policy "ora-terminate-sample" occurs two times (lines 82, 113) in fault-policies.xml.
I don't think this is intended.

Kind regards
Volker

Posted by voreiche on April 29, 2012 at 04:19 AM PDT #

Hey Volker,

Good catch! It's great to see that someone is looking closely at the example. Where there are things like this in an example and it's not behaving as one would expect, digging into why it's not working helps the learning experience ;) Make the appropriate fix(es) and retest :D

Posted by Greg on May 03, 2012 at 08:42 AM PDT #

hi

can any one tell me how to refer two actions at a time in faultpoicies.xml file . To make it clear to you i want to call both rethrow action and human intervention action after retry . please refer below faultpolicy file

<?xml version="1.0" encoding="windows-1252" ?>
<faultPolicies xmlns="http://schemas.oracle.com/bpel/faultpolicy">
<faultPolicy version="2.0.1" id="BpelFaultMechanism"
xmlns:env="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://schemas.oracle.com/bpel/faultpolicy"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Conditions>
<faultName xmlns:bpelx="http://schemas.oracle.com/bpel/extension"
name="bpelx:bindingFault">
<condition>
<action ref="retry-action"/>
</condition>
</faultName>
</Conditions>
<Actions>
<Action id="retry-action">
<retry>
<retryCount>4</retryCount>
<retryInterval>2</retryInterval>
<retryFailureAction ref="rethrow-action"/>(i need to call human intervention as well)
</retry>
</Action>
<Action id="human-intervention-action">
<humanIntervention/>
</Action>
<Action id="rethrow-action">
<rethrowFault/>

</Action>
</Actions>
</faultPolicy>
</faultPolicies>

please reply asap

Posted by guest on September 11, 2012 at 04:17 AM PDT #

Can you expand on why you would need to do a rethrow and human intervention? I don't see how this can work because the rethrow sends control back to the instance to handle the exception. Human intervention provides access to the faulted instance for manual actives etc.

Posted by Greg on September 14, 2012 at 02:46 AM PDT #

Is it possible to make retryCount, retryinterval variable?
If I make policies to be used across most of my composites, having a good fault-policies file, I run into the problem that in some situations I would like to have 5 secs. for retryCount, and in some others 10 secs. The same applies to all the parameters that configure the retry. At least the ones for timing.
Right now I have to deploy fault-policies file with the composite. They are only reusable for the format.

Posted by guest on September 24, 2012 at 05:17 AM PDT #

You can add a policy as fine grained as a reference. What is the granularity of your fault-bindings? I'm trying to understand your scenario that would require the policies to leverage variables. Also, where would you define the variables?

Posted by Greg on October 08, 2012 at 10:59 AM PDT #

The context:
This page appears to be the most detailed explanation of how to use the fault management fault-policies.xml file in SOA Suite 11g. http://docs.oracle.com/cd/E23943_01/dev.1111/e10224/bp_faults.htm#autoId14

The section titled "Creating a Fault Policy File for Automated Fault Recovery" describes the <Conditions><faultName><condition><test> XML structure and gives an example of using $fault.code.

An actual SOAP fault looks like this:
<soap:Fault>
<faultcode>soap:Server</faultcode>
<faultstring>com.viasat.wildblue.common.exception.WebServiceException</faultstring>
<detail>
<ns1:MyCustomDetail xmlns:ns1="http://my.custom.fault.detail.type.namespace.uri">
<ns1:someDetailAttribute></ns1:someDetailAttribute>
</ns1:MyCustomDetail>
</detail>
</soap:Fault>
Assuming the wsdl defined the message type used as the fault as follows:

<wsdl:message name="MyWebServiceException">
<wsdl:part name="parameterPart" element="ns2:MyCustomDetail"/>
</wsdl:message>

There is some speculation that the part-name can be referenced relative to the $fault variable that is in scope when the SOA Suite fault management system evaluates the fault-policies.xml "tests".

For instance, it isn't clear whether this might be a valid test expression, but some forum posts, and information in 3rd party Oracle SOA Suite books leads to the conclusion that this would be valid:
<test>$fault.parameterPart/someDetailAttribute="doh"</test>

Also, the information at docs.oracle.com implies that $fault.code maps to the SOAP fault's "faultcode" element, but it does not say this clearly or directly.

Also, the docs.oracle.com page says...
• Each condition has one test section (an XPath expression) and one action section.
• The test section (XPath expression) is evaluated for the fault variable available in the fault.
... which suggests that XPath functions might work in a test like...
<test>$fault.code[contains(.,'doh')]</test>
This is also suggested in this forum post: https://forums.oracle.com/forums/thread.jspa?threadID=909716

The questions:
1. What attribute of $fault would be mapped to a SOAP fault's faultstring element? $fault.string ?? $fault.summary ??
2. Are xpath functions like contains() officially supported?
3. Where is the reference documentation that explains how the $fault variable is mapped to a SOAP fault (since that would be the most likely case in a SOA orchestration tool)?

Posted by Dave Willard on January 08, 2013 at 01:34 PM PST #

cool stuff!!! thanks for explanation and post

Posted by Phani Kumar on March 31, 2013 at 12:07 AM PDT #

I have used fault policies in a async service to do retry 2 times in 30 seconds for a sync service invoke . it successfully retried two times + 1 try for normal flow but i have seen there was a fourth retry after 10 hours....... i dont understand from where this fourth try comes. in my fault policy i am using a rethrow after two successful retries.

Posted by guest on April 25, 2013 at 08:13 AM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About


This is the blog for the Oracle FMW Architects team fondly known as the A-Team. The A-Team is the central, technical, outbound team as part of the FMW Development organization working with Oracle's largest and most important customers. We support Oracle Sales, Consulting and Support when deep technical and architectural help is needed from Oracle Development.
Primarily this blog is tailored for SOA issues (BPEL, OSB, BPM, Adapters, CEP, B2B, JCAP)that are encountered by our team. Expect real solutions to customer problems, encountered during customer engagements.
We will highlight best practices, workarounds, architectural discussions, and discuss topics that are relevant in the SOA technical space today.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today