Monday Dec 30, 2013

Going Native with JCA Adapters

Formatting JCA Adapter Binary Contents

Sometimes you just need to go native and play with binary data rather than XML.  This occurs commonly when using JCA adapters, the file to be written is in binary format, or the TCP messsages written by the Socket Adapter are in binary format.  Although the adapter has no problem converting Base64 data into raw binary, it is a little tricky to get that data into base64 format in the first place, so this blog entry will explain how.

Adapter Creation

When creating most adapters (application & DB being the exceptions) you have the option of choosing the message format.  By making the message format “opaque” you are telling the adapter wizard that the message data will be provided as a base-64 encoded string and the adapter will convert this to binary and deliver it.

This results in a WSDL message defined as shown below:

<wsdl:types>
<schema targetNamespace="http://xmlns.oracle.com/pcbpel/adapter/opaque/"
        xmlns="http://www.w3.org/2001/XMLSchema" >
  <element name="opaqueElement" type="base64Binary" />
</schema>
</wsdl:types>
<wsdl:message name="Write_msg">
    <wsdl:part name="opaque" element="opaque:opaqueElement"/>
</wsdl:message>

The Challenge

The challenge now is to convert out data into a base-64 encoded string.  For this we have to turn to the service bus and MFL.

Within the service bus we use the MFL editor to define the format of the binary data.  In our example we will have variable length strings that start with a 1 byte length field as well as 32-bit integers and 64-bit floating point numbers.

The example below shows a sample MFL file to describe the above data structure:

<?xml version='1.0' encoding='windows-1252'?>
<!DOCTYPE MessageFormat SYSTEM 'mfl.dtd'>
<!--   Enter description of the message format here.   -->
<MessageFormat name='BinaryMessageFormat' version='2.02'>
    <FieldFormat name='stringField1' type='String' delimOptional='y' codepage='UTF-8'>
        <LenField type='UTinyInt'/>
    </FieldFormat>
    <FieldFormat name='intField' type='LittleEndian4' delimOptional='y'/>
    <FieldFormat name='doubleField' type='LittleEndianDouble' delimOptional='y'/>
    <FieldFormat name='stringField2' type='String' delimOptional='y' codepage='UTF-8'>
        <LenField type='UTinyInt'/>
    </FieldFormat>
</MessageFormat>

Note that we can define the endianess of the multi-byte numbers, in this case they are specified as little endian (Intel format).

I also created an XML version of the MFL that can be used in interfaces.

The XML version can then be imported into a WSDL document to create a web service.

Full Steam Ahead

We now have all the pieces we need to convert XML to binary and deliver it via an adapter using the process shown below:

  • We receive the XML request, in the sample code, the sample delivers it as a web service.
  • We then convert the request data into MFL format XML using an XQuery and store the result in a variable (mflVar).
  • We then convert the MFL formatted XML into binary data (internally this is held as a java byte array) and store the result in a variable (binVar).
  • We then convert the byte array to a base-64 string using javax.xml.bind.DatatypeConverter.printBase64Binary and store the result in a variable (base64Var).
  • Finally we replace the original $body contents with the output of an XQuery that matches the adapter expected XML format.

The diagram below shows the OSB pipeline that implements the above.

A Wrinkle

Unfortunately we can only call static Java methods that reside in a jar file imported into service bus, so we have to provide a wrapper for the printBase64Binary call.  The below Java code was used to provide this wrapper:

package antony.blog;

import javax.xml.bind.DatatypeConverter;

public class Base64Encoder {
    public static String base64encode(byte[] content) {
        return DatatypeConverter.printBase64Binary(content);
    }
    public static byte[] base64decode(String content) {
        return DatatypeConverter.parseBase64Binary(content);
    }
}

Wrapping Up

Sample code is available here and consists of the following projects:

  • BinaryAdapter – JDeveloper SOA Project that defines the JCA File Adapter
  • OSBUtils – JDeveloper Java Project that defines the Java wrapper for DataTypeConverter
  • BinaryFileWriter – Eclipse OSB Project that includes everything needed to try out the steps in this blog.

The OSB project needs to be customized to have the logical directory name point to something sensible.  The project can be tested using the normal OSB console test screen.

The following sample input (note 16909060 is 0x01020304)

<bin:OutputMessage xmlns:bin="http://www.example.org/BinarySchema">
    <bin:stringField1>First String</bin:stringField1>
    <bin:intField>16909060</bin:intField>
    <bin:doubleField>1.5</bin:doubleField>
    <bin:stringField2>Second String</bin:stringField2>
</bin:OutputMessage>

Generates the following binary data file – displayed using “hexdump –C”.  The int is highlighted in yellow, the double in orange and the strings and their associated lengths in green with the length in bold.

$ hexdump -C 2.bin
00000000  0c 46 69 72 73 74 20 53  74 72 69 6e 67 04 03 02  |.First String...|
00000010  01 00 00 00 00 00 00 f8  3f 0d 53 65 63 6f 6e 64  |........?.Second|
00000020  20 53 74 72 69 6e 67                              | String|

Although we used a web service writing through to a file adapter we could have equally well used the socket adapter to send the data to a TCP endpoint.  Similarly the source of the data could be anything.  The same principle can be applied to decode binary data, just reverse the steps and use Java method parseBase64Binary instead of printBase64Binary.

Wednesday Oct 10, 2012

Following the Thread in OSB

Threading in OSB

The Scenario

I recently led an OSB POC where we needed to get high throughput from an OSB pipeline that had the following logic:

1. Receive Request
2. Send Request to External System
3. If Response has a particular value
  3.1 Modify Request
  3.2 Resend Request to External System
4. Send Response back to Requestor

All looks very straightforward and no nasty wrinkles along the way.  The flow was implemented in OSB as follows (see diagram for more details):

  • Proxy Service to Receive Request and Send Response
  • Request Pipeline
    •   Copies Original Request for use in step 3
  • Route Node
    •   Sends Request to External System exposed as a Business Service
  • Response Pipeline
    •   Checks Response to Check If Request Needs to Be Resubmitted
      • Modify Request
      • Callout to External System (same Business Service as Route Node)

The Proxy and the Business Service were each assigned their own Work Manager, effectively giving each of them their own thread pool.

The Surprise

Imagine our surprise when, on stressing the system we saw it lock up, with large numbers of blocked threads.  The reason for the lock up is due to some subtleties in the OSB thread model which is the topic of this post.

 

Basic Thread Model

OSB goes to great lengths to avoid holding on to threads.  Lets start by looking at how how OSB deals with a simple request/response routing to a business service in a route node.

Most Business Services are implemented by OSB in two parts.  The first part uses the request thread to send the request to the target.  In the diagram this is represented by the thread T1.  After sending the request to the target (the Business Service in our diagram) the request thread is released back to whatever pool it came from.  A multiplexor (muxer) is used to wait for the response.  When the response is received the muxer hands off the response to a new thread that is used to execute the response pipeline, this is represented in the diagram by T2.

OSB allows you to assign different Work Managers and hence different thread pools to each Proxy Service and Business Service.  In out example we have the “Proxy Service Work Manager” assigned to the Proxy Service and the “Business Service Work Manager” assigned to the Business Service.  Note that the Business Service Work Manager is only used to assign the thread to process the response, it is never used to process the request.

This architecture means that while waiting for a response from a business service there are no threads in use, which makes for better scalability in terms of thread usage.

First Wrinkle

Note that if the Proxy and the Business Service both use the same Work Manager then there is potential for starvation.  For example:

  • Request Pipeline makes a blocking callout, say to perform a database read.
  • Business Service response tries to allocate a thread from thread pool but all threads are blocked in the database read.
  • New requests arrive and contend with responses arriving for the available threads.

Similar problems can occur if the response pipeline blocks for some reason, maybe a database update for example.

Solution

The solution to this is to make sure that the Proxy and Business Service use different Work Managers so that they do not contend with each other for threads.

Do Nothing Route Thread Model

So what happens if there is no route node?  In this case OSB just echoes the Request message as a Response message, but what happens to the threads?  OSB still uses a separate thread for the response, but in this case the Work Manager used is the Default Work Manager.

So this is really a special case of the Basic Thread Model discussed above, except that the response pipeline will always execute on the Default Work Manager.

 

Proxy Chaining Thread Model

So what happens when the route node is actually calling a Proxy Service rather than a Business Service, does the second Proxy Service use its own Thread or does it re-use the thread of the original Request Pipeline?

Well as you can see from the diagram when a route node calls another proxy service then the original Work Manager is used for both request pipelines.  Similarly the response pipeline uses the Work Manager associated with the ultimate Business Service invoked via a Route Node.  This actually fits in with the earlier description I gave about Business Services and by extension Route Nodes they “… uses the request thread to send the request to the target”.

Call Out Threading Model

So what happens when you make a Service Callout to a Business Service from within a pipeline.  The documentation says that “The pipeline processor will block the thread until the response arrives asynchronously” when using a Service Callout.  What this means is that the target Business Service is called using the pipeline thread but the response is also handled by the pipeline thread.  This implies that the pipeline thread blocks waiting for a response.  It is the handling of this response that behaves in an unexpected way.

When a Business Service is called via a Service Callout, the calling thread is suspended after sending the request, but unlike the Route Node case the thread is not released, it waits for the response.  The muxer uses the Business Service Work Manager to allocate a thread to process the response, but in this case processing the response means getting the response and notifying the blocked pipeline thread that the response is available.  The original pipeline thread can then continue to process the response.

Second Wrinkle

This leads to an unfortunate wrinkle.  If the Business Service is using the same Work Manager as the Pipeline then it is possible for starvation or a deadlock to occur.  The scenario is as follows:

  • Pipeline makes a Callout and the thread is suspended but still allocated
  • Multiple Pipeline instances using the same Work Manager are in this state (common for a system under load)
  • Response comes back but all Work Manager threads are allocated to blocked pipelines.
  • Response cannot be processed and so pipeline threads never unblock – deadlock!

Solution

The solution to this is to make sure that any Business Services used by a Callout in a pipeline use a different Work Manager to the pipeline itself.

The Solution to My Problem

Looking back at my original workflow we see that the same Business Service is called twice, once in a Routing Node and once in a Response Pipeline Callout.  This was what was causing my problem because the response pipeline was using the Business Service Work Manager, but the Service Callout wanted to use the same Work Manager to handle the responses and so eventually my Response Pipeline hogged all the available threads so no responses could be processed.

The solution was to create a second Business Service pointing to the same location as the original Business Service, the only difference was to assign a different Work Manager to this Business Service.  This ensured that when the Service Callout completed there were always threads available to process the response because the response processing from the Service Callout had its own dedicated Work Manager.

Summary

  • Request Pipeline
    • Executes on Proxy Work Manager (WM) Thread so limited by setting of that WM.  If no WM specified then uses WLS default WM.
  • Route Node
    • Request sent using Proxy WM Thread
    • Proxy WM Thread is released before getting response
    • Muxer is used to handle response
    • Muxer hands off response to Business Service (BS) WM
  • Response Pipeline
    • Executes on Routed Business Service WM Thread so limited by setting of that WM.  If no WM specified then uses WLS default WM.
  • No Route Node (Echo functionality)
    • Proxy WM thread released
    • New thread from the default WM used for response pipeline
  • Service Callout
    • Request sent using proxy pipeline thread
    • Proxy thread is suspended (not released) until the response comes back
    • Notification of response handled by BS WM thread so limited by setting of that WM.  If no WM specified then uses WLS default WM.
      • Note this is a very short lived use of the thread
    • After notification by callout BS WM thread that thread is released and execution continues on the original pipeline thread.
  • Route/Callout to Proxy Service
    • Request Pipeline of callee executes on requestor thread
    • Response Pipeline of caller executes on response thread of requested proxy
  • Throttling
    • Request message may be queued if limit reached.
    • Requesting thread is released (route node) or suspended (callout)

So what this means is that you may get deadlocks caused by thread starvation if you use the same thread pool for the business service in a route node and the business service in a callout from the response pipeline because the callout will need a notification thread from the same thread pool as the response pipeline.  This was the problem we were having.
You get a similar problem if you use the same work manager for the proxy request pipeline and a business service callout from that request pipeline.
It also means you may want to have different work managers for the proxy and business service in the route node.
Basically you need to think carefully about how threading impacts your proxy services.

References

Thanks to Jay Kasi, Gerald Nunn and Deb Ayers for helping to explain this to me.  Any errors are my own and not theirs.  Also thanks to my colleagues Milind Pandit and Prasad Bopardikar who travelled this road with me.

Wednesday Dec 28, 2011

Too Much Debug

Too Much Debug

Remains of a Roast Turkey

Well it is Christmas and as is traditional, in England at least, we had roast turkey dinner.  And of course no matter how big your family, turkeys come in only two sizes; massively too big or enormously too big!  So by the third day of Christmas you are ready never to eat turkey again until thanksgiving.  Your trousers no longer fit around the waist, your sweater is snug around the midriff, and your children start talking about the return of the Blob.

And my point?  Well just like the food world, sometimes in the SOA world too much of a good thing is bad for you.  I had just extended my BPM domain with OSB only to discover that I could no longer start the BAM server, or the newly configured OSB server.  The error message I was getting was:

starting weblogic with Java version:
FATAL ERROR in native method: JDWP No transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)
ERROR: transport error 202: bind failed: Address already in use
ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510)
JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [../../../src/share/back/debugInit.c:690]
Starting WLS with line:
C:\app\oracle\product\FMW\JDK160~2\bin\java -client -Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,address=8453,server=y,suspend=n

The mention of JDWP points to a problem with debug settings and sure enough in a development domain the setDomainEnv script is set up to enable debugging of OSB.  The problem is that the settings apply to all servers started with settings in the setDomainEnv script and should really only apply to the OSB servers.  There is a blog entry by Jiji Sasidharan that explains this and provides a fix.  However the fix disables the debug flag for all servers, not just the non-OSB servers.  So I offer my extension to this fix which modifies the setDomainEnv script as follows from:

set debugFlag=true

to:

rem Added so that only OSB server starts in debug mode
if "%SERVER_NAME%"=="osb_server1" (
    set debugFlag=true
)

This enables debugging to occur on managed server osb_server1 (this should match the name of one of your OSB servers to enable debugging).  It does not enable the debug flag for any other server, including other OSB servers in a cluster.  After making this change it may be necessary to restart the Admin Server because it is probably bound to the debug port.

So the moral of this tale is don’t eat too much turkey, don’t abuse the debug flag, but make sure you can get the benefits of debugging.

Have a great new year!

Thursday Sep 22, 2011

Coping with Failure

Handling Endpoint Failure in OSB

HardwareFailureRecently I was working on a POC and we had demonstrated stellar performance with OSB fronting a BPEL composite calling back end EJBs.  The final test was a failover test which tested killing an OSB and bringing it back online and then killing a SOA(BPEL) server and bringing it back online and finally killing a backend EJB server and bringing it back online.  All was going well until the BPEL failover test when for some reason OSB refused to mark the BPEL server as down.  Turns out we had forgotten to set a very important setting and so this entry outlines how to handle endpoint failure in OSB.

Step 1 – Add Multiple End Points to Business Service

The first thing to do is create multiple end points for the business service, pointing to all available backends.  This is required for HTTP/SOAP bindings.  In theory if using a T3 protocol then a single cluster address is sufficient and load balancing will be taken care of by T3 smart proxies.  In this scenario though we will focus on HTTP/SOAP endpoints.

Navigate to the Business Service->Configuration Details->Transport Configuration and add all your endpoint URIs.  Make sure that Retry Count is greater than 0 if you don’t want to pass failures back to the client.  In the example below I have set up links to three back end webs service instances.  Go to Last and Save the changes.

MultiOSBEndpoint

Step 2 – Enable Offlining & Recovery of Endpoint URIs

When a back end service instance fails we want to take it offline, meaning we want to remove it from the pool of instances to which OSB will route requests.  We do this by navigating to the Business Service->Operational Settings and selecting the Enable check box for Offline Endpoint URIs in the General Configuration section.  This causes OSB to stop routing requests to a backend that returns errors (if the transport setting Retry Application Errors is set) or fails to respond at all.

Offlining the service is good because we won’t send any more requests to a broken endpoint, but we also want to add the endpoint again when it becomes available.  We do this by setting the Enable with Retry Interval in General Configuration to some non-zero value, such as 30 seconds.  Then every 30 seconds OSB will add the failed service endpoint back into the list of endpoints.  If the endpoint is still not ready to accept requests then it will error again and be removed again from the list.  In the example below I have set up a 30 second retry interval.  Remember to hit update and then commit all the session changes.

OfflineOSBEndpoint

Considerations on Retry Count

A couple of things to be aware of on retry count.

If you set retry count to greater than zero then endpoint failures will be transparent to OSB clients, other than the additional delay they experience.  However if the request is mutative (changes the backend) then there is no guarantee that the request might not have been executed but the endpoint failed before turning the result, in which case you submit the mutative operation twice.  If your back end service can’t cope with this then don’t set retries.

If your back-end service can’t cope with retries then you can still get the benefit of transparent retries for non-mutative operations by creating two business services, one with retry enabled that handles non-mutative requests, and the other with retry set to zero that handles mutative requests.

Considerations on Retry Interval for Offline Endpoints

If you set the retry interval to too small a value then it is very likely that your failed endpoint will not have recovered and so you will waste time on a request failing to contact that endpoint before failing over to a new endpoint, this will increase the client response time.  Work out what would be a typical unplanned outage time for a node (such as caused by a JVM failure and subsequent restart) and set the retry interval to be say half of this as a comprise between causing additional client response time delays and adding the endpoint back into the mix as soon as possible.

Conclusion

Always remember to set the Operational Setting to Enable Offlining and then you won’t be surprised in a fail over test!

About

Musings on Fusion Middleware and SOA Picture of Antony Antony works with customers across the US and Canada in implementing SOA and other Fusion Middleware solutions. Antony is the co-author of the SOA Suite 11g Developers Cookbook, the SOA Suite 11g Developers Guide and the SOA Suite Developers Guide.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today