« May 2008 | Main | July 2008 »

June 2008 Archives

June 11, 2008

More on Batch Processing in BPEL

More on Batch Processing in BPEL

I was asked a follow up question recently on my entry about batch processing in BPEL.  The individual had implemented a BPEL process similar to the one shown below.

A Use Case

The process needs to refresh the corporate phone book.  It does this by reading a file with the new phone details, deleting the old phone details from a database and then inserting the new phone details into the database.
A file is read into the process via ReceivePhoneBook activity.  The ClearPhoneBook activity uses custom SQL to truncate the phonebook table.  Finally after transforming the file format to database format the new phone details are inserted into the table using the InsertIntoPhoneBookActivity to insert multiple records into the table in a single call.

A Fly in the Ointment

This process works fine as long as all the records are received in a single message.  If they are received in multiple messages then not all the records will be written as multiple processes will receive part of the input file, but all the processes will truncate the table, resulting in the possibility of some records being inserted by one process and deleted by another process that starts a little later.  This is illustrated in the diagram below which shows three processes each processing a portion of the file.

Note that records 1-10 are written to the database at the same time as the table is truncated by the process receiving records 11-20, leaving the possibility that the write will complete only to be overridden by the truncate.  If records 1-10 are not overwritten by the second process they will definitely be overwritten by the third process which only starts truncating the table after records 1-10 have been inserted into the table.

A Solution

What we need to do is to separate the deletion of the existing data from the insertion of the new data.  We can do this by moving the truncation of the table (deleting the existing data) into a separate BPEL process.  Such a process is shown below.

For this to work we need to know when a new file starts to be loaded so that this process can be invoked before any record processing is performed.  Now there are lots of complicated ways that we could do this with singleton processes to maintain state and complex logic to make sure it all works.  Or we could use a newish feature of the BPEL process manager (introduced in 10.1.3.3 I believe) to get it to invoke our process.

Batch Manager

In the previous discussion I ignored the partner link that initiates the record deletion process.  This partner link implements the Batch Manager interface as specified in $ORACLE_HOME/bpel/system/xmllib/jca/BatchManager.wsdl.  To create the above process first create a new empty BPEL process and then in the services stream right click and select new partner link.  Click on the icon to browse for files from the local file system and select the BatchManager.wsdl file and choose to implement the BatchManagerInterfaceRole (i.e. choose this as "My Role" ).  This is the interface used by BPEL Process Manager to notify a process that a file has started to be read.  There are several methods available to receive different notifications
  • onBatchReadStart - tells when a file is started to be read
  • onBatchReadComplete - tells when a file has finished being read
  • onBatchReadFailure - tells when a record or records cannot be processed by the adapter framework
For this scenario we are interested in onBatchReadStart.  Receiving this notification we know to truncate the table ready to receive the records.

Who to Tell?

How does the BPEL Process Manager know to call the notification process?  The answer is that it is configured as an activation agent property in the bpel.xml file of the process receiving the records from the file adapter.  To set up the notification it is necessary to add the following property tag to the bpel.xml at XPath location BPELSuitcase/BPELProcess/activationAgents/activationAgent :
  • <property name="batchNotificationHandler">bpel://default|FileNotificationProcess</property>
Note that default is the name of your domain and FileNotificationProcess is the name of your process.  Adding this property will cause the BatchManager interface on the given process to be invoked when a file is read.

Final Steps

With notification configured we now need to modify our BPEL process to not truncate the table because this is being done via a separate process.  We also need to add a delay to the process to avoid race conditions that could cause records to be inserted into the database before the database has been truncated.  The modified process is shown below, complete with a 30 second delay to avoid problems with multiple processes being invoked at the same time.

A Worked Example

I have created a sample to let you explore how all this works.  To set it up do the following.
  1. Download the project files in FileManipulation.zip.
  2. Unzip FileManipulation.zip - this will create a FileManipulation directory with 3 sub-directories and a JDev 10.1.3.3 workspace
  3. Open the workspace FileManipulation.jws in JDev 10.1.3.3
  4. Create the following directories or modify the file adapter partner links in BigBatchProcess1.0 and BigBatchProcess1.1
    • C:FilesInbound
    • C:FilesOutbound
  5. Create a test user in the database by running the script in FileManpulation/BigBatchProcess1.0/src/CreateUser.sql as a system user
  6. Create a database connection in JDev called TestDS to connect to user Test (password test) in the database.  You may need to rerun the adapter wizard if you are not running XE database on port 1521
  7. As user test run the script in FileManpulation/BigBatchProcess1.0/database/CreateTable.sql to create the phonebook table.
  8. Deploy the BigBatchProcess1.0 to the BPEL process manager.
  9. Test it by copying file FileManpulation/BigBatchProcess1.0/src/PhoneBook1.csv to C:FilesInbound.
  10. Verify the number of records stored by executing command in database "select count(*) from phonebook.  There are 1000 records in the source file, you will probably receive a count less than 1000 due to race conditions around truncating the table and inserting records into it.  This is the problem we are trying to avoid.
  11. Deploy the BigBatchProcess1.1 to the BPEL process manager as version 1.1.
  12. Deploy the FileNotificationProcess1.0 to the BPEL process manager.
  13. Test it by copying file FileManpulation/BigBatchProcess1.0/src/PhoneBook1.csv to C:FilesInbound.
  14. Verify the number of records stored by executing command in database "select count(*) from phonebook.  There are 1000 records in the source file, you should now receive a count of 1000 in the database table, indicating thqat we have solved the problem.
  15. Pat yourself on the back and have a nice drink.
Hope that some of you find the above useful.

June 13, 2008

Throttling Files

Throttling Files

I'm sure everyone has been tempted to grab a file by the throat and squeeze it until it departs for that great filing cabinet in the sky, but that is something to discuss quietly with your psychiatrist.  In this entry I would like to investigate how to limit the concurrency of file handling procedures.

A Concurrency Problem

By default the interaction with the file adapter is a one way interaction.  The file adapter posts a message into the queue for a process and then carries on about its business.  This is good because it allows multiple files, or multiple batches from a single file to be processed concurrently.  However if there are a large number of concurrent processes started then this can cause performance problems as the process manager starts to worry about which process to execute when.  One sympton of this is undelivered messages visible on the BPEL console.

This is bad because it means we are feeding BPEL process manager faster than it can consume, and so it gags a little and throughput is reduced as it gags.

Stopping Force Feeding

So how do we tell the file adapter that BPELs mouth is full and please wait until I have eaten this bit.  Well the answer is in the observation that interactions with the file adapter are usually one way.  They do not need to be!  It is possible to modify the WSDL generated by the file adapter wizard to support a two way interaction.  This causes the file adapter thread to wait until the reply before sending another message.  For the reasons why changing from a one-way to a two-way interaction causes this behaviour look at my earlier entry on BPEL threading.  The answer is left as an exercise to the reader.

Changing the File Adapter WSDL

To change the one-way interaction to a two-way interaction we do the following in the generated WSDL file:
  • Make sure that the definitions root element has a namespace reference to XML schema by adding the following
    • xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  • Add a dummy message that will be used in a reply.
    • <message name="Dummy_msg">
          <part name="PhoneRecords" type="xsd:string"/>
      </message>
  • Add a reply to the read operation
    • <output message="tns:Dummy_msg"/>
  • Add a reply to the binding
    • <output/>
This gives a file like the one below
<definitions
     name="ReadPhoneBookFileSvc"
     targetNamespace="http://xmlns.oracle.com/pcbpel/adapter/file/ReadPhoneBookFileSvc/"
     ...
     xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    >
     ...
    <message name="PhoneRecords_msg">
        <part name="PhoneRecords" element="imp1:PhoneRecords"/>
    </message>
    <message name="Dummy_msg">
        <part name="PhoneRecords" type="xsd:string"/>
    </message>
    <portType name="Read_ptt">
        <operation name="Read">
            <input message="tns:PhoneRecords_msg"/>
            <output message="tns:Dummy_msg"/>
        </operation>
    </portType>
    <binding name="Read_binding" type="tns:Read_ptt">
    <pc:inbound_binding  />
        <operation name="Read">
      <jca:operation
          ...
          OpaqueSchema="false" >
      </jca:operation>
      <input>
        <jca:header message="hdr:InboundHeader_msg" part="inboundHeader"/>
      </input>
      <output/>
        </operation>
    </binding>
     ...
</definitions>

Modifying the BPEL Process

Having modified the WSDL we must modify the BPEL to now do a reply.  The reply will release the file handling thread in the adapter to do further work.  The modified process is available as project BigBatchProcess1.2 in the FileManipulation.zip file.  Details of using this process can be found in an earlier entry on batch processing of files.  Note that there is an additional test file included in the src directory - phonebook2.csv - that has 5000 records in it to give a larger test case.

Other Scenarios

The above scenario is not the only one where we may wish to use a two-way interaction with the file adapter.  The following additional scenarios may also occur.

Reading a File Sequentially

We may need to process a file in strict record order yet still want to have it batched because it is a large file.  In this case the problem is that we may have multiple batches submitted concurrently and there is no guarantee of the order that they will be processed in.  In this case just making the interaction two-way is not enough because there may be more than one file processing thread.

Controlling the Number of File Processing Threads

The number of threads used to process files in the file adapter is controlled by a setting in the file $ORACLE_HOME/bpel/system/services/config/pc.properties.  To alter the number of threads change the following property:
  • oracle.tip.adapter.file.numProcessorThreads=1
Note that this is a global setting and so may impact other file activation agents that have been configured in the system.  The net result of this is to limit file processing to a single thread.  Hence by using a single thread and a two-way interaction we can single thread the file processing and so guarantee processing of records in order in a file.

Processing Files in Date Order

Another scenario where these techniques are applied is when there is a requirement to process files in order of last modification date, for example when later files may contain reversing transactions for earlier files.  This can currently only be achieved in 10.1.3.1 via a patch but it should also be available at some point in 10.1.3.3 and later releases.

Final Thoughts

The use of two-way interactions and limiting threads processing files is very powerful but it needs to be considered in the context of other file interactions in the system.  One way to limit the impact is to use singleton processes to receive the processing requests, enabling messages to be queued in order, releasing the file processing threads for other processes to use whilst still ensuring correct ordering in the required processes.
Hope this helped.

About June 2008

This page contains all entries posted to Antony Reynolds' Blog in June 2008. They are listed from oldest to newest.

May 2008 is the previous archive.

July 2008 is the next archive.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type and Oracle