« May 22, 2007 | Main | June 6, 2007 »

June 4, 2007 Archives

June 4, 2007

Batch Processing with BPEL

Batch Processing with BPEL

Last week I had a question from a consultant about how to efficiently process a large number of records in a file within BPEL.  He needed to take a large file and insert all the records in it into a database.  Lets look at the challenges and gotchas along the way.

Reading the File

The first task is to create a new BPEL project within JDeveloper.  I decided to create a new BPEL project with an empty BPEL process.  This was because I wanted to control the partner links.  I needed control of the partner links because I was not going to use SOAP over HTTP bindings, which are generated for a default synch or asynch process.  Rather I wanted to use file and database adapters for the bindings.  This gave me the blank process as shown below.

I added a partner link to this blank process and then used the file adapter wizard to specify how I wanted it to behave.  To begin with I specified that I wanted to read from a file.

Having specified the read operation I then told the adapter where to find the input files.  I used a physical path, but better practice is to use a logical name which can be altered via the console, making it easier to move the process between environments.  in addition to the directory I also specified that I wanted the file to be copied to an archive directory after processing.  Finally I specified that I wanted the file deleted from its original location after processing.  These choices are reflected in the screen below.

I now needed to provide some more information about the input files.  I needed to provide a pattern to identify which files I wanted to process, I chose a simple wildcard expression rather than a regular expression.  Note that the file adapter uses the term message rather than record, this doesn't entirely make sense as a message may contain many records (as determined by the batch size) - nevertheless I will use message to be consistent with the screens in the adapter wizard even though this confuses records and messages.  It is at this point in the process that the number of messages records in the file is identified (1 or many) and also the batching of messages is determined.  To start with I set the number of messages in a batch to be 1.  This means that for every message (record) in the file a new BPEL process will be created.  More on this later.

Having specified batch sizes and file matching expressions I now need to set the frequency of checking the directory for new files (the Polling Frequency) and also the minimum file age.  Minimum file age means that a file must be at least this old before it can be processed.  This allows for files that take a long time to create, the minimum file age can be set to a large value to enable the source system to finish creating the file.

Finally I needed to specify the format of the records/messages in the file.  To do this I used the "Define Schema for Native Format" wizard.

I said that I wanted to create a new native format, in my case the file was made up of fixed length fields, but it could have been in delimited format such as csv (comma seperated values).

I then provided a sample input file that I could use to identify the field boundaries.  I then chose which parts of the file I wanted to read in as a sample. 

My file had more than one record type, in fact it had a header, an arbitary number of records and a trailer.  With this in mind I chose the appropriate type of file organisation.

Before specifying the individual fields I needed to chose a name space and a root element name.

I could now identify the discriminator for each record type - in this case it was the first character of the record.  The wizard identified the three record types and I changed the record type names to sensible names that meant something to me.

I now used the wizard to identify the individual record field boundaries.  Each type of record needs specifying seperately.  I found it best to do this with a mouse as I had some problems when entering numbers directly into the list.

Having identified field boundaries I then needed to identify the type of the fields.  Again this has to be done for each record type.

Finally we can review the generated XSchema and select a filename to save it in.

We have now defined our file format and mapped it onto an XML record structure.  We can now return to the file adapter wizard.

The file adapter is now set up to read a single record at a time from the file.
I could now add a Receive to the process and use the new created FileInputService partner link.  When creating the receive I marked the "Create Instance" check box as I needed to create a new process when a message was received.

Writing to the Database

Having created a partner link and a receive activty to read the input file I now needed to create a partner link to write it to the database.  To do this I used the database adapter and then created a simple assign statement to set up the call to the database.  The resultant process shown below was then deployed.

Running the Process

After deploying the process to the BPEL Process Manager server the next step is to run it.  To do this I dropped a sample file into the specified input directory.  The file adapter detected the file, read it, and created a process for each message in the file.  This resulted in a large number of processes being created.  The process had two major problems
  • Two processes had errors because they had no data record (the header and trailer records were passed to these processes)
  • A single file had created more than 200 processes which made it difficult to see what was happening
An obvious solution to these problems would be to batch multiple records into a single process.

Batching the Data

In order to receive multiple records into a single process I went back to the File Adapter wizard and altered the "Publish Messages in Batches of " field to have a larger number than one, I chose fifty.

If I had deployed and tested the process without any other changes I would have found only one record being processed in each batch.  This is because I have yet to modify the rest of the process to iterate over multiple records.
The first step is to figure out how many records I need to process.  To calculate this I created a new integer variable "NumberOfRecords" to hold the number of records and initialised it with an assign statement as shown below. 
The assignment calculates the number of Data elements underneath the document root element Records.
I then created another integer variable "CurrentRecord", initialised to 0, to act as a for loop counter.  I then created a while statement to loop over all the Data records in the message.  The while condition was "CurrentRecord<NumberOfRecords" as shown below

Within the while loop I then created an assignment statement to set up the write to the database.  Unfortunately it is not possible to use the wizard to completely set up the copy operation.  I followed this process
  • Create assignment as if there were a single record rather than multiple. This gave me the following xpath expression
    • /ns3:Records/ns3:Data
  • Manually edit the the assignment statement to add an array index to identify the specific record.  The expression now looked like this.
    • /ns3:Records/ns3:Data[bpws:getVariableData('CurrentRecord')]
When incrementing the CurrentRecord variable it is important to remember that, unlike Java and C/C++, XPath indices start at 1, not 0, so the CurrentRecord value must be incremented before using it as an index into the record set.
After deploying this process a 200 record file would now produce just 4 processes, making it much more manageable.

Reasons for Batching

So why would you want to batch messages?
  • Easier to track what is happening as fewer processes to manage.
  • Slightly more efficient within the BPEL PM - some testing I did indicated it was a little more efficient to batch, presumably because of the reduced process creation overhead.
How do I process a file with multiple records as a single process?
  • Treat it as a single message by not checking the "Files contain Multiple Messages" check box in step 5 of the File Adapter Configuration wizard.
  • Individual records in the message are indexed in the same way as multiple records in the multiple records example above.
Why would you not batch?
  • Really want each individual record to initiate a new process that is to be tracked individually, for example equipment provisioning requests, each of which is part of a seperate customer order.
  • Want to simplify process by only handling a single record.
  • Too idle to deal with looping over sets of records in a message (oops, that was the same as the last one!)

Key Points to Remember

  • Use XPath to index into multiple records
    • Use wizard to create copy rules then add indexing afterwards
  • Use a while loop to iterate over records
  • XPath indices start at 1
So happy batching!

About June 2007

This page contains all entries posted to Antony Reynolds' Blog in June 2007. They are listed from oldest to newest.

May 22, 2007 is the previous archive.

June 6, 2007 is the next archive.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type and Oracle