Batch Processing with BPEL

Batch Processing with BPEL

Last week I had a
question from a consultant about how to efficiently process a large
number of records in a file within BPEL.  He needed to take a large
file and insert all the records in it into a database.  Lets look at
the challenges and gotchas along the way.

Reading
the File

The first task is to create a new BPEL project
within JDeveloper.  I decided to create a new BPEL project with an
empty BPEL process.  This was because I wanted to control the partner
links.  I needed control of the partner links because I was not going
to use SOAP over HTTP bindings, which are generated for a default synch
or asynch process.  Rather I wanted to use file and database adapters
for the bindings.  This gave me the blank process as shown
below.

I
added a partner link to this blank process and then used the file
adapter wizard to specify how I wanted it to behave.  To begin with I
specified that I wanted to read from a file.

Having
specified the read operation I then told the adapter where to find the
input files.  I used a physical path, but better practice is to use a
logical name which can be altered via the console, making it easier to
move the process between environments.  in addition to the directory I
also specified that I wanted the file to be copied to an archive
directory after processing.  Finally I specified that I wanted the file
deleted from its original location after processing.  These choices are
reflected in the screen below.

I
now needed to provide some more information about the input files.  I
needed to provide a pattern to identify which files I wanted to
process, I chose a simple wildcard expression rather than a regular
expression.  Note that the file adapter uses the term message rather
than record, this doesn't entirely make sense as a message may contain
many records (as determined by the batch size) - nevertheless I will
use message to be consistent with the screens in the adapter wizard
even though this confuses records and messages.  It is at this point in
the process that the number of messages records in the file is
identified (1 or many) and also the batching of messages is
determined.  To start with I set the number of messages in a batch to
be 1.  This means that for every message (record) in the file a new
BPEL process will be created.  More on this later.

Having
specified batch sizes and file matching expressions I now need to set
the frequency of checking the directory for new files (the Polling
Frequency) and also the minimum file age.  Minimum file age means that
a file must be at least this old before it can be processed.  This
allows for files that take a long time to create, the minimum file age
can be set to a large value to enable the source system to finish
creating the file.

Finally
I needed to specify the format of the records/messages in the file.  To
do this I used the "Define Schema for Native Format"
wizard.

I
said that I wanted to create a new native format, in my case the file
was made up of fixed length fields, but it could have been in delimited
format such as csv (comma seperated values).

I
then provided a sample input file that I could use to identify the
field boundaries.  I then chose which parts of the file I wanted to
read in as a sample. 

My
file had more than one record type, in fact it had a header, an
arbitary number of records and a trailer.  With this in mind I chose
the appropriate type of file organisation.

Before
specifying the individual fields I needed to chose a name space and a
root element name.

I
could now identify the discriminator for each record type - in this
case it was the first character of the record.  The wizard identified
the three record types and I changed the record type names to sensible
names that meant something to me.

I
now used the wizard to identify the individual record field
boundaries.  Each type of record needs specifying seperately.  I found
it best to do this with a mouse as I had some problems when entering
numbers directly into the list.

Having
identified field boundaries I then needed to identify the type of the
fields.  Again this has to be done for each record
type.

Finally
we can review the generated XSchema and select a filename to save it
in.

We
have now defined our file format and mapped it onto an XML record
structure.  We can now return to the file adapter
wizard.

The
file adapter is now set up to read a single record at a time from the
file.
I could now add a Receive to the process and use the new
created FileInputService partner link.  When creating the receive I
marked the "Create Instance" check box as I needed to create a new
process
when a message was received.

Writing
to the
Database

Having created a partner link and a receive activty
to read the input file I now needed to create a partner link to write
it to the database.  To do this I used the database adapter and then
created a simple assign statement to set up the call to the database. 
The resultant process shown below was then
deployed.

Running
the Process

After
deploying the process to the BPEL Process Manager server the next step
is to run it.  To do this I dropped a sample file into the specified
input directory.  The file adapter detected the file, read it, and
created a process for each message in the file.  This resulted in a
large number of processes being created.  The process had two major
problems
  • Two processes had errors because
    they had no data record (the header and trailer records were passed to
    these processes)
  • A single file had created more
    than 200 processes which made it difficult to see what was
    happening
An obvious solution to these
problems would be to batch multiple records into a single
process.

Batching the Data

In order to
receive multiple records into a single process I went back to the File
Adapter wizard and altered the "Publish Messages in Batches of " field
to have a larger number than one, I chose fifty.

If I had deployed and
tested the process without any other changes I would have found only
one record being processed in each batch.  This is because I have yet
to modify the rest of the process to iterate over multiple
records.
The first step is to figure out how many records I
need to process.  To calculate this I created a new integer variable
"NumberOfRecords" to hold the number of records and initialised it with
an assign statement as shown below. 

The assignment calculates the number of Data elements underneath the
document root element Records.
I then created another integer
variable "CurrentRecord", initialised to 0, to act as a for loop
counter.  I then created a while statement to loop over all the Data
records in the message.  The while condition was
"CurrentRecord<NumberOfRecords" as shown below

Within
the while loop I then created an assignment statement to set up the
write to the database.  Unfortunately it is not possible to use the wizard to completely set up the copy operation.  I followed this process
  • Create assignment as if there were a single record rather than multiple. This gave me the following xpath expression
    • /ns3:Records/ns3:Data
  • Manually edit the the assignment statement to add an array index to identify the specific record.  The expression now looked like this.
    • /ns3:Records/ns3:Data[bpws:getVariableData('CurrentRecord')]
When incrementing the CurrentRecord variable it is important to remember that, unlike Java and C/C++, XPath indices start at 1, not 0, so the CurrentRecord value must be incremented before using it as an index into the record set.
After deploying this process a 200 record file would now produce just 4 processes, making it much more manageable.

Reasons for Batching

So why would you want to batch messages?
  • Easier to track what is happening as fewer processes to manage.
  • Slightly more efficient within the BPEL PM - some testing I did indicated it was a little more efficient to batch, presumably because of the reduced process creation overhead.
How do I process a file with multiple records as a single process?
  • Treat it as a single message by not checking the "Files contain Multiple Messages" check box in step 5 of the File Adapter Configuration wizard.
  • Individual records in the message are indexed in the same way as multiple records in the multiple records example above.
Why would you not batch?
  • Really want each individual record to initiate a new process that is to be tracked individually, for example equipment provisioning requests, each of which is part of a seperate customer order.
  • Want to simplify process by only handling a single record.
  • Too idle to deal with looping over sets of records in a message (oops, that was the same as the last one!)

Key Points to Remember

  • Use XPath to index into multiple records
    • Use wizard to create copy rules then add indexing afterwards
  • Use a while loop to iterate over records
  • XPath indices start at 1
So happy batching!

Comments:

Antony- unless I am mistaken, you could have also batched up the database adapter and eliminated the for loop. In other words, your batched approach still makes 1 database adapter call for each row. It would be more efficient (but a bit less individually trackable) to send a set of rows at once to the database adapter. You'll have to swap out the assign for a more flexible transform in order to merge the file adapter schema into the database adapter schema, but you'll get good sql performance by batching up database activity.

Posted by Dave on June 04, 2007 at 02:34 AM MDT #

Absolutely correct.  The only downside is that the process needs more memory to create the inbound message into the database adapter.  When I played around with benchmarks I was surprised to find that the overhead of memory management (on a machine eating a lot of memory) was actually worse than the overhead of multiple database calls.  The overhead is mitigated by the transaction boundaries in BPEL.  I think in general your approach should be better and I have actually used that approach.

Posted by Antony Reynolds on June 04, 2007 at 08:14 AM MDT #

We have some problems with the file adapter when the file contains special characters (like �). They appear as upside down questionmarks in the database or diamond shaped characters, in the BPEL process. Have you this problem too?

Posted by Jerry on June 04, 2007 at 10:41 PM MDT #

If i have the complete soa suite as my infrastructure should i use the esb to read the input-file with the file-adapter and trigger a bpel-process or whould you still prefer the all in bpel-solution?

Posted by Harald Reinmueller on June 11, 2007 at 11:58 PM MDT #

Hi Thanks for this great blog. I would like to know more about how you managed compatibility issues in the xmls schemas that were generated. for the write and the read operation. Further it would be very useful if you can give more information on compatibility issues and how to handle then when the xsd files are generated. How do i solve such issues . Will i have to use transformations. Thanks -Pradip

Posted by Pradip on July 10, 2008 at 05:46 AM MDT #

Pradip, Whenever you use generated schemas then you tend to have to map them onto something else. In this case I did it through assign statements, although if I was batching the database writes I would probably use an XSLT as it is easier to handle multiple records that way. Antony

Posted by Antony Reynolds on July 10, 2008 at 07:12 PM MDT #

Post a Comment:
Comments are closed for this entry.
About

Musings on Fusion Middleware and SOA Picture of Antony Antony works with customers across the US and Canada in implementing SOA and other Fusion Middleware solutions. Antony is the co-author of the SOA Suite 11g Developers Cookbook, the SOA Suite 11g Developers Guide and the SOA Suite Developers Guide.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today