Wednesday Jan 02, 2013

Batch Applications in Java EE 7 - Undertanding JSR 352 Concepts: TOTD #192


Batch processing is execution of series of "jobs" that is suitable for non-interactive, bulk-oriented and long-running tasks. Typical examples are end-of-month bank statement generation, end-of-day jobs such as interest calculation, and ETL (extract-transform-load) in a data warehouse. These tasks are typically data or computationally intensive, execute sequentially or in parallel, and may be initiated through various invocation models, including ad-hoc, scheduled, and on-demand.

JSR 352 will define a programming model for batch applications and a runtime for scheduling and executing jobs. This blog will explain the main concepts in JSR 352.

The diagram below highlight the key concepts of a batch processing architecture.



  • A Job is an instance that encapsulates an entire batch process. A job is typically put together using a Job Specification Language and consists of multiple steps. The Job Specification Language for JSR 352 is implemented with XML and is referred as "Job XML".
  • A Step is a domain object that encapsulates an independent, sequential phase of a job. A step contains all of the information necessary to define and control the actual batch processing.
  • JobOperator provides an interface to manage all aspects of job processing, including operational commands, such as start, restart, and stop, as well as job repository commands, such as retrieval of job and step executions.
  • JobRepository holds information about jobs current running and jobs that run in the past. JobOperator provides access to this repository.
  • Reader-Processor-Writer pattern is the primary pattern and is called as Chunk-oriented processing. In this, ItemReader reads one item at a time, ItemProcessor processes the item based upon the business logic, such as calculate account balance and hands it to ItemWriter for aggregation. Once the 'chunk' number of items are aggregated, they are written out, and the transaction is committed.

    JSR 352 also defines roll-your-own batch pattern, called as Batchlet. This batch pattern is invoked once, runs to completion, and returns an exit status. This pattern must implement and honor a "cancel" callback to enable operational termination of the Batchlet.
A Job XML for a chunk-oriented processing may look like:

<job id="myJob" xmlns="http://batch.jsr352/jsl">
<step id="myStep" >
<chunk reader="MyItemReader"
writer="MyItemWriter"
processor="MyItemProcessor"
buffer-size="5"
checkpoint-policy="item"
commit-interval="10" />
</step>
</job>

  • The <job> has an "id" attribute that defines the logical name of the job and is used for identification purposes.
  • Each <job> can multiple <step>s where each <step> identifies a job step and it's characteristics. Each <step> has an "id" attribute that defines the logical name of the job and is used for identification purposes.
  • A <step> may have <chunk> or <batchlet> element, this <step> has a <chunk>. A <chunk> identifies a chunk type step and implements the reader-processor-writer pattern of batch.
  • The "reader", "processor", and "writer" attributes specify the class names of an item reader, processor, and writer respectively.
  • "buffer-size" specifies number of items to read and buffer before writing. When enough items have been read to fill the buffer, the buffer is emptied to a list and the configured ItemWriter is invoked with the list of items.
  • "checkpoint-policy" attribute specifies the checkpoint policy that governs commit behavior for this chunk. Valid values are "item", "time" and "custom". The "item" policy means chunk is checkpointed after a specified number of items are processed. The "time" policy means the chunk is committed after a specified amount of time. The "custom" policy means the chunk is checkpointed according to a checkpoint algorithm implementation. The default policy is "item".
  • "commit-interval" specifies the commit interval for the specified checkpointed policy. The unit meaning of the commit-interval specifies depends on the specified checkpoint policy. For "item" policy, commit-interval specifies a number of items. For "time" policy, commit- interval specifies a number of seconds. The commit-interval attribute is ignored for "custom" policy.

    When the configured checkpoint policy directs it is time to checkpoint, all the items read and processed so far are passed to the "writer".

Here is a simple reader:

@ItemReader
public class MyItemReader {
private static int id;
MyCheckPoint checkpoint = null;

@Open
void open(MyCheckPoint checkpoint) {
this.checkpoint = checkpoint;
System.out.println(getClass().getName() + ".open: " + checkpoint.getItemCount());
}

@ReadItem
MyBatchRecord read() {
checkpoint.incrementByOne();
return new MyBatchRecord(++id);
}

@CheckpointInfo
MyCheckPoint getCheckPoint() {
return checkpoint;
}
}

Methods marked with @Open, @ReadItem, and @CheckpointInfo are required.

Here is a simple processor that rejects every other item:

@ItemProcessor
public class MyItemProcessor {
@ProcessItem
MyBatchRecord process(MyBatchRecord record) {
return (record.getId() % 2 == 0) ? record : null;
}
}

And here is a simple writer:

@ItemWriter
public class MyItemWriter {
MyCheckPoint checkpoint = null;

@Open
void open(MyCheckPoint checkpoint) {
this.checkpoint = checkpoint;
System.out.println(getClass().getName() + ".open: " + checkpoint.getItemCount());
}

@WriteItems
void write(List<MyBatchRecord> list) {
System.out.println("Writing the chunk...");
for (MyBatchRecord record : list) {
System.out.println(record.getId());
}
checkpoint.increment(list.size());
System.out.println("... done.");
}

@CheckpointInfo
MyCheckPoint getCheckPoint() {
return checkpoint;
}
}

Finally a simple implementation of MyCheckpoint:

public class MyCheckPoint {
int itemCount;

public int getItemCount() {
return itemCount;
}

public void setItemCount(int itemCount) {
this.itemCount = itemCount;
}

void incrementByOne() {
itemCount++;
}

void increment(int size) {
itemCount += size;
}
}

Together, MyItemReader, MyItemWriter, MyItemProcessor, MyCheckPoint, and batch.xml, will read/process/write 5 items and commit when 10 such items have been processed.

JSR 352 specification defines several other concepts such as how Job XML can define sequencing of jobs, listeners to interpose on job execution, transaction management, and running jobs in partitioned and concurrent modes. Subsequent blog will explain some of those concepts.

A complete replay of Java Batch for Cost-Optimized Business Efficiency from JavaOne 2012 can be seen here (click on CON4105_mp4_4105_001 in Media).

Each feature will be added to the JSR subject to EG approval. You can share your feedback to public@jbatch.java.net.

The APIs and implementation of JSR 352 are not integrated in GlassFish 4 promoted builds yet.

Here are some more references for you:

Here are some other Java EE 7 primers published so far:


And of course, more on their way! Do you want to see any particular one first ?

About

profile image
Arun Gupta is a technology enthusiast, a passionate runner, author, and a community guy who works for Oracle Corp.


Java EE 7 Samples

Stay Connected

Search

Archives
« January 2013 »
SunMonTueWedThuFriSat
  
1
3
4
5
6
8
9
10
12
13
14
15
16
17
19
20
21
23
24
25
27
31
  
       
Today