Batch Applications in Java EE 7 - Undertanding JSR 352 Concepts: TOTD #192


Batch processing is execution of series of "jobs" that is suitable for non-interactive, bulk-oriented and long-running tasks. Typical examples are end-of-month bank statement generation, end-of-day jobs such as interest calculation, and ETL (extract-transform-load) in a data warehouse. These tasks are typically data or computationally intensive, execute sequentially or in parallel, and may be initiated through various invocation models, including ad-hoc, scheduled, and on-demand.

JSR 352 will define a programming model for batch applications and a runtime for scheduling and executing jobs. This blog will explain the main concepts in JSR 352.

The diagram below highlight the key concepts of a batch processing architecture.



  • A Job is an instance that encapsulates an entire batch process. A job is typically put together using a Job Specification Language and consists of multiple steps. The Job Specification Language for JSR 352 is implemented with XML and is referred as "Job XML".
  • A Step is a domain object that encapsulates an independent, sequential phase of a job. A step contains all of the information necessary to define and control the actual batch processing.
  • JobOperator provides an interface to manage all aspects of job processing, including operational commands, such as start, restart, and stop, as well as job repository commands, such as retrieval of job and step executions.
  • JobRepository holds information about jobs current running and jobs that run in the past. JobOperator provides access to this repository.
  • Reader-Processor-Writer pattern is the primary pattern and is called as Chunk-oriented processing. In this, ItemReader reads one item at a time, ItemProcessor processes the item based upon the business logic, such as calculate account balance and hands it to ItemWriter for aggregation. Once the 'chunk' number of items are aggregated, they are written out, and the transaction is committed.

    JSR 352 also defines roll-your-own batch pattern, called as Batchlet. This batch pattern is invoked once, runs to completion, and returns an exit status. This pattern must implement and honor a "cancel" callback to enable operational termination of the Batchlet.
A Job XML for a chunk-oriented processing may look like:

<job id="myJob" xmlns="http://batch.jsr352/jsl">
<step id="myStep" >
<chunk reader="MyItemReader"
writer="MyItemWriter"
processor="MyItemProcessor"
buffer-size="5"
checkpoint-policy="item"
commit-interval="10" />
</step>
</job>

  • The <job> has an "id" attribute that defines the logical name of the job and is used for identification purposes.
  • Each <job> can multiple <step>s where each <step> identifies a job step and it's characteristics. Each <step> has an "id" attribute that defines the logical name of the job and is used for identification purposes.
  • A <step> may have <chunk> or <batchlet> element, this <step> has a <chunk>. A <chunk> identifies a chunk type step and implements the reader-processor-writer pattern of batch.
  • The "reader", "processor", and "writer" attributes specify the class names of an item reader, processor, and writer respectively.
  • "buffer-size" specifies number of items to read and buffer before writing. When enough items have been read to fill the buffer, the buffer is emptied to a list and the configured ItemWriter is invoked with the list of items.
  • "checkpoint-policy" attribute specifies the checkpoint policy that governs commit behavior for this chunk. Valid values are "item", "time" and "custom". The "item" policy means chunk is checkpointed after a specified number of items are processed. The "time" policy means the chunk is committed after a specified amount of time. The "custom" policy means the chunk is checkpointed according to a checkpoint algorithm implementation. The default policy is "item".
  • "commit-interval" specifies the commit interval for the specified checkpointed policy. The unit meaning of the commit-interval specifies depends on the specified checkpoint policy. For "item" policy, commit-interval specifies a number of items. For "time" policy, commit- interval specifies a number of seconds. The commit-interval attribute is ignored for "custom" policy.

    When the configured checkpoint policy directs it is time to checkpoint, all the items read and processed so far are passed to the "writer".

Here is a simple reader:

@ItemReader
public class MyItemReader {
private static int id;
MyCheckPoint checkpoint = null;

@Open
void open(MyCheckPoint checkpoint) {
this.checkpoint = checkpoint;
System.out.println(getClass().getName() + ".open: " + checkpoint.getItemCount());
}

@ReadItem
MyBatchRecord read() {
checkpoint.incrementByOne();
return new MyBatchRecord(++id);
}

@CheckpointInfo
MyCheckPoint getCheckPoint() {
return checkpoint;
}
}

Methods marked with @Open, @ReadItem, and @CheckpointInfo are required.

Here is a simple processor that rejects every other item:

@ItemProcessor
public class MyItemProcessor {
@ProcessItem
MyBatchRecord process(MyBatchRecord record) {
return (record.getId() % 2 == 0) ? record : null;
}
}

And here is a simple writer:

@ItemWriter
public class MyItemWriter {
MyCheckPoint checkpoint = null;

@Open
void open(MyCheckPoint checkpoint) {
this.checkpoint = checkpoint;
System.out.println(getClass().getName() + ".open: " + checkpoint.getItemCount());
}

@WriteItems
void write(List<MyBatchRecord> list) {
System.out.println("Writing the chunk...");
for (MyBatchRecord record : list) {
System.out.println(record.getId());
}
checkpoint.increment(list.size());
System.out.println("... done.");
}

@CheckpointInfo
MyCheckPoint getCheckPoint() {
return checkpoint;
}
}

Finally a simple implementation of MyCheckpoint:

public class MyCheckPoint {
int itemCount;

public int getItemCount() {
return itemCount;
}

public void setItemCount(int itemCount) {
this.itemCount = itemCount;
}

void incrementByOne() {
itemCount++;
}

void increment(int size) {
itemCount += size;
}
}

Together, MyItemReader, MyItemWriter, MyItemProcessor, MyCheckPoint, and batch.xml, will read/process/write 5 items and commit when 10 such items have been processed.

JSR 352 specification defines several other concepts such as how Job XML can define sequencing of jobs, listeners to interpose on job execution, transaction management, and running jobs in partitioned and concurrent modes. Subsequent blog will explain some of those concepts.

A complete replay of Java Batch for Cost-Optimized Business Efficiency from JavaOne 2012 can be seen here (click on CON4105_mp4_4105_001 in Media).

Each feature will be added to the JSR subject to EG approval. You can share your feedback to public@jbatch.java.net.

The APIs and implementation of JSR 352 are not integrated in GlassFish 4 promoted builds yet.

Here are some more references for you:

Here are some other Java EE 7 primers published so far:


And of course, more on their way! Do you want to see any particular one first ?

Comments:

Why did it take so long for this JSR? Spring Batch and Terracotta Quartz have been using these methods for over a decade.

Posted by guest on January 03, 2013 at 12:51 PM PST #

At least you renamed the image you lifted from the spring batch reference documentation before you used it on your blog without attribution, clever. http://static.springsource.org/spring-batch/reference/html-single/index.html#domain

Posted by guest on January 09, 2013 at 03:18 PM PST #

The image is from JSR 352 specification which has an explicit attribution to Spring Batch.

Posted by Arun Gupta on January 09, 2013 at 03:21 PM PST #

I don't want to be offensive, but isn't it easier to promote all the Spring APIs to Java EE 3000, or Java-G, like "genius" ;-)

Posted by guest on January 10, 2013 at 08:46 AM PST #

most of the non-JSR specific content looks lifted from spring batch in one way or another as well.

Posted by guest on January 10, 2013 at 09:40 AM PST #

@guest
Before thinking you know things other wouldn't, you should double-check your informations.

Some members of the Spring team actually also worked on that very-JSR. That JSR was composed of members from IBM, SpringSource & RedHat for example.

In this area, they certainly just agreed that the naming used in Spring was right and so kept that part, and why redo a diagram that's already OK? They're concrete technical guys, not doing politics. Doing things for no value would chave certainly been even more pointless.

And by the way, you should be happy to see standardization in Java come after real world experiences (Spring batch, IBM/mainframe job control language, and so on), so that the spec is actually usable in the end.

If you were willing to ask for clarification, you could also have asked for them. Spreading FUD is not the best way to proceed to receive sensible answers.

Cheers
PS: "I don't want to be offensive[...]" => There's a word for that kind of sentence : Apophasis.

Posted by Baptiste on January 16, 2013 at 03:32 AM PST #

* @Baptiste, many thanks for bringing some reasoning back to this dialog. The best way to respond to FUD is calling it out and voicing your opinion agaist it.

* Let's remember that Spring Batch is only one influnce on JBatch. Just to name some others, there are heavy influences from WebSphere Compute Grid (WCG) and z/OS Batch. In particular, the WCG programming model influences are evident in checkpointing, partition processing, and property handling. Spring Batch influences are most evident in chunk processing, listeners, splits, and operational interfaces. z/OS batch influence is most evident in more fundamental ways: separation of job definition from application and multiple concurrently executing instances of the same job definition. That's what standardization has always been about - taking the best of vendor specific technologies and making them vendor neutral.

* Technology adoption between Spring and Java EE cuts both ways. It is very easy to point to the myriad ideas that have wound up in Spring from the Java EE ecosystem (and they are some pretty big ones).

* It's never a goal to standardize the kitchen sink. It's easy to demonstrate that Batch processing is well outside the 80% use case for most server-side Java applications. Personally, I am not entirely convinced there was a need to include JBatch into Java EE even now...

Posted by Reza Rahman on January 16, 2013 at 11:17 AM PST #

How to know the current development status of this JSR? If the development has already been completed then where can I found the downloads like to do a pilot project? If not completed is there any possibility to join team / community to contribute?

Please can someone help me to find out the answers?

Posted by guest on February 03, 2013 at 10:54 PM PST #

IBM is implementing JSR 352 Reference Implementation and you can track the status at: http://java.net/projects/jbatch/downloads. The latest download is available at:

http://java.net/projects/jbatch/downloads/download/JSR352.RI.TCK.SE.20121212a.zip

The integration work in GlassFish has already started and you'll hear on this blog once its ready to be tried out. I've already sent an email to IBM folks asking if they would be interested in taking contributions. Will let you know if I hear back.

Posted by Arun Gupta on February 03, 2013 at 11:40 PM PST #

Hi arun,

thanks for your post. Do you know if there is a working sample out there?

Posted by guest on February 04, 2013 at 05:13 AM PST #

There is no working sample on GlassFish because its not integrated there yet. But I'm tracking it closely and something should come up in the next couple of weeks or so.

Arun

Posted by Arun Gupta on February 04, 2013 at 05:22 AM PST #

Regarding the question concerning community contributions to the RI/TCK... we might have been open to the idea a few months ago. But calendar and legal constraints preclude it now.

An updated RI/TCK will be released shortly that will be up to date with the proposed final draft of the spec. Announcements for this sort of thing is done via the public mailing list at http://java.net/projects/jbatch.

Posted by guest on February 04, 2013 at 06:54 PM PST #

Hey Arun,
Thank you for your post.
This comes at a very opportune moment for me as I am starting work on a batch application. I have been developing web services over the last few years, so this is the first time I am doing batch processing in java, so have all sorts of questions in my mind.

I like your other tutorials. So it would be awesome if you could do a reference tutorial/screencast for the community of a batch program that reads from a flat file or a csv and inserts them into a database using JSR 352 annotations so that folks can follow the most current/modern methodology for developing a batch process.
*
I looked up some spring batch examples online over the last few days, but they look so.... hard to get started. And also look a bit dated with too much xml configuration. I am not a stranger to spring, but have mostly used annotations with minimal xml configuration, so the spring batch examples seem a bit hard to get started on. I was hoping to use SpringData and hibernate as the JPA provider in this application.
*
My plan is to do a simple implementation and then once my business requirements and logic has been flushed out thoroughly, come back and use JBatch APIs to "batch-ify" the process by doing a rewrite or refactoring as needed. I expect to be done with my simple (naive) implementation by end of February, so am hoping to find some tutorials or webinars of JBatch soon.

Since the new app is expected to deploy in a Java 7 JVM in Q3 2013, I expect I should be able to use JSR 352 APIs.

Thanks for all your posts and tutorials.
-SGB

Posted by SGB on February 18, 2013 at 08:39 AM PST #

SGB,

JSR 352 is not integrated in GlassFish yet but you do give me a good idea about the tutorial. I'm watching the integration closely and will use your described scenario to build the tutorial. The app will use GlassFish 4 exclusively though with no dependency on third-party frameworks or libraries.

Arun

Posted by Arun Gupta on February 18, 2013 at 09:22 AM PST #

Post a Comment:
Comments are closed for this entry.
About

profile image
Arun Gupta is a technology enthusiast, a passionate runner, author, and a community guy who works for Oracle Corp.


Java EE 7 Samples

Stay Connected

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today