X

Batch Applications in Java EE 7 - Undertanding JSR 352 Concepts: TOTD #192



Batch processing is execution of series of "jobs" that is suitable
for non-interactive, bulk-oriented and long-running tasks. Typical
examples are end-of-month bank statement generation, end-of-day jobs
such as interest calculation, and ETL (extract-transform-load) in a
data warehouse. These tasks are typically data or computationally
intensive, execute sequentially or in parallel, and may be initiated
through various invocation models, including ad-hoc, scheduled, and
on-demand.



JSR 352 will
define a programming model for batch applications and a runtime for
scheduling and executing jobs. This blog will explain the main
concepts in JSR 352.



The diagram below highlight the key concepts of a batch processing
architecture.



src="//cdn.app.compendium.com/uploads/user/e7c690e8-6ff9-102a-ac6d-e4aebca50425/f4a5b21d-66fa-4885-92bf-c4e81c06d916/Image/8173030e73f6629907c602dc88f86b0f/jsr352_schematic.png"
height="248" width="600">


  • A Job is an instance that encapsulates an entire batch
    process. A job is typically put together using a Job
    Specification Language
    and consists of multiple steps. The
    Job Specification Language for JSR 352 is implemented with XML
    and is referred as "Job XML".

  • A Step is a domain object that encapsulates an
    independent, sequential phase of a job. A step contains all of
    the information necessary to define and control the actual batch
    processing.
  • JobOperator provides an interface to manage all aspects
    of job processing, including operational commands, such as
    start, restart, and stop, as well as job repository commands,
    such as retrieval of job and step executions.
  • JobRepository holds information about jobs current
    running and jobs that run in the past. JobOperator provides
    access to this repository.

  • Reader-Processor-Writer pattern is the primary pattern
    and is called as Chunk-oriented processing. In
    this, ItemReader reads one item at a time, ItemProcessor
    processes the item based upon the business logic, such as
    calculate account balance and hands it to ItemWriter for
    aggregation. Once the 'chunk' number of items are aggregated,
    they are written out, and the transaction is committed.



    JSR 352 also defines roll-your-own batch pattern, called as Batchlet.
    This batch pattern is invoked once, runs to completion, and
    returns an exit status. This pattern must implement and honor a
    "cancel" callback to enable operational termination of the
    Batchlet.


A Job XML for a chunk-oriented processing may look like:

<job id="myJob" xmlns="http://batch.jsr352/jsl">
<step id="myStep" >
<chunk reader="MyItemReader"
writer="MyItemWriter"
processor="MyItemProcessor"
buffer-size="5"
checkpoint-policy="item"
commit-interval="10" />
</step>
</job>

  • The <job> has an "id" attribute that defines the logical
    name of the job and is used for identification purposes.

  • Each <job> can multiple <step>s where each
    <step> identifies a job step and it's characteristics.
    Each <step> has an "id" attribute that defines the logical
    name of the job and is used for identification purposes.

  • A <step> may have <chunk> or <batchlet>
    element, this <step> has a <chunk>. A <chunk>
    identifies a chunk type step and implements the
    reader-processor-writer pattern of batch.

  • The "reader", "processor", and "writer" attributes specify the
    class names of an item reader, processor, and writer
    respectively.
  • "buffer-size" specifies number of items to read and buffer
    before writing. When enough items have been read to fill the
    buffer, the buffer is emptied to a list and the configured
    ItemWriter is invoked with the list of items.
  • "checkpoint-policy" attribute specifies the checkpoint policy
    that governs commit behavior for this chunk. Valid values are
    "item", "time" and "custom". The "item" policy means chunk is
    checkpointed after a specified number of items are processed.
    The "time" policy means the chunk is committed after a specified
    amount of time. The "custom" policy means the chunk is
    checkpointed according to a checkpoint algorithm implementation.
    The default policy is "item".
  • "commit-interval" specifies the commit interval for the
    specified checkpointed policy. The unit meaning of the
    commit-interval specifies depends on the specified checkpoint
    policy. For "item" policy, commit-interval specifies a number of
    items. For "time" policy, commit- interval specifies a number of
    seconds. The commit-interval attribute is ignored for "custom"
    policy.



    When the configured checkpoint policy directs it is time to
    checkpoint, all the items read and processed so far are passed
    to the "writer".

Here is a simple reader:

@ItemReader
public class MyItemReader {
private static int id;
MyCheckPoint checkpoint = null;

@Open
void open(MyCheckPoint checkpoint) {
this.checkpoint = checkpoint;
System.out.println(getClass().getName() + ".open: " + checkpoint.getItemCount());
}

@ReadItem
MyBatchRecord read() {
checkpoint.incrementByOne();
return new MyBatchRecord(++id);
}

@CheckpointInfo
MyCheckPoint getCheckPoint() {
return checkpoint;
}
}

Methods marked with @Open, @ReadItem,
and @CheckpointInfo are required.

Here is a simple processor that rejects every other item:

@ItemProcessor
public class MyItemProcessor {
@ProcessItem
MyBatchRecord process(MyBatchRecord record) {
return (record.getId() % 2 == 0) ? record : null;
}
}

And here is a simple writer:

@ItemWriter
public class MyItemWriter {
MyCheckPoint checkpoint = null;

@Open
void open(MyCheckPoint checkpoint) {
this.checkpoint = checkpoint;
System.out.println(getClass().getName() + ".open: " + checkpoint.getItemCount());
}

@WriteItems
void write(List<MyBatchRecord> list) {
System.out.println("Writing the chunk...");
for (MyBatchRecord record : list) {
System.out.println(record.getId());
}
checkpoint.increment(list.size());
System.out.println("... done.");
}

@CheckpointInfo
MyCheckPoint getCheckPoint() {
return checkpoint;
}
}

Finally a simple implementation of MyCheckpoint:

public class MyCheckPoint {
int itemCount;

public int getItemCount() {
return itemCount;
}

public void setItemCount(int itemCount) {
this.itemCount = itemCount;
}

void incrementByOne() {
itemCount++;
}

void increment(int size) {
itemCount += size;
}
}

    Together, MyItemReader, MyItemWriter,
    MyItemProcessor, MyCheckPoint, and batch.xml,
    will read/process/write 5 items and commit when 10 such items have
    been processed.

    JSR 352
    specification
    defines several other concepts such as how Job
    XML can define sequencing of jobs, listeners to interpose on job
    execution, transaction management, and running jobs in partitioned
    and concurrent modes. Subsequent blog will explain some of those
    concepts.

    A complete replay of href="https://oracleus.activeevents.com/connect/sessionDetail.ww?SESSION_ID=4105">Java
    Batch for Cost-Optimized Business Efficiency from JavaOne
    2012 can be href="https://oracleus.activeevents.com/connect/sessionDetail.ww?SESSION_ID=4105">seen
    here (click on CON4105_mp4_4105_001 in Media).

    Each feature will be added to the JSR subject to EG approval. You
    can share your feedback to href="http://java.net/projects/jbatch/lists/public/archive">public@jbatch.java.net.

    The APIs and implementation of JSR 352 are not integrated in href="http://dlc.sun.com.edgesuite.net/glassfish/4.0/promoted/">GlassFish
    4 promoted builds yet.


    Here are some more references for you:
    • href="http://jcp.org/aboutJava/communityprocess/pr/jsr352/index.html">Java
      API for Batch Processing Public Review Downloads

    • Specification
      Project
      (jbatch.java.net)

    • JSR
      Expert Group Discussion Archive
      (public@jbatch.java.net)

    • href="https://wikis.oracle.com/display/GlassFish/PlanForGlassFish4.0#PlanForGlassFish4.0-SpecificationStatus">Java
      EE 7 Specification Status

    Here are some other Java EE 7 primers published so far:

    • href="https://blogs.oracle.com/arungupta/entry/simple_jms_2_0_sample">Simple
      JMS 2.0 Sample (TOTD #191)
    • href="https://blogs.oracle.com/arungupta/entry/what_s_new_in_servlet">What's
      New in Servlet 3.1 ?
    • href="https://blogs.oracle.com/arungupta/entry/concurrency_utilities_for_java_ee">Concurrency
      Utilities for Java EE (JSR 236)
    • href="https://blogs.oracle.com/arungupta/entry/collaborative_whiteboard_using_websocket_in">Collaborative
      Whiteboard using WebSocket in GlassFish 4 (TOTD #189)
    • href="https://blogs.oracle.com/arungupta/entry/non_blocking_i_o_using">Non-blocking
      I/O using Servlet 3.1 (TOTD #188)
    • href="https://blogs.oracle.com/arungupta/entry/what_s_new_in_ejb">What's
      New in EJB 3.2 ?
    • href="https://blogs.oracle.com/arungupta/entry/jpa_2_1_schema_generation">JPA
      2.1 Schema Generation (TOTD #187)
    • href="https://blogs.oracle.com/arungupta/entry/websocket_applications_using_java_jsr">WebSocket
      Applications using Java (JSR 356)
    • href="https://blogs.oracle.com/arungupta/entry/jersey_2_in_glassfish_4">Jersey
      2 in GlassFish 4 (TOTD #182)
    • href="https://blogs.oracle.com/arungupta/entry/websockets_and_java_ee_7">WebSocket
      and Java EE 7 (TOTD #181)
    • href="https://blogs.oracle.com/arungupta/entry/json_p_java_api_for">Java
      API for JSON Processing (JSR 353)
    • href="https://blogs.oracle.com/arungupta/entry/jms_2_0_early_draft">JMS
      2.0 Early Draft (JSR 343)




    And of course, more on their way! Do you want to see any particular
    one first ?


    Join the discussion

    Comments ( 14 )
    • guest Thursday, January 3, 2013

      Why did it take so long for this JSR? Spring Batch and Terracotta Quartz have been using these methods for over a decade.


    • guest Wednesday, January 9, 2013

      At least you renamed the image you lifted from the spring batch reference documentation before you used it on your blog without attribution, clever. http://static.springsource.org/spring-batch/reference/html-single/index.html#domain


    • Arun Gupta Wednesday, January 9, 2013

      The image is from JSR 352 specification which has an explicit attribution to Spring Batch.


    • guest Thursday, January 10, 2013

      I don't want to be offensive, but isn't it easier to promote all the Spring APIs to Java EE 3000, or Java-G, like "genius" ;-)


    • guest Thursday, January 10, 2013

      most of the non-JSR specific content looks lifted from spring batch in one way or another as well.


    • Baptiste Wednesday, January 16, 2013

      @guest

      Before thinking you know things other wouldn't, you should double-check your informations.

      Some members of the Spring team actually also worked on that very-JSR. That JSR was composed of members from IBM, SpringSource & RedHat for example.

      In this area, they certainly just agreed that the naming used in Spring was right and so kept that part, and why redo a diagram that's already OK? They're concrete technical guys, not doing politics. Doing things for no value would chave certainly been even more pointless.

      And by the way, you should be happy to see standardization in Java come after real world experiences (Spring batch, IBM/mainframe job control language, and so on), so that the spec is actually usable in the end.

      If you were willing to ask for clarification, you could also have asked for them. Spreading FUD is not the best way to proceed to receive sensible answers.

      Cheers

      PS: "I don't want to be offensive[...]" => There's a word for that kind of sentence : Apophasis.


    • Reza Rahman Wednesday, January 16, 2013

      * @Baptiste, many thanks for bringing some reasoning back to this dialog. The best way to respond to FUD is calling it out and voicing your opinion agaist it.

      * Let's remember that Spring Batch is only one influnce on JBatch. Just to name some others, there are heavy influences from WebSphere Compute Grid (WCG) and z/OS Batch. In particular, the WCG programming model influences are evident in checkpointing, partition processing, and property handling. Spring Batch influences are most evident in chunk processing, listeners, splits, and operational interfaces. z/OS batch influence is most evident in more fundamental ways: separation of job definition from application and multiple concurrently executing instances of the same job definition. That's what standardization has always been about - taking the best of vendor specific technologies and making them vendor neutral.

      * Technology adoption between Spring and Java EE cuts both ways. It is very easy to point to the myriad ideas that have wound up in Spring from the Java EE ecosystem (and they are some pretty big ones).

      * It's never a goal to standardize the kitchen sink. It's easy to demonstrate that Batch processing is well outside the 80% use case for most server-side Java applications. Personally, I am not entirely convinced there was a need to include JBatch into Java EE even now...


    • guest Monday, February 4, 2013

      How to know the current development status of this JSR? If the development has already been completed then where can I found the downloads like to do a pilot project? If not completed is there any possibility to join team / community to contribute?

      Please can someone help me to find out the answers?


    • Arun Gupta Monday, February 4, 2013

      IBM is implementing JSR 352 Reference Implementation and you can track the status at: http://java.net/projects/jbatch/downloads. The latest download is available at:

      http://java.net/projects/jbatch/downloads/download/JSR352.RI.TCK.SE.20121212a.zip

      The integration work in GlassFish has already started and you'll hear on this blog once its ready to be tried out. I've already sent an email to IBM folks asking if they would be interested in taking contributions. Will let you know if I hear back.


    • guest Monday, February 4, 2013

      Hi arun,

      thanks for your post. Do you know if there is a working sample out there?


    • Arun Gupta Monday, February 4, 2013

      There is no working sample on GlassFish because its not integrated there yet. But I'm tracking it closely and something should come up in the next couple of weeks or so.

      Arun


    • guest Tuesday, February 5, 2013

      Regarding the question concerning community contributions to the RI/TCK... we might have been open to the idea a few months ago. But calendar and legal constraints preclude it now.

      An updated RI/TCK will be released shortly that will be up to date with the proposed final draft of the spec. Announcements for this sort of thing is done via the public mailing list at http://java.net/projects/jbatch.


    • SGB Monday, February 18, 2013

      Hey Arun,

      Thank you for your post.

      This comes at a very opportune moment for me as I am starting work on a batch application. I have been developing web services over the last few years, so this is the first time I am doing batch processing in java, so have all sorts of questions in my mind.

      I like your other tutorials. So it would be awesome if you could do a reference tutorial/screencast for the community of a batch program that reads from a flat file or a csv and inserts them into a database using JSR 352 annotations so that folks can follow the most current/modern methodology for developing a batch process.

      *

      I looked up some spring batch examples online over the last few days, but they look so.... hard to get started. And also look a bit dated with too much xml configuration. I am not a stranger to spring, but have mostly used annotations with minimal xml configuration, so the spring batch examples seem a bit hard to get started on. I was hoping to use SpringData and hibernate as the JPA provider in this application.

      *

      My plan is to do a simple implementation and then once my business requirements and logic has been flushed out thoroughly, come back and use JBatch APIs to "batch-ify" the process by doing a rewrite or refactoring as needed. I expect to be done with my simple (naive) implementation by end of February, so am hoping to find some tutorials or webinars of JBatch soon.

      Since the new app is expected to deploy in a Java 7 JVM in Q3 2013, I expect I should be able to use JSR 352 APIs.

      Thanks for all your posts and tutorials.

      -SGB


    • Arun Gupta Monday, February 18, 2013

      SGB,

      JSR 352 is not integrated in GlassFish yet but you do give me a good idea about the tutorial. I'm watching the integration closely and will use your described scenario to build the tutorial. The app will use GlassFish 4 exclusively though with no dependency on third-party frameworks or libraries.

      Arun


    Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
    Oracle

    Integrated Cloud Applications & Platform Services