By Acshorten-Oracle on Jan 10, 2012
One of the major features of the Oracle Utilities Application Framework is the Batch Framework. A Batch process is a background process that processes data in bulk. Typically online transactions process single objects in foreground. When a business process requires transactions to be processed in bulk and in background you create a Batch process to do this. Batch processes are also called background process to emphasize they are not run with interaction with end users. For example, in Oracle Utilities Customer Care and Billing and Oracle Enterprise Taxation and Policy Management, there exists a BILLING batch process that generates bills in bulk. For Oracle Utilities Meter Data Management, it might the loading and processing of Meter Read data. For Oracle Utilities Mobile Workforce Management, it maybe to process data for crews. You get the point.
The Oracle Utilities Application Framework comes with the development and runtime components to build and execute Batch processes. Typically batch processes are written in java (in older products we supported COBOL as well) which consist of some driver SQL to identify the object instances to process and then calls the underlying objects in the right sequence to process the records. It is now possible to use ConfigTools to build batch processes (for adhoc processing) using the Request Object but this will be the focus of another blog entry in the future.
To execute Batch with the Oracle Utilities Application Framework you must obviously have the batch program written but you also need to define it to the product using a Batch Control. The Batch Control object defines the batch process to the product with its technical details (like the program to execute) and the parameters to execute it against. Batch Controls are used by the product to define the process but also at runtime the object is used to seed instance information for the batch run tree (more about this later). The Batch Control also holds the sequence number for any extracts (Batch Run Number) if required.
To execute the batch process we have a number of methods including an interactive mode (used for developers mainly), online submission (used for testing) and external scheduler (via a command line) for production systems. Each of the submission methods ultimately instruct the product to execute the batch process (the techniques are slightly different but ultimately the batch process runs the same way each time). For the purposes of this blog entry I will assume the latter is what we are executing.
Now for batch a few things are dfifferent than online, obviously. When running an online transaction it is running within the J2EE Web Application Server software (Oracle WebLogic or IBM WebSphere) which is a JVM running continuously. In the context of batch, the J2EE Web Application Server does not participate (except for the online submission but that is not the focus of this blog entry) but we still need to house the process in a JVM. In the Batch Framework the JVM is created and managed as a threadpoolworker. The threadpoolworker process is a started JVM that is continually running (till it is shutdown) ready for batch processes (threads) to be executed within the JVM. In batch terms, the threadpoolworker acts like the J2EE Web Application Server in that it is a managed JVM waiting for work.
In the Batch framework, the threadpoolworker can be managed a number of ways which we term execution modes. The current batch framework supports DISTRIBUTED, CLUSTERED or EXTENDED mode. Each of the modes uses a different technique for managing work within and across threadpools. For an indpeth discussion of each of the modes as well as advice on use of each mode refer to the Batch Best Practices for Oracle Utilities Application Framework based products (Doc Id: 836362.1) whitepaper available from My Oracle Support. For the purposes of this blog entry I will assume that we are using CLUSTERED mode which is the most used mode of all and the recommended mode for most sites.
The CLUSTERED mode uses the named cache and clustering features of Oracle Coherence (bundled with Oracle Utilities Application Framework). When a threadpoolworker is started, from the command line, it is named (you choose the name) and the named threadpoolworker joins a cluster as defined by the configuration parameters for Coherence specified at installation time (these can be changed at any time). As part of the definition and starting of the threadpoolworker you can specify the capacity of the worker node in the maximum number threads supported. The number of threads specified is different from the java threads you might be familiar with. These are the maximum batch instances of a job (or jobs) supported concurrently by the threadpoolworker and as batch is typically heavier than online the number of threads supported by a JVM is much smaller. If you start another threadpoolworker (hence another JVM) with the same name on the same cluster then the capacity of that threadpoolworker has increased for the number of concurrent threads it supports.
Now, to initiate a batch process you use a submitter node (another JVM) which specifies the threadpoolworker to use, the batch control to use and any relevant parameters for the batch process (as per the Batch Control). These can be specified on the command line or in a configuration file (for reuse). The submitter can be initiated from a command line or as most commonly used via a batch submission product such as Oracle Scheduler, UC4 AppWorx etc. The submitted will be active for the length of the batch process and will submit the batch process to the specified threadpoolworker.
If there is capacity in the threadpoolworker specified then the thread is initiated and a Batch Instance record is created within the product using the Batch Control as a reference. This is the same object used by the Batch run tree to display the progress and status. The Batch Instance is updated every commit point (controlled by a configuration parameter). If there is no capacity at the moment, the process will wait until capacity becomes available.
While the process is executing it can be monitored using a JMX utility (jconsole or jmxbatchclient) which is updated more frequently than the Batch Instance. You specify the JMX port and credentials as part of the startup of the threadpoolworker. You can also cancel the process using the JMX capability. Remember each threadpoolworker has its own JMX port to connect to.
When the process completes the threadpoolworker informs the submitter node which updates the Batch Instance. If the batch process is run multi-threaded then the lasyt executed thread will update the Batch Instance to a completed status. The submitter will return a return code when it completes (0=successful, 1 = error) to the operating system (and the batch scheduler to use for dependency processing). In CLUSTERED mode, if the submitter is killed by the operator or operating system then the threadpoolworker will update the Batch Instance with an error code. If the threadpoolworker is killed by the operator or operating system, then all the submitters are informed. In this scenario the restart capability can be used by the batch scheduler of choice to restart the job.
To summarize, a long running JVM is started with each threadpoolworker to run submitter nodes in. The threadpoolworkers each have a capacity that can be extended by clustering the threadpoolworkers, if necessary. Communication to the product is via the Batch Instance at commit points and Coherence is used to cluster and for inter-process communication between threadpoolworkers and submitters.