Enterprise Batch Processing
Brief introduction
IBM WebSphere Application Server V8 adds a new container for batch processing, providing an environment for the execution of a batch application based on Java EE. This new batch container provides comprehensive functionality that is ideal for use as an enterprise batch infrastructure provider. The modern Batch feature provided in the WebSphere application Server V7 provides a good header for a consistent programming model and tools, but does not include the advanced batch capabilities that are required in the enterprise environment. The WebSphere application Server V8.5 fills these gaps and provides:
Parallel processing of bulk load in the entire enterprise infrastructure. The Enterprise Application Server environment consists of multiple servers that collaborate to provide a high-performance and highly available infrastructure. Simple batching runs on a single server, so it does not optimally utilize the available application server infrastructure. The modern Batch feature of WebSphere application Server provides a parallel job manager that supports parallel processing of container management, which can be more efficient by apportioning batch jobs to a variety of servers. Running a job as a "one job" provides operational control from a batch perspective, and allocating tasks internally to each server ensures optimal utilization of hardware resources while reducing the time it takes to complete a task.
Skipping record processing supports batch programs omitting "bad" records during processing. This functionality is essential in processing, bad data may exist in real-world scenarios, and the entire batch failure may be a ripple effect throughout the batch program plan due to some bad records. A batch failure can also result in manual intervention before any further processing is performed. In most scenarios, it is preferable to log an incident for an application and continue processing the "good" record after skipping "bad" messages.
The retry step process can be used to rerun a job step that failed in the same job run. This is useful for handling transient exceptions in process job steps. If you have enabled retry step processing and an exception occurred during a job step, the job step will end and all resources will return to the state when the job step was started. Then try to perform this job step again.
Integration support with a variety of enterprise scheduler, such as IBM Tivoli workload Scheduler.
COBOL support enables you to reuse COBOL modules in a WebSphere batch application.
This article explores some of these important features, which allow batch developers to focus on solving business problems without building a custom batch infrastructure.
Technical Concepts
Before you introduce the implementation of an instance, you need to know some of the details related to these key features.
Skip record Processing
The Skip record processing feature can be used when reading or writing. This feature is defined for a batch stream of data and requires the following additional components:
XJCL Property
Com.ibm.batch.bds.skip.count: A non-0 value of this property tells the batch container to skip the bad record. This number specifies the number of records that can be skipped, and processing will be interrupted when these records are skipped.
Com.ibm.batch.bds.skip.include.exception.class.<n>: "Bad" records generate an exception. This property is used to define an exception class that can be skipped.
Com.ibm.batch.bds.skip.exclude.exception.class.<n>: Used to define the name of an exception class that cannot be skipped. Note that this property and the previous property com.ibm.batch.bds.skip.include.exception.class.<n> are mutually exclusive and should not be defined together.
Skip listeners: These listeners are used to trigger an action each time a record is skipped.
Record metrics: The number of skipped records and record processing rates are saved by the batch framework and can be used to report metrics about the execution of batch jobs.
In the practice provided here, you will implement skip record processing when reading data from an input file. Figure 1 shows a simplified flowchart for performing skip record processing at read time.
Figure 1. Skipping record processing streams (when reading)
Retry Step Processing
The retry step processing function is defined for a job step that has its own retry step policy configuration. You can enable retry step processing by specifying a value other than 0 for the Com.ibm.batch.step.retry.count job step property in XJCL.
The following additional components are required for the Retry step processing:
XJCL Property
Com.ibm.batch.step.retry.count: The non-0 value of this property tells the batch container to retry the job step if a failure occurs.
Com.ibm.batch.step.retry.delay.time: This value specifies the number of milliseconds to wait before attempting this step again.
Com.ibm.batch.step.retry.include.exception.class.<n>: This property is used to define an exception that can be tried again if a step fails.
Com.ibm.batch.step.retry.exclude.exception.class.<n>: Used to define which exceptions cannot be tried again when a step fails. Note that this property and the previous property com.ibm.batch.step.retry.include.exception.class.<n> are mutually exclusive and should not be defined together.
Retry Listener
You can register the retry listener with the Jobstepcontext method to listen for exceptions that should be tried again. Whenever there is an exception that can be tried again, the retry listener gets control, and the step is tried again.
Parallel Job Management
The batch container determines that the job will run in parallel mode through the XJCL run property. The parallel Job Manager (PJM) component of the batch container is used to create and manage subordinate jobs. PJM uses the following APIs:
Parameterizer: Used to decompose a job into a subordinate job, enabling you to add a new property to the subordinate job XJCL that can be used to partition the workload. This ensures that the subordinate jobs work on different records.
Logicaltx.synchronization: Used to control jobs during the life cycle of parallel jobs.
Subjobcollector: Collects status information for each subordinate job at each checkpoint.
Subjobanalyzer: Used to determine the overall return code for a job based on data from Subjobcollector.
The remainder of the programming model is similar to the transaction job programming model and uses input readers, batch processors, and output writers.
Figure 2 depicts how the parallel job manager and scheduler run jobs across servers in parallel.
Figure 2. Using WebSphere application Server for parallel job management
In such a context, you will create a batch job that supports skipping records processing and parallel job processing.