Partitioning is a master/slave step configuration This allows for partitions of the data to being processed in parallel. Each partition is described via some metadata. For example, if you were processing a database table, partition 1 could be IDs 0-100, partition 2 being 101-200, etc. For Spring Batch, a-master step uses a partitioner to generate executioncontexts this contain the metadata for each Partit Ion. These executioncontexts is distributed to slave step for processing by a partitionhandler (for remote partitioning, the M Essagechannelpartitionhandler is typically used). The slaves execute their step and return the resulting statuses for aggregation by the master.
Things to note about remote partitioning:
Input and output is local to the slaves. For example, if the input was a file, the slaves need access to the file.
Slaves need access to the jobrepository. Slaves is fully defined Spring Batch steps and so they need jobrepository access.
Remote Chunking
Remote chunking is similar to remote partitioning in the It is a master/slave configuration. However with remote chunking, the data are read at by the master and sent over the wire to the slave for processing. Once the processing is do, the result of the itemprocessor is returned to the master for writing.
Things to note about remote chunking:
All I/O is done by the master.
The slaves handle processing only and therefore does not need jobrepository access.
Remote chunking is more I/O intensive than remote partitioning since the actual data is sent over the wire instead of meta Data describing it.
I did a talk on scaling Spring Batch and do a demonstration of remote partitioning so you can watch here:http://www.you Tube.com/watch?v=cytj5yt7czu
Translation:
Partitioning is a master/slave step configuration that allows data partitioning to be processed in parallel. Each partition is described by some meta data. For example, if you are working with a database table, partition 1 may be IDs 0-100, Partition 2 is 101-200, and so on. For spring Batch, the main step uses the partitioner to generate the ExecutionContext that contains the metadata for each partition. These executioncontexts are assigned to subordinate steps for Partitionhandler processing (Messagechannelpartitionhandler is typically used for remote partitions). Perform their steps from the station and return the result status summarized by the master station.
Considerations for Remote Partitioning:
The inputs and outputs are local to the slave. For example, if the input is a file, the slave needs access to the file.
The slave needs access to the jobrepository. Slaves are fully defined spring batch steps, so they require jobrepository access.
Remote chunking
Remote chunking is similar to remote partitions because it is a master/slave configuration. However, when you use a remote group block, the data is read by the master and sent to the slave through the line for processing. When processing is complete, the results of the itemprocessor are returned to the primary device for writing.
Considerations for remote chunking:
All I/O is done by the owner.
The slave handles processing only and therefore does not require jobrepository access.
Remote chunking is more I/O intensive than a remote partition because the actual data is sent over the wire rather than the metadata that describes it.
I made a speech about scaling spring batch and demonstrated the remote partitions you can watch here: Http://www.youtube.com/watch? v = Cytj5yt7czu
Differences between Spring Batch remote partitions and remote tiles