Batch Processing System Architecture Based on AWS cloud services

Source: Internet
Author: User

Batch Processing System Architecture Based on AWS cloud services

When AWS executes batch processing tasks, it allows you to configure application architectures for processing multiple jobs as needed, which can be used for instantaneous or delayed deployment of heterogeneous systems, it can also be expanded to a "Grid" type working node and quickly converge through parallel processing of large batches of tasks. Batch-oriented applications can now use this style for on-demand processing, including claims processing, large-scale transformation, media transcoding, and multi-part data processing.

The batch processing architecture is usually synonymous with the high variable use mode, that is, there is a significant usage peak after a low usage (for example, processing at the end of the month ). There are many ways to build a batch processing architecture. This article provides a basic batch processing architecture to support job scheduling, job status check, upload raw data, output job results, grid management, and report job performance data.

1. The job manager is deployed on an EC2 instance, and the user interacts with it through the Elastic IP address. The Job Manager Component controls the process receiving, scheduling, startup, management, and completion of Batch jobs. It can also access the final results, job and worker statuses, and job progress information.

2. The original job data is uploaded to a high-availability permanent storage, that is, S3.

3. Based on user behavior, the job manager inserts a separate job task into SQS.

4. A Worker node is an EC2 instance that uses the AutoScaling group service. This group is a container that ensures the health and scalability of worker nodes. The Worker node automatically extracts the job part from the input queue and executes separate tasks in the batch process step list.

5. The intermediate data generated by the worker node is stored in Amazon S3.

6. Job progress information and statistical information are stored in the analysis storage area. You can use either AmazonSimpleDB or RDS instances in the analytic storage area.

7. As an option, completed tasks can be inserted into the AmazonSQS queue for reprocessing nodes in the chained structure.

This article permanently updates the link address:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.