Application Architecture of big data and large-scale computing based on AWS cloud services

Source: Internet
Author: User

Application Architecture of big data and large-scale computing based on AWS cloud services

AWS is very popular with large-scale computing solutions, such as scientific computing, simulation, and research projects. These solutions include the collection of a large number of datasets from scientific research devices, measurement devices, or other computing jobs. After the collection, use the analysis of large-scale computing jobs to generate the final dataset. Generally, these results are provided to more audiences.

1. to upload large datasets to AWS, the key is to have the most available bandwidth. With Multi-client parallel processing, you can upload data to S3. Each client uses multi-thread technology to achieve parallel uploading or uploading of multiple parts for further parallel processing. TCP settings such as window adjustment and selection confirmation can be adjusted to further enhance throughput. Through proper optimization, it is possible to upload several terabytes of data a day. Another method for uploading large datasets is the Amazon Import/Export feature, which allows you to send storage devices to AWS and directly insert them to Amazon S3 or Amazon EBS.

2. parallel processing of large-scale jobs is critical. Existing parallel processing applications can run on multiple EC2 instances. if an application requires a POSIX file system, whether it is using HTTP directly or using the FUSE layer (for example, S3FS or SubCloud ), parallel applications can efficiently read and write data to all nodes from S3.

3. once the calculation is complete, the result data is stored in S3, the EC2 instance can be closed, and the result dataset can be downloaded, or the user can be specified by granting the read permission, specify the owner or use a URL with a limited time to share the output data with others.

4. If S3 is not used, you can use Amazon EBS to save the input data as a temporary storage area or obtain the output results. During the upload process, parallel stream upload and TCP adjustment technologies should also be used. In addition, you can use UDP to accelerate upload. The result dataset can be written into the EBS volume, and the time snapshot can share the volume.

This article permanently updates the link address:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.