Shuffle a simple narrative description of the process

Source: Internet
Author: User

Shuffle it is between the map and the reduce process. Let's look at the steps in this process to understand that the problem is not deep and that there may be a mistake. Forgot to fix

1. Map

Map Exit Key,value, Context.write (key, value);. This step is to write key,value to memory buffer, the default size of this memory is 100M

2. Sort

When the data size exceeds 80% of the buffer capacity (default). This part of the data will be sorted, according to the partition and key values to sort, partition represents will be divided into which reducer

3. Overflow writing

Write data to disk after ordering

4. Merge

Since very often it is not possible to write an overflow at a time, it may pass through multiple overflows. As a result, multiple files are generated on the disk, and the files need to be merged at this time.

5. Copy

Copy the data on the disk in the previous step to the appropriate reduce side by using the HTTP method

6. MergeSort

Each map output file is sorted according to key, this place is the file of multiple map side according to key to sort, edge merge side sort

7. Reduce

See this blog post for more specific information

http://blog.csdn.net/nwpuwyk/article/details/37904657


Copyright notice: This article Bo Master original articles, blogs, without consent may not be reproduced.

Shuffle a simple narrative description of the process

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.