Shuffle it is between the map and the reduce process. Let's look at the steps in this process to understand that the problem is not deep and that there may be a mistake. Forgot to fix
1. Map
Map Exit Key,value, Context.write (key, value);. This step is to write key,value to memory buffer, the default size of this memory is 100M
2. Sort
When the data size exceeds 80% of the buffer capacity (default). This part of the data will be sorted, according to the partition and key values to sort, partition represents will be divided into which reducer
3. Overflow writing
Write data to disk after ordering
4. Merge
Since very often it is not possible to write an overflow at a time, it may pass through multiple overflows. As a result, multiple files are generated on the disk, and the files need to be merged at this time.
5. Copy
Copy the data on the disk in the previous step to the appropriate reduce side by using the HTTP method
6. MergeSort
Each map output file is sorted according to key, this place is the file of multiple map side according to key to sort, edge merge side sort
7. Reduce
See this blog post for more specific information
http://blog.csdn.net/nwpuwyk/article/details/37904657
Copyright notice: This article Bo Master original articles, blogs, without consent may not be reproduced.
Shuffle a simple narrative description of the process