Part
The shuffle phase is divided into two parts: the map and the reduce side.
The sort stage is the sorting of the key output from the map side.
Part I: Map End Shuffle
For the input file, the Shard, for a split, there is a map task to process, each map in memory has a buffer, map output will be placed in this buffer, in the buffer, will be pre-ordered (that is, sort and comibner) to improve efficiency.
The default size of the buffer is 100MB (can be changed by the Io.sort.mb property), when the data in the buffer reaches a certain threshold (IO.SORT.MB * io.sort.spill.percent, where io.sort.spill.percent defaults to 0.80), the system initiates a background thread that spill the contents of the buffer (overflow) to disk. Overflow into a temporary file on disk, that is, 80% of the content becomes a temporary file. When this 80% of the content overflows, the map continues to output to the remaining 20% buffers.
Spill threads perform a two quick sort before writing the data in the buffer to the disk, first sorting according to the partition to which the data belongs, and then sorting by key in each partition. The output includes an index file and a data file. If the combiner is set, it will be done on the basis of the sort output.
Comibner is a mini Reducer that runs on the node itself that performs the map task, making a simple reduce to the output of the map, making the map ' de output more compact, and less data being written to disk and sent to the reduce side.
A map task produces multiple spill files, and all spill files will be sorted into an index file and data file before the map task is completed. When the spill file is merged, map deletes all temporary files and informs the Tasktracker that the task is complete.
The data written to the disk can optionally be compressed, and if compression is required, set Mapred.compress.map.output to True.
There is also a partition concept where a temporary file is partitioned, and the number of partitions is determined by the number of reduce, and different partitions are passed to different reduce.
Part II: Reduce End Shuffle
The reduce side gets the data from the map side via HTTP, and as soon as a map task is completed, the reduce task begins copying its output, which is called the copy phase.
Jobtracker know the mapping between map output and tasktracker, the reduce side has a thread that intermittently asks the Jobtracker the address of the map output until all the data is fetched.
If the map output is small, they are copied to reduce's memory and are copied to disk if there is not enough buffer space. The copied data on the disk, the background thread will be merged into a larger sort of file, for the compressed file, the system will be automatically extracted to memory easy to merge.
When all the map outputs are copied, the reduce task goes to the sort stage (exactly the merge phase), and the process repeats multiple times. There are three forms of merge: Memory to memory, memory to disk, disk to disk.
Memory-to-memory is not enabled by default, memory-to-disk mode also generates overflow, if combiner is set, it is enabled at this time, multiple overflow files are generated on disk, and disk-to-disk generates a final file as input to reduce.
Shuffle phase and sort stage of mapreduce