Shuffle phase and sort stage of mapreduce

Last Update:2015-11-18 Source: Internet

Author: User

Tags shuffle

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Part

The shuffle phase is divided into two parts: the map and the reduce side.

The sort stage is the sorting of the key output from the map side.

　　Part I: Map End Shuffle

For the input file, the Shard, for a split, there is a map task to process, each map in memory has a buffer, map output will be placed in this buffer, in the buffer, will be pre-ordered (that is, sort and comibner) to improve efficiency.

The default size of the buffer is 100MB (can be changed by the Io.sort.mb property), when the data in the buffer reaches a certain threshold (IO.SORT.MB * io.sort.spill.percent, where io.sort.spill.percent defaults to 0.80), the system initiates a background thread that spill the contents of the buffer (overflow) to disk. Overflow into a temporary file on disk, that is, 80% of the content becomes a temporary file. When this 80% of the content overflows, the map continues to output to the remaining 20% buffers.

Spill threads perform a two quick sort before writing the data in the buffer to the disk, first sorting according to the partition to which the data belongs, and then sorting by key in each partition. The output includes an index file and a data file. If the combiner is set, it will be done on the basis of the sort output.

Comibner is a mini Reducer that runs on the node itself that performs the map task, making a simple reduce to the output of the map, making the map ' de output more compact, and less data being written to disk and sent to the reduce side.

A map task produces multiple spill files, and all spill files will be sorted into an index file and data file before the map task is completed. When the spill file is merged, map deletes all temporary files and informs the Tasktracker that the task is complete.

The data written to the disk can optionally be compressed, and if compression is required, set Mapred.compress.map.output to True.

There is also a partition concept where a temporary file is partitioned, and the number of partitions is determined by the number of reduce, and different partitions are passed to different reduce.

　　Part II: Reduce End Shuffle

The reduce side gets the data from the map side via HTTP, and as soon as a map task is completed, the reduce task begins copying its output, which is called the copy phase.

Jobtracker know the mapping between map output and tasktracker, the reduce side has a thread that intermittently asks the Jobtracker the address of the map output until all the data is fetched.

If the map output is small, they are copied to reduce's memory and are copied to disk if there is not enough buffer space. The copied data on the disk, the background thread will be merged into a larger sort of file, for the compressed file, the system will be automatically extracted to memory easy to merge.

When all the map outputs are copied, the reduce task goes to the sort stage (exactly the merge phase), and the process repeats multiple times. There are three forms of merge: Memory to memory, memory to disk, disk to disk.

Memory-to-memory is not enabled by default, memory-to-disk mode also generates overflow, if combiner is set, it is enabled at this time, multiple overflow files are generated on disk, and disk-to-disk generates a final file as input to reduce.

Shuffle phase and sort stage of mapreduce

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More