Detailed description of the work principle of mapreduce

Last Update:2018-08-02 Source: Internet

Author: User

Tags shuffle

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Detailed description of the work principle of mapreduce

Preface:

Some time ago, our cloud computing team learned about the knowledge of Hadoop, and we all actively did and learned a lot of things. But after school, everyone is busy with their own things, cloud computing is not too much movement. hehe ~ But recently in Hu boss's call, our cloud computing team rallied, hope that everyone still aloft "cloud in hand, follow me" slogan Fight down. This blog post is the witness of our team's "Restart cloud computing", and we hope that more excellent articles will be released. Tang, Liang Zi, Xie always get up ah.

Oh, below we enter the topic, this article mainly analyzes the following two points:
Directory:
1.MapReduce Job Run Process
Process of shuffle and sequencing in 2.Map, reduce tasks

Text:

1.MapReduce Job Run process

The following is a schematic diagram of the process I drew with visio2010:

Process Analysis:

1. Start a job on the client.

2. Request a job ID to Jobtracker.

3. Copy the resource files required to run the job to HDFs, including the jar files packaged by the MapReduce program, the configuration files, and the client computed input partitioning information. These files are stored in a folder that is created specifically for the job by Jobtracker. The folder name is the job ID of the activity. The jar file will have 10 copies (Mapred.submit.replication property control) By default, and the input partitioning information tells the Jobtracker how many map tasks should be started for the job.

4.JobTracker received the job, put it in a job queue, waiting for the job scheduler to dispatch it (here is not much like a microcomputer in the process of scheduling it, hehe), when the job scheduler according to its own scheduling algorithm to the job, A map task is created for each partition based on the input partitioning information, and the map task is assigned to Tasktracker execution. For the map and reduce tasks, Tasktracker has a fixed number of map slots and reduce slots based on the number of host cores and the size of the memory. It is emphasized here that the map task is not randomly assigned to a tasktracker, there is a concept called: Data localization (data-local). This means that the map task is assigned to the Tasktracker that contains the data block processed by the map, and the program jar package is copied to the Tasktracker to run, which is called "Operation movement, data not moving". When you assign a reduce task, data localization is not considered.

5.TaskTracker sends a heartbeat to jobtracker every once in a while, telling Jobtracker that it is still running, and that there is a lot of information in the heartbeat, such as the progress of the current map task. When Jobtracker receives the last task completion information for the job, it sets the job to "success". When Jobclient queries the state, it learns that the task is complete and displays a message to the user.

The above is in the client, Jobtracker, Tasktracker level to analyze the working principle of mapreduce, below we are a little more detailed, from the map task and reduce the level of the task to analyze it.

process of shuffle and sequencing in 2.Map, reduce tasks

Also post a diagram of the process I drew in Visio:

Process Analysis:

Map End:

1. Each input shard will have a map task to handle, by default, the size of one block in HDFs (64M by default) is a shard, and of course we can set the size of the block. The result of the map output is temporarily placed in a ring memory buffer (the buffer size defaults to 100M, controlled by the Io.sort.mb property), when the buffer is about to overflow (default is 80% of the buffer size, Controlled by the Io.sort.spill.percent property, an overflow file is created in the local file system and the data in that buffer is written to this file.

2. Before writing to disk, the thread first divides the data into the same number of partitions based on the number of reduce tasks, which is the data for one partition for a reduce task. This is done to avoid some of the reduce tasks being allocated to large amounts of data, while some reduce tasks have little or no data embarrassment. In fact, partitioning is the process of hashing data. The data in each partition is then sorted, and if combiner is set at this point, the sorted result is combia and the purpose is to have as little data as possible to write to the disk.

3. When the map task outputs the last record, there may be a lot of overflow files, and these files need to be merged. The process of merging is continuously sorted and combia, with two purposes: 1. Minimize the amount of data that is written to the disk each time; 2. Minimize the amount of data transmitted over the next replication phase of the network. Finally, it is merged into a partitioned and sorted file. In order to reduce the amount of data transmitted over the network, the data can be compressed, as long as the mapred.compress.map.out is set to true.

4. Copy the data from the partition to the corresponding reduce task. One might ask: How does the data in the partition know which of its corresponding reduce is? In fact, the map task has been and its father Tasktracker keep in touch, and Tasktracker has been and jobtracker keep heartbeat. So the macro information in the whole cluster is saved in the Jobtracker. OK, as long as the reduce task gets the corresponding map output location to Jobtracker.

Here, the map end is analyzed. What the hell is shuffle? Shuffle Chinese means "shuffle", if we look at this: a map generated data, the result of the hash process is allocated to different reduce tasks, is not a process of data shuffling. Oh.

Reduce side:

1. Reduce receives data from different map missions, and the data from each map is ordered. If the amount of data accepted by the reduce side is quite small, is stored directly in memory (the buffer size is controlled by the Mapred.job.shuffle.input.buffer.percent property, which represents the percentage of heap space used for this purpose), if the amount of data exceeds a certain percentage of the buffer size (by Mapred.job.shuffle.merg E.percent), the data is merged and then overflowed to disk.

2. As overflow files increase, background threads merge them into a larger, ordered file to save time for subsequent merges. In fact, regardless of the map or the reduce side, MapReduce is repeated to perform the sort, merge operation, now finally understand why some people say: sort is the soul of Hadoop.

3. The process of merging produces many intermediate files (written to disk), but MapReduce makes the data written to disk as small as possible, and the result of the last merge is not written to disk, but is entered directly into the reduce function.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More