Mapreduce Working Principles

Source: Internet
Author: User
Mapreduce Working Principles 

Body:

1. mapreduce job running process

Process Analysis:

1. Start a job on the client.

 

2. Request a job ID from jobtracker.

 

3. Copy the resource files required for running the job to HDFS, including the jar files packaged by the mapreduce program, configuration files, and input Division information calculated by the client. These files are stored in the folder created specifically for this job by jobtracker. The folder name is the job ID of the job. By default, the JAR file has 10 copies (mapred. Submit. Replication attribute control). The input partition information tells jobtracker how many map tasks should be started for this job.

 

4. after receiving a job, jobtracker puts it in a job queue and waits for the job scheduler to schedule it (is it similar to the process scheduling in a microcomputer ), when the job scheduler schedules a job based on its own scheduling algorithm, it creates a map task for each division based on the input Division information and assigns the map task to tasktracker for execution. For map and reduce tasks, tasktracker has a fixed number of map slots and reduce slots based on the number of host cores and memory size. It should be emphasized that the map task is not randomly assigned to a tasktracker. The concept here is data-local ). This means that the map task is assigned to the tasktracker that contains the data block processed by the map, and the jar package of the program is copied to the tasktracker for running, data is not moved ". Data localization is not considered when a reduce task is assigned.

 

5. tasktracker sends a heartbeat packet to jobtracker at intervals, telling jobtracker that it is still running and carrying a lot of information, such as the progress of the current map task. When jobtracker receives the completion information of the last job, it sets the job to "successful ". When jobclient queries the status, it will know that the task has been completed, and a message is displayed to the user.

 

The above is an analysis of the working principle of mapreduce at the client, jobtracker, and tasktracker levels. Let's take a closer look at it and analyze it at the map task and reduce task levels.

 

2. Shuffle and sorting in map and reduce tasks


Also paste the process I drew in Visio:

Process Analysis: 

Map end: 

1. each input part is processed by a map task. By default, the size of one HDFS block (64 MB by default) is used as a part, of course, we can also set the block size. Map output results will be put in a ring memory buffer temporarily (the buffer size is 100 MB by default, from Io. sort. MB attribute control), when the buffer is about to overflow (default is 80% of the buffer size, from Io. sort. spill. percent attribute control). An overflow file is created in the local file system to write data in the buffer to this file.

 

2. Before writing data to a disk, the thread first divides the data into the same number of partitions based on the number of reduce tasks, that is, the data of a partition corresponding to a reduce task. This is done to avoid the embarrassing situation where some reduce tasks are allocated with a large amount of data while some reduce tasks are allocated with very little data or even no data. In fact, partitioning is the process of hash data. Then sort the data in each partition. If the combiner is set at this time, perform the combia operation on the sorted result. This aims to write as little data as possible to the disk.

 

3. When the map task outputs the last record, there may be a lot of overflow files, and these files need to be merged. In the merge process, the sort and combia operations are performed continuously for two purposes:

1. Minimize the amount of data written to the disk each time;

2. Minimize the amount of data transmitted over the network in the next replication phase.

Finally, it is merged into a partitioned and sorted file. To reduce the amount of data transmitted over the network, you only need to set mapred. Compress. Map. Out to true.

 

4. copy the data in the partition to the corresponding reduce task. Someone may ask: how does the data in the partition know which reduce is it? In fact, the map task keeps in touch with its parent tasktracker, while the tasktracker keeps heartbeat with jobtracker. Therefore, jobtracker stores the macro information of the entire cluster. As long as the reduce task obtains the corresponding map output location from jobtracker, it will be OK.

 

At this point, the analysis on the map end is complete. So what is shuffle? The Chinese meaning of Shuffle is "shuffling". If we look at it like this: the data produced by a map is allocated to different reduce tasks through hash partitions, is it a process of shuffling data? Haha.

 

Reduce end: 

1. Reduce receives data from different map tasks, and the data from each map is ordered. If the reduce side accepts a small amount of data, it is directly stored in the memory (the buffer size is determined by mapred. job. shuffle. input. buffer. percent attribute control, indicating the percentage of heap space used for this purpose), if the data volume exceeds a certain proportion of the buffer size (by mapred. job. shuffle. merge. (determined by percent), the data is merged and then overflowed to the disk.

 

2. As the number of overflow files increases, the background thread will merge them into a larger ordered file to save time for subsequent merging. In fact, mapreduce repeatedly performs sorting and merging operations on both the map and reduce ends. Now, I understand why some people say that sorting is the soul of hadoop.

 

3. during the merge process, many intermediate files (written to the disk) will be generated, but mapreduce will minimize the amount of data written to the disk, and the last merge result is not written to the disk, instead, it is directly input to the reduce function.

 

 

 

From: http://weixiaolu.iteye.com/blog/1474172

Mapreduce Working Principles

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.