How mapreduce work

Last Update:2018-12-07 Source: Internet

Author: User

Tags shuffle

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

ArticleDirectory

Input phase
MAP Phase
Sort phase
Combine phase
Partition phase
Reduce phase
Output phase
Job submission
Job Initialization
Task Assignment
Task execution
Progress and status updates
Job completion

Http://blog.endlesscode.com/2010/06/24/how-mapreduce-works/

1. From map to reduce

Mapreduce is actually sub-GovernanceAlgorithmThe processing process is also very similar to the pipeline command. Some simple text character processing can even be replaced by the Unix pipeline command, the process is roughly as follows:

`1`	`Cat` `Input \|Grep` `\|Sort\|Uniq` `-C \|Cat` `> Output`

`2`	`# Input-> map-> shuffle & sort-> reduce-> output`

The simple flowchart is as follows:

For shuffle, the map output is divided into appropriate reducers by some algorithms for processing. Sort sorts the intermediate results by key, because the reducer input strictly requires sorting by key.

Input-> map-> shuffle & sort-> reduce-> output is a simple description of mapreduce from a macro perspective. In the mapreduce framework, from the programming perspective, the processing process is input-> map-> sort-> combine-> partition-> reduce-> output. This process is described using the previous statistical example of temperature.

Input phase

Input data must be transmitted to mapper in a certain format, such as textinputformat, dbinputformat, and sequencefileinput. You can use jobconf. setinputformat. This process should also include dividing the input data by Task Granularity (split) and passing it to mapper. In the temperature example, because all text data is processed, use the default textinputformat for the input format.

MAP Phase

Processes the Input key and value pairs. The output is a set of key and value, that is, map (K1, V1)-> List (K2, V2), which uses jobconf. setmapperclass: set your own mapper. In this example, the (line number, temperature text data) is input as the key/value. After processing, extract the year and the temperature data of the day from the temperature file data to form a new key/value pair, and output the data as a list (year, temperature, for example, [(1950, 10), (1960, 40), (1960, 5)].

Sort phase

Sort the data output by mapper. You can use jobconf. setoutputkeycomparatorclass to set your own sorting rules. In this example, after sorting, the output list set is the list (year, temperature) sorted by year, such as [(1950, 10), (1950, 5 ), (1960, 40)].

Combine phase

In this phase, the <key, value> pairs with the same key in the intermediate results are merged into one pair. The combine process is similar to reduce, and even the reduce interface is used. Combine can reduce the number of <key, value> sets to reduce network traffic. Combine is only an optional optimization process. No matter how many times the combine executes (> = 0), the reducer generates the same output. You can use jobconf. setcombinerclass to set the custom combine class. In this example, assume that map1 produces [(1950, 0), (1950, 20), (1950, 10)], the results produced in MAP2 are [(1950, 15), (1950, 25)], the results of the two sets of data as the reducer input and the maximum temperature per year after the reducer processing are (1950, 25 ), however, when combine (combine first filters out the maximum temperature) is added after Mapper, the output of map1 is [(1950, 20)] and the output of MAP2 is [(1950, 25)] although the other three groups of data are discarded, the maximum temperature after processing is (1950, 25) for the reducer output ).

Partition phase

Divides the intermediate results output by mapper tasks into r copies in the range of keys (r is the number of pre-defined reduce tasks). The default partitioning algorithm is "(key. hashcode () & integer. max_value) % numpartitions ", which ensures that keys in a certain range must be processed by a certain CER, simplifies the reducer processing process, and uses jobconf. setpartitionclass to set the custom partition class. In this example, the year is modeled by default.

Reduce phase

Reducer obtains the intermediate result of mapper output and processes a key range as input. It is set using jobconf. setreducerclass. In this example, the processing is the same as that in combine phase. the maximum temperature of the data transmitted by each er is calculated in the year.

Output phase

The reducer output format corresponds to the Mapper input format. Of course, the reducer output can be processed as another mapper input.

Ii. Details of job run

The process of MAP and reduce is described in task running. In fact, many other details are involved from running "hadoop jar. The entire job running process is shown in:

As you can see, the mapreduce operation involves four independent entities:

Client, used to submit mapreduce jobs.
Jobtracker is responsible for coordinating job operations.
Tasktrackers: multiple tasks run after job decomposition. tasks are mainly responsible for running Mapper and reducer.
Distributed filesystem is used to store the job files (such as intermediate result files) shared when the preceding entity is running ).

Job submission

After jobclient. runjob () is called, the job is submitted. In this step, the job has gone through the following process:

The client requests a new job ID (step 2) from jobtacker, in the format of job_200904110811_0002, it is composed of the time when jobtracker runs the current job and an auto-increment count maintained by jobtracker (starting from 1.
Check the output specification of a job, such as whether the output directory already exists (if an exception exists) and whether the output directory has the permission to write.
Computes the input splits for the job, which serves as the Mapper input.
Copies the resources needed to run the job, including the job jar file, the configuration file and the computed input splits, to the jobtracker's filesystem in a direcotry named after the job ID (Step 3 ).
Tells the jobtracker that the job is ready for execution (step 4 ).

Job Initialization

When jobtracker receives a request submitted by a job, it stores the job in an internal queue and allows the Job scheduler to process and initialize the job. Initialization involves creating a job object that encapsulates its tasks, and keeping the status and progress of the task in accordance with step 5 ). After creating a series of task objects to run, job schedits first obtains the input splits (Step 6) calculated by jobclient from the file system, and then creates a map task for each split.

Task Assignment

Tasktrackers uses a simple loop to periodically send heartbeat calls to jobtracker. The sending interval is about five seconds, which generally depends on the Cluster Server scale, busy level, and network congestion. This heartbeat informs jobtracker that the current tasktracker is in the live state and is used for communication between jobtracker and tasktracker. tasktracker performs certain operations (Step 7) based on the return value of heartbeat ).

To choose a reduce task the jobtracker simply takes the next in its list of yet-to-be-run reduce tasks, since there are no Data Locality considerations. for a map task, however, it takes account of the tasktracker's network location and picks a task whose input splits is as close as possible to the tasktracker. in the optimal case, the task is data-local, that is, running on the same node that the split resides on. alternatively, the task may be rack-local: On the same rack, but not the same node, as the split.

Task execution

After tasktrack is assigned to a task, the task is run. First, it copies the required job jar file from the shared filesystem to the local filesystem, and then creates a working direcotry and un-jars copy JAR file to the directory, finally, create a taskrunner object to run the task.

Taskrunner starts a new JVM to run the each task (Step 10) during running, so as to prevent JVM from being suspended due to exceptions in the User-Defined mapper, this makes tasktracker redundant. The taskrunner sub-process uses the umbilical interface to communicate with tasktracker and reports the progress to tasktracker every several seconds.

Mapper created using streaming and pipes is also run as a sub-process of tasktracker. Streaming uses standard input and output for communication, while pipes uses Socket for communication, for example:

Progress and status updates

The progress and status are updated and maintained through heartbeat. For map tasks, the progress is the ratio of processed data to all input data. For reduce tasks, the situation is a bit complicated, including copying intermediate result files, sorting, and reduce calls, with each part accounting for 1/3.

Job completion

After the job is completed, jobtracker receives a notification of job complete and updates the current job status to successful. Meanwhile, jobclient also learns that the submitted job has been completed, display the information to the user. Finally, jobtracker clears and recycles the relevant resources of the job and notifies tasktracker to perform the same operations (such as deleting the intermediate result file ).

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More