"Turn" MapReduce operation mechanism

Last Update:2015-12-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Turn from http://langyu.iteye.com/blog/992916 write pretty good!

The operation mechanism of MapReduce can be described from many different angles, for example, from the MapReduce running flow, or from the logic flow of the computational model, perhaps some in-depth understanding of the MapReduce operation mechanism will be described from a better perspective, However, there are some things that will not be able to avoid the mapreduce operation mechanism, that is, the instance object that is involved, one is the logical definition stage of the computational model, I explain here not from what flow, from these each object, whether physical entity or logical entity.

To start with the physical entity, the participation in the MapReduce job execution involves 4 separate entities:

Client: Write a mapreduce program, configure the job, submit the job, this is the work done by the programmer;
Jobtracker: Initialize job, assign job, communicate with Tasktracker, Coordinate the execution of the entire job;
Tasktracker: Maintaining communication with Jobtracker, performing a map or reduce task on an allocated piece of data, tasktracker and Jobtracker different there is an important , that is, in the execution of the task Tasktracker can have more than N, Jobtracker will only have one (Jobtracker can only have a single point of failure with the Namenode in HDFs, I'll talk about this in the related question of MapReduce later.
Hdfs: Save the data for the job, configuration information, and so on, and the final result is also saved on Hdfs

So how does MapReduce work? ？

　　First, the client to write a good MapReduce program, configuration of MapReduce job is job, the next is to submit job, submit job is submitted to Jobtracker, this time Jobtracker will build this job, is to assign a new job task ID value, next it will do check operation, this check is to determine whether the output directory exists, if there is a job can not run normally, Jobtracker will throw an error to the client, and then check whether the input directory exists, if not save In the same throw error, if the presence of Jobtracker will be based on the input calculation of input shards (input Split), if the Shard calculation will also throw an error, as for the input shards I will explain later, these are done Jobtracker will configure the resources required for the job. Once the resource is allocated, Jobtracker initializes the job, and the initialization is primarily done by putting the job into an internal queue so that the configured job scheduler can dispatch to the job, and the job scheduler initializes the job. Initialization is the creation of a running Job object (encapsulating tasks and logging information) so that Jobtracker tracks the status and process of the job. When the initialization is complete, the job scheduler obtains the input shard information (input split), creating a map task for each shard. Next is the task assignment, this time Tasktracker will run a simple loop mechanism to send the heartbeat to Jobtracker regularly, the heartbeat interval is 5 seconds, the programmer can configure this time, the heartbeat is Jobtracker and Tasktracker Communication Bridge, Through the heartbeat, Jobtracker can monitor whether the tasktracker is alive or not, and can get the status and problems of tasktracker processing, and Tasktracker can also get the operation instructions Jobtracker to it through the return value in the heartbeat. The task is performed after the assignment is done. At the time of task execution, Jobtracker can monitor the state and progress of tasktracker through the heartbeat mechanism, and also can calculate the status and progress of the whole job, and Tasktracker can monitor its status and progress locally. When Jobtracker obtains the last notification of the successful Tasktracker operation of the specified task, Jobtracker will set the entire job status to success and then when the client queries the job runtime (note: This is an asynchronous operation), The client will check for notification of job completion. If the job fails halfway, MapReduce will also have a mechanism to handle, generally if not the programmer program itself has bThe ug,mapreduce error handling mechanism ensures that the submitted job can be completed properly.

Here I explain the mapreduce operating mechanism from the perspective of a logical entity, which includes, in chronological order:Input Shards (Input Split), Mapstage, Combinerstage, Shufflephase and reduceStage.

input shard (inputs split ): prior to the map calculation, MapReduce computes the input shard based on the input file, each input shard (input split) for a map task, and the input shard store is not the data itself , but a fragment length and an array of location of the record data, the input shard (inputs split) is often closely related to HDFs block (block), if we set the size of the HDFs block is 64MB, if we enter three files, the size is 3MB, 65MB and 127MB, then mapreduce divides the 3MB file into a single input Shard, and 65MB is two input shards, and 127MB is also two input shards (inputs split). In other words, if we do input shard adjustments before the map calculation, such as merging small files, then there will be 5 map tasks to execute, and the data size of each map execution is uneven, which is a key point of the MapReduce optimization calculation.
Map Stage: is the programmer to write the map function, so the map function is relatively efficient control, and the general map operation is localized operation is on the data storage node;
combiner stage: Combiner Stage is the programmer can choose, combiner is actually a kind of reduce operation, so we see the WordCount class is loaded with reduce. Combiner is a localized reduce operation, which is a follow-up operation of the map operation, mainly to do a simple merge duplicate key value before the map is computed, for example, we count the frequency of the word in the file, When a Hadoop word is recorded as 1 when the map is calculated, but Hadoop may appear multiple times in this article, the map output file will be redundant, so a merge of the same key before the reduce calculation will make the file smaller. This improves the transmission efficiency of broadband, after all, Hadoop computing power broadband resources are often the bottleneck of computing is the most valuable resource, but combiner operation is risky, the principle is that combiner input does not affect the final input of the reduce calculation, For example, if the calculation is only the total, the maximum value, the minimum value can be used combiner, but the average calculation using combiner, the final reduce calculation results will be wrong.
Shufflestages:The process of making the output of the map as input to reduce is shuffle, which is the focus of the MapReduce optimization. Here I do not talk about how to optimize the shuffle stage, talk about the principle of shuffle stage, because most of the books are not clear shuffle stage. Shuffle first is the map stage to do the output operation, the general MapReduce calculation is a huge amount of data, map output when it is impossible to put all the files into memory operation, so map to write to the disk process is very complex, not to mention the map output time to sort the results, Memory overhead is very large, map in the output time will be in memory open a ring memory buffer, this buffer is dedicated to output, the default size is 100MB, and in the configuration file for this buffer set a threshold value, The default is 0.80 (this size and threshold can be configured in the configuration file), and the map will also start a daemon for the output operation, if the buffer memory reaches the threshold of 80%, the daemon will write the content to disk, the process is called spill, the other 20% Memory can continue to write data to be written into the disk, write to disk and write memory operation is not interference, if the buffer is full, then map will block write memory operation, let write disk operation completed before continuing to write memory operations, before I write to disk there will be a sort operation, This is done when writing to disk operations, not when writing to memory, and if we define the Combiner function, the combiner operation is performed before sorting. Each time the spill operation is written to the disk operation will write an overflow file, that is, the map output has a few times spill will produce how many overflow files, and so on when the map output is all done, map will merge these output files. This process will also have a partitioner operation, for this operation a lot of people are very confused, in fact, the partitioner operation and the map phase of input shard (input split) very much like, a partitioner corresponding to a reduce job, If we have only one reduce operation for the mapreduce operation, then there is only one partitioner, and if we have multiple reduce operations, then there will be multiple partitioner corresponding Partitioner therefore is the input shard of reduce, which can be programmed to control, mainly based on the actual key and value values, depending on the actual business type or for better reduce load balancing requirements, which is a key to improve the efficiency of reduce. In the reduce phase is to merge the map output file, Partitioner will find the corresponding map outputfile, and then copy operations, the copy operation, reduce will open a few replication threads, the default number of these threads is 5, the programmer can also change the number of replication threads in the configuration file, this copy process and map write disk process is similar, also has the threshold and memory size, the threshold can be configured in the configuration file, The memory size is the memory size of the tasktracker that uses reduce directly, and reduce will also perform a sort operation and merge file operations when the copy is done, which will be done by the reduce calculation.
Reduce stage : As well as the map function is written by the programmer, the end result is stored in HDFs.

Mapreduce the related issues

Here I want to talk about my study of MapReduce thinking some of the problems, are I think of myself to explain the problem, but some problems in the end is right, it is necessary for the majority of children upper I confirmed.

Jobtracker single point of failure: Jobtracker and HDFs Namenode as there is a single point of failure, single point of failure has been a big problem in Hadoop, why Hadoop's file system and the MapReduce computing framework are highly fault-tolerant, But the most important management node failure mechanism is so bad, I think the main is Namenode and Jobtracker in the actual operation is in memory operation, and to do memory fault tolerance is more complex, only when the memory data is persisted after fault-tolerant to do, Both Namenode and Jobtracker can back up their persisted files, but this persistence will have a delay, so there is really a failure, still not overall recovery, in addition to the Hadoop framework contains zookeeper framework, Zookeeper can be combined with jobtracker, with several machines at the same time to deploy Jobtracker, to ensure that a fault, one can be replenished immediately, but this way can not restore the running MapReduce task.
When doing mapreduce calculations, the output is generally a folder, and the folder is not exist, I came forward to the question when the issue, and this check is done very early, when we submit the job will be carried out, MapReduce is designed to ensure data reliability, If the output directory exists reduce It is unclear whether you want to append or overwrite, either the Append and overwrite operations will likely lead to problems, MapReduce is to do a large amount of data calculation, a production calculation is very expensive, such as a job full execution may take a few hours, Everything that affects the wrong situation in MapReduce is therefore 0 tolerated.
MapReduce also has a inputformat and OutputFormat, and when we write the map function we find that the parameters of the map method are the operation row data, not involving InputFormat, these things in our new The path when the MapReduce computing framework helped us do well, and OutputFormat is also reduce help us do, we use what kind of input file, we will call what kind of inputformat, InputFormat is related to the type of file we have entered, the inputformat commonly used in MapReduce have fileinputformat plain text file, Sequencefileinputformat is the serialized file of Hadoop , plus keyvaluetextinputformat. OutputFormat is the file format that we want to finally store to the HDFS system, this according to you need to define, Hadoop has a lot of support file format, here do not enumerate, want to know Baidu next see.

"Turn" MapReduce operation mechanism

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

"Turn" MapReduce operation mechanism

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support