Mapreduce architecture and lifecycle

Last Update:2014-10-30 Source: Internet

Author: User

Tags shuffle

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Mapreduce architecture and lifecycle

Overview: mapreduce is one of the core components of hadoop. It is easy to perform distributed computing and programming on the hadoop platform through mapreduce. The results of this article are as follows: firstly, the mapreduce architecture and basic principles are outlined, and secondly, the lifecycle of the entire mapreduce process is discussed in detail.

References: Dong Xicheng's hadoop technology insider and several Forum articles cannot be found.

Overview of mapreduce architecture and Basic Principles

Mapreduce is mainly divided into two processes: map and reduce. It adopts the M/S Design architecture. In the 1.0 series, the main roles include client, jobtracke, tasktracker, and task.

Client: the job to be executed by the user is configured on the client, such as compiling mapreduce programs, specifying the input/output path, and specifying the compression ratio. After the configuration is completed, the client submits the job to jobtracker. Each job is divided into several tasks.

Jobtracker: it is mainly responsible for job scheduling and resource monitoring. It also provides interfaces to the client so that users can view the job running status.
Jobtracker initializes the Jobs submitted by the client and assigns them to tasktracker. It communicates with tasktracker and coordinates the entire job. Communication between tasktracker and jobtracker and task allocation are completed through the heartbeat mechanism. Tasktracker will take the initiative to ask jobtracker if there is a job to do, if you can do it yourself, then it will apply for a job task, this task can make map or reduce task. If a task fails, the task is transferred to another node for execution.
Tasktracker: maintain jobtracker communication, periodically report the resource usage and task execution status on the current node to jobtracker through the heartbeat mechanism, and accept commands sent by jobtracker, such as starting a task, terminate a task. Tasktracker uses slot to equally divide resources (CPU and memory) on the current node. It can run only when a task gets the allocated slot. The slots on the node can be divided into mapslot and redcueslot, which are allocated to maptask and reducetask respectively. In addition, tasktracker limits the concurrency of tasks through slot allocation.
Task: tasks can be divided into maptask and reducetask, which correspond to the map process and reduce process, which are started by tasktracker. The number of maptasks is determined based on the number of split configurations.

During mapreduce execution, data must be processed by mapper, shuffle, and CER Cer.

Mapper

The Mapper executes maptask and resolves the split assigned to it into several K-V pairs as the input of the map function. For example, in the wordcount program, K = string offset, V = a line of strings. Then call map () in turn for processing, and the output is still a K-V pair. <K = word, V = 1>.

Intermediate process shuffle

The intermediate process is also divided into map and reduce execution.

Mapper end

On the Mapper side, each map task has a memory buffer that stores the output results of the map function, when the buffer zone is full, you need to store the data in the buffer zone as a temporary file to the disk, after the entire map task is completed, merge all temporary files generated by the map task on the disk to generate the final formal output file, and wait for the reduce task to pull data.

Partition

Partitioner determines which reduce task processes the current K-V of map output based on the K-V pair output by the map function and the number of reduce tasks. First, hash the key, and then modulo it with the number of reduce tasks. The modulo is for the average reduce processing capability, and can also be customized and set to the specified job.

Spill

For the output data K-V of map () function, to write to the memory buffer, the function of the memory buffer is to collect map results in batch, reduce the effect of Io. Then write the disk file. Serialize K-V pairs into byte arrays, and then write the grouping results of K-V pairs and partition into the buffer. The buffer size is limited to 100 MB by default. Therefore, when the map task outputs too many results, you need to refresh the buffer and write data to the disk. The process of writing data to a disk is called spill, which means overflow writing. It is completed by a separate process and does not affect the thread operation for writing map results. To achieve this, set the overflow ratio spill. percent (0.8 by default) is implemented, that is, when the data in the buffer reaches 80%, the spill process is started, the buffer of the 80% is locked, and the write overflow process is performed. At the same time, other threads can continue to write map output results to the remaining 20% buffer zone without affecting each other.

Sort

After the spill thread starts, it needs to process the data to be written into the disk and sort the keys serialized as bytes. Because the results of the map task must be handed over to different reduce tasks for processing, the data of the same reduce task must be merged. This merge process is not executed when the buffer is written, but is merged when the spill process writes data to the disk. If there are many K-V pairs that need to be submitted to a reduce task, you should splice these K-V to reduce the index records associated with partitioner. <K = word, V = 1, V = 1, V = 1>

Merge

Each spill overflow operation generates an overflow file on the disk. If the buffer is not large enough or the map output result is large enough, the overwrite file will be executed multiple times. Therefore, you need to merge these overwrite files into a file, which is called merge. Merge's operation is to merge the K-V with the same key from different map task results into a group to form k-[V1, V2, V…]. Because multiple files are merged into one file, the same key may also exist. If a combiner is set on the client side, it is called to merge the same key.

At this point, all the work on the map end is completed, and the file is placed in the local directory that tasktracker can obtain. Each reduce task continuously obtains information about whether the map task is completed from jobtracker through rpc, if the map task on a tasktracker is completed, start the second half of the shuffle process.

CER end

The intermediate process at the reduce end is the work carried out before the reduce execution. The final results output by each map are pulled continuously and then merge is performed.

Copy

Simply pull data. The reduce process starts some data copy threads (Fetcher) and requests the tasktracker of the map task to obtain the output file of the map task through HTTP. Because the map task has already ended, these files are managed by tasktracker on the local disk .?

Merge

Here, the merge action is like the merge action on the map side, but the values stored in the array are the copy values of different map terminals. The copied data is first put into the memory buffer. The buffer size here is more flexible than that on the map end. It is set based on the JVM heap size because the reducer does not run in the shuffle stage, therefore, the vast majority of memory should be used for shuffle. Here, merge has three forms: 1) memory to memory ?? 2) memory to disk ?? 3) disk to disk. The first mode is disabled by default. Perform the sort operation in the memory. When the data volume in the memory reaches a certain threshold, the merge from the memory to the disk is started. Similar to the map end, this is also an overwrite process. If a combiner is set in this process, it is also enabled, and a large number of overwrite files are generated on the disk. The second mode of merge is running until the data on the map end ends. Then, the third mode of Disk-to-disk merge is started to generate the final file.

Reducer

The reduce task reads the final file generated by shuffle and placed on HDFS during the intermediate process, and constantly calls the reduce () function to process input data. The input data format is <k = word, V = N, V = M…>, Output to HDFS.

Mapreduce job Lifecycle

This section describes the process of mapreduce jobs from submission to completion. The entire job processing process includes:

Job submission and initialization → job scheduling and monitoring → prepare the running environment → execute the task → end the task

Job submission and initialization

The jobs you want to execute are configured on the client, such as compiling mapreduce programs, specifying the input and output paths, and specifying the compression ratio. After configuration, the client submits the jobs to jobtracker. After a user submits a job, the jobclient instance first uploads the job-related information (such as the program jar package, job configuration file, and multipart Metadata File) to HDFS, the multipart metadata file records the logical location information of each input shard. Then jobclient notifies jobtracker through rpc. After receiving a new job submission request, jobtracker initializes the job by the job scheduling module: Creates a jobinprogress object for the job to track the job running status, jobinprogress creates a taskinprogress object for each task to track the running status of each task. taskinprogress may need to manage multiple "tasks attempt"

Task scheduling and monitoring

Tasktracker periodically reports the resource usage of the node to jobtracker through heartbeat. Once idle resources exist, jobtracker selects a proper task to use the idle resources according to certain policies, this is done by the task scheduler. The task scheduler is a pluggable independent module with a dual-layer architecture. You can select a job and then select a task from the job, when selecting a task, you need to focus on local data. In addition, jobtracker tracks the entire running process of a job and provides comprehensive assurance for the successful running of the job. First, when the tasktracker or task fails, the task is transferred. Second, when the execution progress of a task lags far behind that of other tasks of the same job, start the same task for it, select the task result for quick calculation as the final result.

Prepare the task running environment

The runtime environment preparation includes JVM startup and resource isolation, which are implemented by tasktracker. Tasktracker starts an independent JVM for each task to prevent different tasks from affecting each other during running. Meanwhile, tasktracker uses operating system processes to isolate resources to prevent Task Abuse of resources.

Task execution

After tasktracker prepares the running environment for the task, it starts the task. During the running process, the latest progress of each task is first reported to tasktracker through RPC, and then to jobtracker by tasktracker.

Task ended

After accepting that the last task is completed, jobtracker marks the task as successful. In this case, the system will delete intermediate results and perform other follow-up operations.

This article briefly discusses and summarizes the mapreduce architecture and job lifecycle. If there are any errors, I hope to correct them.

Mapreduce architecture and lifecycle

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More