Hadoop MapReduce yarn Run mechanism

Last Update:2016-10-29 Source: Internet

Author: User

Tags shuffle stop script hadoop mapreduce

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Problems with the original Hadoop MapReduce framework

The MapReduce framework diagram of the original Hadoop

The process and design ideas of the original MapReduce program can be clearly seen:

First the user program (Jobclient) submits a job,job message sent to the job Tracker , the job Tracker is the center of the map-reduce framework, and he needs to communicate with the machine in the cluster (HEARTBE at), you need to manage which programs should run on which machines and manage all job failures, reboots, and so on.
Tasktracker is a part of every machine in the Map-reduce cluster, and the main thing he does is to monitor the resources of his own machine .
Tasktracker also monitors the tasks health of the current machine. Tasktracker need to send this information through heartbeat to Jobtracker,jobtracker to collect this information to run the newly submitted job assignment on which machine. The dashed arrow is the process that represents the send-receive of a message.

It can be seen that the original Map-reduce architecture is simple and clear, in the first few years, also got a lot of success stories, access to the industry wide support and affirmation, but with the scale of distributed system cluster and its workload growth, the problem of the original framework gradually surfaced, the main problem focused as follows:

Jobtracker is the central processing point for map-reduce, there is single point of failure .
Jobtracker has done too many tasks, resulting in excessive , which can cause significant memory overhead , potentially, also increases the risk of jobtracker fail, which is also the industry's general conclusion that the map-reduce of old Hadoop can only support the upper limit of 4000-node hosts.
on the tasktracker side, with the number of map/reduce task as the representation of the resource too simple, , if two large memory consumed task is dispatched to a piece, it is easy to appear oom .
on the Tasktracker side, the resource is coerced into the map task slot and the reduce task slot, which can be wasteful if only the map task or reduce task is in the system, which is the former Problems with the utilization of cluster resources.
Source-level analysis, you will find the code is very difficult to read, often because a class did too many things, the code amount of more than 3,000 lines, resulting in a class task is not clear, increase the difficulty of bug repair and version maintenance.
from an operational point of view, the current Hadoop MapReduce framework enforces system-level upgrade updates when there are any important or unimportant changes, such as bug fixes, performance improvements, and characterization. What's worse, it does not matter what the user likes, forcing each client side of the distributed cluster system to update at the same time. These updates will allow users to waste a lot of time trying to verify that their previous application is applying a new version of Hadoop.

New Hadoop Yarn Framework principle and operation mechanism

From the industry's changing trends in the use of distributed systems and the long-term development of the Hadoop framework, the jobtracker/tasktracker mechanism of MapReduce requires large-scale adjustments to fix its flaws in scalability, memory consumption, threading models, reliability, and performance. Over the past few years, the Hadoop development team has done some bug fixes, but the cost of these fixes has been increasing recently, suggesting that changes to the original framework are becoming more difficult.

To fundamentally address the performance bottlenecks of the old MapReduce framework, and to promote the longer-term development of the Hadoop framework, starting with the 0.23.0 release, Hadoop's MapReduce framework was completely refactored and changed radically. The new Hadoop MapReduce framework is named MapReduceV2 or Yarn, as shown in the architecture diagram:

The fundamental idea of refactoring is to separate the Jobtracker two main functions into separate components, two of which are resource management (ResourceManager) and task scheduling/monitoring . The new resource manager globally manages the allocation of all application compute resources , and the applicationmaster of each application is responsible for scheduling and coordinating accordingly. An application is nothing more than a single traditional MapReduce job or a DAG (directed acyclic graph) job. The node Management Server (NodeManager) of the ResourceManager and each machine manages the process of the user on that machine and can organize the computations.

　　ResourceManager coordinate the resource utilization of the cluster, any client or running Applicatitonmaster who wants to run the job or task will have to request a certain amount of resources from RM. Applicatonmaster is a special library of the framework, for the MapReduce framework has its own AM implementation, the user can also implement their am, when running, am combined from the ResourceManager The resources obtained are together with NM to initiate and monitor tasks.

ResourceManager

ResourceManager as a resource coordinator has two main components: Scheduler and Applicationsmanager (AsM).

Scheduler is responsible for allocating the minimum amount of resources required to meet the application run to application. Scheduler is only based on application resource usage of scheduling, resources include: Memory, CPU, disk, network, and so on, it can be seen, this is the Mapreduce fixed type of resource usage model has a significant difference, It has a negative impact on the use of the cluster. In a sense it is a purely scheduler and is not responsible for monitoring/tracking the state of application, and of course not handling failed tasks. RM uses the resource container concept to manage the resources of a cluster, and each application requires a different type of resource and therefore requires a different container . Resource container is the abstraction of resources, each container includes a certain amount of memory, IO, network and other resources, but the current implementation includes only one resource of memory. ResourceManager provides a plug-in for scheduling policies that is responsible for assigning cluster resources to multiple queues and applications. Scheduling plug-ins can be based on existing capacity scheduling and fair scheduling models. The ResourceManager supports hierarchical application queues that enjoy a certain percentage of the cluster's resources.

Applicationsmanager (AsM) is responsible for processing client-submitted jobs and negotiating the first container for Applicationmaster to run. And the applicationmaster will be restarted when the applicationmaster fails. The following describes some of the functions that RM specifically accomplishes.

Resource Scheduler: Scheduler received Resource request constructs a global resource allocation plan and then allocates resources based on application special restrictions and some global constraints.
resource monitoring: Scheduler periodically receives monitoring information for resource usage from NM, In addition Applicationmaster can get the status information of the completed container belonging to it from scheduler.
application commit:
1. client to ASM get a applicationid
2. client uploads ApplicationID and required jar package files to the specified directory in HDFs, Yarn-site.xml yarn.app.mapreduce.am.staging-dir specified
3. The client constructs the resource request object and the submission context of the application to the ASM
4. ASM to receive the application submission context
5. ASM negotiates a container for Applicationmaster, based on application information, and then launches Applicationmaster
6. Scheduler Sending Launchcontainer information to the NM to which the container belongs initiates the container, which is also the start Applicationmaster, ASM provides the client with the status information of the running AM.

4. am life cycle:ASM is responsible for the management of the lifecycle of all am in the system. ASM is responsible for the start of AM, when AM is started, am will periodically send heartbeat to ASM, the default is 1s,asm to understand the survival of AM, and in AM failure is responsible for restarting AM, if after a certain time (default 10 minutes) did not receive AM's heartbeat, ASM thought the AM was a failure.

NodeManager

NM is the agent for each machine frame, primarily responsible for starting RM container for AM and container for AM, and monitoring the operation of container. When starting container, NM will set up some necessary environment variables and download the jar packages, files, etc. required for container to run from HDFs to local, so-called resource localization ; When all the preparation is done, The script that represents the container will start up the program. When it starts up, NM periodically monitors the resource situation (CPU, memory, hard disk, network) that the container runs on and reports to the scheduler that if it exceeds the amount of resources declared by the container, it will kill the process represented by the container.

In addition, NM provides a simple service to manage the local directory of the machine on which it resides. Applications can continue to access the local directory even though that machine has no container on it running. For example, the Map-reduce application uses this service to store Map output and shuffle them to the corresponding reduce task.

You can also extend your services on NM, yarn provides a yarn.nodemanager.aux-services configuration that allows users to customize some services, For example, Map-reduce's shuffle function is implemented in this way.

NM generates the following directory structure locally for each running application:

The directory structure under the container directory is as follows:

At the start of a container, NM executes the container default_container_executor.sh, which executes launch_container.sh inside the script. LAUNCH_CONTAINER.SH will set some environment variables and finally start the command to execute the program. For MapReduce, start am to execute org.apache.hadoop.mapreduce.v2.app.MRAppMaster; start map/reduce The task executes the org.apache.hadoop.mapred.YarnChild.

Applicationmaster

Applicationmaster is a framework-specific library in which each application's applicationmaster responsibilities include requesting the appropriate resource containers for the scheduler, running tasks, tracking the status of the application and monitoring their processes, and processing the failure reasons for the task.

For the Map-reduce computing model, it has its own applicationmaster implementation, and for other computational models that want to run on yarn, the applicationmaster for the computational model must be implemented to RM Application Resource Run task, such as the spark frame running on yarn also has a corresponding applicationmaster implementation, in the final analysis,yarn is a resource management framework, not a computational framework, To run the application on yarn, you have to have a specific implementation of the computing framework. Since yarn is present along with MRV2, the following is a brief overview of MRV2 's running process on yarn.

MRV2 Running Process:

MR Jobclient submits a job to ResourceManager (AsM)
ASM asks scheduler for a containerto run for Mr Am, and then launches it
MR am starts up and registers with ASM
Mr Jobclient obtains information about Mr AM from ASM and communicates directly with Mr AM
MR am calculates the splits and constructs a resource request for all maps
Mr AM is doing the necessary preparation for Mr Outputcommitter
MR am initiates a resource request to RM (Scheduler), obtains a set of container for the Map/reduce task to run, and, together with NM, performs some necessary tasks for each container, including resource localization, etc.
The MR AM monitors the running task until it finishes, and when the task fails, it requests a new container to run the failed task
When each map/reduce task is completed, Mr am runs the cleanup code of Mr Outputcommitter, which is to do some finishing work
When all map/reduce are complete, MR am runs the necessary job commit or abort APIs for Outputcommitter
MR am exits.

Write the application on yarn

Writing applications on yarn and unlike the MapReduce applications we know, it is important to keep in mind that yarn is a resource management framework and not a computational framework that can run on yarn. All we can do is apply container to RM and start container with NM. Just like MRv2, jobclient requests the container for the Mr am run, sets the environment variables and the Start command, and then goes to NM to start Mr am, and then the Map/reduce task is fully accountable to Mr Am, and of course the task is initiated by Mr AM applies container to RM and starts with NM. So to run a non-specific computing framework program on yarn, we have to implement our own client and Applicationmaster. In addition, our custom am needs to be placed under the classpath of each nm, as am may run on the same machine as any nm.

new and old Hadoop MapReduce framework alignment

Let's make a detailed analysis and comparison of the old and new MapReduce framework, and we can see the following notable changes:

First the client is unchanged, its calling API and interface are mostly compatible, which is also to be transparent to the development user, so that it does not have to make large changes to the original code (see 2.3 Demo Code Development and detailed), but the original framework of the core Jobtracker and Tasktracker disappeared, replaced by ResourceManager, Applicationmaster and NodeManager three parts.

Let's explain these three parts in detail, first of all, ResourceManager is a central service, it does is to dispatch, start each Job belongs to the Applicationmaster, additional monitoring applicationmaster The existence of the case. Careful readers will find that the tasks inside the Job are monitored, restarted, and so on. This is the reason why Appmst exists. ResourceManager is responsible for the scheduling of jobs and resources. Receive Jobsubmitter submitted jobs, follow the context information of the job, and the status information collected from NodeManager, start the scheduling process, assign a Container as the App MSTR

NodeManager function is more exclusive, is responsible for the maintenance of Container state, and to RM to maintain the heartbeat.

Applicationmaster is responsible for all work within a job life cycle, similar to the old framework of Jobtracker. But note that each Job (not each) has a applicationmaster that can run on a machine other than ResourceManager.

What are the advantages of Yarn frames relative to the old MapReduce framework? We can see:

This design greatly reduces the resource consumption of Jobtracker (now the ResourceManager) and makes it more secure and graceful to distribute the programs that monitor the status of each job subtask (tasks).
In the new Yarn, Applicationmaster is a changeable part that allows users to write their own appmst to different programming models, allowing more types of programming models to run in a Hadoop cluster and refer to Hadoop Yarn The Mapred-site.xml configuration in the official configuration template.
The representation of the resource in memory (in the current version of Yarn does not take into account the CPU footprint) is more reasonable than the number of slots remaining.
The old frame, jobtracker a big burden is to monitor the job under the tasks of the health, now, this part is thrown to applicationmaster do, and ResourceManager There is a module called Applicationsmanager(note not applicationmaster), it is to monitor the health of Applicationmaster , if there is a problem, It will be restarted on other machines.
Container is a framework proposed by Yarn for future resource isolation. This should draw on the work of Mesos, is currently a framework, only to provide the isolation of Java Virtual machine memory, Hadoop team design ideas should be able to support more resource scheduling and control, since the resources expressed as the amount of memory, then there is no previous map slot/reduce slot The embarrassing situation that separates the cluster resources from idle.

The new Yarn frame relative to the old Maprduce framework, its configuration files, start-stop scripts and global variables have also undergone some changes, the main changes are as follows:

Table 1. New and old Hadoop script/variable/location change table

Change Item	in the original framework	in the new Frame (Yarn)	Notes
Configuration file Location	${hadoop_home_dir}/conf	${hadoop_home_dir}/etc/hadoop/	Yarn Framework is also compatible with the old ${hadoop_home_dir}/conf location configuration, startup will detect if there is an old conf directory, if there will be loaded Conf directory configuration, otherwise load etc under configuration
Start-Stop Script	${hadoop_home_dir}/bin/start (stop)-all.sh	${hadoop_home_dir}/sbin/start (stop)-dfs.sh ${hadoop_home_dir}/bin/start (stop)-all.sh	The new yarn framework starts the Distributed file system and starts yarn separation, the command to start/Stop the Distributed File system is located in the ${hadoop_home_dir}/sbin directory, and the start/stop yarn frame is located in ${hadoop_home_dir}/bin/ Directory under
Java_home Global Variables	In ${hadoop_home_dir}/bin/start-all.sh	${hadoop_home_dir}/etc/hadoop/hadoop-env.sh ${hadoop_home_dir}/etc/hadoop/yarn-env.sh	Due to the separation of the start HDFs Distributed File system and the boot MapReduce framework in the Yarn framework, Java_home needs to be configured separately in hadoop-env.sh and yarn-env.sh
Hadoop_log_dir Global Variables	No Configuration required	${hadoop_home_dir}/etc/hadoop/hadoop-env.sh	The old frame in the Log,conf,tmp directory is default to the LOG,CONF,TMP subdirectory in the current directory where the script starts In the new Yarn frame, log is created by default in the log subdirectory of the Hadoop user's home directory, so it is best to configure Hadoop_log_dir in ${hadoop_home_dir}/etc/hadoop/hadoop-env.sh, otherwise it may Will cause confusion in the log location because of the. bashrc or. Bash_profile of the user who initiated the Hadoop, and the location does not have access to the error during startup.

Since the new Yarn frame changes significantly compared to the original Hadoop MapReduce framework, many of the key configuration files have been deprecated in the new framework, and many additional configuration items have been added to the new framework, as shown in the following table:

table 2. New and old Hadoop Framework configuration item Change Table

configuration file	Configuration Items	Hadoop 0.20.X Configuration	Hadoop 0.23.X Configuration	Description
Core-site.xml	system default Distributed File URI	Fs.default.name	Fs.defaultfs
Hdfs-site.xml	DFS name node holds the directory for name table	Dfs.name.dir	Dfs.namenode.name.dir	The new framework name node is divided into Dfs.namenode.name.dir (storing Naname table and Dfs.namenode.edits.dir (storing the edit file), which is the same directory by default
	directory where DFS data node holds data block	Dfs.data.dir	Dfs.datanode.data.dir	The new frame DataNode adds more detail to the configuration, located in Dfs.datanode. Configuration items, such as Dfs.datanode.data.dir.perm (datanode local directory default permissions); Dfs.datanode.address (datanode node listening port);
	Distributed File system data block replication number	Dfs.replication	Dfs.replication	The new framework is consistent with the old framework and values are recommended to be configured to match the actual number of DataNode hosts in the distributed cluster
Mapred-site.xml	Job Monitoring address and port	Mapred.job.tracker	No	The new framework has been changed to Resoucemanager and NodeManager specific configuration items in Yarn-site.xml, and the query for historical jobs in the new framework has been stripped from job tracker and grouped into separate mapreduce.jobtracker.job History-related configurations,
	Third-party MapReduce framework	No	Mapreduce.framework.name	The new framework supports third-party MapReduce development frameworks to support non-Yarn architectures such as SMARTTALK/DGSG, noting that the value of this configuration is usually set to yarn, and if this is not configured, the submitted Yarn job will only run in locale mode instead of distributed Type mode.

Yarn-site.xml	The address of the applications manager interface in the RM	No	Yarn.resourcemanager.address	Interface address for NodeManager and RM communication in the new framework
	The address of the scheduler interface	No	Yarn.resourcemanager.scheduler.address	Ditto, Nodemanger need to know the Scheduler Dispatch service interface address of RM host
	The address of the RM Web application	No	Yarn.resourcemanager.webapp.address	The resource scheduling and health status of each task in the new framework is accessed through the Web interface
	The address of the Resource tracker interface	No	Yarn.resourcemanager.resource-tracker.address	In the new framework, NodeManager needs to report the task run status to RM for resouce tracing, so the NodeManager node host needs to know the tracker interface address of the RM host

Hadoop Yarn Framework Demo sample

Reference:

Hadoop New MapReduce Framework Yarn detailed

Faster and stronger--parsing Hadoop's next generation MapReduce framework yarn

The relationship between MapReduce and yarn-Hadoop distributed data analytics Platform

HADOOP2 submit to Yarn:mapreduce execution Process analysis 1

Hadoop MapReduce yarn Run mechanism

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More