Introduction to the Yarn framework

Source: Internet
Author: User
Tags hadoop mapreduce
The principle and operation mechanism of new Hadoop Yarn framework

The fundamental idea of refactoring is to separate the two main functions of jobtracker into separate components, which are resource management and task scheduling/monitoring. The new resource manager globally manages the allocation of all application computing resources, and each application's applicationmaster is responsible for the corresponding scheduling and coordination. An application is nothing more than a single traditional MapReduce task or a DAG (with a direction-free graph) task. ResourceManager and each machine's node Management Server can manage the user's processes on that machine and can organize the calculations.

New Hadoop MapReduce Framework (Yarn) architecture

The ResourceManager supports hierarchical application queues, which enjoy a certain proportion of the resources in the cluster. In a sense it is a pure scheduler, which does not monitor and status track the application during execution. Similarly, it cannot reboot a task that failed because of an application failure or a hardware error.

ResourceManager is based on the requirements of the application of the resource scheduling; Each application requires a different type of resource and therefore requires a different container. Resources include: Memory, CPU, disk, network, and so on. As can be seen, this is significantly different from the current Mapreduce fixed type Resource usage model, which has a negative impact on the use of the cluster. The Resource Manager provides a plug-in for a scheduling policy that assigns cluster resources to multiple queues and applications. Scheduling plug-ins can be based on existing capacity scheduling and fair scheduling models.

In the figure above, NodeManager is a proxy for each machine framework, a container for executing applications, monitoring the application's resource usage (CPU, memory, hard disk, network) and reporting to the scheduler.

Applicationmaster's responsibilities for each application are to ask the scheduler for the appropriate resource containers, run tasks, track the status of applications, and monitor their processes to handle the failure reasons for the task.

new and old Hadoop MapReduce frame alignment

First the client is unchanged, its calling API and interface are mostly compatible, this is also to the development of the user transparent, so that it does not need to make big changes to the original code, but the original frame of the core jobtracker and Tasktracker disappeared, replaced by ResourceManager, Applicationmaster and NodeManager three parts.

Explain these three sections in detail:

First of all, ResourceManager is a central service, it is to do the task of scheduling, starting each Job belongs to the Applicationmaster, and also monitor the existence of Applicationmaster. Attentive readers will find that the tasks in the Job's monitoring, restart and so on are missing. That's why Appmst exists. ResourceManager is responsible for the scheduling of jobs and resources. Receive Jobsubmitter submitted jobs, start the dispatch process, assign a Container as the App MSTR, according to the context information of the job, and the status information collected from NodeManager

NodeManager function is more single-minded, is responsible for the maintenance of Container state, and to RM to maintain heartbeat.

Applicationmaster is responsible for all work within a job lifecycle, similar to Jobtracker in the old frame. But note that every Job (not each) has a applicationmaster that can run on a machine other than ResourceManager.


What is the advantage of the Yarn frame relative to the old MapReduce frame?

1. This design greatly reduces the resource consumption of the jobtracker (that is, now ResourceManager) and makes it more secure and more graceful to distribute the programs that monitor each job subtask (tasks) status.

2. In the new Yarn, Applicationmaster is a scalable section where users can write their own appmst to different programming models, allowing more types of programming models to run in Hadoop clusters and refer to the map in the Hadoop Yarn official configuration template Red-site.xml configuration.

3. The representation of resources is in memory (in the current version of Yarn, the CPU is not considered), more reasonable than the number of remaining slot.

4. The old frame, jobtracker a big burden is to monitor the job under the operation of the status, now, this part is thrown to Applicationmaster do, and ResourceManager has a module called application Smasters (note is not applicationmaster), it is monitoring the health of applicationmaster, if the problem, it will be restarted on other machines.

5. Container is a framework proposed by Yarn for future resource segregation.  This should be borrowed from the work of Mesos, is currently a framework to provide only Java Virtual machine memory isolation, the design of the Hadoop team should be able to support more resource scheduling and control, since the resources expressed as the amount of memory, then there is no previous map slot/reduce slot Separate the embarrassing situation that the cluster resources are idle.

New and old Hadoop script/variable/position change table


New and old Hadoop Framework configuration Item Change table

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.