Yarn Source Code Analysis Tour---Overall architecture---overview and overall architecture

Source: Internet
Author: User

Welcome everyone to discuss, I also contact time is not long, there are questions welcome to correct me. Welcome reprint, Reprint please indicate the source Haddoop 1.0 deficiency and Hadoop2.0 production

People who have studied and studied Hadoop1.0 should know that in Hadoop1.0, the Master\slave architecture pattern is used, Jobtracker runs on a single point of Namenode, and has two functions of resource management and job control. Makes it become the biggest bottleneck of the system, which restricts the expansion of Hadoop cluster, and the Namenode of single point will cause the whole cluster to be unavailable, and MRV1 use slot slot as resource allocation model, slot position is a kind of coarse granularity resource dividing unit, Typically, a task does not use the resources of the slots, and other tasks cannot use these idle resources, the resources are not shared, the utilization is low, and most crucially, it cannot support multiple computing frameworks. Based on the above reasons, Hadoop2.0 produced.

The core of Hadoop2.0 is yarn, which is an independent resource management and distribution system, yarn is only responsible for resource management and allocation, which can run a variety of different frameworks. the basic composition structure of yarn

Yarn is still the master \ Slave structure overall, and ResourceManager acts as master,nodemanager as slave throughout the cluster, ResourceManager is responsible for the unified management and dispatch of resources on each nodemanager. The following figure is the basic structure of the yarn, yarn mainly consists of ResourceManager, NodeManager, Applicationmaster and container and several other components.


ResourceManager is a stand-alone process on master, responsible for cluster unified resource management, scheduling, distribution, and so on; NodeManager is an independent running process on the slave, responsible for escalation of the node's status; App Master and container are components that run on slave, container is a unit of resources allocated in yarn, includes memory, CPU, and so on, and yarn allocates resources in container units.

Every application that client submits to ResourceManager must have a application Master, which, after ResourceManager allocates resources, runs in the container of a slave node, The task of doing things is also running in the container with a slave node. Rm,nm,am and even ordinary container communication between the RPC mechanism. Yarn Detailed Architecture Design

The following figure is a reference to some of the information I have drawn, please see:


Separately introduced, all lines are RPC protocol, client and admin are outside the yarn system, client is user, he can submit new application to yarn system, admin is yarn Administrator, it can operate and manage yarn system through command line; RM ( ResourceManager, later articles are called RM) is responsible for the management of scheduling resources, NM (NodeManager, later articles are referred to NM) is responsible for the management of slave nodes, and regularly to RM report their status, the cluster has many slave nodes. AMS (Applicationmaste, hereafter) is responsible for the redistribution of resources for a single application (from RM to resource, specifically in the task assigned to the application), and to report the application status to RM periodically. AMS is also outside of the yarn system, and each submitted application needs to implement an am on its own. You may be puzzled as to why Applicationmasterprotocol and Containermanagerprotocol have been implemented in yarn systems and why they are dashed. It is true that all two RPC protocols have been implemented, but they need to be invoked on their own in the AMS, such as the Applicationmasterprotocol protocol, which simply defines the register,stop and allocate interfaces, does not implement any heartbeat code, It would require am to allocate in the code and report back to her own routine. Rmclientprotocol and Taskumbilicalprotocol are two RPC implementations in MRV2 applications, so strictly outside of the yarn system, because MRV2 is already an application running on the yarn platform, From this point of view, yarn more and more like a cloud operating system. Code Directory

In this series of articles, we focus only on Hadoop-yarn-project


Hadoop-yarn-applications is an example of two systems running on a yarn system. The specific each package is what to use, the later article will slowly talk about.

Today's article first came here.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.