Diagram of how yarn works

Source: Internet
Author: User

YARN is the MapReduce V2 version. It has many advantages over MapReduce V1:

1. The task of Jobtracker was dispersed. Resource management tasks are the responsibility of the explorer, and job initiation, run, and monitoring tasks are responsible for the application topics distributed across the cluster nodes. This greatly reduces the problem of Jobtracker single point bottleneck and single point risk in MapReduce V1, and greatly improves the scalability and availability of the cluster.

2. Applicationmaster is a user-customizable part of the MapReduce V2, so users can write their own app theme programs for the programming model. This greatly expands the scope of application of MapReduce V2.

3. Use Zookper to implement failover on resource management. When resource management fails, the standby resource Manager starts quickly based on the cluster state that is saved in zookeeper. The MapReduce V2 supports the application to specify checkpoints. This ensures that the application theme can be quickly restarted based on the state of the HDFS after the failure. These two measures greatly improved the usability of the MapReduce V2.

4. Cluster resources are uniformly organized into resource containers, unlike the map pool and reduce pool in MapReduce V1. In this way, whenever a task requests a resource, the scheduler assigns the available resources in the cluster to the request task, regardless of the resource type. This greatly improves the utilization of resources.

in fact, yarn has a lot of advantages, here do not have a good list. The main talk about yarn workflow.


which parts of YARN are made up of:

yarn In total ResourceManager, NodeManager, Jobhistoryserver, Containers, Application Master, job, Task, client composition.

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/6F/52/wKiom1WY43ayjjsPAAGqIBnXW80270.jpg "title=" 1.png " alt= "Wkiom1wy43ayjjspaagqibnxw80270.jpg"/>

> Resource Manager: One cluster only one, responsible for resource scheduling, resource allocation and other work.

> jobhistory Server: Responsible for querying job run progress and metadata management.

> NodeManager: Runs on Datanode node, responsible for initiating application and managing resources.

> Containers:container is allocated through ResourceManager. Includes the CPU, memory and other resources of the container.

> Application Master: Application Master equals contractor, Resource Manager equals manager. Resource Manager first assigns the task to application master, and application master communicates the Resource Manager's instructions to each nodemanager (the equivalent of a worker). Each application has only one applicationmaster, which runs on Node manager nodes and application master is assigned by resource Manager.

> Job: An input list of mapper, a reducer, or a process. Job can also be called application.

> Task: A separate unit of work that specifically does mapper or reducer. The task runs in the container of NodeManager.

> Client: A application program that is submitted to resource manager.


already know what work units yarn consists of, and then the overall process of how a job is handled.

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/6F/52/wKiom1WY6kGTyleXAALeVXQYYi0085.jpg "title=" 2.png " alt= "Wkiom1wy6kgtylexaalevxqyyi0085.jpg"/>

The user submits a program/job to yarn, including application master start, applicationmaster commands, and user programs, ResourceManager assigns the first container to the job, and communicate with the corresponding NodeManager, requiring it to start the change operation in this container application Master;applicationmaster first to Resourcemaster registration, This allows the user to query the running state of the job directly through ResourceManager, and then it will request resources for each character and monitor the running status of the task, knowing that the run is over, and application requesting and collecting resources from the ResourceManager through RPC requests.

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/6F/52/wKiom1WY7VfA19qwAALtP283jlU576.jpg "title=" 3.png " alt= "Wkiom1wy7vfa19qwaaltp283jlu576.jpg"/> then applicationmaster requires the specified NodeManager node to start the task.

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/6F/50/wKioL1WY8FPzpC8cAAL-1TQ8Rcg452.jpg "title=" 4.png " After the alt= "Wkiol1wy8fpzpc8caal-1tq8rcg452.jpg"/> is started, go to the map Tesk specified by resource Manager.

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/6F/50/wKioL1WY8NWDv3m7AAMCMXP8ctk774.jpg "title=" 5.png " alt= "Wkiol1wy8nwdv3m7aamcmxp8ctk774.jpg"/> after the map task is done, notify application master. Then application master to tell Resouce Manager. Next, Resource Manager assigns a new resource to Application master, and lets it find someone else to do the job.

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/6F/53/wKiom1WY7_jhGoZMAAL5sya7sCk243.jpg "title=" 6.png " alt= "Wkiom1wy7_jhgozmaal5sya7sck243.jpg"/> next Application Master notifies NodeManager to start a new container preparation. The input to the job is the output of the map task end.

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/6F/53/wKiom1WY8K3ygmZXAALgs7uJjWs563.jpg "title=" 7.png " alt= "Wkiom1wy8k3ygmzxaalgs7ujjws563.jpg"/> start the dry reduce task.

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/6F/50/wKioL1WY8tqCgp88AAJ-c3A-MOs153.jpg "title=" 8.png " alt= "Wkiol1wy8tqcgp88aaj-c3a-mos153.jpg"/>

when the reduce task on each node is finished, it synchronizes the results of the NodeManager tasks of the work. Do the final reduce task.

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/6F/53/wKiom1WY8avzq6MCAAKqzpwxOs0598.jpg "title=" 9.png " alt= "Wkiom1wy8avzq6mcaakqzpwxos0598.jpg"/> and so on are all finished, and finally output the final result to HDFs. The task is complete.


The diagram provides a clearer picture of the entire yarn workflow.

This article is from the "David" blog, so be sure to keep this source http://davidbj.blog.51cto.com/4159484/1671064

Diagram of how yarn works

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.