Diagram of how yarn works

Last Update:2015-07-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

YARN is the MapReduce V2 version. It has many advantages over MapReduce V1:

1. The task of Jobtracker was dispersed. Resource management tasks are the responsibility of the explorer, and job initiation, run, and monitoring tasks are responsible for the application topics distributed across the cluster nodes. This greatly reduces the problem of Jobtracker single point bottleneck and single point risk in MapReduce V1, and greatly improves the scalability and availability of the cluster.

2. Applicationmaster is a user-customizable part of the MapReduce V2, so users can write their own app theme programs for the programming model. This greatly expands the scope of application of MapReduce V2.

3. Use Zookper to implement failover on resource management. When resource management fails, the standby resource Manager starts quickly based on the cluster state that is saved in zookeeper. The MapReduce V2 supports the application to specify checkpoints. This ensures that the application theme can be quickly restarted based on the state of the HDFS after the failure. These two measures greatly improved the usability of the MapReduce V2.

4. Cluster resources are uniformly organized into resource containers, unlike the map pool and reduce pool in MapReduce V1. In this way, whenever a task requests a resource, the scheduler assigns the available resources in the cluster to the request task, regardless of the resource type. This greatly improves the utilization of resources.

in fact, yarn has a lot of advantages, here do not have a good list. The main talk about yarn workflow.

which parts of YARN are made up of:

yarn In total ResourceManager, NodeManager, Jobhistoryserver, Containers, Application Master, job, Task, client composition.

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/6F/52/wKiom1WY43ayjjsPAAGqIBnXW80270.jpg "title=" 1.png " alt= "Wkiom1wy43ayjjspaagqibnxw80270.jpg"/>

> Resource Manager: One cluster only one, responsible for resource scheduling, resource allocation and other work.

> jobhistory Server: Responsible for querying job run progress and metadata management.

> NodeManager: Runs on Datanode node, responsible for initiating application and managing resources.

> Containers:container is allocated through ResourceManager. Includes the CPU, memory and other resources of the container.

> Application Master: Application Master equals contractor, Resource Manager equals manager. Resource Manager first assigns the task to application master, and application master communicates the Resource Manager's instructions to each nodemanager (the equivalent of a worker). Each application has only one applicationmaster, which runs on Node manager nodes and application master is assigned by resource Manager.

> Job: An input list of mapper, a reducer, or a process. Job can also be called application.

> Task: A separate unit of work that specifically does mapper or reducer. The task runs in the container of NodeManager.

> Client: A application program that is submitted to resource manager.

already know what work units yarn consists of, and then the overall process of how a job is handled.

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/6F/52/wKiom1WY6kGTyleXAALeVXQYYi0085.jpg "title=" 2.png " alt= "Wkiom1wy6kgtylexaalevxqyyi0085.jpg"/>

The user submits a program/job to yarn, including application master start, applicationmaster commands, and user programs, ResourceManager assigns the first container to the job, and communicate with the corresponding NodeManager, requiring it to start the change operation in this container application Master;applicationmaster first to Resourcemaster registration, This allows the user to query the running state of the job directly through ResourceManager, and then it will request resources for each character and monitor the running status of the task, knowing that the run is over, and application requesting and collecting resources from the ResourceManager through RPC requests.

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/6F/52/wKiom1WY7VfA19qwAALtP283jlU576.jpg "title=" 3.png " alt= "Wkiom1wy7vfa19qwaaltp283jlu576.jpg"/> then applicationmaster requires the specified NodeManager node to start the task.

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/6F/50/wKioL1WY8FPzpC8cAAL-1TQ8Rcg452.jpg "title=" 4.png " After the alt= "Wkiol1wy8fpzpc8caal-1tq8rcg452.jpg"/> is started, go to the map Tesk specified by resource Manager.

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/6F/50/wKioL1WY8NWDv3m7AAMCMXP8ctk774.jpg "title=" 5.png " alt= "Wkiol1wy8nwdv3m7aamcmxp8ctk774.jpg"/> after the map task is done, notify application master. Then application master to tell Resouce Manager. Next, Resource Manager assigns a new resource to Application master, and lets it find someone else to do the job.

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/6F/53/wKiom1WY7_jhGoZMAAL5sya7sCk243.jpg "title=" 6.png " alt= "Wkiom1wy7_jhgozmaal5sya7sck243.jpg"/> next Application Master notifies NodeManager to start a new container preparation. The input to the job is the output of the map task end.

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/6F/53/wKiom1WY8K3ygmZXAALgs7uJjWs563.jpg "title=" 7.png " alt= "Wkiom1wy8k3ygmzxaalgs7ujjws563.jpg"/> start the dry reduce task.

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/6F/50/wKioL1WY8tqCgp88AAJ-c3A-MOs153.jpg "title=" 8.png " alt= "Wkiol1wy8tqcgp88aaj-c3a-mos153.jpg"/>

when the reduce task on each node is finished, it synchronizes the results of the NodeManager tasks of the work. Do the final reduce task.

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/6F/53/wKiom1WY8avzq6MCAAKqzpwxOs0598.jpg "title=" 9.png " alt= "Wkiom1wy8avzq6mcaakqzpwxos0598.jpg"/> and so on are all finished, and finally output the final result to HDFs. The task is complete.

The diagram provides a clearer picture of the entire yarn workflow.

This article is from the "David" blog, so be sure to keep this source http://davidbj.blog.51cto.com/4159484/1671064

Diagram of how yarn works

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Diagram of how yarn works

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Diagram of how yarn works

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support