A little understanding of Hadoop learning 14--hadoop yarn

Last Update:2016-05-10 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Yarn is a distributed resource management system.

It was born because of some of the shortcomings of the original MapReduce framework:

1, Jobtracker single point of failure hidden trouble

2, Jobtracker undertake too many tasks, maintenance job status, job task status, etc.

3, on the Tasktracker side, the use of Map/reduce task means that the resource is too simple, not considering CPU, memory and other usage. Problems occur when you schedule multiple tasks that require a lot of memory to be consumed

Basic components after evolution

The following specific explanation:

Yarn is a resource management framework, not a computational framework, and it is important to understand this.

The application in the figure corresponds to the map/reduce job in the 1.x version.

The container in the diagram is a logical concept, a collective term for a set of resources (memory, CPU, etc.).

AM: Each application corresponds to a AM.

ResourceManager: Mainly to do the coordination of resources. There are two important components:

Scheduler: "Resource Scheduling" builds a global allocation plan after receiving resource requests from all running application. Resources are then allocated according to the special restrictions of application and some constraints on the global environment. Resource monitoring periodically accepts resource usage monitoring information from NM. Note that this has nothing to do with job execution, just monitoring resources. You can also provide status information for AM to the container that it has completed.

ASM: Receives a resource request, requests a container to scheduler to be provided to AM, and starts am. Provides the AM run status to the client. To sum up, it is used to manage the life cycle of all AM.

Yarn Work Flow:

The summary is two steps: The client submits the job to Asm,asm request the resource to run am;am take over, it calculates the split, requests the resource, runs the task with NM, monitors the task and so on.

1. Job client submits job to ASM.

1) Get ApplicationID

2) Upload the application definition and the required jar package to the HDFs specified directory (yarn-site.xml yarn.app.mapreduce.am.staging-dir)

3) Constructs the resource Request object and application submission context information to the ASM

2, ASM to Scheduler request a container for AM to run, send launchcontainer information to its nm, start container

3. Am is registered with ASM when the NM is started

4. Job client obtains AM information from ASM and communicates directly with it

5. Am calculates splits and constructs resource requests for all maps

6, am to do some outputcommitter preparation work

7, am to Scheduler request resources (a group of container) and then together with NM to perform some necessary tasks for container, such as resource localization

8, am monitoring task, if the failure re-apply container, if completed, run Outputcommitter cleanup and commit action

9. Am Exit

Client wants to know the way to monitor information:

Task is fetched from AM

AM is obtained from ASM

NM also has a job that monitors the resources used by a task, and if it exceeds the requested container range, kill its task process

Yarn is a resource framework, and the compute framework runs on top of the resource framework. Map-reduce is a computational model that implements a specific applicationmaster to run on yarn. In the case of other computational models, you also need to implement specific applicationmaster to run on yarn.

Extended reading: http://www.aboutyun.com/thread-7678-1-3.html

A little understanding of Hadoop learning 14--hadoop yarn

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

A little understanding of Hadoop learning 14--hadoop yarn

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

A little understanding of Hadoop learning 14--hadoop yarn

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support