A little understanding of Hadoop learning 14--hadoop yarn

Source: Internet
Author: User

Yarn is a distributed resource management system.

It was born because of some of the shortcomings of the original MapReduce framework:

1, Jobtracker single point of failure hidden trouble

2, Jobtracker undertake too many tasks, maintenance job status, job task status, etc.

3, on the Tasktracker side, the use of Map/reduce task means that the resource is too simple, not considering CPU, memory and other usage. Problems occur when you schedule multiple tasks that require a lot of memory to be consumed

Basic components after evolution

The following specific explanation:

Yarn is a resource management framework, not a computational framework, and it is important to understand this.

The application in the figure corresponds to the map/reduce job in the 1.x version.

The container in the diagram is a logical concept, a collective term for a set of resources (memory, CPU, etc.).

AM: Each application corresponds to a AM.

ResourceManager: Mainly to do the coordination of resources. There are two important components:

Scheduler: "Resource Scheduling" builds a global allocation plan after receiving resource requests from all running application. Resources are then allocated according to the special restrictions of application and some constraints on the global environment. Resource monitoring periodically accepts resource usage monitoring information from NM. Note that this has nothing to do with job execution, just monitoring resources. You can also provide status information for AM to the container that it has completed.

ASM: Receives a resource request, requests a container to scheduler to be provided to AM, and starts am. Provides the AM run status to the client. To sum up, it is used to manage the life cycle of all AM.

Yarn Work Flow:

The summary is two steps: The client submits the job to Asm,asm request the resource to run am;am take over, it calculates the split, requests the resource, runs the task with NM, monitors the task and so on.

1. Job client submits job to ASM.

1) Get ApplicationID

2) Upload the application definition and the required jar package to the HDFs specified directory (yarn-site.xml yarn.app.mapreduce.am.staging-dir)

3) Constructs the resource Request object and application submission context information to the ASM

2, ASM to Scheduler request a container for AM to run, send launchcontainer information to its nm, start container

3. Am is registered with ASM when the NM is started

4. Job client obtains AM information from ASM and communicates directly with it

5. Am calculates splits and constructs resource requests for all maps

6, am to do some outputcommitter preparation work

7, am to Scheduler request resources (a group of container) and then together with NM to perform some necessary tasks for container, such as resource localization

8, am monitoring task, if the failure re-apply container, if completed, run Outputcommitter cleanup and commit action

9. Am Exit

Client wants to know the way to monitor information:

Task is fetched from AM

AM is obtained from ASM

NM also has a job that monitors the resources used by a task, and if it exceeds the requested container range, kill its task process

Yarn is a resource framework, and the compute framework runs on top of the resource framework. Map-reduce is a computational model that implements a specific applicationmaster to run on yarn. In the case of other computational models, you also need to implement specific applicationmaster to run on yarn.

Extended reading: http://www.aboutyun.com/thread-7678-1-3.html

A little understanding of Hadoop learning 14--hadoop yarn

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.