Yarn Framework Analysis __hadoop

Source: Internet
Author: User

Yarn Framework

Yarn is the resource management framework, whose core idea is to separate Jobtracker resource management and job scheduling, respectively, by ResourceManager and Applicationmaster process.

The 4 core components of yarn are ResourceManager, NodeManager, Applicationmaster and container, respectively.

(1) ResourceManager (RM): Controls the cluster and manages the allocation of the underlying resources to the application.

Overall, RM has the following characteristics:

1 Processing client requests

2) Start or monitor Applicationmaster

1) Monitoring NodeManager

2 allocation and scheduling of resources

(2) Applicationmaster (AM): Manage each instance of the application running within yarn

In general, AM has the following characteristics:

1) responsible for the data segmentation

2 Apply resources to the application and assign it to internal tasks

1 The task monitoring and fault-tolerant

(3) NodeManager (NM): Managing each node in the yarn cluster

In general, NM has the following characteristics:

1 Manage resources for each node

2 Processing orders from ResourceManager

3 Processing orders from Applicationmaster

(4) Container: Abstraction of resources in yarn

Generally speaking, container has the following effects:

Abstract the task running environment, encapsulate CPU, memory and other multidimensional resources, as well as environment variables, launch commands and other tasks related to the operation of information

To run the yarn job:

1. Job Submission:

1 Client invokes Job.waitforcompletion method, submits MapReduce job to whole cluster

2 Job ID assigned by ResourceManager

3 Client verification of the job output, calculate the input of the split, the job resources (jar package, configuration information, split information) copy to HDFs

4) Call Resourcemanager.submitapplication () Submit Job

2. Job initialization

1) ResourceManager receives submitapplication () request, then forwards the request to the Scheduler (Scheduler), the dispatcher assigns container, ResourceManager starts in the container applicationmaster

2 Applicationmaster Create bookkeeping object monitor the progress of the job, get the task progress and finish the report

3 The client calculates good split information by HDFs, creates the map task for each split, and creates the reduce task according to Mapreduce.job.reduces

3. Task Assignment

If the job is small, Applicationmaster will choose to run the task in his own JVM

If the job is not small, Applicationmaster ResourceManager request container Run all map and reduce tasks, which are transmitted through the heartbeat, including the data location of each map task, For example, a split hostname and a rack are stored, and the scheduler (schedule) uses these information scheduling tasks to allocate tasks to the nodes that hold the split or to a node that is assigned to the same rack as the split node.

4. Task Run

When a task is assigned to a container by ResourceManager's schdule, Applicationmaster contact NodeManager start container, and the task requires the resources required to localize the task before it is run. such as job configuration, jar files, and all files in the distributed cache, and finally run the map or reduce task, yarnchild running in a dedicated JVM, but yarn does not support JVM reuse

5. Progress and status updates

The tasks in yarn return their progress and status to Applicationmaster, and the client requests progress updates to Applicationmaster per second

6. Job completion

The client checks whether the job completes every 5 minutes by calling WaitForCompletion (), the time interval can be configured, the job is applicationmaster and container cleaned up, and the Outputcommiter job cleanup is called , the job's information is stored by the job history server for user inspection

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.