Thesis Reading Notes-yarn: Architecture of next generation Apache hadoop mapreduceframework

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Author: Liu Xuhui Raymond reprinted. Please indicate the source

Email: colorant at 163.com

Blog: http://blog.csdn.net/colorant/

More paper Reading Note http://blog.csdn.net/colorant/article/details/8256145

=
Target question=

The next-generation hadoop framework supports hadoop clusters with more than 10,000 nodes and more flexible programming models.

=
Core Ideology=

Fixed programming models and single-point resource scheduling and task management methods make hadoop 1.0 applications increasingly show its limitations in terms of model and scale.

Yarn adopts a two-level distributed resource scheduling and task management framework. It supports modular task scheduling components and custom task management modules, to adapt to a variety of programming models and the increasing cluster size.

Yarn schedules resources and tasks in container units, and the schedulable resource type is memory (long-term targets include CPU, disk, and Io ), by allocating and sharing resources among various task management frameworks to improve cluster utilization, the overall idea is very close to mesos.

=
Implementation=

The main components of yarn include:

One global RM (ResourceManager), one am (applicationmaster) for each job)
And
Each node has one nm (nodemanager)

Rm is further divided into the scheduling module (sched) and Application Management Module (applications ).
The scheduling module is responsible for allocating resources among jobs, and the application management module is responsible for listening to the client to create job requests and starting the PER
Job AM

After the application management module starts am, am takes over the management work after its own job. Am is responsible for negotiating with the scheduling module to obtain the resources required for running the job, create a task process for the required resources through nm, and monitor the completion of the task.

From the perspective of the communication protocols between AM and RM, the scheduling interface for resources has been simplified to a list of the container configurations, quantities, and locations required by AM. Therefore, it has great versatility. Of course, because the scheduling module simply schedules resources based on job requirements and priorities, without considering the details and execution of any task, this results in loss of information that can be used as the basis for scheduling. Taking mapreduce as an example, information related to mapsplit is unknown to the scheduling module. Locality and other requirements need to be guaranteed by AM.

=
Related research and projects=

Mesos's problem and overall thinking are very similar to yarn. The same two-level resource scheduling can be modularized. The specific computing framework is responsible for second-level resource scheduling. The isolated resource management methods and similar task execution methods are used. However, in terms of resource level-1 scheduling, mesos adopts the push method, while yarn adopts the pull method. mesos claims to make interfaces simpler and more universal, yarn's pull approach seems more flexible. But from the API perspective, I personally understand that am still needs to obtain the status of global resources before making scheduling requests, and may have to pay a higher communication cost?

Facebook's corona is also developed for hadoop and basically integrates the job in mapreduce1.0
Tracker is split in the unit of job. Similarly, pull is used to dispatch data to the central cluster module.
Manager requests resources. However, scope is approximately smaller than yarn. The purpose of visual testing is to solve the cluster scale problem by means of distribution, while yarn also hopes to flexibly adapt to different computing frameworks.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Thesis Reading Notes-yarn: Architecture of next generation Apache hadoop mapreduceframework

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Thesis Reading Notes-yarn: Architecture of next generation Apache hadoop mapreduceframework

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support