Apache Hadoop YARN: Background and overview

Source: Internet
Author: User
Tags hortonworks hadoop mapreduce

Apache Hadoop yarn (yarn = yet another Resource negotiator) has been a sub-project of Apache Hadoop since August 2012. Since this Apache Hadoop consists of the following four sub-projects:

    • Hadoop Comon: Core Library, service for other parts
    • Hadoop HDFS: Distributed Storage System
    • Open source implementation of Hadoop Mapreduce:mapreduce model
    • Hadoop YARN: A new generation of Hadoop data processing framework

In summary, the purpose of Hadoop yarn is to make Hadoop data processing capabilities go beyond MapReduce. As we all know, Hadoop HDFs is the data storage layer of Hadoop, and Hadoop mapreduce is the processing layer. However, MapReduce has not been able to meet today's extensive data processing needs, such as real-time/quasi-real-time calculations, graph calculations, etc. Hadoop Yarn provides a more general framework for resource management and distributed applications. In this framework, users can implement customized data processing applications according to their own needs. And Hadoop MapReduce is an application on yarn. We will see that MPI, graph processing, online services, and so on (such as Spark,storm,hbase) will be used as yarn applications like Hadoop mapreduce. The following describes the traditional Hadoop MapReduce and the next generation of Hadoop yarn architectures.

The traditional Apache Hadoop mapreduce architecture

The traditional Apache Hadoop MapReduce system consists of Jobtracker and Tasktracker. Where Jobtracker is master, only one; Tasktracker is slaves, and each node deploys one.

Figure 1 Apache Hadoop mapreduce system Architecture

Jobtracker is responsible for resource management (through the management of Tasktracker nodes), tracking resource consumption/release, and job lifecycle management (dispatching each task of the job, tracking task progress, providing fault tolerance for tasks, etc.). Tasktracker's responsibilities are simple, starting and stopping tasks assigned by Jobtracker, and periodically reporting task progress and status information to Jobtracker.

Apache Hadoop Yarn Architecture

Yarn's most basic idea is to jobtracker two main responsibilities: resource management and job scheduling management to two roles respectively. One is the global ResourceManager, one is the applicationmaster of each application. The ResourceManager and the nodemanager of each node constitute a new universal system for managing applications in a distributed manner.

Figure 2 Apache Hadoop yarn Architecture

ResourceManager is the highest authority for allocating resources between arbitration applications in the system. The applicationmaster of each application is responsible for negotiating resources with the ResourceManager and working with NodeManager to execute and manage tasks. The ResourceManager has an pluggable scheduler that allocates resources to individual applications to meet limits such as capacity, groups, and so on. This scheduler is a purely scheduler, meaning it is not responsible for managing or tracking the status of the application, nor is it responsible for task failure restart work due to hardware errors or application problems. The scheduler only executes scheduling based on the application's resource requirements, and the dispatch content is an abstract concept resource Container, which contains resource elements such as memory, CPU, network, disk, etc.

NodeManager is the slave of each node, which is responsible for starting the container of the application, managing their resource usage (memory, CPU, network, disk), and reporting the overall resource usage to ResourceManager.

The applicationmaster of each application is responsible for negotiating reasonable resource container and tracking their status and managing progress to the ResourceManager scheduler. From a system point of view, Applicationmaster itself is executed in the form of a common container.

Summarize

Because of the limitations of MapReduce in computational models, Hadoop implements a more general yarn for resource management systems and uses MapReduce as an application. The application of various computational models can be implemented on yarn to meet business needs. In addition, yarn system will jobtracker the main work of the segmentation, so that the master of the pressure is greatly reduced (ResourceManager bear the workload is much smaller than jobtracker), so yarn system can support a larger cluster size.

Reprint address:http://blog.csdn.net/liangliyin/article/details/20729281

Resources:

"1" http://hortonworks.com/blog/introducing-apache-hadoop-yarn/

"2" http://hortonworks.com/blog/apache-hadoop-yarn-background-and-an-overview/

"3" http://hadoop.apache.org/

Apache Hadoop YARN: Background and overview

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.