Apache Hadoop yarn (yarn = yet another Resource negotiator) has been a sub-project of Apache Hadoop since August 2012. Since this Apache Hadoop consists of the following four sub-projects:
- Hadoop Comon: Core Library, service for other parts
- Hadoop HDFS: Distributed Storage System
- Open source implementation of Hadoop Mapreduce:mapreduce model
- Hadoop YARN: A new generation of Hadoop data processing framework
In summary, the purpose of Hadoop yarn is to make Hadoop data processing capabilities go beyond MapReduce. As we all know, Hadoop HDFs is the data storage layer of Hadoop, and Hadoop mapreduce is the processing layer. However, MapReduce has not been able to meet today's extensive data processing needs, such as real-time/quasi-real-time calculations, graph calculations, etc. Hadoop Yarn provides a more general framework for resource management and distributed applications. In this framework, users can implement customized data processing applications according to their own needs. And Hadoop MapReduce is an application on yarn. We will see that MPI, graph processing, online services, and so on (such as Spark,storm,hbase) will be used as yarn applications like Hadoop mapreduce. The following describes the traditional Hadoop MapReduce and the next generation of Hadoop yarn architectures.
The traditional Apache Hadoop mapreduce architecture
The traditional Apache Hadoop MapReduce system consists of Jobtracker and Tasktracker. Where Jobtracker is master, only one; Tasktracker is slaves, and each node deploys one.
Figure 1 Apache Hadoop mapreduce system Architecture
Jobtracker is responsible for resource management (through the management of Tasktracker nodes), tracking resource consumption/release, and job lifecycle management (dispatching each task of the job, tracking task progress, providing fault tolerance for tasks, etc.). Tasktracker's responsibilities are simple, starting and stopping tasks assigned by Jobtracker, and periodically reporting task progress and status information to Jobtracker.
Apache Hadoop Yarn Architecture
Yarn's most basic idea is to jobtracker two main responsibilities: resource management and job scheduling management to two roles respectively. One is the global ResourceManager, one is the applicationmaster of each application. The ResourceManager and the nodemanager of each node constitute a new universal system for managing applications in a distributed manner.
Figure 2 Apache Hadoop yarn Architecture
ResourceManager is the highest authority for allocating resources between arbitration applications in the system. The applicationmaster of each application is responsible for negotiating resources with the ResourceManager and working with NodeManager to execute and manage tasks. The ResourceManager has an pluggable scheduler that allocates resources to individual applications to meet limits such as capacity, groups, and so on. This scheduler is a purely scheduler, meaning it is not responsible for managing or tracking the status of the application, nor is it responsible for task failure restart work due to hardware errors or application problems. The scheduler only executes scheduling based on the application's resource requirements, and the dispatch content is an abstract concept resource Container, which contains resource elements such as memory, CPU, network, disk, etc.
NodeManager is the slave of each node, which is responsible for starting the container of the application, managing their resource usage (memory, CPU, network, disk), and reporting the overall resource usage to ResourceManager.
The applicationmaster of each application is responsible for negotiating reasonable resource container and tracking their status and managing progress to the ResourceManager scheduler. From a system point of view, Applicationmaster itself is executed in the form of a common container.
Summarize
Because of the limitations of MapReduce in computational models, Hadoop implements a more general yarn for resource management systems and uses MapReduce as an application. The application of various computational models can be implemented on yarn to meet business needs. In addition, yarn system will jobtracker the main work of the segmentation, so that the master of the pressure is greatly reduced (ResourceManager bear the workload is much smaller than jobtracker), so yarn system can support a larger cluster size.
Reprint address:http://blog.csdn.net/liangliyin/article/details/20729281
Resources:
"1" http://hortonworks.com/blog/introducing-apache-hadoop-yarn/
"2" http://hortonworks.com/blog/apache-hadoop-yarn-background-and-an-overview/
"3" http://hadoop.apache.org/
Apache Hadoop YARN: Background and overview