Yarn is essentially a new operating system for Hadoop, breaking through the performance bottlenecks of the MapReduce framework. Using yarn to manage cluster resource requests, Hadoop upgrades from a single application system to a multiple-application operating system.
Its application types include machine learning, image analysis, streaming analysis and interactive query functions. Once the yarn is fully operational, developers will be able to use the yarn "operating system" to use the data stored in HDFS for these applications, providing more than mapreduce frameworks, including the Graph algorithm processing framework (Apache giraph), The distributed computing framework based on BSP model (Apache HAMA), High-performance computing function library (Open MPI), hbase, etc. Yarn is a true Hadoop resource manager that allows multiple applications to run simultaneously and efficiently on one cluster. With Yarn,hadoop will be a truly multiple application platform that can serve the entire enterprise
The following is a yarn frame chart:
Where Resourmanager is responsible for resource management (previous versions of Jobtracker), assigning resources to individual applications, similar to the Windows operating system's resource management, There are two main components: Scheduler (Scheduler) and Applicationmanager (Application Manager). Scheduler is responsible for allocating resources to the application, a pure-breaking dispatch that does not participate in monitoring and tracking the application's state, nor is it responsible for restarting, depending on the resource requirements of the application (encapsulated in an abstract resource concept container: Includes the memory, CPU, hard drive, Network, etc.) to allocate resources, allocate resources according to priority, if Relax_locality is set to True (by default), the order in which matching resources is found is this node--the other nodes on the same rack--other racks, if set to False, you can only allocate resources on that node
Applicationmanager is responsible for receiving job submissions, assigning applications to specific applicationmaster (AM), and is responsible for restarting AM.
Am is actually a detailed framework library that combines resources and NodeManager from ResourceManager to run and monitor tasks, and is responsible for requesting appropriate resource containers for scheduler, tracking their usage status, and monitoring their progress.
NodeManager (Previous version of the Tasktracker) is primarily responsible for starting the application of the required containers, monitoring the use of resources in the container and report the results to the scheduler, is the framework agent on each node.