A little understanding of the Hadoop version yarn

Source: Internet
Author: User

Yarn is essentially a new operating system for Hadoop, breaking through the performance bottlenecks of the MapReduce framework. Using yarn to manage cluster resource requests, Hadoop upgrades from a single application system to a multiple-application operating system.

Its application types include machine learning, image analysis, streaming analysis and interactive query functions. Once the yarn is fully operational, developers will be able to use the yarn "operating system" to use the data stored in HDFS for these applications, providing more than mapreduce frameworks, including the Graph algorithm processing framework (Apache giraph), The distributed computing framework based on BSP model (Apache HAMA), High-performance computing function library (Open MPI), hbase, etc. Yarn is a true Hadoop resource manager that allows multiple applications to run simultaneously and efficiently on one cluster. With Yarn,hadoop will be a truly multiple application platform that can serve the entire enterprise

The following is a yarn frame chart:

Where Resourmanager is responsible for resource management (previous versions of Jobtracker), assigning resources to individual applications, similar to the Windows operating system's resource management, There are two main components: Scheduler (Scheduler) and Applicationmanager (Application Manager). Scheduler is responsible for allocating resources to the application, a pure-breaking dispatch that does not participate in monitoring and tracking the application's state, nor is it responsible for restarting, depending on the resource requirements of the application (encapsulated in an abstract resource concept container: Includes the memory, CPU, hard drive, Network, etc.) to allocate resources, allocate resources according to priority, if Relax_locality is set to True (by default), the order in which matching resources is found is this node--the other nodes on the same rack--other racks, if set to False, you can only allocate resources on that node

Applicationmanager is responsible for receiving job submissions, assigning applications to specific applicationmaster (AM), and is responsible for restarting AM.

Am is actually a detailed framework library that combines resources and NodeManager from ResourceManager to run and monitor tasks, and is responsible for requesting appropriate resource containers for scheduler, tracking their usage status, and monitoring their progress.

NodeManager (Previous version of the Tasktracker) is primarily responsible for starting the application of the required containers, monitoring the use of resources in the container and report the results to the scheduler, is the framework agent on each node.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.