Yarn analysis of the second generation Map-reduce architecture

Source: Internet
Author: User
Tags resource shuffle hadoop ecosystem

Background

Recently began to research yarn-next-generation resource management system, Hadoop 2.0 mainly composed of three parts mapreduce, yarn and HDFs, of which HDFS mainly increased HDFs Federation and HDFs HA, MapReduce is a programming model that runs on yarn, and yarn is a unified resource management system, yarn can be considered as the cloud operating system of the Hadoop ecosystem, and can run a variety of computational programming frameworks on yarn, such as traditional mapreduce, Gigraph graph computation, spark iterative computation, storm real-time flow computation model and so on, yarn's introduction can greatly improve the cluster resource utilization, reduce the operation dimension cost and share the underlying data. HADOOP 1.0 has developed for 6 years has been stable enough, but yarn out 2 years let us see more advantages, the major internet companies are moving to yarn direction, yarn must be the future.

The architecture evolution of Hadoop 1.0 through 2.0:

YARN Stack:

Demand

When we consider the Hadoop map-reduce framework, the most important requirements include:
1. Reliability reliability, mainly Jobtracker,resource manager reliability
2. Availability of availability
3. Scalability scalability, can support 10000 to 20000 nodes of cluster
4. Backward compatibility backward compatibility, support write MapReduce application can run directly on the new frame without modification
5. Evolution, enabling users to upgrade the software stack software stack (hive, pig, hbase, storm, etc.) to be compatible
6. Predictable latency predictable delay time
7. Cluster Utilization Cluster utilization
Other requirements include:
1. Support other programming models other than map-reduce, such as graph calculation, flow calculation
2. Support for short time services
Based on the above requirements, it is clear that the Hadoop architecture needs rethinking, now the MapReduce framework is very slow to meet the future needs of a two-tier scheduler

Next Generation MapReduce (YARN)
MRv2 split the Jobtracker two most important features, Resource managerment resource management and job scheduling/monitoring job scheduling and monitoring. There will be a global ResourceManager (RM) and a single applicationmaster (AM) for each application, a application can be a separate mapreduce job or a DAG Job. The ResourceManager and the NodeManager of each slave node constitute the computational framework, which has absolute control over all APPLICATIONS,RM and assigns rights to resource, while AM is a specific library under a framework, It negotiates resources with RM, and communicates with NodeManager to execute and monitor task

ResourceManager has two components

1. Scheduler Scheduler

2. Applicationsmanager (ASM)

MRv2 introduces a new concept called Resource Container (Resource container), which consists of CPU, memory, disk,network, which is different from the first generation of map slot and reduce slot,slot only has a coarse granularity for the resources of the whole node , if the number of slot is N, then each slot is the 1/n of the whole machine resources, and after the introduction of container, application can dynamically request the resources according to their own needs.

The scheduler is pluggable and is responsible for allocating cluster resources, currently supported by Capacilityscheduler and Fairscheduler

Applicationsmanager is responsible for receiving job submissions and applying for the first container to run Applicationmaster and provide reboot at AM failure

More Wonderful content: http://www.bianceng.cnhttp://www.bianceng.cn/webkf/tools/

NodeManager is the daemon on each slave node, which is responsible for starting application containers, monitoring resource usage (CPU, memory, disk, network), and report it to scheduler.

Applicationmaster get the right containers from the scheduler and track their status and progress

YARN v1.0

Yarn 1.0 only considers memory, each node has multiple minimum size of memory (such as 512MB or 1GB), Applicationmaster can request multiple minimum memory Size (note: now supports CPU and memory two kinds of resource isolation, for memory using thread monitoring mode, for CPU cgroups isolation method)

AM is responsible for computing resource requirements (such as input-splits) and converting them into protocols that scheduler can understand, such as <priority, host,rack,*, memory, #containers >

For example, when Map-reduce,am gets input-splits, the limit size of the inverted table and containers number based on the host address is presented to RM Scheduler.

Scheduler will attempt to match the corresponding host and provide resources under the same rack or under different rack if the specified host does not provide resources. Am can accept or reject these resources.

Scheduler Scheduler

There is only one API between scheduler and AM

Response allocate (list<resourcerequest> ask, list<container> release)

Am is using a string of resourcerequest to request resources and to release the unwanted containers previously allocated

The returned response contains a string of newly allocated containers, the container state that has been completed since the last AM and RM communications, and the amount of resources available for the cluster. AM collects information and responds to failed tasks, and the residual (headroom) information can be used to adjust policies for subsequent resource requests, such as adjusting the map and reduce numbers to prevent deadlocks (all maps are full, reduce is starving)

Resource Monitoring Resource Monitoring

Scheduler periodically obtains the allocated container resource usage from NM, and then sets the container to be available to AM

Application Submssion

The process submitted by Apllication is as follows:

1. Users (usually on the gateway) submit job to ASM

1. The client first generates a ApplicationID

2. Package application description definition, upload to HDFs ${user}/.staging/${application_id}

3). Submit Application to ASM

2. ASM accepts application submission

3. ASM and Scheduler negotiate to get the first container to start am and start the

4. At the same time ASM provides the details of am to the client to enable it to monitor the progress status

The life cycle of Applicationmaster

ASM manages the life cycle of AM, and ASM is responsible for starting AM, and then the ASM monitoring am,am periodically heartbeat to ASM to make sure it is still alive, and if failure reboots

Applicationsmanager Parts

1. Schedulernegotiator is responsible for and scheduler coordinate to obtain the starting AM container

2. Amcontainermanager is responsible for starting and stopping the container of AM, and will complete with the appropriate NM communication

3. Ammonitor is responsible for managing AM activity and will restart am if necessary

Availability Availability

ResourceManager will save its own state in zookeeper and also ensure that Ha, based on ZK state save policy can be quickly restarted

NodeManager

Once scheduler allocates containers to APPLICATION,NM, it is responsible for starting these containers, and it also guarantees that the containers allocated will not exceed the total resources of the machine.

NM is also responsible for environment settings at task startup, including binary and jar packs, and so on

NM also provides a service to manage the storage resources of the local node, for example, for map-reduce application will use the shuffle service to store the local temporary map outputs, and shuffle to reduce tasks

Applicationmaster

AM is responsible for coordinating resources with scheduler, executing and monitoring tasks in NM, and requesting additional resources from scheduler when container fails

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.