Background
Recently began to research yarn-next-generation resource management system, Hadoop 2.0 mainly composed of three parts mapreduce, yarn and HDFs, of which HDFS mainly increased HDFs Federation and HDFs HA, MapReduce is a programming model that runs on yarn, and yarn is a unified resource management system, yarn can be considered as the cloud operating system of the Hadoop ecosystem, and can run a variety of computational programming frameworks on yarn, such as traditional mapreduce, Gigraph graph computation, spark iterative computation, storm real-time flow computation model and so on, yarn's introduction can greatly improve the cluster resource utilization, reduce the operation dimension cost and share the underlying data. HADOOP 1.0 has developed for 6 years has been stable enough, but yarn out 2 years let us see more advantages, the major internet companies are moving to yarn direction, yarn must be the future.
The architecture evolution of Hadoop 1.0 through 2.0:
YARN Stack:
Demand
When we consider the Hadoop map-reduce framework, the most important requirements include:
1. Reliability reliability, mainly Jobtracker,resource manager reliability
2. Availability of availability
3. Scalability scalability, can support 10000 to 20000 nodes of cluster
4. Backward compatibility backward compatibility, support write MapReduce application can run directly on the new frame without modification
5. Evolution, enabling users to upgrade the software stack software stack (hive, pig, hbase, storm, etc.) to be compatible
6. Predictable latency predictable delay time
7. Cluster Utilization Cluster utilization
Other requirements include:
1. Support other programming models other than map-reduce, such as graph calculation, flow calculation
2. Support for short time services
Based on the above requirements, it is clear that the Hadoop architecture needs rethinking, now the MapReduce framework is very slow to meet the future needs of a two-tier scheduler
Next Generation MapReduce (YARN)
MRv2 split the Jobtracker two most important features, Resource managerment resource management and job scheduling/monitoring job scheduling and monitoring. There will be a global ResourceManager (RM) and a single applicationmaster (AM) for each application, a application can be a separate mapreduce job or a DAG Job. The ResourceManager and the NodeManager of each slave node constitute the computational framework, which has absolute control over all APPLICATIONS,RM and assigns rights to resource, while AM is a specific library under a framework, It negotiates resources with RM, and communicates with NodeManager to execute and monitor task
ResourceManager has two components
1. Scheduler Scheduler
2. Applicationsmanager (ASM)
MRv2 introduces a new concept called Resource Container (Resource container), which consists of CPU, memory, disk,network, which is different from the first generation of map slot and reduce slot,slot only has a coarse granularity for the resources of the whole node , if the number of slot is N, then each slot is the 1/n of the whole machine resources, and after the introduction of container, application can dynamically request the resources according to their own needs.
The scheduler is pluggable and is responsible for allocating cluster resources, currently supported by Capacilityscheduler and Fairscheduler
Applicationsmanager is responsible for receiving job submissions and applying for the first container to run Applicationmaster and provide reboot at AM failure
More Wonderful content: http://www.bianceng.cnhttp://www.bianceng.cn/webkf/tools/
NodeManager is the daemon on each slave node, which is responsible for starting application containers, monitoring resource usage (CPU, memory, disk, network), and report it to scheduler.
Applicationmaster get the right containers from the scheduler and track their status and progress
YARN v1.0
Yarn 1.0 only considers memory, each node has multiple minimum size of memory (such as 512MB or 1GB), Applicationmaster can request multiple minimum memory Size (note: now supports CPU and memory two kinds of resource isolation, for memory using thread monitoring mode, for CPU cgroups isolation method)
AM is responsible for computing resource requirements (such as input-splits) and converting them into protocols that scheduler can understand, such as <priority, host,rack,*, memory, #containers >
For example, when Map-reduce,am gets input-splits, the limit size of the inverted table and containers number based on the host address is presented to RM Scheduler.
Scheduler will attempt to match the corresponding host and provide resources under the same rack or under different rack if the specified host does not provide resources. Am can accept or reject these resources.
Scheduler Scheduler
There is only one API between scheduler and AM
Response allocate (list<resourcerequest> ask, list<container> release)
Am is using a string of resourcerequest to request resources and to release the unwanted containers previously allocated
The returned response contains a string of newly allocated containers, the container state that has been completed since the last AM and RM communications, and the amount of resources available for the cluster. AM collects information and responds to failed tasks, and the residual (headroom) information can be used to adjust policies for subsequent resource requests, such as adjusting the map and reduce numbers to prevent deadlocks (all maps are full, reduce is starving)
Resource Monitoring Resource Monitoring
Scheduler periodically obtains the allocated container resource usage from NM, and then sets the container to be available to AM
Application Submssion
The process submitted by Apllication is as follows:
1. Users (usually on the gateway) submit job to ASM
1. The client first generates a ApplicationID
2. Package application description definition, upload to HDFs ${user}/.staging/${application_id}
3). Submit Application to ASM
2. ASM accepts application submission
3. ASM and Scheduler negotiate to get the first container to start am and start the
4. At the same time ASM provides the details of am to the client to enable it to monitor the progress status
The life cycle of Applicationmaster
ASM manages the life cycle of AM, and ASM is responsible for starting AM, and then the ASM monitoring am,am periodically heartbeat to ASM to make sure it is still alive, and if failure reboots
Applicationsmanager Parts
1. Schedulernegotiator is responsible for and scheduler coordinate to obtain the starting AM container
2. Amcontainermanager is responsible for starting and stopping the container of AM, and will complete with the appropriate NM communication
3. Ammonitor is responsible for managing AM activity and will restart am if necessary
Availability Availability
ResourceManager will save its own state in zookeeper and also ensure that Ha, based on ZK state save policy can be quickly restarted
NodeManager
Once scheduler allocates containers to APPLICATION,NM, it is responsible for starting these containers, and it also guarantees that the containers allocated will not exceed the total resources of the machine.
NM is also responsible for environment settings at task startup, including binary and jar packs, and so on
NM also provides a service to manage the storage resources of the local node, for example, for map-reduce application will use the shuffle service to store the local temporary map outputs, and shuffle to reduce tasks
Applicationmaster
AM is responsible for coordinating resources with scheduler, executing and monitoring tasks in NM, and requesting additional resources from scheduler when container fails