Yarn is a distributed resource management system.
It was born because of some of the shortcomings of the original MapReduce framework:
1, Jobtracker single point of failure hidden trouble
2, Jobtracker undertake too many tasks, maintenance job status, job task status, etc.
3, on the Tasktracker side, the use of Map/reduce task means that the resource is too simple, not considering CPU, memory and other usage. Problems occur when you schedule multiple tasks that require a lot of memory to be consumed
Basic components after evolution
The following specific explanation:
Yarn is a resource management framework, not a computational framework, and it is important to understand this.
The application in the figure corresponds to the map/reduce job in the 1.x version.
The container in the diagram is a logical concept, a collective term for a set of resources (memory, CPU, etc.).
AM: Each application corresponds to a AM.
ResourceManager: Mainly to do the coordination of resources. There are two important components:
Scheduler: "Resource Scheduling" builds a global allocation plan after receiving resource requests from all running application. Resources are then allocated according to the special restrictions of application and some constraints on the global environment. Resource monitoring periodically accepts resource usage monitoring information from NM. Note that this has nothing to do with job execution, just monitoring resources. You can also provide status information for AM to the container that it has completed.
ASM: Receives a resource request, requests a container to scheduler to be provided to AM, and starts am. Provides the AM run status to the client. To sum up, it is used to manage the life cycle of all AM.
Yarn Work Flow:
The summary is two steps: The client submits the job to Asm,asm request the resource to run am;am take over, it calculates the split, requests the resource, runs the task with NM, monitors the task and so on.
1. Job client submits job to ASM.
1) Get ApplicationID
2) Upload the application definition and the required jar package to the HDFs specified directory (yarn-site.xml yarn.app.mapreduce.am.staging-dir)
3) Constructs the resource Request object and application submission context information to the ASM
2, ASM to Scheduler request a container for AM to run, send launchcontainer information to its nm, start container
3. Am is registered with ASM when the NM is started
4. Job client obtains AM information from ASM and communicates directly with it
5. Am calculates splits and constructs resource requests for all maps
6, am to do some outputcommitter preparation work
7, am to Scheduler request resources (a group of container) and then together with NM to perform some necessary tasks for container, such as resource localization
8, am monitoring task, if the failure re-apply container, if completed, run Outputcommitter cleanup and commit action
9. Am Exit
Client wants to know the way to monitor information:
Task is fetched from AM
AM is obtained from ASM
NM also has a job that monitors the resources used by a task, and if it exceeds the requested container range, kill its task process
Yarn is a resource framework, and the compute framework runs on top of the resource framework. Map-reduce is a computational model that implements a specific applicationmaster to run on yarn. In the case of other computational models, you also need to implement specific applicationmaster to run on yarn.
Extended reading: http://www.aboutyun.com/thread-7678-1-3.html
A little understanding of Hadoop learning 14--hadoop yarn