Yarn Framework
Yarn is the resource management framework, whose core idea is to separate Jobtracker resource management and job scheduling, respectively, by ResourceManager and Applicationmaster process.
The 4 core components of yarn are ResourceManager, NodeManager, Applicationmaster and container, respectively.
(1) ResourceManager (RM): Controls the cluster and manages the allocation of the underlying resources to the application.
Overall, RM has the following characteristics:
1 Processing client requests
2) Start or monitor Applicationmaster
1) Monitoring NodeManager
2 allocation and scheduling of resources
(2) Applicationmaster (AM): Manage each instance of the application running within yarn
In general, AM has the following characteristics:
1) responsible for the data segmentation
2 Apply resources to the application and assign it to internal tasks
1 The task monitoring and fault-tolerant
(3) NodeManager (NM): Managing each node in the yarn cluster
In general, NM has the following characteristics:
1 Manage resources for each node
2 Processing orders from ResourceManager
3 Processing orders from Applicationmaster
(4) Container: Abstraction of resources in yarn
Generally speaking, container has the following effects:
Abstract the task running environment, encapsulate CPU, memory and other multidimensional resources, as well as environment variables, launch commands and other tasks related to the operation of information
To run the yarn job:
1. Job Submission:
1 Client invokes Job.waitforcompletion method, submits MapReduce job to whole cluster
2 Job ID assigned by ResourceManager
3 Client verification of the job output, calculate the input of the split, the job resources (jar package, configuration information, split information) copy to HDFs
4) Call Resourcemanager.submitapplication () Submit Job
2. Job initialization
1) ResourceManager receives submitapplication () request, then forwards the request to the Scheduler (Scheduler), the dispatcher assigns container, ResourceManager starts in the container applicationmaster
2 Applicationmaster Create bookkeeping object monitor the progress of the job, get the task progress and finish the report
3 The client calculates good split information by HDFs, creates the map task for each split, and creates the reduce task according to Mapreduce.job.reduces
3. Task Assignment
If the job is small, Applicationmaster will choose to run the task in his own JVM
If the job is not small, Applicationmaster ResourceManager request container Run all map and reduce tasks, which are transmitted through the heartbeat, including the data location of each map task, For example, a split hostname and a rack are stored, and the scheduler (schedule) uses these information scheduling tasks to allocate tasks to the nodes that hold the split or to a node that is assigned to the same rack as the split node.
4. Task Run
When a task is assigned to a container by ResourceManager's schdule, Applicationmaster contact NodeManager start container, and the task requires the resources required to localize the task before it is run. such as job configuration, jar files, and all files in the distributed cache, and finally run the map or reduce task, yarnchild running in a dedicated JVM, but yarn does not support JVM reuse
5. Progress and status updates
The tasks in yarn return their progress and status to Applicationmaster, and the client requests progress updates to Applicationmaster per second
6. Job completion
The client checks whether the job completes every 5 minutes by calling WaitForCompletion (), the time interval can be configured, the job is applicationmaster and container cleaned up, and the Outputcommiter job cleanup is called , the job's information is stored by the job history server for user inspection