Hadoop has three core components: HDFS, yarn, and mapreduce. We have already sorted out some basic HDFS components. Let's take a look at the main roles of yarn and their functions, then you are familiar with how yarn executes a job when the client submits a job to yarn. Yarn (yet another resource negotiator) is another Resource Scheduling Coordinator. It is a new hadoop Resource Manager and yarn is a new feature in hadoop 2.x.
(1)Main roles and their roles
1. Client: the client accesses yarn through an interface to submit a job or start or stop a job.
2. ResourceManager: Global Resource Manager ① allocates resources for the entire cluster ② schedules and starts the applicationmaster to which each job belongs ③ monitors applicationmaster
Rsourcemanager consists of two components: ① scheduler (schedager) ② Application Manager (ASM)
Sched: scheduler allocates system resources to running applications based on capacity, queue, and other constraints. The scheduler is not involved in any specific application-related work, for example, it is not responsible for monitoring or tracking the execution status of the application, it is not responsible for restarting the application for execution failures or failure tasks caused by hardware faults. The scheduler allocates resources only according to the resource requirements of each application.
Application Manager: responsible for all applications in the system, this includes application submission, negotiation with scheduler resources to start the application master, monitoring the running status of the application master, and restarting it upon failure.
3. Application master: ① negotiate with the RM scheduler to obtain resources (expressed in container) ② further allocate the obtained task to internal tasks ③ communicate with nm to start and stop the task ④ monitor the status of all tasks, and re-apply for resources for the task when the task fails to run to restart the task
Yarn comes with two application master implementations: one is the instance program distributedshell used to demonstrate the am writing method, which can apply for a certain number of iner to run shell commands or shell scripts in parallel; the other is the AM-mrappmaster that runs the mapreduce application.
4. nodemanager: the resource and task manager on each node ① periodically reports the resource usage and running status of the container to RM on the current node ② receives and processes the start and stop requests from the am container
5. Container: similar to a resource container, resources requested by RM for am are expressed in the form of container. ① multi-dimensional resources on a node are encapsulated, such as CPU, memory, and disk, networks
There are two types of container in yarn: one is the container required by am runtime, and the other is the one applied by AP to RM for execution task
(2) process for yarn to submit and execute an MR task
1. the user submits an application to yarn, including the mrappmaster program, the command for starting mrappmaster, and the user program.
2. rm allocates a iner to the change program and communicates with the corresponding nm. The application mrappmaster must be started in the container.
3. mrappmaster first registers with RM, so that the user can view the running status of the application through RM. Then, the user will apply for resources for each task and monitor its running status until it is finished.
4. mrappmaster requests and receives resources from RM through RPC protocol through polling
5. Once the mrappmaster receives the resource, it will communicate with the corresponding nm to start the task.
6. After the NM environment is configured for the task, write the task startup command to a script and start the task through the script.
7. Each task reports its status and progress to the mrappmaster through an RPC protocol, so that the mrappmaster can keep abreast of the running status of each task, so that the task can be restarted when the task fails.
8. After the application is executed, mrappmaster logs out of RM and closes himself.
Yarn: The fourth story of big data documentary