Several state machines in yarn

Source: Internet
Author: User
1 Overview

To increase concurrency, yarn uses an event-driven concurrency model, abstracts various processing logic into events and schedulers, and expresses the event processing process in a state machine. What is a state machine?

If an object is composed of several States and events that trigger mutual transfer between these States, this object is called a state machine.

When a request is sent to the system as an event, a central scheduler passes the request to the corresponding event scheduler to process the event. After processing, the scheduler sends the request to the Central scheduler again, and then proceed until the processing is complete.

The Resource Management Module ResourceManager of yarn consists of four types of state machines (based on version 2.4:

(1) rmapp: used to maintain the lifecycle of an application;

(2) rmappattempt: used to maintain the life cycle of a test run;

(3) rmcontainer: used to maintain the lifecycle of the smallest unit of allocated resources;

(4) rmnode: used to maintain the lifecycle of a nodemanager;

The above four state machines exist in org. Apache. hadoop. yarn. server. ResourceManager of yarn source code in the form that inherits the eventhandler interface. The specific implementation class is the corresponding xxximpl class.

An application submitted to yarn is called an application. It may attempt to run multiple times. Each attempt is called "application attempt". If an attempt fails, then, rmapp creates another one to continue running until the maximum number of failures is reached. The container is an abstract concept of the runtime environment. It runs in either applicationmaster or a specific task.

2 rmapp state machine

The specific implementation class of this state machine is org. Apache. hadoop. yarn. server. ResourceManager. rmapp. rmappimpl. It records all the State rmappstate of an application (11 in total), the event rmappevent that triggers the transition between States (14 in total), and other basic information of the application. Its function is to receive rmappeventtype events from other objects, and then transfer the current status to another status based on the current status and event type, and trigger a behavior at the same time.

Is the status conversion chart of rmapp.

Among them, the new_saving status refers to the status where the basic information of the application is recorded using logs. This is the first thing that Rm did when it received the application, so that it can restart after the fault. After a recover restart event is received, it can be changed from the new status to the submitted, accepted, finished, failed, killed, And final_saving status. By default, recover is disabled, you can use the yarn parameter. rsourcemanager. recovery. enabled settings.

The app_rejected event is triggered in many cases. If an exception occurs when the client submits the application, or the RM audit application is invalid, the app_rejected event is triggered.

The application fails to run in many cases, but after the attempt_failed event is triggered, it may not be transferred directly to the failed event. The system will check whether the number of failures of the current application has reached the upper limit. If not, creates a rmappattemptimpl object and returns the state machine to the accepted state. Otherwise, the object enters final_saving and fails to be processed, such as releasing resources.

3 rmappattempt state machine

The specific implementation class of this state machine is org. Apache. hadoop. yarn. server. ResourceManager. rmapp. attempt. rmappattemptimpl. It records all the rmappattemptstates of an application attepmt (13 in total) and rmappattemptevent (15 in total) events that trigger transition between States. The function is to receive rmappattempteventtype events from other objects, and then transfer the current status to another status based on the current status and event type, and trigger an action at the same time.

Is the status conversion graph of rmappattempt.

 

 

Among them, after rmappattemptimpl is created, ResourceManager adds it to resourcescheduler. After checking validity, the state is schedulered. At this time, resources are allocated to the applicationmaster. After receiving an allocated container resource, write the container information to the disk. After the storage is complete, the status changes to allocated_saving. After the storage is complete, the status changes to allocated.

Then, the applicationmasterlauncher in ResourceManager communicates with the corresponding nodemanager to start applicationmaster. The status changes to launched. After the startup is complete, applicationmaster immediately registers with ResourceManager and the status changes to running.

At the same time, because yarn allows the applicationmaster to start on the client, such as spark's yarn-client mode, you still need to record applicationmaster logs for fault recovery, the status of rmappattemptimpl in which logs are being recorded is launched_unmanaged_saving. As for recover, it is similar to the previous rmapp state machine.

There are several important events:

(1) container_allocated: After rresourcemanager assigns the iner on a nodemanager node to rmappattemptimpl, A rmcontainerimpl is created and a startup event is sent to the object, then, a container_allocated event is sent to rmappattemptimpl. At this time, rmappattemptimpl obtains the allocated container resource, initiates a log record event, and writes the resource allocation information to the disk for fault recovery.

(2) unregistered: After the applicationmaster is running, ResourceManager is notified. After ResourceManager receives the notification, it sends an unregistered event to rmappattemptimpl, and enters the finishing state. After the container exits, the resource is recycled and then finished. However, if the applicationmaster is started by the client, the status of the unregistered event is changed to finished.

(3) container_finished: When the appliner of the applicationmaster exits, the current nodemanager node reports its status to ResourceManager. Then, ResourceManager sends a finished event to rmcontainerimpl, it then sends a container_finished event to rmappattemptimpl.

(4) expire: If the applicationmaster does not report heartbeat for a period of time, ResourceManager will issue an expire event to rmappattemptimpl and clear applicationmaster and container.

(5) container_acquired: After the applicationmaster obtains the resource, it sends a notification to the container. After rmcontainerimpl receives the notification, it then sends a warning event to rmappattemptimpl. rmappattemptimpl saves the nodemanager information to facilitate subsequent clean.

(6) status_update: applicationmaster reports heartbeat to ResourceManager.

4 rmcontainer state machine

The specific implementation class of this state machine is org. Apache. hadoop. yarn. server. ResourceManager. rmcontainer. rmcontainerimpl. It records all the statuses rmcontainerstate (9 in total) of a container and rmcontainerevent (8 in total) that triggers the transition between States. The function is to receive rmcontainereventtype events from other objects, transfer the current status to another status based on the current status and event type, and trigger a behavior at the same time.

Is the status conversion graph of rmcontainerimpl.

 

When the resources on a nodemanager are insufficient to meet the current application request but have to be allocated to this application, the current node Reserves Resources for this application, gradually accumulate the remaining resources until the required resources are met. Then, the resources are encapsulated into a container and sent to the applicationmaster. If a iner has been created and is in the reserved state when the remaining resources are accumulated. When the container has been assigned to the applicationmaster and the applicationmaster has not sent a notification that it has obtained the resource, the container is in the allocated State, until the applicationmaster sends a notification to ResourceManager that it has obtained the resource, the status changes to acquired.

Then, applicationmaster communicates with nodemanager to start these iner, and nodemanager reports the container status to ResourceManager through heartbeat. ResourceManager sends a launched event to each container that receives the heartbeat, rmcontainerimpl removes the container corresponding to the received event from the failure list, indicating that the container status is normal. If the applicationmaster does not use a container for a period of time, ResourceManager issues an expire event for the container and recycles resources.

5 rmnode state machine

This state machine is used to maintain the lifecycle of a nodemanager. Its implementation class is Org. apache. hadoop. yarn. server. resourceManager. rmnode. rmnodeimpl records the nodestate (6 in total) and rmnodeevent (9 events in total) of the nodemanager node. status conversion triggers an action at the same time.

Is the status conversion graph of rmnodeimpl.

 

If a nodemanager node is added to the blacklist, its status is set to decommishoned, that is, it is offline, and the nodemanager process exits. If the current nodemanager node is in the unhealthy state and unhealthy (such as disk damage), ResourceManager will be notified by heartbeat. ResourceManager will no longer assign new tasks to this node, after the loss is reported to ResourceManager's heartbeat, nodemanager changes to the lost state.

After the application is executed, the cleanup_app event is triggered to clear the memory occupied by the program. When a iner is executed, the cleanup_container event is triggered to clear the resources occupied by the container. If a nodemanager repeatedly registers with ResourceManager, ResourceManager triggers a reconnected event, and rmnodeimpl updates its information after receiving the event notification.

-------------------------------------------------------------------------------

If you have read this blog and want to learn more, click[Recommended]

If you want to repost this blog,Please specify the source

If you have any comments or suggestions for this article, please leave a message.

Thank you for reading this article. Please follow up on my blog

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.