The state machine of RM has been analyzed before. Next we will analyze the state machine of NM. The state machine of NM includes three states: Container, Application, and LocalizedResource. The Container is relatively complicated. Now we analyze the iner state machine. The other 3 shows the state machine diagram of LocalizedResource. Because its state machine is relatively simple, it is not analyzed in detail here. You can view the relevant code on your own. This article is based on the latest Apache Hadoop version 2.3.0 in the community. NodeManager maintains the tasks (container) executed by the current node. As shown in figure 1, it maintains information such as containerID, user, and resource. The Container implementation class is ContainerImpl. Figure 2 shows the container state machine. Figure 1 container interface Diagram 2 Container state machine diagram 3 LocalizedResource state machine diagram Container state transition and Interpretation NEW. In NM, ContainerManagerImpl implements the ContainerManagementProtocol protocol, therefore, RM and AM can use RPC to call the startContainers method command NM to start the corresponding iner. In the startContainers method of NM, A ContainerImpl object is created and its status is initialized to NEW. LOCALIZING, there are two places that will convert the ContainerImpl status to LOCALIZING. 1. appInitDoneTransition is called at the end of ApplicationImpl initialization. ContainerEventType is created for each iner under the Application. INIT_CONTAINER event, ContainerImpl handles this event and sets its status to LOCALIZING (or LOCALIZED, LOCALIZATION_FAILED ). 2. however, when NM uses RPC to call the startContainers method command NM to start the container, if the Application corresponding to the iner is in RUNNING (that is, the initialization is completed), then NM will create the ContainerEventType. INIT_CONTAINER event, ContainerImpl handles this event and sets its status to LOCALIZING (or LOCALIZED, LOCALIZATION_FAILED ). In addition, the ContainerImpl will call the handler when handling the INIT_CONTAINER event. If resources need to be LOCALIZED, create the INIT_CONTAINER_RESOURCES event and enter the LOCALIZING state. If no resources need to be LOCALIZED, create the LAUNCH_CONTAINER event and enter the localiz; if you encounter an invalid resource request, it enters the LOCALIZATION_FAILED status. LOCALIZED: After all the resources to be LOCALIZED have been LOCALIZED, ContainerImpl will call LocalizedTransition to create the LAUNCH_CONTAINER event and enter the LOCALIZED status. LOCALIZATION_FAILED: if an exception occurs during the localization of resources, ContainerImpl calls ResourceFailedTransition to process the RESOURCE_FAILED event. After resources are cleared, the RESOURCE_FAILED state is displayed. RUNNING: When ContainerImpl enters LOCALIZED, The LAUNCH_CONTAINER event is created. This event is handled by ContainersLauncher, which generates the container startup command and sets the environment variables, create the CONTAINER_LAUNCHED event and start the container process. ContainerImpl calls LaunchTransition to process this event and enters the RUNNING state. EXITED_WITH_FAILURE: During ContainersLauncher's container startup, an exception occurs or the return value is not 0 when the iner inerslauncher ends. In this case, ContainerImpl creates a volume event and calls volume to process the event and enters the exited_wit. EXITED_WITH_SUCCESS. In ContainersLauncher, if the iner ends normally and the return value is 0, a producer event is created. ContainerImpl calls ExitedWithSuccessTransition to process the event and enters the exited_with_succ. KILLING, but when ContainerImpl encounters a KILL_CONTAINER event, it starts the cleanup and enters the KILLING state. When ContainersLauncher is about to start a container, but if the container is already in the KILLING state, the CONTAINER_KILLED_ON_REQUEST event is created. If ContainerImpl encounters this event, it enters the CONTAINER_CLEANEDUP_AFTER_KILL. When the DONE ContainerImpl encounters the CONTAINER_RESOURCES_CLEANEDUP event, it enters the DONE status, that is, the cleanup is completed. Now, the state machines related to ResourceManager and NodeManager have been analyzed. From these state machines, we can see the process of submitting a job from the client to the final end, with all the statuses it has gone through. Yarn splits job management functions from Jobtracker Based on Jobtracker, which greatly reduces the time for ResourceManager to process NodeManager heartbeat, because it no longer needs to maintain job information, jobtracke, which maintains job information, requires many lock operations. In addition, because job management has enabled ApplicationMaster to be maintained, custom ApplicationMaster supports the MR computing model. Two major points of code refactoring are the introduction of state machines, which facilitates asynchronous operations and improves the performance of ResourceManager. The MRApplicationMaster also contains three state machines: Job, Task, and TaskAttempt. Because it is not in the Yarn category, no detailed analysis is performed on them here, interested readers can view the code analysis status on their own. Appendix: events related to ContainerImpl: public enum ContainerEventType {// Producer: ContainerManager init_eventiner, KILL_CONTAINER, container, container, // DownloadManager container, RESOURCE_LOCALIZED, RESOURCE_FAILED, failed, // Producer: ContainersLauncher CONTAINER_LAUNCHED, CONTAINER_EXITED_WITH_SUCCESS, CONTAINER_EXITED_WITH_FAILURE, CONTAINER_KILLED_ON_REQUEST ,}