Blog Address: http://blog.csdn.net/yueqian_zhu/
The frame diagram in standalone mode is as follows:
Exception parsing 1:worker exception exit
- Worker quits abnormally, such as consciously killing a worker by a kill command
- The worker will kill all the little brother executor that he controls before exiting.
- The worker needs to improve the heartbeat message to master on a regular basis, and now that the worker process is over and there is a heartbeat message, Master will realize in the timeout process that there is a "rudder" left.
- Master was very sad, the sad master reported the situation to the corresponding driver
- Driver through two aspects of the identification assigned to their own executor unfortunate left, one is the master sent over the notice, second, driver did not receive executor in the specified time statusupdate, So driver will remove the registered executor
Exception parsing 2:executor exception exit
Executor as the bottom-level employee of the standalone cluster deployment, what will happen if the exception exits?
- Executor abnormal exit, Executorrunner Notice the exception, the situation through executorstatechanged report to master
- Master received the notice, very unhappy, altogether have a brother to run, that also had, asked executor belong to the worker to start again
- The worker receives the launchexecutor instruction and starts again executor
Exception parsing 3:master exception exit
If the boss is not there, what will be the consequences?
- The worker did not report the object, that is, if executor run again, the worker will not be executor start up, the elder brother did not give instructions
- Unable to submit a new task to the cluster
- Even if the old task is over, the resources used cannot be cleared because the resource cleanup instructions are issued by master.
Personally think this classmate speaks is vivid image!!! A little.
Reference: http://www.cnblogs.com/hseagle/p/3791779.html
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Spark Core source Analysis 13 fault tolerance assurance in exceptional cases