Spark Core source Analysis 13 fault tolerance assurance in exceptional cases

Source: Internet
Author: User

Blog Address: http://blog.csdn.net/yueqian_zhu/



The frame diagram in standalone mode is as follows:


Exception parsing 1:worker exception exit

    1. Worker quits abnormally, such as consciously killing a worker by a kill command
    2. The worker will kill all the little brother executor that he controls before exiting.
    3. The worker needs to improve the heartbeat message to master on a regular basis, and now that the worker process is over and there is a heartbeat message, Master will realize in the timeout process that there is a "rudder" left.
    4. Master was very sad, the sad master reported the situation to the corresponding driver
    5. Driver through two aspects of the identification assigned to their own executor unfortunate left, one is the master sent over the notice, second, driver did not receive executor in the specified time statusupdate, So driver will remove the registered executor

Exception parsing 2:executor exception exit

Executor as the bottom-level employee of the standalone cluster deployment, what will happen if the exception exits?

    1. Executor abnormal exit, Executorrunner Notice the exception, the situation through executorstatechanged report to master
    2. Master received the notice, very unhappy, altogether have a brother to run, that also had, asked executor belong to the worker to start again
    3. The worker receives the launchexecutor instruction and starts again executor

Exception parsing 3:master exception exit


If the boss is not there, what will be the consequences?

    • The worker did not report the object, that is, if executor run again, the worker will not be executor start up, the elder brother did not give instructions
    • Unable to submit a new task to the cluster
    • Even if the old task is over, the resources used cannot be cleared because the resource cleanup instructions are issued by master.

Personally think this classmate speaks is vivid image!!! A little.


Reference: http://www.cnblogs.com/hseagle/p/3791779.html

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Spark Core source Analysis 13 fault tolerance assurance in exceptional cases

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.