Storm document (----) fault tolerance

Source: Internet
Author: User

Source Address: http://storm.apache.org/documentation/Fault-tolerance.html


This paper mainly introduces the design details of storm as fault-tolerant system.



What happens when a worker dies?


When the worker dies, supervisor will restart it. If worker startup always fails, the worker cannot send a heartbeat message to Nimbus, and the Nimbus will start it again on another machine.



What happens when node dies?


All tasks assigned to this node will time out, and the Nimbus will reassign the tasks to another machine.



What happens when Nimbus or supervisor Daemons dies?


Both the Nimbus and Supervisor Daemons are designed to fail quickly (any unexpected situation can cause the process to crash itself) and stateless (all states are stored on zookeeper or disk) as described in configuring a storm cluster. The Nimbus and Supervisor Daemons must be run under monitoring, which is implemented using tools such as Daemontools or Monit. So if Nimbus or supervisor daemons die, they will restart again like nothing has happened.


It is most necessary to point out that no worker process is affected by Nimbus or supervisors death. By contrast, for Hadoop, all jobs that run will be lost if Jobtracker dies.



Is there a separate failure condition for Nimbus?


If the Nimbus node dies, the worker will continue to run. In addition, supervisors will still restart when they die. However, without nimbus, workers are not reassigned to other machines when needed, such as when a worker's machine is down.


So the answer is that Nimbus is some sort of single point of failure. In practice, when Nimbus daemon dies, it is not a big deal, because nothing catastrophic will happen. There are plans to submit Nimbus availability in the future.



How does storm ensure data processing?


Storm provides a mechanism to ensure data processing, even if the node dies or loses the message. For more details, you can see the guaranteed message handling mechanism.












Storm document (----) fault tolerance

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.