Cassandra 3.x Official document (3)---Gossip Communication protocol and fault detection and recovery

Source: Internet
Author: User
Tags cassandra time limit

It 's written in front .
Unofficial translation of cassandra3.x official documents. The level of translation content is entirely dependent on my English proficiency and understanding of Cassandra. Therefore, it is strongly recommended to read the English version of Cassandra 3.x official documents. Half of this document is translation, and half is personal knowledge of Cassandra. Try to annotate my understanding by the way I refer to it as a distinction. In addition, document translation is a long-term and challenging task, and if you are willing to join Cassandra Git book, you can send me a letter. Of course you can also join our QQ group, 104822562. Learn to discuss Cassandra together. Gossip

Gossip is a Peer-to-peer network communication protocol in which nodes intermittently exchange their own state information and other node information they know. Gossip exchange information with up to three nodes per second in the cluster. Not only do they exchange their own information, but they also exchange information about other nodes that are known through the previous gossip communication, so all nodes can quickly understand the other node conditions in the cluster. A gossip message has an associated version number, so when a gossip is exchanged, its old information is overwritten by the nearest state for a particular node.

To prevent problems that may occur with gossip communications, all nodes in the cluster have the same seed nodes list. This is especially important when a node is first started. By default, a node remembers other nodes that have been gossip during subsequent reboots. Seed node is used in the bootstrap process to add new nodes to the cluster. Not for a single point of failure, nor for any other particular purpose.

Note:

In a multiple data center cluster environment, make sure that at least one node in each data center is in the seed list. For fault tolerance It is recommended that you assign multiple seed nodes to each datacenter, otherwise you will need to gossip with other data centers when a node bootstrap.

It is not recommended to set each node to seed node because it increases the cost of maintenance and reduces the performance of gossip. Gossip optimization is not particularly important, but it is recommended that you use a small seed list (best for 3 nodes per datacenter)
failure detection and recovery

Failure detection is a method of providing information for local decisions, obtaining information from the state and history of gossip, and determining whether a node in the system is down or has been restored. Cassandra uses this information to avoid routing client requests to nodes that may not be reachable at any time. (Cassandra can also use dynamic snitch) to avoid routing client requests to those surviving but poorly performing nodes.

The gossip process can track the state of other nodes, either directly (directly with a node gossip) or indirectly (through second-hand, triple-hand, etc.). To mark a node for Fail,cassandra compared to a fixed threshold value a natural growth detection mechanism is used to compute the thresholds of each node, taking into account the network, load, historical status, and other factors. When a gossip exchange is made, each node maintains a sliding window time for other nodes gossip information to arrive. You can adjust the sensitivity of failure detection by configuring the Phi_convict_threshold property. The lower the value, a node that is not responding is more likely to be marked as down, and the higher the value, the shorter the failure may be marked as failure. In most cases, the default value is OK. But it needs to be increased to 10 or 12 on Amazon EC2. (Because of the frequent network congestion), increasing the value to 10 or 12 in an unstable network environment (such as EC2) can help avoid faulty failure detection. It is not recommended to use a value above 12 or below 5.

Node failures can be caused by a variety of reasons, such as hardware failure and network power outages. Node interrupts are often short-lived but can last for a long time. Because a node outage rarely means leaving the cluster permanently, it will not automatically be removed from the cluster ring. The other nodes periodically attempt to re-establish contact with the failed nodes to see if they have returned. To permanently change the membership of a cluster node, administrators need to explicitly add or remove nodes through Notetool.

When a node passes down to return, it may lose the replica data it needs to maintain. Repair can help restore this data, such as hinted handoffs and manual repair. The time the node falls down determines which mechanism is used to keep the data consistent.

Note:

Hintedhandoff has a time limit of three hours by default, and data that precedes this time is constantly overwritten. You must manually repair the

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.