Part 2 Draft of cassandra1.0.x inter-node communication

Source: Internet
Author: User
Tags cassandra datastax

The second part translates the communication between nodes, cluster members andError discovery and repair

Original

About internode communications (gossip)

Cassandra uses aprotocol called gossip to discover location and state information about theother nodes participant in a Cassandra cluster. Gossip is a peer-to-peercommunication Protocol in which nodes periodically exchange state informationabout themselves
And about other nodes they know about.

In cassandra, thegossip process runs every second and exchanges state messages with up to threeother nodes in the cluster. the nodes exchange information about themselves andabout the other nodes that they have gossiped about, so all nodes quickly learnabout
All other nodes in the cluster. A gossip message has a version associatedwith it, so that during a gossip exchange, older information is overwrittenwith the most current state for a participant node.

 

Translation

About Inter-node communication (gossip protocol) Draft of Translation

Cassandra uses a protocol called "gossip-gossip" to discover the information and status of nodes added to the cluster. Gossip is a point-to-point (p-to-p) protocol, it supports information and status exchange between cluster nodes, which can be actively sent by the node or passively accepted by other nodes.

As long as the number of nodes in a cluster is greater than three, gossip will exchange information between nodes every second. The information exchanged between nodes is mainly about their own and other nodes. Therefore, every node in the cluster learns from each other quickly.

Each piece of gossip Information carries version information to facilitate information exchange in gossip. The old information will be overwritten by the new information of the current node.

 

Translator's note

There is a mysterious "technology" in China called "gossip". It is said that gossip is a kind of scientific technology, and every element of gossip must be connected together, there must be a causal relationship between the gossip Technology of Cassandra and the gossip Technology of China. It also comes up with practical applications and feasible solutions.

There is a kind of thing called "gossip news" in the world. The so-called gossip news is also characteristic of gossip, that is, you pass me to you, you pass him, it is similar to the 1-to-n relationship of data, where n is unknown, it can be 10, and it can be 10000, just as you cannot estimate the person who accepts gossip news. However, gossip is really fast, because everyone is interested.

 

Original

About cluster membership and seed nodes

When a node firststarts up, it looks at its configuration file
Determine thename of the Cassandra cluster it belonsto and which node (s), calledseeds, to contact to obtain information about the other nodes in the cluster. These cluster contact points are configured in
Thecassandra. yamlconfiguration file for a node.

Toprevent partitions in gossip communications, all nodes in a cluster shoshould have the same list ofseed nodes listed in their configuration file. This is most criticalthe
First time a node starts up. By default, a node will rememberother nodes it has gossiped with between subsequent restarts.

 

Note

The seed node designation has no purpose other than bootstrapping the gossip process for new nodes joining the cluster. Seed nodes are not a single point of failure,
Nor do they have any other special purpose in cluster operations beyond the bootstrapping of nodes.

 

Toknow what range of data it is responsible for, a node must also know its own token andthose of the other nodes in the cluster. When initializing a new cluster, youshocould generate tokens for the entire cluster
And assign an initial token toeach node before starting up. Each node will then Gossip its token to theothers. See
About datapartitioning in Cassandra for more information aboutpartitioners and tokens.

 

 

Translation

About cluster members and seed nodes

When a node is started for the first time, it determines its cluster name and other nodes in the cluster (called seed nodes) through the configuration file ), connect these nodes to obtain information about other nodes in the group. The contact points of these clusters are set in the Cassandra. yaml file of each node.

To prevent communication faults in some regions, the configuration files of each node in the cluster have the same list of seed nodes. The key point is that the customer plays an important role when the node is started. By default, a node uses the gossip protocol to remember whether the startup of other nodes is successful.

 

Remarks

The seed node settings are those nodes that do not need to be self-initiated when they are added to the cluster. The seed node cannot have a single point of failure and has no special purpose during cluster operations.

 

In the cluster, each node uses the "token-token" to understand the scope of its data and the scope of its other nodes. When a cluster is initialized, each node in the masses is specified with a token, whether manually specified during configuration or automatically generated by the cluster, each node will tell other nodes about their own tokens. For details about this part, see "Cassandra data segmentation on the partition server and token ".

 

Original

About failure detection and recovery

Failure detectionis a method for locally determining, from gossip state, if another node in thesystem is up or down. Failure Detection Information is also used bycassandra to avoid routing client requests to unreachable
Nodes wheneverpossible. (Cassandra can also avoid routing requests tonodes that are alive, but should Ming poorly, through
Dynamicsnitch.)

Thegossip process tracks heartbeats from other nodes both directly (nodesgossiping directly to it) and indirectly (nodes heard about secondhand, thirdhand, and so on). ratherthan
Have a fixed threshold for marking nodes without a heartbeat as down, Cassandra uses an accrual detection mechanic to calculate a per-node thresholdthat takes into account network conditions, workload, or other conditions thatmight affect perceived heartbeat
Rate. During gossip exchanges, every node maintains asliding window of Inter-arrival times of gossip messages from other nodes inthe cluster. The value
Of Phi is based on thedistribution of Inter-arrival time values should SS all nodes in the cluster. incassandra, copying ingPhi_convict_thresholdProperty
Adjusts the sensitivity of the failure detector. The default value isfine for most situations, but datastax recommends increasing it to 12 foramazon EC2 due to the network congestion frequently experienced on thatplatform.

Node failures canresult from varous causes such as hardware failures, network outages, and soon. node outages are often transient but can last for extended intervals. anode outage rarely signifies a permanent departure from the cluster, andtherefore does
Not automatically result in permanent removal of the node fromthe ring. other nodes will still try to periodically initiate gossip contactwith failed nodes to see if they are back up. to permanently change a node 'smembership in a cluster, administrators must
Explicitly add or remove nodesfrom a Cassandra cluster using
NodetoolUtility.

When a node comesback online after an outage, it may have missed writes for The replica data itmaintains. Once the Failure Detector marks a node down, missed writes arestored by other replicas ifHintedhandoff
Is Enabled (for a period of time, anyways ). however, itis possible that some writes were missed between the interval of a nodeactually going down and when it is detected as down. or if a node is down forlongerMax_hint_window_in_ms(One
Hour by default), hints will no longer be saved. For that reason, it isbest practice to routinely runNodetoolrepairOn all nodes to ensure they have consistent data, and toalso
Run repair after recovering a node that has been down for an extendedperiod.

 

Translation

Error discovery and repair

Fault detection can be performed on the local machine through the gossip protocol of the cluster to check whether other nodes have been started or disabled. Fault Detection information can also be used when a client sends a request directly to the node without a route, and the node is not reachable. (Cassandra can also send routing-free requests on a running node, but the effect is not good, but it is better to use dynamic reporter .)

Gossip tracks the status of each node through heartbeat, either directly or indirectly (for example, through the second-hand or third-hand transit ). You can obtain the heartbeat status of a node in a fixed way to determine whether the node enters the network environment, loads, or other conditions that may affect the heartbeat rate of the node. During gossip, each node maintains information sent from other nodes in a rolling manner. The pHi value is based on the arrival value across all nodes. In cassandra, SetPhi_convict_thresholdAdjust the sensitivity of error detection. The default value is to actively detect most errors, but datastax recommends setting the error detection value to 12 for Amazon EC2 when the network is blocked.

Note: errors can come from any situation, such as hardware and network. Node interruption is often short, but can last for a long time. When a node is interrupted, it indicates that it is temporarily disconnected from the cluster, so it will not permanently remove from the cluster group. Other nodes will still try to contact failed nodes through gossip on a regular basis to see if they have backups. To permanently remove a node from the cluster, the Administrator must explicitly remove the node from a group through nodetool.

When a node is repaired and put online here, it may lose some data that has been written to other nodes, it will start from its fault point, data is obtained from other backups (but it may take a short time in any case). However, it is still possible that many write operations are missing when the cluster is disconnected, or if the time when a node leaves the cluster exceeds the value set by max_hint_window_in_ms, more than some write hype will not be recorded. For this reason, the best way is to run nodetool for repair on all nodes, to ensure that they have consistent data and run a node that has been restored for a long time.

 

Note: This article is very poorly translated. The main problem is that Cassandra still needs to know more about the complicated error handling mechanism. I also found that the expression of words is poor. I hope experienced friends can help me.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.