Cassandra study Note 5

Last Update:2018-12-03 Source: Internet

Author: User

Tags cassandra random seed

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Cassandra clusters have no central nodes and each node has the same status. They maintain the cluster status through a protocol called gossip. Through gossip, each node can know which nodes are included in the cluster and their statuses, which enables any node in the Cassandra cluster to route any key, unavailability of any node will not cause disastrous consequences.

I. Gossip algorithm background

The gossip algorithm, as its name suggests, is inspired by office gossip. as long as a person gossip, all people will know the gossip information within a limited period of time. This method is similar to virus propagation, therefore, gossip has many aliases: "Gossip algorithm", "Epidemic propagation algorithm", "virus infection algorithm", and "rumor propagation algorithm ". However, gossip is not a new thing. Previous pan-flood searches and routing algorithms all belong to this category, the difference is that gossip provides clear semantics, specific implementation methods, and proof of convergence for these algorithms.

Ii. Gossip algorithm features

The gossip algorithm is also called anti-entropy. entropy is a concept in physics that represents disorder, while entropy seeks consistency in disorder, this fully demonstrates the characteristics of gossip: In a bounded network, each node is randomly communicating with other nodes, after some chaotic communication, the final state of all nodes will reach an agreement (amazing ). Each node may know all other nodes or only a few neighboring nodes. as long as these nodes can be connected through the network, their statuses will eventually be consistent.

Iii. Nature of gossip

Gossip is a fault tolerance algorithm with redundancy. Furthermore, gossip is a final consistency algorithm. Although the state of all nodes cannot be consistent at a certain time point, the state of all nodes in the "final" state can be consistent. The "final" state exists in reality, but theoretically it cannot be proved by the time point. Therefore, gossip is suitable for scenarios without high consistency requirements.

Because Gossip does not require nodes to know all other nodes, it also features decentralization. nodes are completely equal and no central node is required. In fact, gossip can be used in many fields that can accept "final consistency": failure detection, route synchronization, pub/sub, and dynamic load balancing.

However, the disadvantages of gossip are also obvious. redundant communication will cause a great load on network bandwidth and cup resources, and these loads are subject to the communication frequency, this frequency affects the algorithm convergence speed.

Iv. communication methods and convergence of gossip nodes

Each node in gossip maintains a group of statuses, which can be expressed by a key/value pair and a version number. The status of a version number is newer than that of a version number. There are three communication methods between two nodes (A and B:

Communication Method	Description
Push	Node A pushes the data (Key, value, version) and the corresponding version number to Node B. Node B updates the new data sent by node A compared to its own data.
Pull	A does not send the data value. Only the digest key and version of the data are sent to B. B pushes the new data (Key, value, Version) locally to a Based on version comparison data. A updates its local data.
Push/pull	Similar to pull, A only sends a summary to B. The difference is that, after B compares the version, it not only sends new data than a to A, but also requests a's abstract for new data than itself.

If the data synchronization between the two nodes is defined as a cycle, the push needs to communicate once, the pull needs to be twice, and the push/pull needs to be three times, in terms of performance, push/pull is the best. Theoretically, two nodes can be completely consistent within a cycle. Intuitively, the push/pull convergence speed is the fastest.

Assuming that a new node can be selected for each node's communication cycle, the gossip algorithm degrades to a binary search process, and each cycle forms a balanced binary tree with a convergence speed of O (n2 ), the corresponding time overhead is O (logn ). This is also the optimal convergence speed in gossip theory. However, in practice, the optimal convergence speed is hard to achieve.

Obviously, the convergence speed of pull is higher than that of push, and the probability of each node being infected at each cycle is fixed P (0 <p <1). Therefore, the gossip algorithm is P-based square convergence, it also becomes probabilistic convergence, which is unique among many consistency algorithms.

Gossip nodes can work in the following two ways:

Anti-entropy: transmits all data with a fixed probability.

Rumor-mongering: only new data is transmitted.

The anti-entropy mode has full fault tolerance, but has a large network and CPU load. The rumor-mongering mode has a small network and CPU load, however, the "newest" boundary must be defined for the data, and it is difficult to ensure full fault tolerance. For nodes that fail to restart and exceed the "newest" period, the final consistency cannot be guaranteed, or you need to introduce additional mechanisms to handle inconsistencies.

5. node synchronization rules for gossip in cassandra

When a node is started, obtain the seeds configuration in the configuration file. Cassandra, as a decentralized distributed system, has no central node. However, to enable the node to communicate with the cluster at startup, you still need to configure at least one seed node for it.

Cassandra has a gossiper that runs every second (in the Start method of gossiper. Java) and sends a synchronous message to other nodes according to the following rules:

1. Randomly retrieve a living node and send a synchronization request to it

2. Randomly fetch an inaccessible node and send them a synchronous request

3. If the selected node in the first step is not seed, or the number of currently active nodes is less than the number of seed, a synchronous request is sent to a random seed.

The purpose of the first step is to synchronize with the current active nodes. The purpose of the second step is to find out that the offline nodes are re-launched as soon as possible. The first condition in step 3 is that seed always has more node status information in theory. If the node to be synchronized for the first time is not seed, it should be synchronized with seed again. The second condition in step 3 is to avoid the appearance of seed islands.

If you do not have this judgment, consider this scenario. There are four machines, {a, B, c, d}, and they are all set to seed. If they are started at the same time, this may happen:

When node A is started and no living node is found, it goes to Step 3 and synchronizes with any other seed. Assume that node B is selected. If node B and node A are synchronized, A is considered to be alive and will be synchronized with node A. Because node A is a seed, B will not be synchronized with other seed. When the C node is up and no living node is found, it also goes to Step 3 and synchronizes with any one of the seeds. Assume that D is selected this time. C node and D are synchronized. If D is considered to be alive, it will be synchronized with D. Because D is also a seed, C will not be synchronized with other seed.

At this time, two isolated islands are formed. A and B are synchronized with each other. C and D are synchronized with each other, but {a, B} and {c, d} are not synchronized with each other, they do not know each other's existence.

After the second judgment is added, A and B are synchronized, and only one node is found to be alive, but there are four seed nodes. Then, they will communicate with any other seed to break this isolated island.

6. Implementation of gossip in cassandra

Cassandra adopts the push/pull communication mode. As described above, push/pull has three stages. In each stage, some state information must be transmitted between nodes. The transmission of status information is encapsulated in a specific message, and the message format transmitted in each stage is different, as shown in the following table:

Message name	Description
Gossipdigitssynmessage	A requests synchronization from B
Gossipdigitsackmessage	B returns the new data owned by B to
Gossipdigitsack2message	A. then return the new data owned by B to B.

Gossip communicates with each other and encapsulates the information to be transmitted through the message in the preceding table. The status information exchanged between nodes mainly includes the following three types:

Status information	Description
Heartbeat	Heartbeat information, which consists of generation and version. Generation adds 1 to each system startup to differentiate the status before and after restart.
Applicationstate	Used to indicate system status and store system load information.
Endpointstate	Maintain the global version of the node data and encapsulate heartbeat and applicationstate.

Each node of Cassandra implements the iendpointstatechangesubscriber interface, which processes received messages. This interface includes the following methods:

Method Name	Description
Onjoin	Add machines to the Cluster
Onchange	Status changed
Onalive	Machine available
Ondead	Machine unavailable

Shows the complete process of State synchronization between two Cassandra nodes through gossip.

Suppose that 192.168.1.1 (source node) decides to synchronize with 192.168.1.2 (target node). First, the source node sends the gossipdigestsynmessage packet to the target node, this package has the latest Summary of status information of all nodes maintained by the local machine. The summary only contains the key and version, and does not contain the specific value. This can reduce the bandwidth consumption for synchronization.

When the target node receives the gossipdigestsynmessage package, it needs to do two things:

1. Find the new status of the received message than the local version, sort the status by version number difference, and put the summary of these statuses into gossipdigestackmessage.

2. Find the local node version Update Status, put it in gossipdigestackmessage, and send it back to the source node.

The reason for sorting by version number difference is that the number of statuses that each message can be sent is limited (see gossip. in Java, max_gossip_packet_size is defined. This ensures that the old status (with a large difference in version numbers) is updated first.

After the source machine receives the gossipdigestackmessage, it also does two things:

1. Use the new status sent from the target node to update the local status. The source node obtains the status updated from the target node.

2. The source node sends the status information corresponding to the abstract that the target node requests to update to itself in gossipdigestackmessage to the target server.

The target server updates the local status, so that the target server obtains the status updated on the source node. After such a synchronization, the status on the source and target nodes is synchronized.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More