Gossip algorithm because Cassandra and fame, gossip seemingly simple, but to really understand its nature is far less easy to look at. In order to seek the essence of gossip, the following content is mainly referred to gossip's original thesis: <<efficient Reconciliation and flow control for Anti-entropy protocols> >.
1. Gossip background
Gossip algorithm, such as its name, inspiration from the office gossip, as long as a person gossip about, in a limited time all people will know the gossip information, this way is similar to virus transmission, so gossip has a number of aliases "Gossip algorithm", "epidemic spread algorithm", "Virus Infection Algorithm", " Rumor-Spreading algorithm ".
But gossip is not a new thing, before the flood search, routing algorithms belong to this category, the difference is gossip to this kind of algorithm provides a clear semantics, specific implementation methods and convergence of proof. 2. Gossip features
Gossip algorithm is also known as the inverse entropy (anti-entropy), entropy is a physics concept, representing the chaos, and the inverse entropy is in the chaos to seek consensus, which fully illustrates the characteristics of the gossip: in a bounded network, each node is randomly communicating with other nodes, After a messy communication, the final status of all nodes will be agreed. Each node may know all the other nodes, or only a few neighbor nodes, as long as they are connected through the network, and eventually they are in the same state, which is also a feature of the epidemic.
One thing to note is that even if some nodes are restarted due to downtime, a new node joins, but after a while, the states of these nodes agree with the other nodes, that is, gossip naturally has the advantage of distributed fault tolerance. 3. Gossip essence
Gossip is a fault-tolerant algorithm with redundancy, further, gossip is a final consistency algorithm. While it is not guaranteed that all nodes are consistent at any given moment, it can be guaranteed that all nodes are consistent at the end, and that "end" is a point in reality that is not theoretically provable.
Because gossip does not require nodes to know all the other nodes, and therefore has the feature of going to the center, the nodes are completely equal and do not need any central nodes. In fact, gossip can be used in many areas to accept "final conformance": Failure detection, routing synchronization, pub/sub, dynamic load balancing.
But gossip's shortcoming also is obvious, the redundant communication can cause the huge load to the network bandwidth, the cup resource, but these load is limited by the communication frequency, this frequency also affects the algorithm convergence speed, later we will speak in each kind of situation optimization method. 4. Communication mode and convergence of gossip nodes
According to the original paper, there are three modes of communication between two nodes (A, B): The PUSH:A node pushes the data (key,value,version) and the corresponding version number to the B node, and the B-node update A is a new data pull:a the data key,version to B. b pushes the local to the A,a update local push/pull with a new data (key,value,version): Similar to pull, just one more step, a then pushes local more data than B to the b,b to update the local
If the two node data synchronization once defined as a cycle, then in one cycle, the push needs to communicate 1 times, pull need 2 times, push/pull 3 times, from the effect, push/pull best, theoretically a cycle can make two nodes exactly the same. Intuitive also feel, push/pull convergence speed is the fastest.
Assuming that each node communication cycle can select (infect) a new node, the gossip algorithm degrades to a binary lookup process, each cycle forms a balanced binary tree, the convergence rate is O (N2), and the corresponding time cost is O (Logn). This is also the optimal convergence rate of gossip theoretically. But in the actual situation, the optimal convergence speed is difficult to achieve, assuming that a node in the first cycle of the probability of infection is PI, the first i+1 cycle is infected with the probability of pi+1, then pull way:
And the push is:
Obviously the convergence rate of pull is greater than push, and the probability of each node being infected in each cycle is fixed p (0<p<1), so the gossip algorithm is based on the squared convergence of P and also becomes the probability convergence, which is very unique in many consistency algorithms.
A gossip node works in two ways: anti-entropy (inverse entropy): Propagate all data rumor-mongering (rumor spread) with a fixed probability: only new arrivals are propagated
Anti-entropy mode has full fault tolerance, but has a large network, CPU load, rumor-mongering mode has a smaller network, CPU load, but must define "the latest" boundary for data, and it is difficult to ensure full fault tolerance, the failure to restart and exceed the "latest" deadline of the node , there is no guarantee of final consistency, or additional mechanisms need to be introduced to deal with inconsistencies. We focus on the optimization of the Anti-entropy model in the following discussion. 5. Coordination mechanism of the anti-entropy
The coordination mechanism is to discuss how to exchange data to achieve the fastest consistency and eliminate the inconsistency of two nodes at each 2 node communication. The above mentioned push, pull and so on is the communication way, the coordination is in the communication way the data exchange mechanism. The biggest problem with coordination is that, because of the network load, it is not possible to send data from one node to another at a time, that is, there is a maximum message size for each gossip. The efficient exchange of all messages in a limited space is the main problem in coordinating the solution.
Before the discussion, we'll declare a few concepts: make n = {p,q,s,...} A collection of servers for which gossip communication is required, bounded size order (P1,P2,...) Is the data that is hosted on node p, where the data is composed of (key,value,version), and q is similar to P.
In order to ensure consistency, the specified data value and version only the host node can be modified, the other nodes can only indirectly through the gossip protocol to request data corresponding to the host node modification. 5.1 precise Coordination (Precise reconciliation)
Accurate coordination hope that in each communication cycle will be very accurate to eliminate inconsistencies between the two sides, the specific performance of each other to send each other need to update the data, because each node in the concurrent communication with multiple nodes, theoretically accurate coordination is difficult to do. Precise coordination requires that each data item maintain its own version independently, and that each interaction sends all (key,value,version) to the target for comparison, thus identifying the difference between the two to update. However, because the gossip message has a size limit, it is a problem to choose which data to send each time. Of course, you can randomly select a part of the data, but also a deterministic choice of data. For the choice of certainty, you can have the oldest first (according to version) and the latest priority two, the oldest priority will update the latest version of the current data, and the latest update on the contrary, which will cause the old data is always not the opportunity to update, that is, hunger.
Of course, the development of this can also be based on business scenarios to build their own selection algorithm, but always can not avoid the problem of excessive message volume. 5.2 Overall Coordination (scuttlebutt reconciliation)
The difference between overall coordination and precise coordination is that overall coordination does not maintain a separate version number for each data, but rather maintains a unified versioning of the host data on each node. For example, the node P will be (P1,P2,...) Maintaining a consistent global version is equivalent to seeing all of the host data as a whole, and when comparing to other nodes, you only need to have the highest version of the host data, and if the highest version is the same that this part of the data is all consistent, then exact coordination.
The overall coordination of the selection of data also has two methods: Breadth First: According to the overall version size ranking, also known as fair choice depth First: According to the number of data included in the order, also known as unfair choice. Because the latter is more practical, the original paper encourages the latter 6. The realization in Cassandra
Validated, Cassandra implements the Push/push mode based on the overall coordination, with several components:
The three messages correspond to the three stages of the Push/pull: Gossipdigitsmessage gossipdigitsackmessage gossipdigitsack2message
There are three more states: Endpointstate: Maintaining the global version of the host data and encapsulating the HeartBeat and applicationstate HeartBeat: Heartbeat information ApplicationState: System Load Information (disk usage)
Cassandra mainly uses gossip to accomplish three functions: failure detection dynamic load Balancing to center elastic expansion 7. Summary
Gossip is a kind of excellent algorithm for centering, fault-tolerant and final consistency, and its convergence is not only proved to be an exponential convergence rate. Using a gossip system can easily extend the server to more nodes, which is easy to satisfy the elastic expansion.
The only drawback is that convergence is ultimately consistent and does not use strong-consistency scenarios, such as 2pc.