Introduction to gossip

Source: Internet
Author: User

Cassandra clusters have no central nodes and each node has the same status. They maintain the cluster status through a protocol called gossip. Through gossip, each node has
This allows any node in the Cassandra cluster to route any key.
It has catastrophic consequences.

Introduction to gossip

The name of gossip is anti-entropy (inverse entropy ?), It is suitable for synchronous information in scenarios without high consistency requirements. The time when the information is synchronized is about
Log (N), where N indicates the number of nodes.

Gossip has two forms: Anti-entropy and rumor-mongering.

Each node in gossip maintains a group of statuses, which can be represented by a key/value pair and a version number. The updated status is the version number.

There are three ways to process messages. Cassandra adopts the third method -- push-pull-gossip

Push-gossip Node A sends the status set to Node B. Node B compares the status set with the local status set and returns
Cartesian product of S (A) and S (B)
Pull-gossip A sends a Digest (digest, which only contains the key and version)
B and B only return the status to be updated on a through comparison.
Push-pull-gossip This method is the same as that of pull-gossip.
The local expired status will be requested from a at the same time.

For more details about the gossip protocol, refer to this paper.
.

Cassandra implementation

When a node is started, get the seeds configuration in the profile (storage-conf.xml) to know all the seed nodes in the cluster.

Cassandra has a gossiper, which runs once every second (in the Start method of gossiper. Java) and sends it to other nodes according to the following rules:
Send synchronous Message:

  1. Randomly fetch a living node and send a synchronization request to it
  2. Send a synchronization request to a random machine that cannot be reached
  3. If the node selected in the first step is not seed, or the number of currently active nodes is less than the number of seed, a synchronous request is sent to any seed.

Steps 1 and 2 are easy to understand. The first step can be used to synchronize the status with the current active node to update the local status. The second step can be used to detect that the unavailable node is available as soon as possible.

The first condition in the third step. If the node in the first step is not seed, it is better to send a synchronization request to a random seed, because seed always has many node states in theory.
Information.

In step 3, the second condition is a bit difficult to understand. When the number of living nodes is less than seed, synchronous messages must also be sent to Random Seed. In fact, this is to avoid the appearance of seed islands.

Without this judgment, consider a scenario where there are four machines, {A, B, C,
D}, and they are all configured as seed. If they are started at the same time, this may happen:

  1. When node A is up and no living node is found, it goes to Step 3 and synchronizes with any one of the seeds. Assume that node B is selected
  2. If node B and node A are synchronized, A is considered to be alive and will be synchronized with node A. Because node A is a seed, B will not be synchronized with other seed.
  3. When node C is started and no living node is found, it also goes to Step 3 and synchronizes with any one of the seeds. Assume that the node D is selected this time.
  4. C node and D are synchronized. If D is considered to be alive, it will be synchronized with D. Because D is also a seed, C will not be synchronized with other seed.

At this time, two isolated islands are formed. A and B are synchronized with each other. C and D are synchronized with each other, but {a, B} and {c, d} are not synchronized with each other, they do not know each other's existence.

After the second judgment is added, A and B are synchronized, and only one node is found to be alive, but there are four seed nodes. Then, they will communicate with any other seed to break this isolated island.

For more details about this issue, refer to here.
.

Each node of Cassandra has a subscriber that implements the iendpointstatechangesubscriber interface. It is responsible for processing the received message
. This interface includes the following methods:

Method Name Description
Onjoin Add machines to the Cluster
Onchange Status changed
Onalive Machine available
Ondead Machine unavailable

Meanwhile, gossiper implements iendpointstatechangepublisher, which includes register and unregister.
Two methods are used to add or delete a subscriber.

There are three main statuses of gossip communication:

  1. Heartbeatstate
  2. Applicationstate
  3. Endpointstate

Heartbeatstate is composed of generation and version. Each time a generation is started, it is used to differentiate the status before and after a machine is restarted.

Applicationstate indicates the state of the system. Each object indicates a State. For example, the state of the current load is roughly as follows: (1.2,
20), meaning that when the version number is 20, the load of the node is 1.2

Endpointstate encapsulates the applicationstate and heartbeatstate of a node.

The status of a node can only be changed by itself. The status of other nodes can only be updated synchronously.

Synchronization Process

A synchronization process between two nodes can be expressed as follows:

Assume that 192.168.1.1 (source node) is synchronized with 192.168.1.2 (target node). The source node sends
Gossipdigestsynmessage package, which includes the latest version of status information of all nodes maintained by the local machine. The summary only contains the key and version, and does not contain
To reduce the bandwidth consumption.

When the target node receives the gossipdigestsynmessage package, it needs to do two things:

  1. Find the status of the received message that is newer than the local version, and sort the status by Version Number Difference.Summary of status
    Put
    In gossipdigestackmessage
  2. Find the local node version Update StatusStatus
    Put in gossipdigestackmessage

After the gossipdigestackmessage is constructed, it is sent to the source node.

The reason for sorting by version number difference is that the number of statuses that each message can be sent is limited (see in gossip. Java
Max_gossip_packet_size). This ensures that the old status (with a large difference in version numbers) is updated first.

After the source machine receives the gossipdigestackmessage, It updates the local State using the updated status of the sent destination node, so that the source node can be obtained to the target node.
Status updated by yourself.

At the same time, the source node sends the status corresponding to the abstract contained in gossipdigestackmessage to the target through gossipdigestack2message.
Server, the target server updates the local status, so that the target server also obtains the status updated on the source node.

After such a synchronization, the status on the source and target nodes is synchronized. This method is still relatively good.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.