Overview of the Gossip protocolNodes in the Cassandra cluster do not have primary and secondary points, and they communicate through a protocol called gossip. Through the gossip protocol, they can know what nodes are in the cluster and how they are state. Each gossip message has a version number on it, the nodes can compare to the received messages to see which messages I need to update, what messages I have and others don't, and then talk to each other to make sure they get the same information, which is like real-life gossip (chew), One, wildfire, the last known. When Cassandra starts, the gossip service starts, and the gossip service starts a task Gossiptask that periodically communicates with the other nodes. Gossiptask is an internal class that is located under the Org.apache.cassandra.gms.Gossip class, and its run method is as follows:
public void Run () {messagingservice.instance (). waituntillistening (); /* Update the local heartbeat counter.
*/Endpointstatemap.get (fbutilities.getbroadcastaddress ()). Getheartbeatstate (). Updateheartbeat ();
Final list<gossipdigest> gdigests = new arraylist<gossipdigest> ();
Gossiper.instance.makeRandomGossipDigest (gdigests); if (gdigests.size () > 0) {Gossipdigestsyn digestsynmessage = new Gossipdigestsyn (databasedescriptor.getcluste
Rname (), Databasedescriptor.getpartitionername (),
gdigests); Messageout<gossipdigestsyn> message = new Messageout<gossipdigestsyn> (MessagingService.Verb.GOSSIP_
Digest_syn, Digestsynmessage,Gossipdigestsyn.serializer);
/* Gossip to some random Live member/Boolean gossipedtoseed = dogossiptolivemember (message); /* Gossip to some unreachable/some probability to check if it's back/*/Dogossiptounreachablemem
BER (message);
if (!gossipedtoseed | | liveendpoints.size () < Seeds.size ()) dogossiptoseed (message);
Dostatuscheck (); }
}
Mainly done a few things: 1, gossiptask after gossip started and will not run immediately, blocking on the listengate this condition variable, when the gossip service calls listen to start running; 2, first update the heartbeat version number of this node, Then constructs the message that needs to be sent to other nodes Gdigests 3, from the surviving node randomly select a node to send, from the failure node randomly select a send. If the current number of surviving nodes is less than the number of seeds, send a message to one of the seed nodes; 4, check the node status. For further analysis of the seed node and node State, this section focuses on the gossip protocol itself. Gossiptask is used to send gossip information to other nodes, Cassandra also provides socketthread such a thread to receive messages, The code that receives the message is in the Org.apache.cassandra.net.IncomingTcpConnection class. Either sending or receiving gossip messages is implemented by calling the Org.apache.cassandra.net.MessagingService Sendoneway method. A gossip communication is divided into three phases, as shown in the figure:
Each phase corresponds to a message class, the three message classes have a corresponding processor, the processor registration is done in the Org.apache.cassandra.service.StorageService class, refer to the following code:
Messagingservice.instance (). Registerverbhandlers (MessagingService.Verb.GOSSIP_DIGEST_SYN, new Gossipdigestsynverbhandler ());
Messagingservice.instance (). Registerverbhandlers (MessagingService.Verb.GOSSIP_DIGEST_ACK, new Gossipdigestackverbhandler ());
Messagingservice.instance (). Registerverbhandlers (MessagingService.Verb.GOSSIP_DIGEST_ACK2, new Gossipdigestack2verbhandler ());
It can be seen that these three messages correspond to three message types Gossip_digest_syn, Gossip_digest_ack, and Gossip_digest_ack2 respectively.
Each phase is analyzed in detail below.
GossipdigestsynmessageThe Gossiptask Run method sends a Gossip_digest_syn type of message (Gossipdigestsynmessage), which is given to its corresponding processor Gossipdigestackverbhandler processing, The specific process in the DoVerb () method, the core code is as follows:
public void DoVerb (messagein<gossipdigestsyn> message, int id) {//...check list<gossipdigest> gDi
Gestlist = Gdigestmessage.getgossipdigests ();
Dosort (gdigestlist);
list<gossipdigest> deltagossipdigestlist = new arraylist<gossipdigest> ();
Map<inetaddress, endpointstate> deltaepstatemap = new hashmap<inetaddress, endpointstate> ();
Gossiper.instance.examineGossiper (Gdigestlist, Deltagossipdigestlist, Deltaepstatemap);
Logger.trace ("Sending {} digests and {} deltas", Deltagossipdigestlist.size (), deltaepstatemap.size ()); messageout<gossipdigestack> gdigestackmessage = new Messageout<gossipdigestack> (
MessagingService.Verb.GOSSIP_DIGEST_ACK,
New Gossipdigestack (Deltagossipdigestlist, Deltaepstatemap),
Gossipdigestack.serializer); GossIper.instance.checkSeedContact (from);
Messagingservice.instance (). Sendoneway (Gdigestackmessage, from); }
1, docking received by the sort of message: first by generation sort, if the generation the same, by MaxVersion and local version of the difference, 2, examinegossiper contrast received information and the difference between this node, The message that this node needs to obtain further is saved by deltagossipdigestlist, and this node needs to tell the information from the node to be saved by Deltaepstatemap; 3. The deltagossipdigestlist and Deltaepstatemap are encapsulated into Gossipdigestackmessage messages and sent to the from node.
GossipdigestackmessageAs with gossipdigestsynmessage messages, Gossip_digest_ack type messages are handled by the processor Gossipdigestackverbhandler and also by the DoVerb () method. In fact, these three processors have a common implementation interface Iverbhandler. The core code is as follows:
public void DoVerb (messagein<gossipdigestack> message, int id) {list<gossipdigest> gdigestlist = Gdigesta
Ckmessage.getgossipdigestlist ();
Map<inetaddress, endpointstate> epstatemap = Gdigestackmessage.getendpointstatemap ();
if (epstatemap.size () > 0) {Gossiper.instance.notifyFailureDetector (EPSTATEMAP);
Gossiper.instance.applyStateLocally (EPSTATEMAP);
} map<inetaddress, endpointstate> deltaepstatemap = new hashmap<inetaddress, endpointstate> ();
for (Gossipdigest gdigest:gdigestlist) {inetaddress addr = Gdigest.getendpoint ();
Endpointstate localepstateptr = Gossiper.instance.getStateForVersionBiggerThan (addr, gdigest.getmaxversion ());
if (localepstateptr!= null) deltaepstatemap.put (addr, localepstateptr); } messageout<gossipdigestack2> gdigestack2message = new Messageout<gossipdigestack2> (
MessagingService.Verb.GOSSIP_DIGEST_ACK2, New GossipDigestAck2 (Deltaepstatemap),
Gossipdigestack2.serializer);
Messagingservice.instance (). Sendoneway (Gdigestack2message, from); }
1. Epstatemap is the message that the from node wants to tell it, calls the Applystatelocally method for updating, and 2, Gdigestlist is the message that the from node wants to update further. The Getstateforversionbiggerthan method is invoked to get the message that the from node can update; 3, the message in 2 is encapsulated into a GOSSIP_DIGEST_ACK2 message sent to the from knot.
Gossipdigestack2messageHandled by the Gossipdigestack2verbhandler processor, the code is not posted here, the message from the From interface is received, and local updates are made.
The next section learns Cassandra Frame induction
References: Http://ddg.jaist.ac.jp/pub/HDY+04.pdf http://www.datastax.com/documentation/cassandra/2.0/cassandra/ Architecture/architectureintro_c.html