leader Elections Cephin theLeaderelection is aPaxos?Leaseprocess, withBasicpaxosthe purpose is different. The latter is used to resolve data consistency issues, whilePaxos Leaseis to elect aLeaderBearMonmapsynchronization Tasks, and is responsible for theLeaderafter offline select a newLeader. Cephthe cluster will only have oneMonitoras aLeader, is the current allMonitorinRankThe one with the lowest value. The electoral process will produceLeaderand theQuorummember, that is, all supportLeaderelected byMonitor. Quorumis aMonitorin the majority, that is to say, it must have a greater number of members thann/2+1,Nto beMonitorThe number of nodes.
??
The Paxos Lease process can be summarized as follows:
Phase1-a:proproser Participation
- Proproser send Prepare message to each acceptor ;
- Set Timer Tp, timed out after R time, requires a Prepare message to be re-sent after timeout .
Phase1-b:proposer and acceptor participation
- Acceptor receives the Prepare message, it needs to check the version number V carried by the Prepare message ,accept this message if the version number is larger than accepted, and to proposer reply Promise message that promises not to accept Prepare messages with a version number less than V ;
- Proposer received Promise message, statistics approved version V of the number of acceptor , if more than half, it is considered that this version is currently the latest (can be submitted).
Phase2-a:proposer participation
- Proposer resets the timer Tp, sends the acceptrequest message to each acceptor;
Phase2-b:proposer,acceptor and learner participation
- Acceptor received acceptrequest message, send Accepted message to proposer ;
- Proposer accepts Accepted messages, and if the number of senders exceeds half, it is considered to be Leader. The version is then requested to synchronize with the learner;
- Learner After receiving the version message, restart Tp, ready to restart the election process after the timer has been triggered.
??
The Leader elections in Ceph can be divided into three steps:
- proposer proposal, send propose message to all monitor node;
- Span style= "color:black; Font-size:10pt ">monitor node receives the message, accepts or rejects propose
- proposer receive ack message, count the number of supporters based on quantity, if more than half send to other nodes Victory News, won the election.
The epoch value plays a very important role in the leader elections: normally , the epoch value is equal between Monitor, and when Monitor is offline, The epoch value is saved in the database. When it goes back online, its epoch value is smaller than other monitor . This way the acceptor can determine if the PN is up-to-date based on the epoch value sent by the other . Also, if the epoch value is an odd number, the monitor is in an election state. When the election is complete, theepoch becomes even and synchronizes to all quorum members.
Leader election main process see:
recovery stage when issued OP_ Collect After the message, cpeh The cluster has entered the recovery< Span style= "font-family: Microsoft Black" > phase, the purpose of this phase is to ensure that all monitor Monmap version consistent, including the final approval of the (Committed) proposal, the last one did not approve the proposal, the last to accept (acceppted) proposal. For all members of the old quorum Quorum
from Paxos:: Collect () began to describe the main activities of each network element at this stage:
Lease Stage The Lease phase begins after Leader issues a BEGIN message. The classical Paxos algorithm is divided into Prepare and Accept two stages. During the Prepare phase,proposer sends a PREPARE message to all acceptor, with the content <snp, vp>;acceptor after receiving this message, check the maximum value of the prepare request that itself replied to SNa:
then Proposer will begin sending an Accept message. acceptor Acceptance is divided into two types of processing:
- Check the maximum number of prepare requests that have been replied to , and if sna>snp, ignore this message;
- otherwise to Proposer and learner send Accepted messages. For learner, because acceptor between each other do not know what the resolution of the round passed, so tend to be sent by proposer to each learner <snp, vp> complete the learning process.
for Ceph Paxos algorithm consistent. But slightly different: prepare Stage activities have been completed in the recovery stage. peon Synchronize the maximum number of proposals that you have accepted to leader paxos promise message with acceptor The maximum number of proposals accepted is consistent. Then leader peon The largest accepted proposal number has been the same, then leader ceph lease stage activity is paxos< Span style= "font-family: Microsoft Black" > In the process of Accept stage direct mapping.
??