Ceph Paxos Related Code parsing

Source: Internet
Author: User

  • leader Elections

    Cephin theLeaderelection is aPaxos?Leaseprocess, withBasicpaxosthe purpose is different. The latter is used to resolve data consistency issues, whilePaxos Leaseis to elect aLeaderBearMonmapsynchronization Tasks, and is responsible for theLeaderafter offline select a newLeader. Cephthe cluster will only have oneMonitoras aLeader, is the current allMonitorinRankThe one with the lowest value. The electoral process will produceLeaderand theQuorummember, that is, all supportLeaderelected byMonitor. Quorumis aMonitorin the majority, that is to say, it must have a greater number of members thann/2+1,Nto beMonitorThe number of nodes.

    ??

    The Paxos Lease process can be summarized as follows:

    Phase1-a:proproser Participation

      • Proproser send Prepare message to each acceptor ;
      • Set Timer Tp, timed out after R time, requires a Prepare message to be re-sent after timeout .

    Phase1-b:proposer and acceptor participation

      • Acceptor receives the Prepare message, it needs to check the version number V carried by the Prepare message ,accept this message if the version number is larger than accepted, and to proposer reply Promise message that promises not to accept Prepare messages with a version number less than V ;
      • Proposer received Promise message, statistics approved version V of the number of acceptor , if more than half, it is considered that this version is currently the latest (can be submitted).

    Phase2-a:proposer participation

      • Proposer resets the timer Tp, sends the acceptrequest message to each acceptor;

    Phase2-b:proposer,acceptor and learner participation

      • Acceptor received acceptrequest message, send Accepted message to proposer ;
      • Proposer accepts Accepted messages, and if the number of senders exceeds half, it is considered to be Leader. The version is then requested to synchronize with the learner;
      • Learner After receiving the version message, restart Tp, ready to restart the election process after the timer has been triggered.

    ??

    The Leader elections in Ceph can be divided into three steps:

      • proposer proposal, send propose message to all monitor node;
      • Span style= "color:black; Font-size:10pt ">monitor node receives the message, accepts or rejects propose
      • proposer receive ack message, count the number of supporters based on quantity, if more than half send to other nodes Victory News, won the election.

    The epoch value plays a very important role in the leader elections: normally , the epoch value is equal between Monitor, and when Monitor is offline, The epoch value is saved in the database. When it goes back online, its epoch value is smaller than other monitor . This way the acceptor can determine if the PN is up-to-date based on the epoch value sent by the other . Also, if the epoch value is an odd number, the monitor is in an election state. When the election is complete, theepoch becomes even and synchronizes to all quorum members.

    Leader election main process see:

  • recovery stage

    when issued OP_ Collect After the message, cpeh The cluster has entered the recovery< Span style= "font-family: Microsoft Black" > phase, the purpose of this phase is to ensure that all monitor Monmap version consistent, including the final approval of the (Committed) proposal, the last one did not approve the proposal, the last to accept (acceppted) proposal. For all members of the old quorum Quorum

    from Paxos:: Collect () began to describe the main activities of each network element at this stage:

  • Lease Stage

    The Lease phase begins after Leader issues a BEGIN message. The classical Paxos algorithm is divided into Prepare and Accept two stages. During the Prepare phase,proposer sends a PREPARE message to all acceptor, with the content <snp, vp>;acceptor after receiving this message, check the maximum value of the prepare request that itself replied to SNa:

    • if SNA>SNP, the request is ignored;
    • Otherwise (at this point Snp>sna) Check the last approved accept request <snx,vx>, and reply to it;
    • if the request has not been received before, reply directly OK.

      Proposer received a number of acceptor replies, can be divided into the following situations:

    • reply Acceptor more than half, and all the answers are OK, indicating that the value of this to be submitted does not exist before, can be submitted directly through the accept message;
    • reply Acceptor more than half, but carried the previously received V, such as <sn2,v2>,<sn3,v3> . At this time proposer should choose the one with the largest number, and the contents of the Accept message is modified to <snp,v3>(note the number is unchanged);
    • reply Acceptor not more than half, the incremental SNp continues to send prepare messages, and after a few rounds proposer can always submit the latest proposals.

    then Proposer will begin sending an Accept message. acceptor Acceptance is divided into two types of processing:

      • Check the maximum number of prepare requests that have been replied to , and if sna>snp, ignore this message;
      • otherwise to Proposer and learner send Accepted messages. For learner, because acceptor between each other do not know what the resolution of the round passed, so tend to be sent by proposer to each learner <snp, vp> complete the learning process.

    for Ceph Paxos algorithm consistent. But slightly different: prepare Stage activities have been completed in the recovery stage. peon Synchronize the maximum number of proposals that you have accepted to leader paxos promise message with acceptor The maximum number of proposals accepted is consistent. Then leader peon The largest accepted proposal number has been the same, then leader ceph lease stage activity is paxos< Span style= "font-family: Microsoft Black" > In the process of Accept stage direct mapping.

    ??

Ceph Paxos Related Code parsing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.