ETCD Raft Source Code Analysis II: the electoral process

Source: Internet
Author: User
Tags etcd

1.6 node tick and raft's tickelection

> This section is in Raft/raft.go this file without special instructions (you can see the R *raft before the method to know that it is currently in raft.go). The Tick ()

method of node calls the tick () of the raft struct in raft/raft.go. 

In section 1.2, the step function and the tick function =tickelection of the raft structure are set in Raft.becomefollower (). 

At the end of the previous section, Node.run () Gets the message from the N.TICKC channel, calls the Raft.tick () method, and actually calls Raft.tickelection (). "' go

// tickElection is run by followers and candidates after r.electionTimeout.
func (r *raft) tickElection() {
    r.electionElapsed++

    if r.promotable() && r.pastElectionTimeout() {
        r.electionElapsed = 0
        r.Step(pb.Message{From: r.id, Type: pb.MsgHup})
    }
}

When follower or candidate exceeds the election time, a message of type Msghup is sent to itself and then the R.campaign () method is called. > Call the R.step (r, m) function 

if the message type is not Msghup, Msgvote, Msgprevote. For example, when you set the R.step=stepfollower in front of the Becomefollower, the Stepfollower () method is actually called. In the Stepfollower () method, you can see that the message type it handles does not have the above msghup, Msgvote, Msgprevote. 


Start raft at the very beginning. Node called Becomefollower, when initially, the term of raft was 0, and later was updated to 1. For pb.message of type msghup, its term is initially 0, so the following M.term==0 branch is executed, and then PB is executed. Msghup Branch.

Func (r *raft) Step(m pb.Message) error {
     // Handle the message term, which may result in our stepping down to a follower.
     Switch {
     Case m.Term == 0:
         // local message
     Case m.Term > r.Term:
         ...
     Case m.Term < r.Term:
         ...
         Return nil
     }

     Switch m.Type {
     Case pb.MsgHup:
         If r.state != StateLeader {
             Ents, err := r.raftLog.slice(r.raftLog.applied+1, r.raftLog.committed+1, noLimit)
             If r.preVote {
                 R.campaign(campaignPreElection)
             } else {
                 R.campaign(campaignElection)
             }
         } else {
             r.logger.Debugf("%x ignoring MsgHup because already leader", r.id)
         }

     Case pb.MsgVote, pb.MsgPreVote:
         ...
     Default:
         // You must first call r.step=xxx to set the function before you can call the following statement to actually execute the function.
         R.step(r, m)
     }
     Return nil
}

review the status of Raft call Becomefollower,raft is updated to Statefollower (also listed below as the code for the other two roles):

  • Become follower, start the election, Timer for tickelection, after Electiontimeout timeout, become candidate
  • become candidate, add term, vote to yourself, timer for tickelection
  • become leader, Timer for Tickheartbeat, timed to send heartbeat to follower "' go


// raft/raft.go
Func (r *raft) becomeFollower(term uint64, lead uint64) {
     R.step = stepFollower
     R.reset(term)
     R.tick = r.tickElection
     R.lead = lead
     R.state = StateFollower
     r.logger.Infof("%x became follower at term %d", r.id, r.Term)
}

Func (r *raft) becomeCandidate() {
     R.step = stepCandidate
     // When you become a candidate, Term adds 1
     R.reset(r.Term + 1)
     R.tick = r.tickElection
     // Vote for yourself
     r.Vote = r.id
     R.state = StateCandidate
     r.logger.Infof("%x became candidate at term %d", r.id, r.Term)
}

Func (r *raft) becomeLeader() {
     R.step = stepLeader
     R.reset(r.Term)
     R.tick = r.tickHeartbeat
     R.lead = r.id
     R.state = StateLeader
     r.pendingConfIndex = r.raftLog.lastIndex()
     r.appendEntry(pb.Entry{Data: nil})
     r.logger.Infof("%x became leader at term %d", r.id, r.Term)
}


1.7 campaign leader

follower in Electiontimeout After a timeout, the campaign becomes candidate. Here we do not consider two phases:

  • Call Becomecandidate, set the polling message type to msgvote
  • if you get to most votes, call Becomeleader ()
  •  Send votemsg to each node 
here follower the first time the campaign is executed, the votes obtained by Step 2 certainly do not satisfy the majority, so the msgvote message is sent to the other nodes. 
Func (r *raft) campaign(t CampaignType) {
    Var term uint64
    Var voteMsg pb.MessageType
    If t == campaignPreElection {
        r.becomePreCandidate()
        voteMsg = pb.MsgPreVote
        // PreVote RPCs are sent for the next term before we've incremented r.Term.
        Term = r.Term + 1
    } else {
        r.becomeCandidate()
        voteMsg = pb.MsgVote
        Term = r.Term
    }
    If r.quorum() == r.poll(r.id, voteRespMsgType(voteMsg), true) {
        // We won the election after voting for ourselves (which must mean that this is a single-node cluster). Advance to the next state.
        If t == campaignPreElection {
            R.campaign(campaignElection)
        } else {
            r.becomeLeader()
        }
        Return
    }
    For id := range r.prs {
        // If you are yourself, don't need to send
        If id == r.id continue
        Var ctx []byte
        If t == campaignTransfer ctx = []byte(t)
        R.send(pb.Message{Term: term, To: id, Type: voteMsg, Index: r.raftLog.lastIndex(), LogTerm: r.raftLog.lastTerm(), Context: ctx})
    }
}

After the candidate sends Votemsg to the other nodes (Follower), the candidate becomes leader after receiving the poll results of most Follower nodes. Below we analyze how the follower node receives the Msgvote request sent by the candidate is handled, which involves the RPC call, which is rafthttp in ETCD.


1.8 Raft http 

Each etcdserver has an HTTP server that receives messages sent by other nodes and returns the result of the response to the sender:


//etcdserver/api/rafthttp/http.go
func (h *pipelineHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    var m raftpb.Message
    if err := h.r.Process(context.TODO(), m); err != nil {
        switch v := err.(type) {
        case writerToResponse:
            v.WriteTo(w)
        }
        return
    }
}


Etcdserver ' implements Etcdserver/api/rafthttp/transport.go ' Raft ' Interface "

//etcdserver/api/rafthttp/transport.go
type Raft interface {
    Process(ctx context.Context, m raftpb.Message) error
    IsIDRemoved(id uint64) bool
    ReportUnreachable(id uint64)
    ReportSnapshot(id uint64, status raft.SnapshotStatus)
}

Q:S.R returns the Raftnode of Etcdserver/raft.go, which is a struct. The step method is defined in the Raft/node.go interface. 

So the question is: How does the step method of the Raftnode struct call the step method of the node interface?

 >a: First look at the node interface under Node.go and the node structure. The node struct implements all the methods of the node interface, so node can be seen as the implementation class of the node interface. 

Although the Raft.node interface is not defined in the Raftnode struct, its Raftnodeconfig property is defined! This syntax is called struct inline/nested (embedded) interface.


 Etcdserver's Process method calls the step method of the node interface in Raft/node.go (whose implementation class is the node struct under this file): 


// etcdserver/server.go
func (s *EtcdServer) Process(ctx context.Context, m raftpb.Message) error {
    return s.r.Step(ctx, m)
}

// raft/node.go
func (n *node) Step(ctx context.Context, m pb.Message) error {
    // ignore unexpected local messages receiving over network
    if IsLocalMsg(m.Type) {
        return nil
    }
    return n.step(ctx, m)
}
func (n *node) step(ctx context.Context, m pb.Message) error {
    return n.stepWithWaitOption(ctx, m, false)
}
// Step advances the state machine using msgs. The ctx.Err() will be returned, if any.
func (n *node) stepWithWaitOption(ctx context.Context, m pb.Message, wait bool) error {
    if m.Type != pb.MsgProp {
        select {
        case n.recvc <- m:
            return nil
        case <-ctx.Done():
            return ctx.Err()
        case <-n.done:
            return ErrStopped
        }
    }
    ...
    return nil
}


candidate sends a request to the follower node, when the follower node receives the request, The message is sent to the RECVC channel of node. 

Note: All nodes in the ETCD distributed cluster will start Raftnode, raft. Node, it will also run the Node.run () method. 

// raft/node.go
func (n *node) run(r *raft) {
    ...
    for {
        select {
            ...
        case <-n.tickc:
            r.tick()
        case m := <-n.recvc:
            // filter out response message from unknown From.
            if pr := r.getProgress(m.From); pr != nil || !IsResponseMsg(m.Type) {
                r.Step(m)
            }
        }
    }
}

review the previous candidate from N.TICKC to the timer (electiontimeout) timeout message, via Tickelection () call to Raft.go ' Step () ' method, Then participate in the campaign (campaign) and send votemsg to the follower node. Here the Follower node receives the VOTEMSG request, gets the message from N.RECVC, and calls the ' R.step (M) ' method. The roles of the two nodes are different, but the same raft are called. Step () method, of course, the processing logic of the two is different. 


1.9 Follower received Msgvote, vote 

follower received the message M term=1, it itself raft. Term=0, the steps are as follows:

  • m.term > R.term, Call Becomefollower ()
  • return MSGVOTERESP to the candidate. 
// raft/raft.go
Func (r *raft) Step(m pb.Message) error {
    // Handle the message term, which may result in our stepping down to a follower.
    Switch {
    Case m.Term == 0:
        // local message
    Case m.Term > r.Term:
        Switch {
        Default:
            If m.Type == pb.MsgApp || m.Type == pb.MsgHeartbeat || m.Type == pb.MsgSnap {
                r.becomeFollower(m.Term, m.From)
            } else {
                // Follower receives the MsgVote request from Candidate and becomes the Follower (change status)
                // Again, just set Raft's step function and tick function, and haven't actually implemented the logic to become Follower.
                r.becomeFollower(m.Term, None)
            }
        }
    Case m.Term < r.Term:
        ...
        Return nil
    }

    Switch m.Type {
    Case pb.MsgHup:
        ...
    Case pb.MsgVote, pb.MsgPreVote:
        canVote := r.Vote == m.From || // We can vote if this is a repeat of a vote we've already cast...
            (r.Vote == None && r.lead == None) || // ...we haven't voted and we don't think there's a leader yet in this term...
            (m.Type == pb.MsgPreVote && m.Term > r.Term) // ...or this is a PreVote for a future term...
        If canVote && r.raftLog.isUpToDate(m.Index, m.LogTerm) { // ...and we believe the candidate is up to date.
            R.send(pb.Message{To: m.From, Term: m.Term, Type: voteRespMsgType(m.Type)}) // The returned message type is VoteResp
            If m.Type == pb.MsgVote {
                r.electionElapsed = 0 // Only record real votes. Reset election counter
                r.Vote = m.From // Vote for the node that sent the message, the candidate
            }
        } else {
            R.send(pb.Message{To: m.From, Term: r.Term, Type: voteRespMsgType(m.Type), Reject: true}) // Refuse to vote
        }
    Default:
        Err := r.step(r, m)
        If err != nil {
            Return err
        }
    }
    Return nil
}


when follower returns Msgvote response result Msgvoteresp to candidate,candidate the processing flow is similar to follower received message, R is also called. Step () method. 


1.10 candidate becomes leader 

because the term of follower return message equals candidate to send a message, so go straight to the default of the second switch conditionBranch:


// raft/raft.go
Func (r *raft) Step(m pb.Message) error {
     // Handle the message term, which may result in our stepping down to a follower.
     Switch {
     Case m.Term == 0:
         // local message
     Case m.Term > r.Term:
         ...
     Case m.Term < r.Term:
         ...
         Return nil
     }

     Switch m.Type {
     Case pb.MsgHup:
         ...
     Case pb.MsgVote, pb.MsgPreVote:
         ...
     Default:
         Err := r.step(r, m) // Call Raft's stepFunc, step 1.7, stepCandidate set by believeCandidate
         If err != nil {
             Return err
         }
     }
     Return nil
}


 before we see that the step (m) method does not go to the default branch, where the R.step (r,m) function is called, corresponding to the stepcandidate: 


// stepCandidate is shared by StateCandidate and StatePreCandidate; the difference is
// whether they respond to MsgVoteResp or MsgPreVoteResp.
Func stepCandidate(r *raft, m pb.Message) error {
    // Only handle vote responses corresponding to our candidacy (while in
    // StateCandidate, we may get stale MsgPreVoteResp messages in this term from our pre-candidate state).
    Var myVoteRespType pb.MessageType
    If r.state == StatePreCandidate {
        myVoteRespType = pb.MsgPreVoteResp
    } else {
        myVoteRespType = pb.MsgVoteResp
    }
    Switch m.Type {
    Case pb.MsgProp:
        r.logger.Infof("%x no leader at term %d; dropping proposal", r.id, r.Term)
        Return ErrProposalDropped
    Case pb.MsgApp:
        r.becomeFollower(m.Term, m.From) // always m.Term == r.Term
        r.handleAppendEntries(m)
    Case pb.MsgHeartbeat:
        r.becomeFollower(m.Term, m.From) // always m.Term == r.Term
        r.handleHeartbeat(m)
    Case pb.MsgSnap:
        r.becomeFollower(m.Term, m.From) // always m.Term == r.Term
        r.handleSnapshot(m)
    Case myVoteRespType: // The candidate receives the Follower's VoteMsg message and determines the number of votes.
        Gr := r.poll(m.From, m.Type, !m.Reject)
        r.logger.Infof("%x [quorum:%d] has received %d %s votes and %d vote rejections", r.id, r.quorum(), gr, m.Type, len(r.votes )-gr)
        Switch r.quorum() {
        Case gr:
            If r.state == StatePreCandidate {
                R.campaign(campaignElection)
            } else {
                r.becomeLeader() // Satisfy the number of votes and become Leader
                r.bcastAppend() // Send MsgApp request to other Follower nodes
            }
        Case len(r.votes) - gr:
            // pb.MsgPreVoteResp contains future term of pre-candidate m.Term > r.Term; reuse r.Term
            r.becomeFollower(r.Term, None)
        }
    Case pb.MsgTimeoutNow:
        r.logger.Debugf("%x [term %d state %v] ignored MsgTimeoutNow from %x", r.id, r.Term, r.state, m.From)
    }
    Return nil
}


summarizes that each node will call raft when it receives an HTTP request. Step (m) method, if the type of the message is not Msghup, Msgvote, Msgprevote, then the R.step (r,m) method is called. Raft Stepfunc must be set before sending/receiving an HTTP request, where it is set in the three methods of calling Becomefollower, Becomecandidate, and Becomeleader. 


A: Take Follower1 to candidate for example: 

1. Call Becomefollower, set R.step=stepfollower

2. Ticker electiontimeout timeout, call Becomecandidate, set R.step=stepcandidate

3. Call campaign, send votemsg to all other nodes 


B: Take Follower2 for example: 

1. Call Becomefollower, set R.step=stepfollower

2. Receive the candidate node's votemsg request, vote to the candidate node, return VOTEMSGRESP


C: Candidate to Leader for example: 

1. Receives the VOTEMSGRESP sent by follower, calls the R.step function, which is stepCandidate

2 in A:2. Judging the number of votes, if you get the majority of votes, call Becomeleader, set R.step=stepleader

3. Send Msgapp request to other follower nodes 


D: Take Follower2 for example: 

1. Receive leader Msgapp request, call R.step function, that is, stepFollower

2. Process appendentries request, return MSGAPPRESP request to leader 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.