MIT6.824 Lab2 Raft (1)

Source: Internet
Author: User
Tags mutex sleep volatile


Introduction
This experiment mainly uses the go language to implement the simple raft algorithm, is familiar with the distributed system consistency algorithm, this experiment realizes the raft algorithm is the subsequent Experiment Foundation. It is recommended that you take a look at a flash website raft that introduces raft. The
replicated service (such as a key-value pair database) uses the raft algorithm to help manage replica nodes. The purpose of using replica is that the system can still provide services when some of the replica nodes in the system are down or if the network connection is suspended. When the node fails, the data on the replica node may be inconsistent, and the raft algorithm can filter out the correct data. The basic idea of the
raft algorithm is to implement a replication state machine that organizes the client's requests into logs, ensuring that all log on the replica node is consistent. Each replica node executes according to the order of requests in the log, applying the request result to the state machine. Because all active replicas nodes can see the same log content, the requests are executed in the same order, so they can have the same service state. If a node hangs and then resumes, raft can carefully update its logs to the latest state. When most of the nodes in the system are active and can communicate with each other, then the raft can continue to serve.

The Code of this experiment is not much, the focus is to read Raft paper Extended Raft paper, fill the corresponding structure and implement the corresponding method. If you look at English more difficult, here are the Chinese translation version of the link raft.



Content
Raft The consistency issue into three relatively independent sub-issues:
1, leading the election: A new leader needs to be elected, yourselves the remaining leaders when they are down (Chapter 5.2)
2, Day Log replication: The leader must receive logs from the client and then replicate to other nodes in the cluster, and force the logs for the other nodes to remain the same as themselves.
3. Security: The key to security in Raft is the state machine security shown in Figure 3: If any of the server nodes have applied a certain log entry to its state machine, then the other server nodes cannot apply a different instruction at the same log index location. Chapter 5.4 illustrates how the Raft algorithm guarantees this feature, and this solution involves restrictions on an additional electoral mechanism (Section 5.2).
The best way to complete an experiment is to read the test code first and start to understand the whole process.
This experiment uses the LABRPC framework provided by the experiment (based on the official RPC library RPC) to enable communication between nodes. The use method is described in the Labrpc/labrpc.go file, as shown in the following example:


Net: = makenetwork() -- create a new network to accommodate clients and servers
End: = net. Makeend (endname) -- create a new client to communicate with the server
Net. Addserver (servername, server) -- add 1 server to the network
Net.deleteserver (servername) -- delete 1 server from the network
Net. Connect (endname, servername) -- connect 1 client to 1 server
Net. Enable (endname, enabled) -- enable / disable 1 client
End. Call ("raft. Appendentries", & args, & reply) -- sends an RPC request to wait for a reply, where raft is the name of the service structure, appendentries is a method in the server structure, args is the parameter of the method call, and reply saves the return result of the method call. The return value of RPC call is bool type, false indicates that the network lost request or reply or the server is hung, and true indicates that reply is valid.
srv := MakeServer()
SRV. Addservice (SVC) -- one server can provide multiple services, such as raft and K / V
svc := MakeService(receiverObject) -- obj's methods will handle RPCs


First look at the Testinitialelection function (raft/test_test.go).


Func testinitialelection (t *testing. T) {
    Servers: = 3
    cfg: = Make_config (t, servers, false)
    defer cfg.cleanup ()

    FMT. Printf ("test:initial election ... \ n")

    //is a leader elected?
    Cfg.checkoneleader ()

    //Does the leader+term stay the same there is no failure?
    Term1: = Cfg.checkterms () time
    . Sleep (2 * raftelectiontimeout)
    Term2: = cfg.checkterms ()
    if term1! = term2 {
        fmt. Printf ("Warning:term changed even though there were no failures")
    }

    fmt. Printf ("...  passed\n ")
}


The 1th important function is the Make_config function, which is defined in Config.go, where the primary function is to create an initialization raft system, and the return value is the config type. The config is defined as follows:


Type config struct {
    mu        sync. Mutex
    T         *testing. T
    net       *labrpc. Network
    n         int
    do      int32//Tell internal threads to die
    rafts     []*raft
    applyerr  []string//From apply channel readers
    connected []bool   //whether Per server is on the net
    saved     []* Persister
    endnames  [][]string    //The port file names each sends to
    logs      []map[int]int//Copy of each server ' s committed entries
}


The primary member is the net of type Labrpc.network, the array of raft system nodes rafts, the feedback information requested by the node application Applyerr, the type saved the node implements the persistent storage, the interface Endnames exposed by RPC calls per node, and the logs submitted by each node.


Func make_config (t *testing. T, n int, unreliable bool) *config {
    runtime. Gomaxprocs (4)
    cfg: = &config{}
    cfg.t = t
    cfg.net = Labrpc. Makenetwork ()
    CFG.N = n
    cfg.applyerr = make ([]string, CFG.N)
    cfg.rafts = make ([]*raft, CFG.N)
    cfg.connected = Make ([]bool, CFG.N)
    cfg.saved = make ([]*persister, CFG.N)
    cfg.endnames = make ([][]string, CFG.N )
    cfg.logs = make ([]map[int]int, CFG.N)

    cfg.setunreliable (unreliable)

    cfg.net.LongDelays (True)

    //Create a full set of rafts.
    For I: = 0; i < CFG.N; i++ {
        cfg.logs[i] = map[int]int{}
        cfg.start1 (i)
    }

    //Connect everyone for
    I: = 0; i < CFG.N; I + + {
        cfg.connect (i)
    }

    return cfg
}


In the Make_config function, the main is to initialize the config struct variable, and then call the Start1 function in the 1th for loop to create a new initialization raft node, and finally connect each node to each other.


func (cfg *config) checkOneLeader() int {
    for iters := 0; iters < 10; iters++ {
        time.Sleep(500 * time.Millisecond)
        leaders := make(map[int][]int)
        for i := 0; i < cfg.n; i++ {
            if cfg.connected[i] {
                if t, leader := cfg.rafts[i].GetState(); leader {
                    leaders[t] = append(leaders[t], i)
                }
            }
        }

        lastTermWithLeader := -1
        for t, leaders := range leaders {
            if len(leaders) > 1 {
                cfg.t.Fatalf("term %d has %d (>1) leaders", t, len(leaders))
            }
            if t > lastTermWithLeader {
                lastTermWithLeader = t
            }
        }

        if len(leaders) != 0 {
            return leaders[lastTermWithLeader][0]
        }
    }
    cfg.t.Fatalf("expected one leader, got none")
    return -1
}

The

Loops 10 times in the Checkoneleader function, each time sleep500 milliseconds, each time the getstate function of the raft node is called to get the current state of the node, that is, whether it is in a leader state. Determine if a system exists and only 1 leader exist.
At this point the main part of the Testinitialelection function is ended, the Checkterms function is used to get the current term of the leader node, and then the leader node's current is judged after twice times the election timeout Whether the term is the same as the value obtained before, under normal circumstances without node fail, the leader is constant and the term is unchanged.

Raft node
First populates the Raft node in the Raft.go file with the information that each node should contain, based on the 5th part of the paper.


Log entry contains command for state machine, and term when entry//is received by leader//type logentry struct
{Command interface{} term int}////A Go object implementing a single Raft peer. Type Raft struct {mu sync. Mutex peers []*labrpc. Clientend persister *persister me int//index to peers[]//persistent state on all servers cur rentterm int votedfor int logs []logentry//Volatile state on all servers Commitindex int//I

    Ndex of highest log entry known to be committed lastapplied int//index of highest log entry applied Volatile state on Leaders Nextindex []int//Index of the next log entry to send to that server Matchinde  x []int//index of highest log entry known to BES replicated on server//granted vote number Granted_votes_count int//State and Applymsg Chan state string Applych chan applymsg//Log and TimEr//logger *log. Logger Timer *time. Timer}


Where the timer is used for the node, Granted_vote_count is used to count the number of votes voted in favor. The Make function is followed by the Initialize node, which is the raft struct.


Func make (peers []*labrpc. Clientend, Me int,
    persister *persister, Applych Chan applymsg) *raft {
    rf: = &raft{}
    rf.peers = peersrf.persister = persister
    rf.me = Me

    //Your initialization code here.
    Rf.currentterm = 0
    rf.votedfor = 1
    rf.logs = make ([]logentry, 0)

    Rf.commitindex =-1
    rf.lastapplied =- 1

    rf.nextindex = make ([]int, Len (peers))
    Rf.matchindex = make ([]int, Len (peers))

    rf.state = FOLLOWER
    Rf.applych = Applych
    //Initialize from the state persisted before a crash
    rf.readpersist (persister. Readraftstate ())

    rf.persist ()
    Rf.resettimer ()

    return RF
}


Leader election
To begin an electoral process, the follower will first increase their current term number and switch to the candidate status. Then he would send a poll RPCs to the other server nodes in the cluster in parallel to vote for himself. So first implement the Requestvoteargs and requestvotereply structure, that is, the parameters of the request to vote RPC and feedback results.


Type Appendentryargs struct {
    term         int
    leader_id    int
    prevlogindex int
    prevlogterm  int
    Entries      []logentry
    leadercommit int
}

type appendentryreply struct {
    term        int
    Success     BOOL
    Commitindex int
}


One of the concepts involved in a leader election is a timeout, which is converted to a candidate to make a request to vote RPC when the raft node times out.


func (rf *Raft) handleTimer() {
    rf.mu.Lock()
    defer rf.mu.Unlock()

    if rf.state != LEADER {
        rf.state = CANDIDATE
        rf.currentTerm += 1
        rf.votedFor = rf.me
        rf.granted_votes_count = 1
        rf.persist()
        // rf.logger.Printf("New election, Candidate:%v term:%v\n", rf.me, rf.currentTerm)
        args := RequestVoteArgs{
            Term:         rf.currentTerm,
            CandidateId:  rf.me,
            LastLogIndex: len(rf.logs) - 1,
        }

        if len(rf.logs) > 0 {
            args.LastLogTerm = rf.logs[args.LastLogIndex].Term
        }

        for server := 0; server < len(rf.peers); server++ {
            if server == rf.me {
                continue
            }

            go func(server int, args RequestVoteArgs) {
                var reply RequestVoteReply
                ok := rf.sendRequestVote(server, args, &reply)
                if ok {
                    rf.handleVoteResult(reply)
                }
            }(server, args)

        }
    } else {
        rf.SendAppendEntriesToAllFollwer()
    }
    rf.resetTimer()
}


The Handletimer function is used to handle node timeouts. When nodes are not leaders, the transition status is candidate, term plus 1, and vote for themselves. It then constructs the parameters of the request to vote RPC, sends a poll RPC request to the other node, processes the results of the RPC call when there is feedback, and finally calls the Resettimer function to reset the clock.
Then take a look at the Sendrequestvote function, which is used to invoke RPC requests.


Func (RF *raft) sendrequestvote (server int, args Requestvoteargs, reply *requestvotereply) bool {
    OK: = Rf.peers[serve R]. Call ("Raft.requestvote", args, Reply)
    return OK
}


The Requestvote function is the handler for the raft node that receives the polling request.


func (rf *Raft) RequestVote(args RequestVoteArgs, reply *RequestVoteReply) {
    // Your code here.
    rf.mu.Lock()
    defer rf.mu.Unlock()

    may_grant_vote := true

    // current server's log must newer than the candidate
    if len(rf.logs) > 0 {
        if rf.logs[len(rf.logs)-1].Term > args.LastLogTerm ||
            (rf.logs[len(rf.logs)-1].Term == args.LastLogTerm &&
                len(rf.logs)-1 > args.LastLogIndex) {
            may_grant_vote = false
        }
    }

    // current server's current term bigger than the candidate
    if args.Term < rf.currentTerm {
        reply.Term = rf.currentTerm
        reply.VoteGranted = false
        return
    }

    // current server's current term same as the candidate
    if args.Term == rf.currentTerm {
        // no voted candidate
        if rf.votedFor == -1 && may_grant_vote {
            rf.votedFor = args.CandidateId
            rf.persist()
        }
        reply.Term = rf.currentTerm
        reply.VoteGranted = (rf.votedFor == args.CandidateId)

        return
    }

    // current server's current term smaller than the candidate
    if args.Term > rf.currentTerm {
        rf.state = FOLLOWER
        rf.currentTerm = args.Term
        rf.votedFor = -1

        if may_grant_vote {
            rf.votedFor = args.CandidateId
            rf.persist()
        }
        rf.resetTimer()

        reply.Term = args.Term
        reply.VoteGranted = (rf.votedFor == args.CandidateId)

        return
    }
}


The main idea is this: first to determine whether the current node log is newer than the candidate, or the last log of the term than in the argument, or the same as the term, but the last log of the index than the index in the parameter is larger. If the current node log is newer than the candidate, then it is impossible to vote. Then judge whether the term of the current node is larger than the current term of the candidate, if large then refuses to vote. If it is equal, then determine if the current node has voted in the term, and if it does not vote and can vote (the log is older), then vote for the candidate. If the term of the current node is smaller than the term of the candidate node, then the current node is converted to the Follwer state, and the current term of the node is updated to the term of the candidate node, and if able to vote then vote to the candidate node. Note that when the term of 2 nodes is the same size, the state of the current node cannot be converted to Follwer because the current node may be waiting for the candidate to vote for the result. The
finally looks at the function of the candidate node to process the poll results.


func (rf *Raft) handleVoteResult(reply RequestVoteReply) {
    rf.mu.Lock()
    defer rf.mu.Unlock()

    // old term ignore
    if reply.Term < rf.currentTerm {
        return
    }

    // newer reply item push peer to be follwer again
    if reply.Term > rf.currentTerm {
        rf.currentTerm = reply.Term
        rf.state = FOLLOWER
        rf.votedFor = -1
        rf.resetTimer()
        return
    }

    if rf.state == CANDIDATE && reply.VoteGranted {
        rf.granted_votes_count += 1
        if rf.granted_votes_count >= majority(len(rf.peers)) {
            rf.state = LEADER
            for i := 0; i < len(rf.peers); i++ {
                if i == rf.me {
                    continue
                }
                rf.nextIndex[i] = len(rf.logs)
                rf.matchIndex[i] = -1
            }
            rf.resetTimer()
        }
        return
    }
}


In the Handlevoteresult function, if the term of the candidate node is greater than the term in the feedback result, the information is ignored. If the term in the feedback result is larger than the candidate node, then the node should be converted to the Follwer state. If it is equal and there is a vote in the feedback result, determine whether the total number of votes currently obtained is greater than half, if it is converted to the leader state and initialized for the relevant node information.
The complete leader election is complete.
Take a look back at the 2nd Test testreelection function in the test code.


func TestReElection(t *testing.T) {
    servers := 3
    cfg := make_config(t, servers, false)
    defer cfg.cleanup()

    fmt.Printf("Test: election after network failure ...\n")

    leader1 := cfg.checkOneLeader()

    // if the leader disconnects, a new one should be elected.
    cfg.disconnect(leader1)
    cfg.checkOneLeader()

    // if the old leader rejoins, that shouldn't
    // disturb the old leader.
    cfg.connect(leader1)
    leader2 := cfg.checkOneLeader()

    // if there's no quorum, no leader should
    // be elected.
    cfg.disconnect(leader2)
    cfg.disconnect((leader2 + 1) % servers)
    time.Sleep(2 * RaftElectionTimeout)
    cfg.checkNoLeader()

    // if a quorum arises, it should elect a leader.
    cfg.connect((leader2 + 1) % servers)
    cfg.checkOneLeader()

    // re-join of last node shouldn't prevent leader from existing.
    cfg.connect(leader2)
    cfg.checkOneLeader()

    fmt.Printf("  ... Passed\n")
}


The test creates a new 1 raft system with 3 nodes, and then tests that when the leader loses the connection, the system should re-elect 1 leaders. When previous leaders rejoin the system, the existing leaders should not change, since the term of the former leader is less than the term of the tie man and will be forced to update. When removing 2 nodes, there should be no leader in the system. When you rejoin 1 nodes, you make up 1 elections, and the system should contain 1 leaders.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.