MIT6.824 Lab2 Raft (3)

Source: Internet
Author: User
Tags commit sleep

The last section of the

describes the test functions in the test code.
immediately after the last time, it is now the Testfailnoagree function test cannot achieve a consensus in the case of too many node fail.

Func testfailnoagree (t *testing. T) {servers: = 5 cfg: = Make_config (t, servers, false) defer cfg.cleanup () fmt. Printf ("Test:no agreement if too many followers fail ... \ n") cfg.one (servers)//3 of 5 followers Disconnec 
    T leader: = Cfg.checkoneleader () Cfg.disconnect ((leader + 1)% servers) Cfg.disconnect ((leader + 2)% servers) Cfg.disconnect ((leader + 3)% servers) Index, _, OK: = Cfg.rafts[leader]. Start if ok! = True {t.fatalf ("leader rejected Start ()")} if Index! = 2 {t.fatalf ("Expe CTED index 2, got%v ", index)} time. Sleep (2 * raftelectiontimeout) n, _: = cfg.ncommitted (index) if n > 0 {t.fatalf ("%v committed but no Majority ", N)}//Repair Failures Cfg.connect ((leader + 1)% servers) Cfg.connect ((leader + 2)% servers ) Cfg.connect ((leader + 3)% servers)//The disconnected majority may has chosen a leader from//among the IR own ranKS, forgetting index 2. Or perhaps Leader2: = Cfg.checkoneleader () Index2, _, Ok2: = Cfg.rafts[leader2]. Start (+) If Ok2 = = false {T.fatalf ("Leader2 rejected Start ()")} if Index2 < 2 | | Index2 > 3 {t.fatalf ("Unexpected index%v", Index2)} cfg.one (+, servers) fmt. Printf ("... Passed\n ")}

With the print function in the one function, we can see the Leader (leader=1) of the current system under different circumstances. In the Testfailnoagree function, call the one function check consistency (index=1), and then disconnect 3 follwer nodes, at this time the client made a request (index=2), raft system only 2 nodes, so append Entry the number of nodes agreed to is less than majority, the ordinal log entry should not be committed. Immediately after the disconnection of the 3 nodes into the network, because the number of disconnected nodes more than majority, so at this time the system may re-election leader, call Checkoneleader function found leader=2, so the original index=2 request will be discarded. The client then makes the request (index=2) and the returned sequence number should be 2. The
Testconcurrentstarts function tests the consistency of concurrent requests in a term.

Func testconcurrentstarts (t *testing. T) {servers: = 3 cfg: = Make_config (t, servers, false) defer cfg.cleanup () fmt. Printf ("Test:concurrent Start () s ... \ n") var success bool Loop:for Try: = 0; Try < 5; try++ {If try > 0 {//give solution some time to settle time. Sleep (3 * time. Second)} leader: = Cfg.checkoneleader () _, term, OK: = Cfg.rafts[leader]. 
        Start (1) If!ok {//leader moved on really quickly continue} iters: = 5 VAR wg sync. Waitgroup is: = Do (chan int, iters) for II: = 0; II < iters; ii++ {WG. ADD (1) go func (i int) {defer WG. Done () I, Term1, OK: = Cfg.rafts[leader]. Start (+ i) if term1! = term {return} if ok! = TR UE {return} IS <-i} (ii)} WG. Wait () Close (IS) for j: = 0; J < servers; J + + {if T, _: = Cfg.rafts[j]. GetState ();
        T! = term {//term changed--can ' t expect low RPC counts continue loop}  } failed: = False Cmds: = []int{} for index: = range is {cmd: = cfg.wait (Index, servers, term) If IX, OK: = cmd. (int); OK {if IX = =-1 {//peers has moved on to later terms//So we Can ' t expect all Start () s to//has succeeded failed = True B Reak} cmds = Append (Cmds, ix)} else {t.fatalf ("value%v is Not a int ", CMD)}} if failed {//Avoid leaking goroutines go func
              () {for range is {  }} () continue} for II: = 0; II < iters; ii++ {x: = + + II OK: = False for J: = 0; J < Len (Cmds); J + + {I
                F Cmds[j] = = x {OK = true}} if ok = = False { T.fatalf ("cmd%v Missing in%v", X, Cmds)}} success = True break} if!s uccess {t.fatalf ("term changed too often")} FMT. Printf ("... Passed\n ")}

Create a new 1 raft system with 3 nodes, call the Checkoneleader function to get the current leader ordinal of the raft system, call the start function to send a command (index=1) to the node, and create a new 1 channel that can hold 5 elements. Create 5 goroutine to send commands concurrently, using the Waitgroup in the Sync pack to ensure synchronization. After each goroutine sends a command, check whether in the previous term, if the command succeeds, write the ordinal (here Index is 2-6). After 5 Goroutine are finished to determine whether the previous operation in the same term, if not then start again, mainly to ensure high concurrent RPC. Then take the ordinal from the previous channel, where the wait function waits for the command log of the corresponding sequence to be submitted by all nodes of the raft system. If the term changes, such as a system error, clear the command in the channel. The final comparison of the commands obtained from the channel is the same as the command that sent the request previously, that is, whether the command was submitted in an orderly manner. The
Testrejoin function tests for consistency when a network partition occurs.

func testrejoin (t *testing. T) {servers: = 3 cfg: = Make_config (t, servers, false) defer cfg.cleanup () fmt. Printf ("Test:rejoin of partitioned leader ... \ n") Cfg.one (101, servers)//leader network failure Leader1: = Cfg.checkoneleader () cfg.disconnect (leader1)/make old leader try to agree on some entries Cfg.rafts[lead Er1]. Start (102) Cfg.rafts[leader1]. Start (103) Cfg.rafts[leader1].  Start (104)//new leader commits, also for index=2 Cfg.one (103, 2)//new leader network failure Leader2 : = Cfg.checkoneleader () cfg.disconnect (leader2)//old leader connected again Cfg.connect (leader1) cfg. One (104, 2)//All Together now Cfg.connect (LEADER2) cfg.one (servers) fmt. Printf ("... Passed\n ")} 

Create a new 1 raft system with 3 nodes, call the one function to send a command request (Index=1,cmd=101,term=1) and complete the commit. The leader node is then disconnected from the network, forming 2 network partitions, the leader node (N1) point and 2 follwer nodes (N2,N3). Send 3 commands (term=1) to the N1 node, but cannot get the commit because the number of nodes in the network partition is less than majority, then the index=4,term=1 of the N1 node at this time. The original 2 follwer nodes are re-elected, a command request (index=2,cmd=103,term=2) is sent to the new leader node (N2) and committed, then the N2 node is disconnected from the network and the N1 node is re-incorporated into the network. Because the term of the N1 node is smaller than the term of the N3 node, the N3 node becomes a different log on the LEADER,N1 node that is discarded. Then call the one function to send a command request (INDEX=3,CMD=104,TERM=3) and complete the commit. Finally, the N2 node is rejoin the network, the N3 node is still the leader node, and the log is appended to the N2 node. Finally call the one function to send a command request (INDEX=4,CMD=105,TERM=3).
The specific change process is (the term value is not necessarily in accordance with the actual):
Leader is N1 after calling one (101)
N1:term=1,log is {101}
N2:term=1,log is {101}
N3:term=1,log is {101}
After disconnecting N1 and calling start (102), start (103), Start (104)
N1:term=1,log {101, 102, 103, 104}
N2:term=1,log is {101}
N3:term=1,log is {101}
When one (103) is called, leader is N2
N1:term=1,log {101, 102, 103, 104}
N2:term=2,log {101, 103}
N3:term=2,log {101, 103}
When you disconnect N2 and connect N1 and call one (104) leader to N3, the term value may be 3 or 4, because N2 may time out before disconnecting N1 connection N3
N1:term=3,log {101, 103, 104}
N2:term=3,log {101, 103, 104}
N3:term=2,log {101, 103}
Leader is still N3 when connecting N2 and calling one (105)
N1:term=3,log {101, 103, 104, 105}
N2:term=3,log {101, 103, 104, 105}
N3:term=3,log {101, 103, 104, 105}
  
After the test analysis according to the above process can, and so on later to fill up ...

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.