This is a creation in Article, where the information may have evolved or changed.
Introduction
Because has been dealing with Raft, although Raft very familiar with, but if you want me to give a completely do not know what is Raft Raft, I think the difficulty is very big. So I decided to use my usual Luo Li, using metaphors and storytelling to try to talk about Raft.
If you've seen piglet with your kids, you'll probably know why I used that weird name. If you have not seen it, it is highly recommended that you take a look, this is really a very good children's animation.
Logs and State machines
Miss Rabbit is going to set up a bank in the muddy town (called the Mud Bank). For the design of the bank savings system, Miss Rabbit found the pig father.
Miss Rabbit: "Father Pig, we have to make sure that no matter how the user's money can not be wrong." If the client deposited 100 dollars, then his account will be more than 100 yuan, not 101, nor will it be 99. 』
Papa Pig: Ok, Miss Bunny, I think we can do that. If a customer comes to save money, first we can record the transaction sequentially, then Mr. Rabbit puts the actual money of the customer into the vault corresponding to the safe. When Mr. Rabbit puts the money in, we can tell the client that the deal is complete.
Miss Rabbit: "Good idea, Father Pig, but why should we record the transaction first?" Can you just put it in the vault? 』
Pig Father: First of all, if the transaction records are unsuccessful, then we will not have to put the money in the Treasury. Second, suppose that at the same time a lot of people to save money, Mr. Rabbit is a bit of processing, may be mistaken. If we have a record, Mr. Rabbit will be able to follow the record one at a processing, although this is a little slower, but not wrong. In addition, the transaction can never be tampered with or overwritten, for example, if we record the location of n this position of customer a saved 100 dollars, then this record n must be the transaction, but it is not possible to become customer B took 100 dollars. 』
Miss Rabbit: "Well, that sounds reasonable, so let's do it." 』
Well, although the above example is a bit unrealistic, after all, if the bank is really so play, away from the collapse is not far away, but you still think this mechanism can work well. In Raft, the transaction record, we can call the log, but the Treasury, is the state machine. For any operation, Raft first logs it to the log, and then waits for the log to be submitted, and then writes its corresponding data to the state machine. Each log has a unique number, which is strictly incremented by the addition of a number, that is, leader only appends the log, not the log. Assuming that the current log number is 10, the next log number must be 11. If this log is applied to the state machine, then we can assume that the log has been applied.
Quorum
Soon after, the first version of the MUD bank savings system was on the line. Everything worked well until one day, thunder and lightning, and mud pits the bank was out of power. Users found that there was no way to trade, although the dog Grandpa did his best, but it took a lot of time for the whole bank to work properly. After the normal operation of the bank, Miss Rabbit found the pig father.
Miss Rabbit: "Father Pig, now it looks like we have to make sure that even if there is a problem with the bank in the muddy town, such as a power outage, users can still trade normally." 』
Father Pig: "Yes, Miss Rabbit." That's what we have to do. We have to set up another bank in other places, so that even if the small town bank in the mud is unable to provide services, customers can still trade at other bank outlets. 』
Miss Rabbit: "Good idea, Father Pig, then we set up a division in the Echo Valley, and when the banks in the muddy town failed, the users were still able to trade in the Echo Valley." 』
Papa Pig: "Miss Rabbit, I'm afraid it's not possible to create a division in the Echo Valley." 』
Miss Rabbit: "Why?" Father Pig, I'm a little confused. 』
Pig Father: "Now suppose we have a system in the muddy town and the Echo Valley, if a customer to save money in the muddy town, we first have to record the transaction in the muddy town side, and then in the notification Echo Valley side also recorded the transaction, only we know the Echo Valley record transaction success, we can proceed to the next step, That is, the user's money into the vault, and tell the Echo Valley side, also need to put the corresponding money in the vault over there, so if there is a problem in the muddy town, customers can still go to the Echo Valley to withdraw money. 』
Miss Rabbit: "It sounds complicated, but it looks fine, so put the system in two places, no problem!" 』
Father Pig: "No, no, Miss Rabbit." What I said above is that both sides can work normally, but there will be many abnormal situations. Suppose, for example, that the system on this side of the muddy town works, but there is a problem with the Echo Valley, then the customer comes to the small town to save money because we can't record the deal in the Echo Valley, so the user still can't save. 』
Miss Rabbit: "This is the case, we still let customers save it, and so on Echo Valley system, and then put the relevant transaction records over there, so not?" 』
Father Pig: "Of course not, Miss Rabbit." Because we want to ensure the absolute security of the customer's money. Suppose the customer first in the mud in the small town to save money, Echo Valley over there may be because of problems do not know the deal, if the muddy town side of the system there is a problem, then users go to Echo Valley to withdraw money, you will find that his money in the Echo Valley or before the total amount, so the problem is big. So, if there are only two places with systems, we have to make sure that the systems in these two locations are fully functional, and that the whole system is not available on either side. 』
Miss Rabbit: "Oh, I probably understand, then what do we do?" 』
Pig Father: "If we want to tolerate a place where the bank cannot provide services, but the customer is still able to conduct transactions, we need at least three places to deploy the system." 』
Miss Rabbit: "Oh, I'm a little confused, can you explain carefully, why three?" 』
Father Pig: "OK, Miss Rabbit." Let's say we have a system deployed in three places, such as the three places in the muddy town, Echo Valley and Pirate Island. Suppose a client comes to the puddle to save money, first of all, we'll record the deal in a puddle town, and then tell Echo Valley and Pirate Island to record the deal, if Echo Valley or Pirate Island has a reply to the muddy town This transaction has been successfully recorded, we can allow customers in the mud town to deposit money into the vault, Then you can deposit the vault with the Echo Valley and the Pirate Island. 』
The pig's father paused, drank a mouthful of water, and then said, "as we know two places have successfully recorded this transaction, we can keep saving, even if there is a problem in one place." For example, we know that the muddy town and Echo Valley have successfully recorded the transaction, but the pirate Island has been able to work properly because of some problems that have caused feedback delays. And then there was a sudden problem on this side of the town, and there was no external service, but we were able to provide services, because we knew that the latest trading information had been recorded in the Echo Valley, and we would be able to get the right amount of money from the Echo Valley side. However, at this time we still have only two places to work properly, so if there is a problem in the second place, we still cannot provide services. So, if we are to tolerate two of places where problems arise, but the system can still provide services externally, we need--"
"We need to deploy the service in five places, right, Father Pig. Miss Rabbit interposed directly.
"Yes, very rightly, Miss Rabbit." "Father Pig heartily praised the way."
"Then I think we should first consider three places, tolerate a place can not work, it is in the Echo Valley and Pirate Island also set up branch." 』
"OK, Miss Bunny, but I'm a little worried about the pirate island ... 』
"So decided, pig father", Miss Rabbit did not wait for the pig father said finished, directly made a decision.
Well, say so much, or come back to reality. In the example above, we assume that money can be copied into different vaults, but real banks do not. In order to design a high-availability system, a single point of the problem is to be solved, after all, if there is a problem, the whole system will not be able to serve. To solve this problem, we need to deploy the system in multiple places, but this introduces another problem, the data consistency problem.
CAP
Here's a brief talk about CAP, which is consistency, usability, and zoning tolerance. Because in the distributed system, P must be avoided, so we just choose C or select a problem. Usually a is capable of HA, which is highly available, so for systems that require complete data security, we will definitely choose C. In order to ensure that C, when we write data, we must ensure that at least quorum node is successfully written to the data, we will think this write is successful. In Raft, if a log is successfully received by a quorum node, then we can assume that the log has been committed.
Usually, the C we're talking about is linear consistency, that is, I write a value at a certain time, then at any time after that point, we read the newest value, not the old one. After the data is written to the quorum node, our read will be able to read the most recent value if it is also guaranteed to be read at the quorum node. This is Amazon's Dynamo approach, but this puts the burden of linear consistency guarantees on the client that reads the data. Raft has adopted another simple approach that we continue to follow.
Summary
Well, said so much, said so much, in fact, also mentioned a few Raft concept. To summarize here, Raft uses Log Replication + state machine to handle the consistency of distributed data, which is now a common practice. For Raft, the ID of log must be added a monotonically increment, if a log is at least quorum node accept, we can think that this log is committed, and then you can apply it, when log is applied, change log is appli Ed up. Later, we will begin to discuss Raft's Leader.