How to solve the annoying data inconsistency problem? --Achieving data consistency through "consensus"

Source: Internet
Author: User
Tags ming prepare zookeeper
Read Catalogue
    • What is "consensus"? Why is it produced?
    • Question of the Byzantine general
    • BFT class algorithm
    • CFT class algorithm
    • Conclusion

  

This time to start a new series to write, talk about the focus of the distributed system. The rhythm will not be too tight, plan two weeks a more.

This article is the second article in this series. Is the previous article, "I do not know is the most understandable" data consistency "analysis of the follow-up content."

The previous article may be said to be too popular, forcing lattice is not high, not too popular to see. This article will continue to insist as easy as possible to understand, and firmly believe that more people can understand the greater value. Relatively speaking, however, the professionalism of the content has risen.

Once the data consistency problem has been analyzed, how to solve the inconsistency problem caused by the failure? This article will focus on the "consensus" point.

01 What is "consensus"? Why is it produced?

The consistency problem is actually a " result ", the essence is due to data redundancy, if there is no redundancy, there will be no consistency problem.

The various subsystems in the distributed system can cooperate with each other, is because of the redundancy of the same data as "token", otherwise I do not know your words, why should cooperate with you to work. So this "token" has changed, you have to inform me, otherwise I do not know you. The process of agreeing on a "pledge" change is called " consensus ". So:

Consistency is the result, and consensus is the process or means by which this result is achieved.

In distributed systems, the scenario of redundant data is not limited to this, because the larger the scale of the system, the more can not tolerate a subsystem problem after the butterfly effect, so often do high availability. Xiao Ming 1th is down. There are tens of thousands of small Ming X in the holding position, the ideal 24 hours a day to provide services ~. The essence of high availability is to store multiple copies of the same data and provide services externally. For example, every Xiao Ming x has a "massage fingering white paper", who leave the same can be provided by other Xiao Ming X to provide the same massage services. But this "massage fingering white paper" changed, you have to notify everyone, because this is the full and source of services, so in the high-availability cluster of data redundancy problem more prominent.

In fact, if each node in the distributed system can guarantee instantaneous response and trouble-free operation, it is easy to reach a consensus. Just like us, in a certain range as long as the roar of a voice, through the steady air transmission, whether the relevant person receives the message, and give a response can almost be "instantaneous". But as mentioned in the previous article, point me, such a system only stay in the imagination, response requests are often delayed, the network will be interrupted, node failure, and even malicious nodes intentionally to destroy the system. This has spawned the classic " Byzantine general question "[1 ".

02 Byzantine general question

We generally divide the question of the Byzantine general into 2 cases:

    1. Byzantine error. Represents an error caused by a malicious response by forging information.

    2. Non-Byzantine error. There are no errors resulting from the response.

The core of the problem is:

How to solve a change in a distributed network consistent execution results are recognized by the participating parties, and this information is determined, can not be overturned.

Like how to make all of Xiao Ming X received is "Massage fingering white Paper Ⅱ", and not the other, and the original copy destroyed. This problem derived a lot of "consensus" algorithm, to solve " Byzantine error " called Byzantine Fault Tolerance (BFT) class algorithm, to solve " non-Byzantine error " called Crash Fault Tolerance (CFT) class algorithm. It can be seen from the 2 names that the essential work is " fault tolerance ". Some small partners in the ordinary work may be "fault-tolerant" awareness of the importance of not so strong, do not produce a bug or abnormal data, but in the space field, a small error may lead to the failure of the entire launch, the cost is very large.

For the "Byzantine general question" want to understand deeply, you can consult the relevant information, here is not launched, the text attached to the paper at the end.

"Byzantine error" is not generally considered in our common software development, but it is a necessity for blockchain projects. But in the mainstream distributed database, can see "non-Byzantine error" figure, such as Tidb Paxos algorithm, cockroachdb raft algorithm. While we are all in the daily coding, understanding the underlying principles of the database is not an imperative. But at least "non-Byzantine error" is a hurdle that must be faced as long as we are involved in high availability at the application level.

BFT class algorithm

The BFT type algorithm also has 2 branches. " Deterministic-based " and " probability-based ".

Let's talk about "deterministic", which means that once a consensus on a result is reached, it is irreversible, that consensus is the end result. Its masterpiece is pbft (practical Byzantine Fault tolerance) algorithm [2], since the central bank endorsement (Blockchain digital ticket trading platform), a greater reputation. The principle of the algorithm, such as:

▲ Pictures from the network, the copyright belongs to the original author all

Take the military analogy, here the line C can be considered as "commander in chief", Straight line 0 is "Commander", Straight line 1, straight line 2, straight line 3 are "teachers", it is worth noting that the 3rd Division of the teachers. The whole process explains this:

    1. "request": The commander-in-chief gave an order to the commander, "Dry!" ”。

    2. "pre-prepare": The commander of the Army broadcast the order to 3 teachers.

    3. "prepare": Each teacher receives and agrees to send "received" to the commander and two other teachers.

    4. "commit": Each teacher received 2f teachers (captains do not do prepare) "received" request sent "at any time" to the captains and the other two teachers. (F for the number of Byzantine nodes that can be tolerated)

    5. "reply": Each teacher received 2f+1 "at any time to open dry" message, you can think commander in chief of the command in the relevant teachers have reached the "at any time to open dry" state, then he fired directly!

Really in-depth understanding of pbft words there is a lot of content, here will not continue to unfold, interested in the small partners themselves to check the paper address or the public attention to the post- office reply "consistency" package download .

To talk about "probability-based", the consensus result of such algorithms is temporary, with time or some kind of reinforcement, the probability of the consensus result being overturned is getting smaller, and becomes the de facto final result. Its masterpiece is the POW (Proof of work) algorithm, which once amounted to 2 w USD/A Bitcoin is based on this algorithm to achieve. The principle of the algorithm takes "fix the fairy" to make a simple analogy (the algorithm in the actual bit is more complex than this):

    • Their own efforts to cultivate, and let the immortal more than half of the people to recognize your practice, you agree to be immortal.

    • Then you become a fairy. And take part in judging whether the next person can be a "fairy" thing.

    • If this thing is to be achieved through bribery, as the number of the team, the greater the cost of bribery, you can think of the less people to do bribes, then the probability of being misjudged is lower, the more credible the end.

The probability formula of being misjudged is: 0.5 ^ number, if the number = 6, the probability of miscarriage is 1.5625%. If the number = 10, it is already 0.09765625%, the number of points down.

It is noteworthy that "deterministic" and "probability-based" standards for non-cooperative nodes are different, the former can tolerate at most 1/3, the latter is less thanthe first.

The algorithm of the CFT class

As mentioned above, the CFT class algorithm solves the problem of consensus in a distributed system where there is a failure, but there is no scenario where a malicious node exists (that is, a possible message is missing or duplicated, but no error message is present). Leslie Lamport, the "Byzantine general Problem", also raised "paxos questions in his other paper [3], similar to this. In this paper, we use a story to simulate this problem, as follows:

"Law enforcers " in the Greek island Paxon vote through the " Law " in the " Parliament Hall " and exchange information through " waiters " passing notes, each " enforcer " will pass the " Law " is recorded in its own " accounts ". The problem is that " law enforcers " and " waiters " are unreliable, and they can leave the halls of Parliament at any time, and there may be new " Law enforcers " entering the halls of Parliament. "to take a legal vote.

The way in which this voting process can be made is normal, and the passage of the " law " does not conflict.

--Baidu Encyclopedia

The key objects here are in our system, which can be likened to:

    • Congress Hall = Distributed System

    • Law enforcement = a program

    • Waiter = RPC Channel

    • accounts = database

    • Legal = One change operation

Leslie Lamport himself also proposed a solution to the problem of the algorithm, "Paxos" algorithm [4]. The key to this algorithm is represented by the following 3 definitions:

    • Each "change" has a unique ordinal number and can be used to identify the old and new

    • "Law enforcers" can only accept changes that are newer than the known "change"

    • Any two "changes" must have the same "law enforcement" participation

These 3 points are just the most critical part of ensuring consistency, and there are many more. Interested partners to check the article at the end of the paper address or attention to the public after the direct backstage reply "consistency" package download .

"paxos" algorithm is a non-leader (leaderless) algorithm, the implementation of more complex, so many variants to simplify it, the most famous should be "Raft", 2013 years before the advent. The "raft" algorithm is a leader (leadership) algorithm. The following 2 processes guarantee a consensus:

    • There will only be a living leader, who is responsible for synchronizing the data of the follower.

    • If the leader "loses the Union", then each follower can become the candidate, finally compares whose term newest, who is the new leader. This term is a self-increment that is maintained within each node.

Although the follower's vote is on a first-come-first-served basis, the same number of candidates for the same term (" split vote issue ") will be met, and a new round of balloting until the outcome is decided. Since the raft with random timers from the addition of the term, coupled with the network is not stable, so the probability of encountering the same number of votes is greatly reduced.

The complete process is more complicated, there is a raft algorithm animation recommended to everyone, interested can understand: http://thesecretlivesofdata.com/raft/.

Off-topic, we often use the ZooKeeper "ZAB" (ZooKeeper Atomic Broadcast) algorithm is the CFT class algorithm, is based on the fast Paxos algorithm for the implementation.

05 Conclusion

Looking back, we found that if we wanted more rigorous consistency, we would need to increase the number of mutual communication confirmations, but that would lead to poor performance, just like PBFT and Paxos. But the distributed system is like this, everywhere need Balance, find the most suitable is the most important .

After talking about the "consensus" issue at the data level, let's talk about the issue of "distributed Transactions" next, around the common cap and base theory.

Finally, if you want to be a data consistency expert, ask if there are shortcuts. To read the Father Leslie Lamport's paper is a shortcut, his personal homepage: http://www.lamport.org/.

Background reply "Consistency" keyword, can be packaged download yo ~

[1] The Byzantine generals problem, ACM transactions on programming Languages and Systems, Leslie lamport,1982.

Links: www.microsoft.com/en-us/research/uploads/prod/2016/12/The-Byzantine-Generals-Problem.pdf

[2] "practical Byzantine Fault tolerance", Miguel Castro&barbara liskov,1999.

Links: http://101.96.10.63/pmg.csail.mit.edu/papers/osdi99.pdf

[3] "The Part-time Parliament", Leslie lamport,1998.

Links: www.microsoft.com/en-us/research/uploads/prod/2016/12/The-Part-Time-Parliament.pdf

[4] In Search of an understandable Consensus algorithm, Diego Ongaro&john ousterhout,2013

Links: raft.github.io/raft.pdf

Zachary (Personal number: ZACHARY-ZF)

Public number (starter): cross-border architect. <--Click to read the popular article, or the right sweep code attention --

Regular publication of original content: architecture design 丨 Distributed Systems 丨 Products 丨 operations 丨 some deep thinking.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.