Blockchain Quick Start (ii)--Distributed system core technology one, the consistency problem of distributed System 1, the consistency problem of distributed system
As Moore's law hits the bottleneck, more and more cases rely on scalable distributed architectures to achieve massive processing power. The first problem that the single point structure evolves into distributed structure is the consistency of data. If multiple nodes in a distributed cluster cannot guarantee the consistency of the processing results, the business system on which it is built will not function properly. Blockchain system is a typical distributed system, and it is necessary to consider the consistency problem in design.
When facing large-scale complex task scenarios, a single point of service is often difficult to solve both the extensible (Scalability) and fault-tolerant (fault-tolerance) requirements, the need for multiple servers to form a clustered system, virtual for a more powerful and stable "super server."
The larger the cluster size, the greater the processing power, and the greater the complexity of the management. Currently running large-scale clusters including Google's search system, through hundreds of thousands of servers to support the entire Internet content search services.
In general, different nodes in a clustered system may be in different states, receive different requests at any time, and remain consistent with the external response at all times.
Consistency (consistency) in the field of distributed system refers to a number of service nodes, given a series of operations, under the guarantee of the Agreement protocol to enable multiple nodes to achieve a certain degree of coordination of processing results.
Ideally (regardless of node failure), if each service node strictly follows the same processing protocol (that is, the same state machine logic), when given the same initial state and input sequence, you can ensure that each step in the process performs the same result. Therefore, the traditional distributed system discusses consistency, usually refers to the external arbitrary initiation of requests (such as sending different requests to multiple nodes), to ensure that most of the nodes in the system actually processing the request sequence consistency, that is, the request is globally ordered.
Consistency is concerned with the state of the system's rendering, and does not pay attention to the correctness of the results; For example, all nodes have a negative state for a request and are also consistent.
2. The challenge of Distributed system
The following challenges exist in a typical distributed system:
A, the node can only through the message to interact and synchronize, and network communication is unreliable, there may be any message delay, chaos and error.
B, the time of the node processing request is not guaranteed, the result of processing may be wrong, even the node itself may fail at any time.
C, in order to avoid conflicts, the use of synchronous call can simplify the design, but it will seriously reduce the scalability of the system, or even degenerate into a single point system.
3, the requirements of distributed system consistency
The process of achieving agreement on a distributed system should meet the following requirements:
A, termination (termination)
Consistent results are achieved within a limited time.
B, about same-sex (agreement)
The results of the final decision of the different nodes are the same. To ensure interoperability, distributed systems typically globally and uniquely sort multiple events that occur in different time and space, and the order must be recognized by all nodes.
C. Legality (validity)
The result of a decision must be a proposal made by a node.
4, Distributed system with constraint consistency
Strict consistency (Strict consistency) to achieve absolute ideals is costly. The entire system is actually equivalent to a single point system, unless there is no failure of the system and communication between all nodes takes no time at all. In fact, the stronger the consistency requirement often means the weaker processing performance, and the worse the scalability. Depending on the actual requirements, the consistency of different intensities can be chosen, including strong consistency (strong consistency) and weak consistency (Weak consistency).
Strong consistency consists of two categories:
A, sequential consistency
Sequential consistency (sequential consistency): Leslie Lamport1979 Classic paper "How to make a multiprocessor computer" correctly executes Multi Process Programs is a strong constraint that ensures that all processes see the same global execution order (total order), and that each process sees its own execution order (local order) consistent with the actual order of occurrence. For example, when a process executes a and then executes B, the actual resulting global result should be a in front of B, not vice versa. At the same time, all other processes should see this order globally as well. Sequential consistency actually restricts the partial ordering of directives within each process, but does not globally sort by physical time between processes.
B, linear consistency
Linear consistency (linearizability consistency): Maurice P. Herlihy and Jeannette M.wing in the 1990 classic paper Linearizability:a correctness Condition for Concurrent Objects, it is proposed that, in order consistency, the sequence of operations between processes is enhanced to form a unique global order (the system equivalence is then executed sequentially, and all operations seen by all processes are in the same sequential order and are consistent with the actual order of occurrence) , is a strong atomic guarantee. However, it is difficult to achieve, now basically either rely on the global clock or lock, or through some complex algorithm implementation, performance is usually not high.
Achieving strong consistency often requires an accurate timing device. The high-precision quartz clock has a drift rate of 10^-7, and the most accurate rate of drift for an atom oscillator is 10^-13. Google has adopted an atomic clock and GPS-based TrueTime scheme in its distributed database spanner to control time skew in different data centers within 10ms. Without considering the cost, the TrueTime scheme is simple, rough, but effective.
Strong and consistent systems are often difficult to implement, and there are many scenarios where consistency needs are not as strong. Therefore, it is necessary to relax the requirement of consistency and reduce the difficulty of the system implementation. For example, under certain constraints to achieve the so-called final consistency (eventual consistency), there will always be a moment (not immediately), so that the system to achieve a consistent state.
For example, when e-commerce purchases an item into a shopping cart, it may indicate that the item is sold out when the final payment is made. In fact, most web systems achieve eventual consistency in order to maintain service stability.
Weak consistency, in contrast to strong consistency, is a weakening of consistency in some respects, such as eventual consistency.
Ii. Introduction to distributed consensus Algorithm 1. Introduction to distributed consensus
Consensus (Consensus) is usually discussed in conjunction with consistency (consistency) terminology. Strictly speaking, the meaning of the two is not exactly the same.
The meaning of consistency is broader than consensus, and it has different meanings in different scenarios (transaction-based databases, distributed systems, etc.). In a distributed system scenario, consistency refers to the state in which multiple replicas are rendered externally. such as sequential consistency and linear consistency, the common maintenance capability of multi-node to data state is described. Consensus, in particular, refers to the process of agreeing on something (such as the order of execution of multiple transaction requests) between multiple nodes in a distributed system. Therefore, reaching a certain consensus does not mean that consistency is guaranteed.
In practice, to guarantee the system to meet the different degree of consistency, often need through consensus algorithm to achieve.
The consensus algorithm solves a process in which most nodes in a distributed system agree on a proposal (proposal). The meaning of the proposal is very broad in distributed systems, such as the order in which multiple events occur, the value corresponding to a key, and so on. Any information that can be agreed upon is a proposal. For distributed systems, each node is usually the same deterministic state machine model (also known as the state machine replication problem, State-machine Replication), which receives the same order of instructions from the same initial state, which guarantees the same result state. Therefore, the key to reach consensus among multiple nodes in a distributed system is to agree on the sequence of multiple events, that is, sequencing.
2. The challenge of distributed consensus
There are two basic issues to be solved in a distributed system:
A, how to propose a consensus proposal, such as through token transfer, random selection, weight comparison, solve problems and so on.
B, how to allow multiple nodes to agree on the proposal (consent or rejection), such as voting, rule validation.
In the actual distributed system, the communication between different nodes is delayed (physical limit of speed of light, communication processing delay), and any link can be faulty (the larger the system size, the higher the probability of failure). If the communication network will be interrupted, the node will fail, and even exist by the node intentionally forged messages, undermine the normal consensus process.
In general, it is called "Non-Byzantine error (Non-byzantine Fault)" or "Fault error (Crash Fault)" When a fault occurs (Crash or fail-stop, which is not responding) but does not forge information, and the case of spoofed information malicious response is called "Byzantine error". (Byzantine Fault), the corresponding node is the Byzantine node. The Byzantine error scene is more difficult to agree on because of the presence of troublemakers.
3. Common distributed consensus algorithm
The consensus algorithm can be divided into CFT (Crashfault tolerance) and BFT (Byzantine Fault tolerance), depending on whether the scenario is permissible for Byzantine error scenarios.
For non-Byzantine errors, the classical consensus algorithm includes Paxos (1990), Raft (2014) and its variants. The CFT class fault-tolerant algorithm usually performs better, handles faster, and tolerates no more than half of the failed nodes.
For cases where Byzantine errors are to be tolerated, including pbft (Practical Byzantine Fault tolerance,1999 years) as the representative of the Deterministic series algorithm, PoW (1997) as the representative of the probabilistic algorithm. Once the deterministic algorithm has reached a consensus is irreversible, that is, the consensus is the final result, and the probability algorithm of the consensus result is temporary, with the passage of time or some kind of reinforcement, the probability of the consensus result is less and less, and eventually become the result of the fact. The Byzantine class fault tolerant algorithm usually has poor performance and tolerates no more than 1/3 fault nodes.
In addition, recently proposed improvements such as XFT (cross Fault tolerance,2015) can provide a response speed similar to CFT and provide BFT protection when most nodes are working properly.
The Algorand algorithm (2017) is improved based on PBFT, which solves the problem of proposal selection by introducing verifiable random function, and can theoretically achieve better performance (1000+TPS) in the premise of tolerating Byzantine errors.
In practice, it is common for clients to obtain consensus results by themselves, typically by accessing enough multiple service nodes to compare the results to ensure the accuracy of the results.
Three, FLP impossible principle 1, FLP principle Introduction
FLP Impossibility principle: In a minimized asynchronous model system where the network is reliable but allows for node failure (even if there is only one), there is no deterministic consensus algorithm to solve the consistency problem.
The FLP impossibility principle was presented and proved in 1985 by Fischer,lynch and Patterson Three scientists in the paper "Impossibility of distributed Consensus with one faulty Process".
The FLP impossibility principle indicates that it is not a waste of time to attempt to design a consensus algorithm for an asynchronous distributed system that targets arbitrary scenarios.
2. Synchronization and asynchronous of distributed system
The definition of synchronous and asynchronous in a distributed system is as follows:
Synchronization refers to the limit of the clock error of each node in the system, and the message delivery must be completed within a certain period of time, otherwise it is considered to be a failure, and the time for each node to complete processing of the message is certain. Therefore, the synchronization system can easily determine whether the message is lost.
Asynchronous refers to the different nodes in the system may have a large clock difference, while the message transmission time is arbitrarily long, each node processing the message may be any length of time. Therefore, it is not possible to determine the cause of a message being unresponsive (node failure or transmission failure).
In real life, distributed systems are usually asynchronous systems.
3. Significance of FLP principle
The FLP impossibility principle actually shows that a purely asynchronous system cannot ensure that the consensus is completed in a limited time for the allowable node failure. Even in the premise of non-Byzantine errors, including Paxos, raft and other algorithms also exist in the extreme situation can not reach consensus, but in engineering practice the probability is very small.
The FLP impossibility principle does not mean that the research consensus algorithm is meaningless. Academic research is usually considered to be mathematically and physically idealized situations, many times the real world is much more stable, the project to achieve a consensus failure, try again a few times, it is likely to succeed.
Iv. Introduction to Cap Principle 1, Cap principle
The CAP principle was first presented in 2000 by Professor Eric Brewer of the University of California, Berkeley, at the principles of distributed Computing (PODC) Seminar at ACM, when the MIT Nancy Lynch and other scholars have carried out theoretical proof.
The CAP principle is considered as one of the important principles in the field of distributed systems, and it deeply influences the development of distributed computing and system design.
Cap principle: Distributed systems cannot simultaneously ensure consistency (consistency), availability (availability), and partition tolerance (Partition), which often requires a weakening of the requirements for a particular feature.
Consistency (consistency): Any transaction should be atomic, and the state on all replicas is the result of a successful transaction submission and remains strong and consistent.
Availability (availability): The System (non-failed node) can complete the response to an operation request within a limited time.
Partition tolerance (Partition): The network in the system may have partition failure (become multiple subnets, even nodes on-line and offline), that is, the communication between the nodes is not guaranteed. Network failure should not affect the normal service of the system.
The CAP principle argues that distributed systems can guarantee only two of the three attributes. When a network may be partitioned, the system cannot guarantee consistency and availability at the same time. Either the node receives the request and does not answer (sacrificing availability) without being acknowledged by the other node, or the node can only answer non-conforming results (sacrificing consistency).
Because most of the time the network is considered reliable, the system can provide consistent and reliable services, and when the network is unreliable, the system either sacrifices the consistency (in most scenarios) or sacrifices the availability.
Network partitioning is possible, and the partitioning situation is likely to cause a brain fissure, and multiple new primary nodes may attempt to close other primary nodes.
2. Cap Principle Application Scenario
Cap three features are not guaranteed at the same time, so it is necessary to design a system to weaken the support for a particular feature. Three scenarios can be defined based on the CAP principle:
A, weakening consistency
Applications that are insensitive to result consistency can allow for a period of time after the new version is online to be successful, with no guarantee of consistency during the final update. For example, site static page content, real-time weak query database and so on, simple distributed synchronization protocol such as gossip, and Couchdb, Cassandra database, etc. have weakened the consistency.
B. Weakening usability
Applications that are sensitive to results consistency, such as bank teller machines, will be rejected when the system fails. MongoDB, Redis, mapreduce, etc. all weaken usability.
Paxos, raft and other consensus algorithms mainly deal with consistency-sensitive situations. In the Paxos class algorithm, there may be situations where the available results cannot be provided, while allowing a small number of nodes to be offline.
C, weakening the tolerance of zoning
In reality, the probability of network partitioning is small, but it is difficult to avoid it altogether.
Two-stage commit algorithm, some relational database and zookeeper mainly consider this kind of design.
In practice, the network can enhance the reliability through the mechanism of double channel, and realize the high stable network communication.
Five, acid principle and multi-stage submission 1, Acid principle introduction
ACID, i.e. atomicity (atomicity), consistency (consistency), isolation (isolation),
Durability (persistence) Four characteristics of the abbreviation.
ACID is a well-known principle for describing consistency, usually in a transactional process-based system such as a distributed database.
The ACID principle describes the consistency required for a distributed database to be met while allowing for the cost of availability.
Atomicity: Each transaction is atomic, and all operations contained by the transaction are either all successful or not executed at all. Once an operation fails, a fallback state is required before the transaction is executed.
Consistency: The state of the database is consistent and complete, with no intermediate state, before and after the transaction execution. That is, it can only be in the state after a successful transaction commits.
Isolation: Various transactions can be executed concurrently, but each other does not affect each other. According to the standard SQL specification, from weak to strong can be divided into unauthorized read, authorization read, repeatable read and serialization of four isolation levels.
Durability: The change of State is persistent and will not fail. Once a transaction commits, the state change it causes is permanent.
A principle that is relative to acid is the base (Basic availability,soft-state,eventual consistency) principle proposed by ebay technical expert Dan Pritchett. The base principle for large, highly available distributed systems is to sacrifice the pursuit of strong consistency and achieve eventual consistency in exchange for certain availability.
The acid and base two principles are actually different tradeoffs between the three characteristics of the CAP principle.
The research results for distributed transactional consistency include the well-known two-phase commit algorithm (two-phasecommit,2pc) and the three-phase commit algorithm (Three-phase commit,3pc).
2. Two-phase commit algorithm
The two-phase commit algorithm was first proposed by Jim Gray in 1979 in the paper "Notes on Database Operating Systems". The basic idea is very simple, since in the distributed scenario, the direct commit transaction can have a variety of failures and conflicts, then it can be decomposed into pre-commit and formally commit two phases to avoid the risk of conflict.
Pre-submission: The Coordinator (Coordinator) initiates an application to submit a transaction, with each participating performer
(Participant) need to try to commit and feedback whether it can be completed.
Formal submission: If the facilitator receives a successful response from all performers, the official submission request is made. If completed successfully, the algorithm executes successfully.
If there is a problem with any step during this process (for example, the pre-submission phase has a performer reply that is not expected to complete the submission), a fallback is required.
The two-phase commit algorithm is widely used in the relational database system because of its advantages of simple and easy implementation. The disadvantage of the two-phase commit algorithm is that the overall process requires synchronous blocking resulting in poor performance, and a single point of issue, which may not be possible to complete the commit in the worse case, and may result in inconsistent data (for example, coordinators and performers fail in the second phase).
3. Three-phase commit algorithm
Three-phase submissions are optimized for scenarios in which a partial performer may be blocked in the first phase of the two-phase commit algorithm, and the pre-submission phase is further split into two steps: attempt to pre-commit and pre-commit.
The complete process is as follows:
Try pre-commit: The coordinator asks the performer if it can commit a transaction. The performer needs to return a reply, but there is no need to execute the commit, which prevents some of the performers from being blocked by the invalid.
Pre-commit: The coordinator checks the collected replies and, if all, initiates the commit transaction request. Each participating performer (Participant) needs to attempt a submission and feedback on whether it can be completed.
Formal submission: If the facilitator receives a successful response from all performers, the official submission request is made. If completed successfully, the algorithm executes successfully.
Whether the two-phase commit or the three-phase commit, only to some extent alleviate the issue of the submission of the conflict, and does not necessarily guarantee the consistency of the system. The first effective algorithm for multi-phase commit is the Paxos algorithm.
Vi. Reliability Index 1, Reliability index Introduction
Reliability (availability, availability) is an important indicator of the ability of a system to provide services. Highly reliable distributed systems often require a variety of complex mechanisms to safeguard.
In general, service availability can be guaranteed with service level AGREEMENT,SLA, service-level INDICATOR,SLI, service-level Objective,slo and other aspects of the measurement. The reference values for reliability metrics that allow services to be unavailable every year are as follows:
In general, a single point of the server system should be able to meet at least two 9, the General enterprise Information System three 9 is sufficient; The system can reach four 9 is already the leading level (refer to the AWS and other cloud computing platforms); carrier-class applications generally need to be five 9, which allows for up to five minutes of service unavailable in a year. ; six 9 and more systems are rare, and achieving this often means a very high price.
2, two core time
In general, describe the probability of a system failure and the resilience of a failure, there are two basic indicators: MTBF and MTTR.
MTBF (Mean time between failures), which is the average failure interval, is the expected time for the system to run without fault.
MTTR (Mean time to Repair), or mean repair times, is the expected time at which the system can return to normal operation after a failure occurs.
MTBF measures the frequency of system failures, and if a system has a short MTBF, it means low system availability, while mttr reflects the resilience of the system to service after a failure, and if the system's MTTR are too long, it can take a long time for the system to recover the service if it fails.
A highly available system should be with as long as possible MTBF and as short as possible mttr.
3. Improve reliability
There are two ways to improve reliability: one is to make individual components in the system more reliable, and the other is to eliminate single points.
Relying on a single point of implementation of the reliability is limited, to further improve the reliability of the system, we have to eliminate the single point, through the master-slave, multi-live mode to allow multiple nodes to complete the original single point of work (distributed), can improve the overall reliability of services from a probabilistic sense.
Blockchain Quick Start (ii)--core technology of distributed system