Data consistency (consistency), service availability (availability), Partition fault tolerance (partition-tolerance)
Theory base of distributed systems-CAP
2016-04-04 18:27 by Bangerlee, 135 reading, 0 reviews, Favorites, compilation
Introduction
CAP is the most discussed theory in distributed systems, especially distributed storage, "What is the cap theorem?" "Ranking FAQs in the Quora Distributed Systems category. Cap in the programmer also has a broader popularity, it is not only "C, A, p can not meet at the same time, up to 3 select 2", the following attempts to synthesize views from the development history, engineering practice and other aspects of the CAP theory. I hope you will learn more about cap theory through this article.
Cap theorem
The cap was presented by Eric Brewer at the 2000 PODC meeting, which is about data consistency (consistency), service availability, when Eric Brewer developed a search engine, distributed Web cache during the inktomi[3] period ( availability), partition fault tolerance (partition-tolerance) conjecture:
It is impossible for a Web service to provide the three following guarantees:consistency, availability and Partition-tol Erance.
The conjecture, which was established two years later, [4], became known as the cap theorem:
- Data Consistency (consistency): If the system is successful in a write operation, then the read request must read the new data, if the return fails, then all read operations can not read this data, for the caller, the data is strong consistency (strong consistency) ( Also known as atomic Atomic, linear consistency linearizable consistency) [5]
- Availability of Services (availability): All read and write requests are responded to within a certain period of time, can be terminated, not always waiting
- Partition fault Tolerance (partition-tolerance): In the case of the network partition, the separated node can still serve the service normally.
At some point, if the AP is satisfied, the separated nodes at the same time the external service but can not communicate with each other, will lead to inconsistent state, that can not meet C; If the CP is satisfied, the network partition in the case of C, the request can only wait, that is, A; Cannot meet p if the network partition cannot be expected.
C, A, p three can only meet two of them, and the FLP theorem, the cap theorem also indicates an unreachable result (impossibility result).
Engineering implications of CAP
After the CAP theory was proposed 7 or 8 years later, the NoSQL circle used the CAP theory as a basis for countering the traditional relational database, stating that it was correct to relax the requirements of data consistency (consistency) [6], and then aroused a wide range of discussions about cap theory.
The cap theory seems to give us a choice of 3 choices 2, but there are many practical constraints in engineering practice, and we need to do more consideration and trade-offs to avoid entering cap misunderstanding [7].
1, about the understanding of P
Partition literally means the network partition, that is, because the network factor separates the system into a number of separate parts, one might say that the probability of the network partition is very small Ah, do not have to consider p, to ensure that the CA is good [8]. To understand p, we look back at the definition of P in the CAP proof [4]:
In order to model partition tolerance, the network would be allowed to lose arbitrarily many messages sent from one node to another.
The network partition condition conforms to this definition, the network drops the situation also conforms to above definition, the other node goes down, the other node sends to the outage node the packet also will be lost, this kind of situation also conforms to the definition. In reality, we are faced with an unreliable network, with a certain probability of downtime of the device, both factors will lead to partition, so the implementation of the distributed system P is a must, rather than the optional [9][10].
For distributed Systems Engineering practice, the CAP theory is more appropriate to describe: in the premise of satisfying partition fault tolerance, no algorithm can meet data consistency and service availability at the same time [11]:
In a network subject to communication failures, it's impossible for any Web service to implement an atomic Read/write Sha Red memory that guarantees a response to every request.
2, CA non-0/1 choice
P is a required option, then the 3 choice of 2 will not become data consistency (consistency), service availability (availability) 2 Select 1? There are different degrees of consistency in engineering practice, there are different levels of usability, and in the premise of ensuring partition fault tolerance, the relaxation of constraints can take into account consistency and availability, both of which are not either/or [12].
The consistency in the proof of the CAP theorem indicates strong consistency, and the strong consistency requires that the multi-node composition be tuned to operate like a single node and that the operation is atomic, and the data is required in time and timing. If these requirements are relaxed, there are other consistency types:
- Sequence consistency (sequential consistency) [13]: does not require a consistent timing, a operation before the B operation, after the B operation if all the call end read operations to get the result of a operation, to satisfy the sequence consistency
- Final consistency (eventual consistency) [14]: Relaxation of time requirements, at a certain point in time after being tuned to complete the operation response, the data of the multiple nodes is finally agreed
Usability, in the cap theorem, means that all read and write operations must be terminated, the actual application from the keynote, the two different perspectives, usability has different meanings. When P (the network partition) appears, the keynote can only support read operations, achieving data consistency by sacrificing some of the availability.
In engineering practice, it is more common to use asynchronous copy copy (asynchronous replication), QUORUM/NRW, to realize the data is consistent at the end of the call, the end of the end is the same, at the end of the call, the service is available, the end of the port allows some nodes to be unavailable (or separated by the network ) effect [15].
3. Jump out of Cap
Cap theory is instructive for implementing distributed systems, but cap theory does not cover all the important factors in distributed engineering practice.
For example, latency (latency), which is an important metric for measuring system availability and directly related to user experience [16]. The availability of the CAP theory requires that the operation be terminated, not endlessly, and we are also concerned about how long it will take to end the operation, which is the delay, which deserves to be considered separately when designing and implementing a distributed system.
Delay and data consistency is a pair of "friends", if you want to achieve strong consistency, multiple copies of the data consistent, will inevitably increase the delay. Coupled with the delay, we get a revision of the CAP theory. pacelc[17]: If P (Network partition) appears, how to choose between a (service availability), C (data consistency), otherwise, how to choose between L (Delay), C (data consistency).
Summary
This paper introduces the origin and development of CAP theory, and introduces the revelation of Cap theory to the practice of distributed systems engineering.
Cap theory has a significant impact on the implementation of distributed systems, and we can choose between data consistency and service availability based on our business characteristics. By loosening the constraints, we can meet the cap (the cap in this cap's non-cap theorem, such as C replaced with eventual consistency) at different points in time [18][19][20].
There are very, very many articles discussing and studying cap theory, and hopefully this will help you to know and understand the cap theory.
[1] Harvest, Yield, and scalable tolerant Systems, Armando Fox, Eric Brewer, 1999
[2] Towards robust distributed Systems, Eric Brewer, 2000
[3] Inktomi ' s wild ride-a personal view of the Internet bubble, Eric Brewer, 2004
[4] Brewer ' s conjecture and the feasibility of consistent, Available, Partition-tolerant Web, Seth Gilbert, Nancy Lynch, 2 002
[5] linearizability:a correctness Condition for Concurrent Objects, Maurice p. herlihy,jeannette M. Wing, 1990
[6] Brewer ' s CAP theorem-the Kool Aid Amazon and Ebay have been drinking, Julian Browne, 2009
[7] CAP theorem between Claims and Misunderstandings:what are to be sacrificed?, Balla Wade Diack,samba Ndiaye,yahya Slima NI, 2013
[8] Errors in Database Systems, eventual consistency, and the CAP theorem, Michael Stonebraker, 2010
[9] CAP confusion:problems with ' partition tolerance ', Henry Robinson, 2010
[Ten] you Can ' t sacrifice Partition tolerance, Coda Hale, 2010
[One] perspectives on the CAP theorem, Seth Gilbert, Nancy Lynch, 2012
CAP Twelve years later:how the "Rules" has Changed, Eric Brewer, 2012
Multiprocessor computer that correctly executes multiprocess Programs, Lamport Leslie, 1979
[eventual] consistent databases:state of the ART, Mawahib Elbushra, Jan Lindström, 2014
[eventually] consistent, Werner Vogels, 2008
[+] Speed Matters for Google Web Search, Jake Brutlag, 2009
[consistency] tradeoffs in modern distributed Database System Design, Daniel J. Abadi, 2012
[+] A CAP solution (proving Brewer wrong), Guy's blog, 2008
How to beat the CAP theorem, Nathanmarz, 2011
[+] The CAP FAQ, Henry Robinson
Data consistency (consistency), service availability (availability), Partition fault tolerance (partition-tolerance)