The cap principle, also known as the cap theorem, refers to consistency (consistency), availability (availability), Partition tolerance (partition fault tolerance) in a distributed system, which cannot be combined.
The CAP principle is the cornerstone of a NoSQL database. Consistency (consistency). Availability (availability). Partition tolerance (partition fault tolerance).
The CAP theory of distributed systems: firstly, the three characteristics of distributed system are summarized as follows:
- Consistency (C): All data in the distributed system is backed up, at the same time whether the same value. (equivalent to all nodes accessing the same copy of the latest data)
- Availability (A): After a partial node failure in the cluster, the cluster is still able to respond to client read and write requests. (High availability for data updates)
- Partition tolerance (P): As a practical result, partitioning is equivalent to the time-frame required for communication. If the system cannot achieve data consistency within the time frame, it means that the partition has occurred and that a choice must be made between C and a for the current operation.
Choice Editor for consistency and usabilityThe CAP theory is that in a distributed storage system, only the above two points can be implemented. Because of the current network hardware is bound to delay packet loss and so on, so partition tolerance is what we must achieve. So we can only trade between consistency and usability, and no NoSQL system can guarantee these three points at the same time. For web2.0 websites, many of the main features of relational databases are often useless
- Database Transactional Consistency Requirements
Many web real-time systems do not require strict database transactions, the requirements for read consistency is very low, and some occasions write consistency requirements are not high. Allows for eventual consistency.
- The demand of realistic time and reading real-time of database
For a relational database, it is certainly possible to read this data immediately after inserting a piece of data, but for many Web applications it is not required to be so high-real-time, such as after sending a message, after a few seconds or even more than 10 seconds, my subscribers will see that this dynamic is completely acceptable.
- Requirements for complex SQL queries, especially for multi-table associated queries
Any large data volume of the Web system, are very taboo of multiple large tables of related queries, as well as complex data analysis type of report query, especially the SNS type of Web site, from the requirements and product design point of view, to avoid the emergence of this situation. Often more than just a single table of primary key query, as well as single table simple conditional paging query, SQL function has been greatly weakened.
relationship editing with NoSQLTraditional relational database is usually very broad in function support, from simple key-value query to complex multi-table joint query to the support of transaction mechanism. In contrast, NoSQL systems typically focus on performance and extensibility, rather than on transaction mechanisms (transactions are the embodiment of strong consistency) [2].
Traditional SQL database transactions are often a strong, acid-enabled transaction mechanism. A is atomic, that is, the execution of multiple operations in a transaction is atomic, either the operation of the transaction is executed in its entirety, or none is executed; C for consistency, that is, to ensure that the whole data in the process of the State is consistent, there will be no data to spend the situation, I represent isolation, that is, two transactions do not affect each other, covering each other's data, etc. d means persistence, that is, the transaction is completed, then the data should be written to security, Persistent storage on a device (such as a disk).
The NoSQL system provides only atomicity guarantees for the row level, meaning that two operations on data under the same key at the same time are executed serially at the time of execution, ensuring that each key-value pair is not destroyed. What's the relationship with cap?
It states, that though their desirable to having consistency, high-availability and partition-tolerance in every system, unfor Tunately No system can achieve all three at the same time.
In the design of distributed system, there is no one design can satisfy the consistency, availability, partition fault tolerance 3 features
Note: Do not confuse weak consistency with eventual consistency in the CAP theory (there are so many confusing concepts)
Weak consistency, eventual consistency you can assume that the C-point relationship with the CAP is not the same, because the C of the CAP is the exact, weak consistency of the data that any node sees when the update operation is complete. The final consistency itself and the C consistency of the cap are against it, so you can see how ridiculous it is to pretend that your system has the same cap 3 features, and perhaps more of the scene in the country is: Once an open person is on the podium, it shifts to the marketing staff, not even the most basic idea. .
Here is a very large article cap-twelve-years-later-how-the-rules-have-changed, in fact, the changed of this article is more in the way of thinking, and the cap theory itself is not changed
Why would that be?
Let's look at a simple question, a DB service built in two rooms (Beijing, Guangzhou), two DB instances providing both write and read
1. Assuming that the update operation of the DB is both Beijing and Guangzhou, the DB has successfully returned to success .
In the absence of network failure, to satisfy the CA principle, C that is, any one of my write, the update operation succeeds and return to the client after completion, the distributed all nodes at the same time the data exactly, A that my read and write operations can be successful, but in the event of a network failure, I can not guarantee the CA, That is, the P condition does not meet
2. assume that the update operation of the DB is write-only the local computer room successfully return, through the Binlog/oplog playback mode synchronization to the side room
This operation ensures that in the event of a network failure, both sides of the computer room can provide services, and read and write operations can be successful, means that he satisfies the AP, but it does not meet the C, because the update operation returned successfully, the data in the bilateral computer room DB can see a brief inconsistency, and in the event of a network failure, Inconsistent time difference can be very large (only to ensure final consistency)
3. assume that the update operation of the DB is both Beijing and Guangzhou are successful when the DB is successfully returned and a degraded service is provided in the case of network failure
Downgrading services, such as stop writing, provide read only, which ensures that the data is consistent and that the network fails to provide services to meet the CP principle, but he cannot satisfy the principle of availability
Select Trade-Offs
From the example above, we learned that we can never get the 3 features of the CAP at the same time, so how do we weigh the choices?
The key points chosen depend on the business scenario
For most Internet applications (such as NetEase portal), because of the large number of machines, the deployment of nodes scattered, network failure is the norm, the availability is must be guaranteed, so only set the consistency to ensure that the service of the AP, usually the most common high-availability services boast 5 of 9 6 9 Service SLA Stability It's all a drop. C Select AP
For scenarios that need to ensure strong consistency, such as banks, it is common to weigh the CA and CP models, the CA model network failure is completely unavailable, the CP model is partially available, and the actual choice needs to be weighed against the business scenario (not all cases the CP is better than the CA, Can only view information cannot update information Sometimes it is better to reject the service directly from the product level.
Extended
Base (basically Available, Soft state, eventual consistency basic availability, soft states, eventual consistency) extends to cap AP theory, and many systems like Redis are built with this theory
The common design concepts of acid traditional databases, acid and base represent two diametrically opposed design philosophies, at the poles of the consistency-availability distribution map.
Cap principle (Cap theorem)