This document and structure are basically plagiarized from an article in Infoq, and Wikipedia. The words are basically their own ideas
From a standalone RDMS to a distributed database
Once Upon a while ... Everyone is using a single-node database. For example: SQL Server, MySQL, Oracle ...
If we want to improve overall performance, we have to improve the capability of single node vertically. It's simple, but expensive, and it's easy to reach the upper limit.
Later... Everyone came up with a variety of ways: Master-slave copy, sharding, sub-Library,
An option (available or consistent?)
The original database is a single node, there is no consistency problem. When you enter the distributed world, you are faced with a choice.
For example, as an example of MySQL master-slave replication, when the master node is written, the direct return succeeds, or the replication to the slave node is completed before it succeeds?
The former guarantees availability, but it loses strong consistency, but asynchronous replication also guarantees eventual consistency.
The latter guarantees consistency, but it obviously loses some performance.
Cap theory
Cap theory in Distributed systems is in full swing, everyone is saying. Do you really understand that?
Let's re-comb it.
The key definitions must not be less:
- C---consistency consistency
- A---availability of availability
- P---Partition tolerance zoning tolerance
This theorem originates from the computer scientist Eric Brour of the University of California (University of California, Berkeley) in the 2000 Distributed Computing Principles Workshop (Symposium on Principles of A conjecture proposed by distributed Computing (PODC). [5] In 2002, Seiss Gilbert and South Hill Linch of the Massachusetts Institute of MIT (MIT) published a proof of Brewer's conjecture, making it a theorem.
According to the theorem, the distributed system can meet only two of the three items and cannot satisfy all three items [4]. The simplest way to understand the cap theory is to imagine two nodes on each side of the partition. Allowing at least one node to update the state results in inconsistent data, i.e. the loss of C nature. If the node on the partition side is set to not be available for data consistency, the A property is lost. Unless two nodes can communicate with each other, both C and a are guaranteed, which can lead to the loss of P-Properties.
Clarification of the CAP theory
After 10 years of proposing his cap theory, Dr. Brewer issued a statement clarifying that his initial "three-choice two" view was greatly simplified in order to generate discussion and help to transcend acid. However, this great simplification has led to numerous misunderstandings and misunderstanding. According to him, the cap three dimensions, should not be the 0,1 value, but should be the range.
Let us first analyze: ap,cp good understanding. But what does AC mean? That means you can't partition, which means it's not a distributed system. This is clearly not the scope of our discussion.
So since it's a distributed system, it means either an AP or a CP.
When the network is in good condition, the partition does not exist: it is not the choice between availability and consistency, but the choice between consistency and performance
When the network is not good and there is a partition, you will select one between the AP or CP.
For example, when there is a network outage between nodes,
If you choose consistency, it means that the network recovery data is not available until it is synchronized between nodes. Cp
If you choose availability, it means that we give up the direct synchronization of each node, we select the AP
In the real world, no one will give up usability, the actual solution is that the AP then gets the final consistency after the network is restored
Selection of several no-sql schemes
Can be seen:
Mongodb,hbase,redis chooses strong consistency, and it's definitely a compromise of overall performance.
Instead, Cassandra,dynamodb chooses the availability + eventual consistency. Theoretically, it should be better than some of the previous performance.
The follow-up on Cassandra,mongodb,hbase will be analyzed in detail in other blog post.
Summary cap According to the previous description, we can see: 1. CAP three really only choose 22. In distributed systems we can only select CP or AP3. The network is constantly distributed system, we choose consistency or performance
Understanding the CAP again