Cap theory has been used by many people as a gold law for distributed system design, but there are a lot of misconceptions about the three attributes of cap, so what is cap theory? Cap was originally a conjecture, when Daniel Brewer, in the 2000 PODC Congress, argued that when designing a massively scalable network service, you would encounter three features: consistency (consistency), availability (availability), Partition fault tolerance (partition-tolerance) is required for scenarios, but this is not possible to achieve all. Then, in 2003, MIT's Gilbert and Lynch formally proved that the three traits were not to be combined.
CAP is an abbreviation for consistency, availablity, and Partition-tolerance . respectively refers to:
1) consistency (consistency) : each read operation guarantees that the latest data is returned. That is !
2) availability (availablity) : Any node that does not fail will return a normal result within a reasonable time. That is to say, !
3) partition tolerance (partition-torlerance) : Services can still be provided when a network partition occurs between nodes. That is, The network connection between the nodes is invalid and the request can still be processed !
In fact, any distributed system needs to meet these three of the two. Cap theory points out:cap Three can only take the second, not both. In fact, this is very well understood, for the following reasons:
1- first, the single machine can only guarantee the CP.
2- when there are two or more nodes, when a network partition occurs, two nodes in the cluster cannot communicate with each other (that is, there is no guarantee of availability a). At this point, if the consistency of the data C, then there must be a node is marked as unavailable state, violating the requirements of availability A, only the CP can be guaranteed.
3- Anyway, if the availability of a, that is, two nodes can continue to process the request, then because the network is not able to synchronize data, it will inevitably lead to inconsistent data, can only guarantee the AP.
One, single instance
Stand-alone system and obviously, only the CP is guaranteed, sacrificing availability a. A single version of the MYSQL,REDIS,MONGODB database is this mode.
In practice, we need a high availability system that can continue to serve even after some of the machines have been hung.
Second, multiple copies
Compared to a single instance, there is one more node to back up the data.
For read operations, availability is increased because any one of the two nodes can be accessed.
For write operations, there are three scenarios based on the update strategy:
1) Synchronous Update: That is, the write operation needs to wait for two nodes to be updated successfully before returning. In this case, if a network partition failure occurs, the write operation is not available, sacrificing a.
2) Asynchronous update: That is, the write operation is returned directly, do not need to wait for the node to update successfully, the node asynchronously to update the data (Fastdfs file system Storage node is in this way, after writing a copy of the data immediately return results, the copy data is written by the synchronization thread to other nodes of the same group). In this way, C is sacrificed to guarantee a, which means there is no guarantee that the data will be updated successfully, and it may cause data inconsistency due to network failure.
3) Compromise: Update some nodes to return when they are successful.
Third, Shard
Compared to a single instance, there is one more node to split the data.
Because there is only one copy of all the data, consistency is ensured, there is no need for communication between nodes, and zoning tolerance is also available.
However, when any one node is hung up and a portion of the data is lost, system availability is not guaranteed.
In summary, this is the same as the standalone version of the scheme, can only guarantee CP.
So what are the benefits?
1) A node hangs up will only affect some services, that is, service degradation;
2) because the data is fragmented, the load can be balanced;
3) Increase/decrease the amount of data can be the corresponding expansion/contraction capacity.
Most database services provide the functionality of sharding. such as Redis's Slots,cassandra patitions,mongodb shards and so on.
Based on sharding to solve the problem of large data volume, but we still want our system is highly available, then, how to sacrifice a certain consistency to ensure availability?
Iv. Cluster
As you can see, the above approach combines the first two methods. In the same analysis, different data synchronization strategies are used, and the system cap is guaranteed to be different. However, the general database system will provide an optional configuration, we choose different characteristics according to different scenarios.
In fact, for most non-financial internet companies, the requirements are not strong consistency, but a guarantee of availability and eventual consistency. This is one of the main reasons why NoSQL is popular in internet applications, and it is more inclined to base than the acid principle of strong consistency systems:
-Basically Available: basic availability, which allows partitioning to fail, except for the problem of only service demotion;
-Soft-state: Soft state, that is, allow asynchrony;
-Eventual consistency: final consistency, allowing final consistency of data, not always.
V. Summary
Basically, several of the methods discussed above already cover most of the distributed storage systems.
In fact, for large-scale distributed systems, the CAP is very stable, can be expanded in a few places.
It largely limits the capacity for large-scale computing, and some design approaches to bypassing cap-governed areas may be key to the next large-scale system design.
As you can see, these schemes always need to be sacrificed to another part of the package, and cannot reach the 100% cap. Which option to choose is based on what features are more important in a given scenario.
Distributed system and Cap theory analysis under Linux