Technical experts talk about CAP misunderstanding and hypothetical choice
There are a lot of discussions about the CAP principle, and there are usually misunderstandings in distributed systems. It stipulates that any system that connects to the Internet and shares data can guarantee at most two of the following three attributes: consistency, availability, and partition fault tolerance. I will not introduce CAP in detail here, because it involves many aspects, but "two of the three" must be misleading-although conceptually easy to understand. Brewer once pointed out this issue and has a lot of voices of consent, but there are still many disputes over this topic. The bottom line is that you cannot sacrifice partition fault tolerance, but it seems that CAP has some deviations in this aspect.
On the surface, CAP divides the system into three types. CA indicates a network system that achieves perfect availability while maintaining consistency and availability. CP achieves consistency and partition fault tolerance at the cost of certain availability, while AP achieves availability and partition fault tolerance without considering linear consistency. Obviously, CA implies that the system ensures consistency and availability only when no network partition exists. However, it is unrealistic to say that no network partition exists. This is the root cause of many disputes.
Partitions must exist. There are many reasons for their appearance. Switch faults, NIC faults, link layer faults, server faults, process faults, and so on May all lead to partition failures. Even if the system does not fail, other causes may cause partitioning, such as GC suspension or long latency. We need to accept this fact before continuing the analysis. This means that a "CA" system is CA only when it becomes non-CA. Once a partition occurs, all assumptions and all guarantees will have serious consequences in one way. Where will this problem not occur?
The core of CAP lies in balance compromise, but it is an exclusive principle. It tells us that the system cannot do anything under specific conditions? The difference is that not all systems can fit these models well. If Jepsen once taught us something, it must make us know that most systems do not conform to these classifications, even if the designers say they do. In practice, CAP is not only black and white.
Nicolas Liochon recently wrote a series of very good CAP articles. He explained this obscure and easy-to-misunderstand term (much better than I have explained) and put forward some very meaningful points. Nicolas believes that CA should actually be seen as a norm of operation, while CP and AP are descriptions of behavior. I agree with this, but my problem is that it has avoided a certain balance compromise.
We know that network partitions are unavoidable. If we give the application such a rule: "This application will not process network partitions. If a network partition occurs, the application partially becomes invalid, the data may be damaged, and you may have to manually repair the data ." In other words, we actually require CA here, but if a partition appears, it may belong to CP; or, unfortunately, it also loses both availability and consistency.
In the scope of operation, CA actually means that when a partition appears, the system will spread out both hands and send a message: "I broke down !" If we specify that the system cannot work normally in the network partition, that is, the partition is not in the operational scope. On Earth, what is the significance of specifying a specification for a space ship designed to fly to the atmosphere of triara? We are in a world where partitions exist, so we must support partitions in the operational scope. The CA defines an operation scope, but you cannot write it to the SLA and then hand it over to the customer. In general, when there is no definition, it is just a "undefined behavior" Mode-the system is consistent and available. CAP is not a perfect concept, but in my opinion, it really emphasizes some basic compromise issues that need to be considered during the process of building a distributed system. Whether written or not, they exist. If it is written down, we cannot guarantee availability. In the face of partitions, the CAP seems to be only one choice for consistency and availability. In fact, there are not only two options. You can select either AP, CP, or both. The problem between the two is that it is difficult for us to launch it, or even difficult to define it. In the end, it is just an illusion of choice, because we cannot sacrifice partition fault tolerance.