Understanding the CAP again

Source: Internet
Author: User
Tags cassandra

This document and structure are basically plagiarized from an article in Infoq, and Wikipedia. The words are basically their own ideas

From a standalone RDMS to a distributed database

Once Upon a while ... Everyone is using a single-node database. For example: SQL Server, MySQL, Oracle ...

If we want to improve overall performance, we have to improve the capability of single node vertically. It's simple, but expensive, and it's easy to reach the upper limit.

Later... Everyone came up with a variety of ways: Master-slave copy, sharding, sub-Library,


An option (available or consistent?)


The original database is a single node, there is no consistency problem. When you enter the distributed world, you are faced with a choice.

For example, as an example of MySQL master-slave replication, when the master node is written, the direct return succeeds, or the replication to the slave node is completed before it succeeds?

The former guarantees availability, but it loses strong consistency, but asynchronous replication also guarantees eventual consistency.

The latter guarantees consistency, but it obviously loses some performance.


Cap theory

Cap theory in Distributed systems is in full swing, everyone is saying. Do you really understand that?

Let's re-comb it.

The key definitions must not be less:

    • C---consistency consistency
    • A---availability of availability
    • P---Partition tolerance zoning tolerance


This theorem originates from the computer scientist Eric Brour of the University of California (University of California, Berkeley) in the 2000 Distributed Computing Principles Workshop (Symposium on Principles of A conjecture proposed by distributed Computing (PODC). [5] In 2002, Seiss Gilbert and South Hill Linch of the Massachusetts Institute of MIT (MIT) published a proof of Brewer's conjecture, making it a theorem.

According to the theorem, the distributed system can meet only two of the three items and cannot satisfy all three items [4]. The simplest way to understand the cap theory is to imagine two nodes on each side of the partition. Allowing at least one node to update the state results in inconsistent data, i.e. the loss of C nature. If the node on the partition side is set to not be available for data consistency, the A property is lost. Unless two nodes can communicate with each other, both C and a are guaranteed, which can lead to the loss of P-Properties.

Clarification of the CAP theory


After 10 years of proposing his cap theory, Dr. Brewer issued a statement clarifying that his initial "three-choice two" view was greatly simplified in order to generate discussion and help to transcend acid. However, this great simplification has led to numerous misunderstandings and misunderstanding. According to him, the cap three dimensions, should not be the 0,1 value, but should be the range.

Let us first analyze: ap,cp good understanding. But what does AC mean? That means you can't partition, which means it's not a distributed system. This is clearly not the scope of our discussion.

So since it's a distributed system, it means either an AP or a CP.

When the network is in good condition, the partition does not exist: it is not the choice between availability and consistency, but the choice between consistency and performance

When the network is not good and there is a partition, you will select one between the AP or CP.

For example, when there is a network outage between nodes,

If you choose consistency, it means that the network recovery data is not available until it is synchronized between nodes. Cp

If you choose availability, it means that we give up the direct synchronization of each node, we select the AP

In the real world, no one will give up usability, the actual solution is that the AP then gets the final consistency after the network is restored


Selection of several no-sql schemes


Can be seen:

Mongodb,hbase,redis chooses strong consistency, and it's definitely a compromise of overall performance.

Instead, Cassandra,dynamodb chooses the availability + eventual consistency. Theoretically, it should be better than some of the previous performance.

The follow-up on Cassandra,mongodb,hbase will be analyzed in detail in other blog post.

Summary cap According to the previous description, we can see: 1. CAP three really only choose 22. In distributed systems we can only select CP or AP3. The network is constantly distributed system, we choose consistency or performance

Understanding the CAP again

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.