Understanding the CAP again

Last Update:2016-04-29 Source: Internet

Author: User

Tags cassandra

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This document and structure are basically plagiarized from an article in Infoq, and Wikipedia. The words are basically their own ideas

From a standalone RDMS to a distributed database

Once Upon a while ... Everyone is using a single-node database. For example: SQL Server, MySQL, Oracle ...

If we want to improve overall performance, we have to improve the capability of single node vertically. It's simple, but expensive, and it's easy to reach the upper limit.

Later... Everyone came up with a variety of ways: Master-slave copy, sharding, sub-Library,

An option (available or consistent?)

The original database is a single node, there is no consistency problem. When you enter the distributed world, you are faced with a choice.

For example, as an example of MySQL master-slave replication, when the master node is written, the direct return succeeds, or the replication to the slave node is completed before it succeeds?

The former guarantees availability, but it loses strong consistency, but asynchronous replication also guarantees eventual consistency.

The latter guarantees consistency, but it obviously loses some performance.

Cap theory

Cap theory in Distributed systems is in full swing, everyone is saying. Do you really understand that?

Let's re-comb it.

The key definitions must not be less:

C---consistency consistency
A---availability of availability
P---Partition tolerance zoning tolerance

This theorem originates from the computer scientist Eric Brour of the University of California (University of California, Berkeley) in the 2000 Distributed Computing Principles Workshop (Symposium on Principles of A conjecture proposed by distributed Computing (PODC). [5] In 2002, Seiss Gilbert and South Hill Linch of the Massachusetts Institute of MIT (MIT) published a proof of Brewer's conjecture, making it a theorem.

According to the theorem, the distributed system can meet only two of the three items and cannot satisfy all three items [4]. The simplest way to understand the cap theory is to imagine two nodes on each side of the partition. Allowing at least one node to update the state results in inconsistent data, i.e. the loss of C nature. If the node on the partition side is set to not be available for data consistency, the A property is lost. Unless two nodes can communicate with each other, both C and a are guaranteed, which can lead to the loss of P-Properties.

Clarification of the CAP theory

After 10 years of proposing his cap theory, Dr. Brewer issued a statement clarifying that his initial "three-choice two" view was greatly simplified in order to generate discussion and help to transcend acid. However, this great simplification has led to numerous misunderstandings and misunderstanding. According to him, the cap three dimensions, should not be the 0,1 value, but should be the range.

Let us first analyze: ap,cp good understanding. But what does AC mean? That means you can't partition, which means it's not a distributed system. This is clearly not the scope of our discussion.

So since it's a distributed system, it means either an AP or a CP.

When the network is in good condition, the partition does not exist: it is not the choice between availability and consistency, but the choice between consistency and performance

When the network is not good and there is a partition, you will select one between the AP or CP.

For example, when there is a network outage between nodes,

If you choose consistency, it means that the network recovery data is not available until it is synchronized between nodes. Cp

If you choose availability, it means that we give up the direct synchronization of each node, we select the AP

In the real world, no one will give up usability, the actual solution is that the AP then gets the final consistency after the network is restored

Selection of several no-sql schemes

Can be seen:

Mongodb,hbase,redis chooses strong consistency, and it's definitely a compromise of overall performance.

Instead, Cassandra,dynamodb chooses the availability + eventual consistency. Theoretically, it should be better than some of the previous performance.

The follow-up on Cassandra,mongodb,hbase will be analyzed in detail in other blog post.

Summary cap According to the previous description, we can see: 1. CAP three really only choose 22. In distributed systems we can only select CP or AP3. The network is constantly distributed system, we choose consistency or performance

Understanding the CAP again

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More