Some thoughts on the consistency and usability of the cap theory and MongoDB

Last Update:2017-05-24 Source: Internet

Author: User

Tags failover mongodb driver mongodb sharding

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

About five or six years ago, for the first time, it was a nosql that was already hot topic. But at that time the use of mysql,nosql for me is still new things, and not really use, just unknown. But the impression is such a picture (later Google to the picture from here):

This picture is about the relationship between the database (including the traditional relational database and NoSQL) and the CAP theory. Because of the lack of practical experience and lack of insight into NoSQL, the cap theory is smattering. Therefore, it is unclear why a particular database is divided into which faction. work after the use of MongoDB more, have a certain understanding, the previous period of time and saw this picture, so want to know, MongoDB is really belong to the CP camp, and why? The problem is suspected because replica set is used in the classic (officially recommended) deployment architecture of MongoDB, and Replica set provides high availability (availability) through redundancy and automatic failover. So why did you say MongoDB sacrificed avalability? I searched for "CAP" in the official document of MongoDB and did not search for any content.　　So I wanted to figure out the question and give myself an answer.　　This article first clarifies what cap theory is, and some articles about cap theory, and then discusses the tradeoff and tradeoff between MongoDB's consistency and usability. This article address: Http://www.cnblogs.com/xybaby/p/6871764.html CAP theory to cap theory I only know the meaning of these three words, the explanation is also from the online some articles, and not necessarily accurate. So the first thing to do is to find out the origin and exact explanation of the theory. I think the best start is Wikipedia, from the above can be seen more accurate introduction, more important is to see a lot of useful links, such as the origin of the CAP theory, the development of the evolution process. The cap theory is that for distributed data storage, it is possible to meet only the same consistency (c,consistency), availability (a, availability), Partition fault tolerance (p,partition 　　Tolerance).　　Consistency refers to the ability to read the most recent written data or errors for each read operation.　　Availability is the ability to get a timely, non-error response for each request, but does not guarantee that the result of the request is based on the latest written data. Partition fault tolerance, refers to the network problem between nodes, even if some messages to the packet or delay, the entire system can continue to provide services (to provide consistency or availability). Consistency, usability is the use of very broad terminology, in a different semantic context, the specific meaning is not the same, such as in the CAP-TWelve-years-later-how-the-rules-have-changed the article brewer that "consistency in the CAP is not the same as the consistency in acid", so unless specifically stated in the following article, the mentioned consistency, Availability refers to the definition in cap theory. Only when it is clear that everyone is in the same context is the discussion meaningful. for distributed systems, network partitioning (network partition) is unavoidable and there must be a delay in data replication between nodes.　　If consistency is required (the most recent written data can be read for all read requests), it is bound to be unavailable (unreadable) for a certain amount of time, i.e. at the expense of availability and vice versa. According to Wikipedia, the relationship between the Caps originated in 1998, and Brewer in the 2000 PODC (Symposium on Principles of Distributed Computing) showed Cap conjecture [3],　　[4] In 2002, two other scientists, Seth Gilbert, and Nancy Lynch, proved the conjecture of Brewer, which became a theorem from conjecture. The cap theory originated in towards robust distributed systems , the author of Cap theory Brewer pointed out: in the distributed system, the calculation is relatively easy, the real difficulty is the maintenance of the State. So for the distributed storage or the data sharing system, the data consistency guarantee is also more difficult. For traditional relational databases, consistency rather than availability is preferred, so the acid characteristics of the transaction are presented. For many distributed storage systems, it is more about usability than consistency, and consistency through base (basically available, soft state, eventual Consistency) to ensure that.　　 The following diagram shows the difference between acid and base:

In short: base ensures the availability of services as much as possible through eventual consistency. Note the last sentence of the figure "but I think it's a spectrum", that is, acid base is only a matter of degree, not the opposite of the two extremes.

2002, in Brewer ' s conjecture and the feasibility of consistent, available, partition-tolerant Web services, The two authors demonstrated the CAP conjecture through an asynchronous network model, thus upgrading Brewer's conjecture into a theory (theorem). But to tell you the truth, I didn't read the article very clearly.

In the 2009 article Brewers-cap-theorem, the author gives a relatively simple proof:

As shown, n1,n2 two nodes store the same data V, the current state is V0. Running on the node N1 is a safe and reliable write algorithm A, in the node N2 is the same reliable read algorithm B, that is, the N1 node is responsible for the write operation, the N2 node is responsible for the read operation. The data written by the N1 node is also automatically synchronized to the N2, and the synchronized message is called M. If partitioning occurs between n1,n2, there is no guarantee that message m will reach N2 within a certain amount of time.

From the point of view of the matter

α This transaction consists of manipulating α1,α2, where α1 is writing data and α2 is reading data. If it is a single point, it is easy to ensure that α2 can read the data written by α1. In the case of a distributed situation, the α2 can not be guaranteed to read the data written by the α1 unless it can control the α2 time, but any control (such as blocking, data centralization, etc.) either destroys partition fault tolerance or loses availability.

In addition, this article points out that in many cases availability is more important than consistency, such as Facebook, Google and so on, the short-term unavailability will bring huge losses.

The 2010 article, brewers-cap-theorem-on-distributed-systems/, uses three examples to illustrate the CAP, example1: Single point of Mysql;example2: two MySQL, But different MySQL stores different subsets of data (similar to sharding); Example3: Two MySQL, an insert operation on a that requires a successful execution on B to assume that the operation is complete (similar to a copy set). The authors argue that strong consistency can be guaranteed on both example1 and example2, but not guaranteed, and in the Example3 case, because of the presence of partitions (partition), there is a tradeoff between consistency and availability.

In my opinion, it is best to discuss the CAP theory under the premise of "Distributed Storage System", usability is not the availability of the overall service, but the availability of a sub-node in a distributed system. So it feels like the example above is not quite right.

The CAP theory developed over 2012 years, and the inventor of Cap theory brewer the CAP theory again, "Cap twelve years later:how the" Rules "has Changed", this article is longer, but clear-minded, strategically advantageous position, very value Have to read it first.　　There is a Chinese translation of the CAP theory for 12 years: The rule has changed, and the translation is good. The main point of the article is that the CAP theory does not mean that the three do not need to choose both. First, although there may be partitions as long as there is a distributed system, the probability of a partition appearing is very small (otherwise it would be necessary to optimize the network or hardware), and the CAP allows perfect C and a for most of the time, and only in the period within which the partition exists, the tradeoff between C and a is required. Second, consistency and availability are a matter of degree, not 0 or 1, availability can change continuously between 0% and 100%, consistency is divided into many levels (for example, in Casandra, you can set consistency level). 　　Therefore, the goal of contemporary cap practice should be to maximize the effectiveness of data consistency and availability within a reasonable scope for specific applications. The article also points out that partitioning is a relative concept, when a predetermined communication time limit is exceeded, that is, if the system cannot achieve data consistency within the time frame, it means that the partition is occurring and the current operation must be selected between C and a.

system Availability is the primary goal in terms of revenue targets and contractual requirements, so we routinely use caching or post-mortem update logs to optimize system availability.　　Therefore, when the designer chooses usability, it is necessary to restore the corrupted invariance about after the partition has ended.　　In practice, most groups believe that there is no partition within the data center (located in a single location), so the system defaults to the design idea, including the traditional database, before the Ca;cap theory can be selected within a single data center. During partitioning, a separate, self-guaranteed set of node subcollections can continue to perform operations, but there is no guarantee that global scope invariant constraints will not be compromised. Data Fragmentation (sharding) is an example of a designer pre-dividing data into different partition nodes, in which a single data shard can continue to operate for most of the time. Conversely, if partitioning is a closely related state, or if there are some global invariant constraints that are not persisted, then the best case is that only the partition side can operate, and the worst case scenario is that the operation cannot be performed at all.

The above excerpt from the lower part of the line with MongoDB sharding situation is very similar, MongoDB sharded cluste mode, shard under normal circumstances, there is no need to communicate with each other.

In the 13 article "Better-explaining-cap-theorem", the author points out that "it is really just A vs C!" ", because

(1) Availability is typically achieved by copying data between different machines

(2) Consistency requires several nodes to be updated simultaneously between allowed read operations

(3) Temporary partion, that is, the communication delay between the time is likely to occur, it is necessary to weigh between a and c. But the tradeoff only needs to be considered when partitioning occurs.

In a distributed system, the network partition must occur, so "it is really just A vs C!" ”

MongoDB and cap in the article "Creating sharded cluster to know MongoDB by step-by-step," we introduce the features of MongoDB, including high-performance, high-availability, scalable (horizontal scaling), where The high availability of MongoDB relies on the replication and automatic failover of the replica set.　　The use of MongoDB database has three modes: Standalone,replica set, shareded cluster, in the previous article described in detail the shared cluster construction process. Standalone is a single mongod, the application directly connected to the Mongod, in this case, no partition fault tolerance can be said, and must be strong consistency. For sharded cluster, each shard is also recommended as a replica set. The shards in MongoDB maintains a separate subset of data, so there is little impact between shards (the process of chunk migration might or may have an impact), so the main consideration is the partitioning effect of the Shard internal replica set.　　Therefore, this article discusses MongoDB consistency, usability issues, is also for MongoDB replica set. For replica set, there is only one primary node, which accepts write requests and read requests, and the other secondary nodes accept read requests. This is a single-write, multi-read situation, more than read, write more than the situation has been simplified a lot. Later, in order to discuss, it is also assumed that replica set consists of three basis points, a primary, two secondary, and all nodes persist data (data-bearing) MongoDB about consistency, usability tradeoffs, Depending on the three: Write-concern, Read-concern, Read-preference. The following is primarily the case of the MongoDB3.2 version, because Read-concern was introduced in the MongoDB3.2 version. Write-concern:write concern indicates when MongoDB gives the client a response in case of a write operation. Includes the following three fields:

{w: <value>, J: <boolean>, wtimeout: <number>}

W: Indicates that the write request is returned to the client after the value of the MongoDB instance is processed. Value range: 1: The default value, which indicates that data is written to standalone MongoDB or replica set after the primary returns 0: directly to the client without writing, high performance, but may lose data. However, you can use j:true to increase the persistence of data (durability) >1: Only in the replica set environment, if value is greater than the number of nodes in the replica set, it may cause blocking ' majority ': When the data is written is returned to the client after the majority of the nodes in the replica set, in which case it is generally used with Read-concern:

After
the write operation returns with a w: "majority" acknowledgement to the client, the client can read the result o F that write with a "majority" readconcern

J: Indicates that the write request is returned to the client after writing the journal, and the default is False. Two note: If you use J:true for a MongoDB instance that does not turn on journaling, the error will be MongoDB3.2 and after that, for w>1, all instances need to be written to Journal before returning Wtimeout: Indicates the time-out of the write, i.e. At the specified time (number), if it is not yet returned to the client (W is greater than 1), then the return error defaults to 0, which is equivalent to not setting the option in MongoDB3.4, adding writeConcernMajorityJournalDefault. Such an option makes the w,j different under different combinations: read-reference:

As already explained in the previous article, a replica set consists of a primary and multiple secondary. Primary accepts write operations, so the data must be up-to-date, secondary by Oplog to synchronize the write operation, so the data has a certain delay. For the timeliness is not very sensitive query business, can be queried from the secondary node to reduce the pressure of the cluster.

MongoDB points out that it is very flexible to choose different read-reference in different situations. MongoDB driver supports several read-reference:

Primary: Default mode, all read operations are routed to the primary node of the replica set

Primarypreferred: Normally, it is routed to the primary node, and only when the primary node is unavailable (failover), it is routed to the secondary node.

Secondary: All read operations are routed to the secondary node of the replica set

Secondarypreferred: Normally, it is routed to the secondary node and is routed to the primary node only if the secondary node is unavailable.

Nearest: Reads data from the node with the least delay, whether it is primary or secondary. For distributed applications and MongoDB is a multi-datacenter deployment, nearest can guarantee the best data locality.

If you use secondary or secondarypreferred, you need to be aware that:

(1) Because of delay, the data read may not be up-to-date, and the data returned by different secondary may not be the same;

(2) for sharded collection with balancer enabled by default, secondary may return a missing or redundant data due to a chunk migration that has not ended or is abnormally terminated.

(3) In the case of multiple secondary nodes, which secondary node to choose, in short, "closest" that is, the average delay of the least node, specifically participate in the Server Selection algorithm

Read-concern:

Read concern is a new feature added in MongoDB3.2 that represents what data is returned for replica set, including cluster using a replica set in Sharded shard. Different storage engine support for Read-concern is not the same.

Read concern has the following three ratings:

Local: The default value, which returns the latest data for the current node, depends on the read reference.

Majority: Returns the most recent data that has been confirmed to be written to most nodes. The use of this option requires the following conditions: Wiredtiger storage engine, and using election protocol version 1 ; When starting a MongoDB instance, specify--enableMajorityReadConcern选项。

Introduced in the linearizable:3.4 version, this is skipped over and interested readers refer to the documentation.

In the article there is such a sentence:

Regardless of the read concern level, the most recent data in a node may not reflect the most recent version of T The He data in the system.

That is, even if you use Read concern:majority, the return is not necessarily the latest data, this and NWR theory is not the same thing. The root cause is that the final value returned is derived from only one MongoDB node , and the choice of the node depends on the read reference.

In this article, the significance and implementation of the introduction of Readconcern are described in detail, where only the core is referenced:

readConcernThe original intention is to solve the "dirty read" problem, such as the user from the MongoDB primary read a certain piece of data, but this data is not synchronized to most nodes, and then primary on the fault, after the recovery of the primary node will not sync to most of the node's data rollback, Causes the user to read "dirty data".

When you specify a readconcern level of majority, you can ensure that the data read by the user "has been written to most nodes", and that such data will certainly not be rolled back, avoiding dirty reads.

Conformance or availability?

Review the question of consistency availability in the CAP theory:
Consistency refers to the ability to read the most recent written data or errors for each read operation.
Availability is the ability to get a timely, non-error response for each request, but does not guarantee that the result of the request is based on the latest written data.

As mentioned earlier, the discussion of conformance availability in this article is based on replica set, and whether the shared cluster is not affected. In addition, the discussion is based on the case of a single client, and if it is multiple clients, it seems to be a problem of isolation that does not fall within the CAP theory category. Based on the understanding of write concern, read concern, read reference, we can draw the following conclusions.

By default (W:1, readconcern:local) If the read preference is primary, then it is possible to read the latest data, strong consistency, but if the primary fails at this time, then the error will be returned, the availability is not guaranteed
By default (W:1, readconcern:local) if read preference is secondary (secondarypreferred, primarypreferred), although it is possible to read outdated data, But can get the data immediately, usability is better
Writeconern:majority guarantees that the data written will not be rolled back; Readconcern:majority guaranteed that the data read must not be rolled back.
if (w:1, readconcern;majority) even read from primary, there is no guarantee that the latest data will be returned, so it is weak consistency
if (w:majority, readcocern:majority), if it is read from the primary, then must be able to read the latest data, and this data will not be rolled back, but at this time the write usability is worse, if it is read from secondary, Not guaranteed to read the latest data, weak consistency.

Looking back, MongoDB says high availability is a more universal usability: Through data replication and automatic failover, even if a physical failure occurs, the whole cluster can reply in a short time, continue to work, not to mention the recovery is automatic. In this sense, it is indeed highly available.

References "1" http://blog.nahurst.com/visual-guide-to-nosql-systems "2" Https://en.wikipedia.org/wiki/CAP_theorem "3" Towards Robust distributed Systems "4"Brewer ' s conjecture and the feasibility of consistent, available, partition-tolerant Web Services"5" Http://www.julianbrowne.com/article/viewer/brewers-cap-theorem"6" brewers-cap-theorem-on-distributed-systems/"7" cap-twelve-years-later-how-the-rules-have-changed "8" Better-explaining-cap-theorem "9" is created by step-by-step sharded Cluster to know MongoDB "10" Write-concern "11" Read-concern "12" read-preference

Some thoughts on the consistency and usability of the cap theory and MongoDB

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More