A reflection on the cap theory of MongoDB (reproduced)

Source: Internet
Author: User
Tags failover mongodb driver mongodb sharding reflection

About five or six years ago, for the first time, it was a nosql that was already hot topic. But at that time the use of mysql,nosql for me is still new things, and not really use, just unknown. But the impression is such a picture (later Google to the picture from here):

This picture is about the relationship between the database (including the traditional relational database and NoSQL) and the CAP theory. Because of the lack of practical experience and lack of insight into NoSQL, the cap theory is smattering. Therefore, it is unclear why a particular database is divided into which faction.

After working on MongoDB use more, have a certain understanding, the previous time and saw this picture, so want to figure out, MongoDB is really belong to the CP camp, and why? The reason for this problem is that replica set is used in MongoDB's classic (officially recommended) deployment architecture, while replica set provides high availability (availability) through redundancy and automatic failover. So why did you say MongoDB sacrificed avalability? I searched for "CAP" in the official document of MongoDB and did not search for any content. So I wanted to figure out the question and give myself an answer.

This article first clarifies what cap theory is, and some articles about cap theory, and then discusses the tradeoff and tradeoff between MongoDB's consistency and usability.

I only know the meaning of these three words for the CAP theory, and the explanations are also from some articles on the Internet, which are not necessarily accurate. So the first thing to do is to find out the origin and exact explanation of the theory. I think the best start is Wikipedia, from the above can be seen more accurate introduction, more important is to see a lot of useful links, such as the origin of the CAP theory, the development of the evolution process.

The CAP theory is that for distributed data storage, you can only meet both the consistency (c,consistency), availability (A, availability), and partition fault tolerance (p,partition tolerance).

Consistency refers to the ability to read the most recent written data or errors for each read operation.

Availability is the ability to get a timely, non-error response for each request, but does not guarantee that the result of the request is based on the latest written data.

Partition fault tolerance, refers to the network problem between nodes, even if some messages to the packet or delay, the entire system can continue to provide services (to provide consistency or availability).

Consistency, usability is used in a very broad terminology, in different semantic contexts, the specific meaning is not the same, for example, in the Cap-twelve-years-later-how-the-rules-have-changed article brewer pointed out " Consistency in the CAP is not the same as consistency in acid ", so unless specifically stated in the following article, the consistency and availability mentioned are defined in the CAP theory. Only when it is clear that everyone is in the same context is the discussion meaningful.

For distributed systems, the network partition is unavoidable, there must be a delay in data replication between nodes, and if consistency is required (the most recently written data can be read for all read requests), it is bound to be unavailable (unreadable) for a certain period of time. At the expense of usability, and vice versa.

According to Wikipedia, the relationship between the Caps originated in 1998, and Brewer in the 2000 PODC (Symposium on Principles of Distributed Computing) showed Cap conjecture [3], In 2002, two other scientists, Seth Gilbert and Nancy Lynch, proved the conjecture of Brewer, which became a theorem from conjecture. The origin of the CAP theory

In towards robust distributed systems, the author of Cap theory brewer points out that in distributed systems, computation is relatively easy and the real difficulty is the maintenance of the State. So for the distributed storage or the data sharing system, the data consistency guarantee is also more difficult. For traditional relational databases, consistency rather than availability is preferred, so the acid characteristics of the transaction are presented. For many distributed storage systems, it is more about usability than consistency, and consistency is ensured by base (basically Available, Soft state, eventual consistency). The following diagram shows the difference between acid and base:

In short: base ensures the availability of services as much as possible through eventual consistency. Note the last sentence of the figure "but I think it's a spectrum", that is, acid base is only a matter of degree, not the opposite of the two extremes.

2002, in Brewer ' s conjecture and the feasibility of consistent, available, partition-tolerant Web services, The two authors demonstrated the CAP conjecture through an asynchronous network model, thus upgrading Brewer's conjecture into a theory (theorem). But to tell you the truth, I didn't read the article very clearly.

In the 2009 article Brewers-cap-theorem, the author gives a relatively simple proof:

As shown, n1,n2 two nodes store the same data V, the current state is V0. Running on the node N1 is a safe and reliable write algorithm A, in the node N2 is the same reliable read algorithm B, that is, the N1 node is responsible for the write operation, the N2 node is responsible for the read operation. The data written by the N1 node is also automatically synchronized to the N2, and the synchronized message is called M. If partitioning occurs between n1,n2, there is no guarantee that message m will reach N2 within a certain amount of time.

From the point of view of the matter

α This transaction consists of manipulating α1,α2, where α1 is writing data and α2 is reading data. If it is a single point, it is easy to ensure that α2 can read the data written by α1. In the case of a distributed situation, the α2 can not be guaranteed to read the data written by the α1 unless it can control the α2 time, but any control (such as blocking, data centralization, etc.) either destroys partition fault tolerance or loses availability.

In addition, this article points out that in many cases availability is more important than consistency, such as the facebookgoogle of such sites, the short-term unavailability will bring huge losses.

The 2010 article, brewers-cap-theorem-on-distributed-systems/, uses three examples to illustrate the CAP, example1: Single point of Mysql;example2: two MySQL, But different MySQL stores different subsets of data (similar to sharding); Example3: Two MySQL, an insert operation on a that requires a successful execution on B to assume that the operation is complete (similar to a copy set). The authors argue that strong consistency can be guaranteed on both example1 and example2, but not guaranteed, and in the Example3 case, because of the presence of partitions (partition), there is a tradeoff between consistency and availability.

In my opinion, it is best to discuss the CAP theory under the premise of "Distributed Storage System", usability is not the availability of the overall service, but the availability of a sub-node in a distributed system. So it feels like the example above is not quite right.

Cap Theory Development

By the year 2012, the inventor of Cap theory brewer the CAP theory again, "Cap twelve years later:how the" Rules "has Changed", this article is relatively long, but the idea is clear, strategically advantageous position, very well worth reading. There is a Chinese translation of the CAP theory for 12 years: The rule has changed, and the translation is good.

The main point of the article is that the CAP theory does not mean that the three do not need to choose both. First, although there may be partitions as long as there is a distributed system, the probability of a partition appearing is very small (otherwise it would be necessary to optimize the network or hardware), and the CAP allows perfect C and a for most of the time, and only in the period within which the partition exists, the tradeoff between C and a is required. Second, consistency and availability are a matter of degree, not 0 or 1, availability can change continuously between 0% and 100%, consistency is divided into many levels (for example, in Casandra, you can set consistency level). Therefore, the goal of contemporary cap practice should be to maximize the effectiveness of data consistency and availability within a reasonable scope for specific applications.

The article also points out that partitioning is a relative concept, when a predetermined communication time limit is exceeded, that is, if the system cannot achieve data consistency within the time frame, it means that the partition is occurring and the current operation must be selected between C and a.

System availability is the primary goal in terms of revenue targets and contractual requirements, so we routinely use caching or post-mortem update logs to optimize system availability. Therefore, when the designer chooses usability, it is necessary to restore the corrupted invariance about after the partition has ended.

In practice, most groups believe that there is no partition within the data center (located in a single location), so the system defaults to the design idea, including the traditional database, before the Ca;cap theory can be selected within a single data center.

During partitioning, a separate, self-guaranteed set of node subcollections can continue to perform operations, but there is no guarantee that global scope invariant constraints will not be compromised. Data fragmentation (sharding) is an example of a designer pre-dividing data into different partition nodes, in which a single data shard can continue to operate for most of the time. Conversely, if partitioning is a closely related state, or if there are some global invariant constraints that are not persisted, then the best case is that only the partition side can operate, and the worst case scenario is that the operation cannot be performed at all.

The above excerpt from the lower part of the line with MongoDB sharding situation is very similar, MongoDB sharded cluste mode, shard under normal circumstances, there is no need to communicate with each other.

In the 13 article "Better-explaining-cap-theorem", the author points out that "it is really just A vs C!" ", because

(1) Availability is typically achieved by copying data between different machines

(2) Consistency requires several nodes to be updated simultaneously between allowed read operations

(3) Temporary partion, that is, the communication delay between the time is likely to occur, it is necessary to weigh between a and c. But the tradeoff only needs to be considered when partitioning occurs.

In a distributed system, the network partition must occur, so "it is really just A vs C!" "

In the article "Creating sharded cluster to know MongoDB by step-by-step," the features of MongoDB are described, including high-performance, highly available, extensible (horizontal scaling), where MongoDB's high availability relies on replica Set replication and automatic failover. The use of MongoDB database has three modes: Standalone,replica set, shareded cluster, in the previous article described in detail the shared cluster construction process.

Standalone is a single mongod, the application directly connected to the Mongod, in this case, no partition fault tolerance can be said, and must be strong consistency. For sharded cluster, each shard is also recommended as a replica set. The shards in MongoDB maintains a separate subset of data, so there is little impact between shards (the process of chunk migration might or may have an impact), so the main consideration is the partitioning effect of the Shard internal replica set. Therefore, this article discusses MongoDB consistency, usability issues, is also for MongoDB replica set.

For replica set, there is only one primary node, which accepts write requests and read requests, and the other secondary nodes accept read requests. This is a single-write, multi-read situation, more than read, write more than the situation has been simplified a lot. Later, for discussion, it is also assumed that replica set consists of three basis points, one primary, two secondary, and all nodes are persisted data (data-bearing)

MongoDB's trade-offs about consistency and usability depend on the three: Write-concern, Read-concern, Read-preference. The following is primarily the case of the MongoDB3.2 version, because Read-concern was introduced in the MongoDB3.2 version.

Write concern indicates when MongoDB gives the client a response in case of a write operation. Includes the following three fields:

W: Indicates that the write request is returned to the client after the value of the MongoDB instance is processed. Range of values:

1: Default value, indicating that data is written to standalone's MongoDB or replica set's primary after returning

0: Return directly to the client without writing, high performance, but may lose data. However, it can be combined with j:true to increase the data's durability (durability)

>1: Useful only in replica set environment, if value is greater than the number of nodes in the replica set, it may cause blocking

' Majority ': When data is written to most nodes of replica set and returned to the client, in this case it is generally used in conjunction with Read-concern:

"Majority"

J: Indicates that the write request is returned to the client after writing the journal, and the default is False. Two points Note:

If you use J:true for MongoDB instances that do not have journaling turned on, an error is given

After MongoDB3.2 and after, for w>1, all instances need to be written to Journal before returning

Wtimeout: Indicates the write timeout, that is, at the specified time (number), if it is not yet returned to the client (W is greater than 1), then an error is returned

The default is 0, which is equivalent to not setting this option

in the MongoDB3.4, added the Writeconcernmajorityjournaldefault

Read-reference:

As already explained in the previous article, a replica set consists of a primary and multiple secondary. Primary accepts write operations, so the data must be up-to-date, secondary by Oplog to synchronize the write operation, so the data has a certain delay. For the timeliness is not very sensitive query business, can be queried from the secondary node to reduce the pressure of the cluster.

MongoDB points out that it is very flexible to choose different read-reference in different situations. MongoDB driver supports several read-reference:

Primary: Default mode, all read operations are routed to the primary node of the replica set

Primarypreferred: Normally, it is routed to the primary node, and only when the primary node is unavailable (failover), it is routed to the secondary node.

Secondary: All read operations are routed to the secondary node of the replica set

Secondarypreferred: Normally, it is routed to the secondary node and is routed to the primary node only if the secondary node is unavailable.

Nearest: Reads data from the node with the least delay, whether it is primary or secondary. For distributed applications and MongoDB is a multi-datacenter deployment, nearest can guarantee the best data locality.

If you use secondary or secondarypreferred, you need to be aware that:

(1) Because of delay, the data read may not be up-to-date, and the data returned by different secondary may not be the same;

(2) for sharded collection with balancer enabled by default, secondary may return a missing or redundant data due to a chunk migration that has not ended or is abnormally terminated.

(3) In the case of multiple secondary nodes, which secondary node to choose, in short, "closest" that is, the average delay of the least node, specifically participate in the server Selection algorithm

Read concern is a new feature added in MongoDB3.2 that represents what data is returned for replica set, including cluster using a replica set in Sharded shard. Different storage engine support for Read-concern is not the same.

Read concern has the following three ratings:

Local: The default value, which returns the latest data for the current node, depends on the read reference.

majority: Returns the most recent data that has been confirmed to be written to most nodes. The use of this option requires the following conditions: Wiredtiger storage engine, and using electionProtocol version 1; When starting a MongoDB instance, specify

Introduced in the linearizable:3.4 version, this is skipped over and interested readers refer to the documentation.

In the article there is such a sentence:

Regardless of the read concern level, the most recent data in a node may not reflect the most recent version of the data I n the system.

That is, even if you use Read concern:majority, the return is not necessarily the latest data, this and NWR theory is not the same thing. The root cause is that the final value returned is derived from only one MongoDB node , and the choice of the node depends on the read reference.

In this article, the significance and implementation of the introduction of Readconcern are described in detail, where only the core is referenced:

Readconcern The original intention is to solve the "dirty read" problem, such as the user from MongoDB primary read a certain piece of data, but this data is not synchronized to most nodes, and then primary fault, After re-recovery, the primary node rolls back data that is not synchronized to most nodes, causing the user to read dirty data.

When you specify a readconcern level of majority, you can ensure that the data read by the user "has been written to most nodes", and that such data will certainly not be rolled back, avoiding dirty reads.

Conformance or availability? Review the question of consistency availability in the CAP theory: consistency refers to the ability to read the most recently written data or errors for each read operation. Availability is the ability to get a timely, non-error response for each request, but does not guarantee that the result of the request is based on the latest written data. As mentioned earlier, the discussion of conformance availability in this article is based on replica set, and whether the shared cluster is not affected. In addition, the discussion is based on the case of a single client, and if it is multiple clients, it seems to be a problem of isolation that does not fall within the CAP theory category. Based on the understanding of write concern, read concern, read reference, we can draw the following conclusions.

  • By default (W:1, readconcern:local) If the read preference is primary, then it is possible to read the latest data, strong consistency, but if the primary fails at this time, then the error will be returned, the availability is not guaranteed
  • By default (W:1, readconcern:local) if read preference is secondary (secondarypreferred, primarypreferred), although it is possible to read outdated data, But can get the data immediately, usability is better
  • Writeconern:majority guarantees that the data written will not be rolled back; Readconcern:majority guaranteed that the data read must not be rolled back.
  • if (w:1, readconcern;majority) even read from primary, there is no guarantee that the latest data will be returned, so it is weak consistency
  • if (w:majority, readcocern:majority), if it is read from the primary, then must be able to read the latest data, and this data will not be rolled back, but at this time the write usability is worse, if it is read from secondary, Not guaranteed to read the latest data, weak consistency.

    Looking back, MongoDB says high availability is a more universal usability: Through data replication and automatic failover, even if a physical failure occurs, the whole cluster can reply in a short time, continue to work, not to mention the recovery is automatic. In this sense, it is indeed highly available.

A reflection on the cap theory of MongoDB (reproduced)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.