Bigdata Learning 2_ Distributed Foundation (1): cap principle, base thought and final consistency

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Cap,base and final consistency are the three cornerstones of NoSQL database.
caps
C:consistency consistency a:availability availability (refers to fast fetch data) P:tolerance of network Partition partition tolerance (distributed) in a football match, a player scored three goals in a game, saying It's a hat-trick (Hat-trick). In distributed Data systems, there is also a hat principle (cap theorem), but this hat is not a hat. In the CAP principle, there are three elements: the cap principle means that these three elements can achieve at most two points at the same time, the three can not be taken into account. Therefore, when conducting a distributed architecture design, trade-offs must be made.
For distributed data systems, partitioning tolerance is the basic requirement, otherwise it loses its value. So designing a distributed Data system is a trade-off between consistency and usability.
For most Web applications, there is no need for strong consistency, so sacrificing consistency in exchange for high availability is the direction of most distributed database products. Of course, at the expense of consistency, not entirely regardless of data consistency, otherwise the data is chaotic, then the system availability of high distribution is no good value.
Sacrificing consistency, just no longer requires strong consistency in relational databases, but as long as the system can achieve final consistency, taking into account the customer experience, this final consistent time window, as much as possible to the user transparent, that is, the need to protect the "user perceived consistency." Typically, the high availability of the system and the final consistency of the data are achieved through multiple asynchronous replication of the data, and the time window for "user-perceived consistency" depends on when the data is replicated to a consistent state.

Final consistency (eventually consistent) for consistency can be divided into two different perspectives from the client and the service side. From the client's point of view, consistency refers primarily to the issue of how data is updated when multiple concurrent accesses are available. From the service side, it is how the update replicates the distribution to the entire system to ensure that the data is ultimately consistent. Consistency is due to the fact that there are concurrent read and write problems, so in understanding the problem of consistency, it is important to consider the combination of concurrent read and write scenarios. From the client point of view, the different policies that the updated data can obtain in different processes determine the different consistency when the multiple process concurrent access.
For relational databases, requiring updated data to be visible to subsequent accesses is strong consistency.
If you can tolerate subsequent portions or all of them, it is weak consistency.
If you require access to the updated data over a period of time, it is ultimately consistent. Final consistency varies according to the time and manner in which the data is accessed by each process after the data is updated, and can be divided into the following areas: causal consistency (causal consistency) If process a notifies process B that it has updated a data item, subsequent access to process B will return the updated value, and one write will guarantee the substitution of the previous write. Access to process C, which has no causal relationship with process A, adheres to general final consistency rules. "Read your own written (read-your-writes)" consistency. When process a updates a data item by itself, it always accesses the updated value and never sees the old value. This is a special case of the causal consistency model. Conversation (session) consistency. This is a practical version of the previous model, which places the process of accessing the storage system in the context of the session. As long as the session still exists, the system guarantees "read your own writing" consistency. If the session terminates because of some failure conditions, a new session is established and the system's assurance does not extend to the new session. Monotone (monotonic) Read consistency. If a process has already seen a value for a data object, any subsequent access will not return the value before that value. Monotone write consistency. The system guarantees the write sequence execution from the same process. If the system does not guarantee this degree of consistency, it is very difficult to program. This final consistency can be combined in different ways, such as monotonic read consistency and read-write consistency. And from a practical point of view, the combination of the two, read their own updated data, and once read to the latest version will not read the old version, for the architecture of the program development, there will be a lot less trouble. From the point of view of service, how to distribute the updated data to the whole system as soon as possible, and reduce the time window to achieve the final consistency is very important to improve the usability and user experience of the system. For distributed Data system: N-data replication number of copies, W-update data is required to ensure the number of write-completed nodes, R-read data need to read the number of nodes if w+r>n, write nodes and read nodes overlap, it is strong consistency. For example, for a typical primary standby synchronous replicated relational database, N=2,w=2,r=1 is consistent regardless of whether the data in the primary or standby repository is read. If w+r<=n, it is weak consistency. For example, a relational database with a primary standby asynchronous replication, n=2,w=1,r=1, is weakly consistent if it is read as a standby and may not be able to read data that has been updated by the main library. For distributed systems, in order to ensure high availability, the general setting of N>=3. Different n,w,r combinations are a trade-off between availability and consistency to accommodate different scenarios. If n=w,r=1, any write node fails, resulting in write failure, so availability is reduced, but because the n nodes of the data distribution are written synchronously, strong consistency can be guaranteed. If n=r,w=1, you need only one node to write successfully, write performance and availability are relatively high. But reading the feed of other nodesThe process may not get the updated data, so it is weakly consistent. In this case, if w< (n+1)/2 and the written node does not overlap, there is a write conflict

BASEIt's interesting to say that base's English is alkali, and acid is sour. It's really not the same.

Basically availble--the basic available soft-state--soft state/Flexible transaction "Soft states" can be understood as "connectionless", while the "Hard state" is a "connection-oriented" eventual consistency-the most Final consistency is also the ultimate goal of ACID.

Base model anti-acid model, completely different acid models, sacrificing high consistency for availability or reliability: Basically available is basically available. Support for partition failure (e.g. sharding fragmentation database) Soft state soft states can be unsynchronized for a period of time, asynchronous. Eventually consistent final agreement, the final data is consistent, and not always consistent.

The main realization of base thought has
1. Divide the database by function
2.sharding fragments

Base thinking focuses on basic usability, if you need high availability, which is pure high performance, then you have to sacrifice consistency or fault tolerance, and the base idea's solution has the potential to be tapped in performance.

Original from: http://blog.chinaunix.net/xmlrpc.php?r=blog/article&uid=29126521&id=3868927

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Bigdata Learning 2_ Distributed Foundation (1): cap principle, base thought and final consistency

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support