In-depth analysis of distributed algorithms for NoSQL Databases

Last Update:2018-06-07 Source: Internet

Author: User

Tags cassandra

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

System scalability is the main reason for promoting the development of NoSQL, including distributed system coordination, failover, resource management and many other features. So that NoSQL listening

System scalability is the main reason for promoting the development of NoSQL, including distributed system coordination, failover, resource management and many other features. In this case, NoSQL sounds like a big basket, and everything can be inserted. Although the NoSQL movement has not brought about fundamental technological changes to distributed data processing, it still triggers overwhelming research and practices on various protocols and algorithms. It is through these attempts that some effective database construction methods have been gradually summarized. In this article, I will systematically describe the distributed features of NoSQL databases.

Next, we will study some distributed strategies, such as the replication in fault detection. These strategies are marked in italics and divided into three parts:

Data Consistency

As we all know, distributed systems often encounter network isolation or latency. In this case, the isolated part is unavailable. Therefore, it is impossible to maintain high availability without sacrificing consistency. This fact is often referred to as the "CAP theory ". However, consistency is very expensive in distributed systems, so we often need to make some concessions on it, not just for availability, but also for a variety of trade-offs. To study these trade-offs, we noticed that the consistency problem of distributed systems is caused by data isolation and replication, so we will start from studying the characteristics of replication:

Read/write consistency. From the perspective of reading and writing, the basic goal of the database is to make the replica convergence time as short as possible (that is, the time when the update is passed to all copies) to ensure final consistency. In addition to this weak guarantee, there are some more consistent features:

Read consistency after writing. Write operations on data item X are always visible to subsequent read operations on data item X.

Read consistency after reading. After a read operation on data item X, subsequent read operations on data item X should return the same or newer value as the first return value.

Write consistency. Partition databases often encounter write conflicts. The database should be able to handle such conflicts and ensure that multiple write requests are not processed by different partitions. In this regard, the database provides several different consistency models:

Atomic write. If the database provides APIs, one write operation can only be an atomic value assignment. To avoid write conflicts, find the "latest version" of each data ". This allows all nodes to obtain the same version at the end of the update, regardless of the update sequence. network faults and latencies often result in different node update sequence. Data versions can be expressed by timestamps or user-specified values. Cassandra uses this method.

Atomic read-Modify-write. An application sometimes needs to perform read-Modify-write sequence operations instead of independent atomic write operations. Assume that two clients read data of the same version, write the modified data back, and write the data according to the atomic write model. The later update will overwrite the previous one. This behavior is incorrect in some cases (for example, two clients add new values to the same list value ). The database provides at least two solutions:

Conflict prevention. Read-Modify-write can be considered as a transaction in special circumstances. Therefore, consistent protocols such as distributed locks and PAXOS can solve this problem. This technology supports atomic read rewriting semantics and arbitrary isolation level transactions. Another method is to avoid distributed concurrent write operations and route all write operations on specific data items to a single node (either the global master node or the partition master node ). To avoid conflicts, the database must sacrifice the availability in the case of network isolation. This method is often used in many systems that provide strong consistency assurance (such as most relational databases, HBase, and MongoDB ).

Conflict Detection. The Database tracks the conflicts of concurrent updates, and chooses to roll back one of them or maintain two versions to the client for resolution. Concurrent updates are usually tracked using a vector clock (which is an optimistic lock), or a complete version history is maintained. This method is used for Riak, Voldemort, and CouchDB.

Now let's take a closer look at common Replication technologies and give them classes based on the characteristics described. The first figure depicts the logical relationship between different technologies and the trade-off between different technologies in terms of system consistency, scalability, availability, and latency. The second figure details each technology.

The duplicate factor is 4. The read/write Coordinator can be an external client or an internal proxy node.

Quick Start to NoSQL databases. For details about how to download high-definition PDF, see

Basic knowledge about NoSQL Databases

Key to enterprise application of NoSQL

We will repeat all the technologies from weak to strong based on consistency:

(A, anti-entropy) consistency is weakest, based on the following policy. During the write operation, select any node for update. If the new data is not transmitted to the read node through the backend anti-entropy protocol, the old data is still read. (The next section will detail the anti-entropy protocol ). The main features of this method are:

High propagation latency makes it less useful in data synchronization. Therefore, it is typically used to detect and repair unplanned inconsistencies only as an auxiliary function. Cassandra uses the antientropy algorithm to transmit the database topology and other metadata information between nodes.

Poor consistency guarantee: Write conflicts and read/write inconsistencies may occur even if no fault occurs.

High Availability and robustness under Network isolation. Asynchronous batch processing replaces updates one by one, which improves performance.

Durability guarantee is weak because new data only has a single copy at first.

(B) An improvement in the above mode is that an update is asynchronously sent to all available nodes when any node receives an update request. This is also considered a targeted inverse entropy.

Compared with pure antientropy, this method greatly improves consistency with only a small performance sacrifice. However, formal consistency and persistence remain unchanged.

If some nodes are unavailable because of network faults or node failures, the update will be transmitted to the node through the Anti-entropy propagation process.

(C) In the previous mode, use the prompt transfer technology to better handle failed operations on a node. The expected update of the failed node is recorded on the additional proxy node, and indicates that the update will be passed to the node once the feature node is available. This improves consistency and reduces the replication convergence time.

(D, one-time read/write) because the responsible node that prompts the transfer may also have expired before the update is passed out, it is necessary to ensure consistency through the so-called read fix in this case. Each read operation starts an asynchronous process and requests a data digest (such as signature or hash) to all nodes that store the data ), if the summary returned by each node is inconsistent, the data versions on each node are unified. We use the one-time read/write naming to combine A, B, C, and D Technologies-they do not provide strict consistency guarantees, but they can be used as A self-prepared method for practice.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More