Consistency of Distributed Databases

Last Update:2018-12-05 Source: Internet

Author: User

Tags cassandra

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Databases are generally divided into the following types: relational (transactional) databases, represented by Oracle and MySQL, with keyValue databases, represented by redis and memcached dB, there are document-based databases such as MongoDB, and columnar databases represented by hbase, Cassandra, and Dynamo, as well as other graphic databases, object data libraries, and XML databases.

Some databases are distributed database design concepts, such as the column-based databases above; some are single-node design architectures, such as redis, MongoDB, and relational databases;

However, considering the availability of the entire system, such memory databases have a distributed deployment mechanism, such as redis master-slave replication, MongoDB master-slave, replicaset, MySQL, and Oracle master-slave structure.

The CAP principle is often mentioned here. In the case of partitioning, availability and consistency can only be ensured, that is, in the case of distribution, ultimate consistency (base) is guaranteed ).

The following is a brief summary.

1. Data Distribution Mode

A. Master-slave Replication

The master-slave mode is the most commonly used mode. The slave machine stores the full number of masters. It is often used in scenarios where the master provides write and multiple slaves provide read. The consistency between master and slave data can be achieved as strong consistency or eventual consistency. the specific requirements of the application (consistency and availability compromise) are detailed ).

In scenarios with weak consistency, taking into account availability requirements, asynchronous logging is adopted, such as the BINLOG of the replication mechanism in MySQL, the archive log of Oracle, And the oplog in MongoDB.

B. sharding Mechanism

When a single node cannot carry all the data, it needs to split the data, that is, the sharding mechanism. There are many types of data splitting, such as concurrent hash and custom routing policies.

Consistent hash, which has the advantage of adding and exiting data nodes, and the scope of affected data is relatively small. Data migration should be considered when nodes and nodes are down. Of course, based on availability, multiple data backups are required on the basis of data splitting. The hinted handoff technology is used to ensure that system write operations are not greatly affected after a node failure. To evenly distribute data, you can Virtualize each physical node into multiple logical nodes and use the logical nodes as the nodes on the ring. Node health detection can be based on gossip or zookeeper for management and maintenance. The shard distribution of data in dynamo and Cassandra uses consistent hash.

Custom routing policies: You can set the master node to route data. You can set table-based data rules, such as dates or consecutive range partitions, the routing node needs to be maintained by itself and migrated by the master node. The Node health status is maintained by the master node (or zookeeper. The data distribution in hbase defines a meta table that covers the partition of the key range, which is similar to this policy.

C. Multi-master mode

The multi-master mode is rarely used at ordinary times, and data consistency and conflict make data merging complicated. We will not elaborate on it here.

2. consistent solutions

A. Strong Consistency:

R + W> N: three nodes are created. When two nodes are read each time, the data is consistent. When two nodes are written, the data is successfully written. This is strongly consistent.

If multiple nodes in two PCs and three PCs are successful, the operation is successful. Otherwise, the rollback operation is performed.

Paxos, similar to 2 pcs, solves how a distributed system can reach an agreement on a value (resolution) and conduct voting. It is an algorithm with no master nodes. The Distributed Coordination Service zookeeper implements this algorithm to ensure consistency. The cluster solution replicaset in MongoDB also implements similar algorithms.

B. Weak Consistency (eventually consistent ):

Due to factors such as network latency during data synchronization in the distributed system, the replica data cannot be consistent with the master node at all times. In case of inconsistency, the following policies can be used to ensure final consistency:

Gossip (Cassandra, Dynamo) is an algorithm with redundancy and fault tolerance, that is, the final consistency algorithm. It cannot ensure data consistency among all nodes at a certain time. It is a decentralized deployment method, each node in the cluster maintains a group of statuses. The status can be represented by key, value, and a version number. The version number is newer than the version number, and data version information is exchanged between nodes, and update the data, similar to virus transmission, so that the data can be eventually consistent. Cassandra adopts this policy to synchronize data and maintain the health status of nodes.

Vector clock (Dynamo) is a conflict resolution policy caused by inconsistent data. The system uses an optimistic lock policy to operate on the same value, multiple versions may appear, and the vector clock is used to solve the consistency. Each element is (the node that updates the value, the serial number). This information is carried whenever a value is updated, as you can see, data conflicts between D3 and D4 will be resolved by the node with the updated value during the next operation. Dynamo uses this policy to resolve conflicts.

Timestamp (Cassandra). timestamp information is added each time a node is updated. The latest timestamp is used to resolve the conflict. It is represented by Cassandra.

Merle tree (Cassandra, Dynamo) constructs a Merkle Tree for the data in each interval on each node. In this way, when two nodes compare data, the comparison starts from the root node of the Merkle Tree. If the root node is the same, the two copies are consistent and no processing is required. If they are different, the Merkle Tree is traversed, locating inconsistent nodes is also very fast, greatly saving the comparison time and data transmission volume. This scheme is used for replica synchronization in Dynamo and cassandra.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More