Dynamo involves algorithms and protocols--p2p architecture, consistent hash fault tolerant +GOSSIP protocol get cluster status + vector clock synchronization data

Source: Internet
Author: User

transferred from: http://www.letiantian.me/2014-06-16-dynamo-algorithm-protocol/

Dynamo is a distributed key-value system for Amazon, with no master-slave concept and consistent data consistency. Apache Cassandra reference its implementation method.

Consistent Hash

You can refer to the consistency hash for the specifics of a consistent hash.

Fault tolerant

Because of the use of a consistent hash, a node in a dynamo cluster can logically be considered a ring. Assuming that there are M nodes, we start from a node clockwise to each node designator 1, 2, 3 、...、 M. For fault tolerance, assume that a single piece of data is stored in 3 copies. If a piece of data is stored in node 2 through a consistent hash, the other two copies of the data are stored in node 3 and Node 4. If node 3 is temporarily down, then the incremental data will be stored in node 5 before the node 3 is restored, and node 5 will be recovered by the gossip Protocol, node 3 will be returned to node 5 with the data "data back" from the temporary memory. It is relatively simple to judge whether the outage of node 3 is temporary or permanent, which is to see how long it takes to dang the machine. If node 3 is permanently down, then the full version of this data needs to be synchronized to node 5 in a valid manner.

Gossip protocol

The gossip protocol, the gossip protocol. Used primarily to let each node know the latest state of the cluster. This agreement is actually:

With a given frequency, the picks another machine at random and shares all hot rumors.

Information is exchanged between nodes at a fixed time frequency. At the time of exchanging information, one node randomly selects one of the other nodes in the cluster to exchange their own knowledge of the cluster and updates to the latest (or newer) cluster state information accordingly.

NWR

n represents the number of copies of a copy of the data. W represents the minimum number of copies required for a successful write operation, that is, the write operation was successful when at least W copies were written successfully in a single write operation. R represents the minimum number of copies required for the read operation to succeed. Dynamo that as long as the r+w>n, can guarantee the availability of the cluster. The values of N, W, R can be set. If you pay attention to the efficiency of reading, you can set the value of r smaller; If you pay attention to the efficiency of writing, you can set the value of W smaller. NWR does not guarantee consistent data. If R=n and w=n, then consistency can be guaranteed.

Vector clock

For small or low-demand distributed systems, the time stamp can be used to ensure consistency of data between replicas, in the time-stamp mode, the use of NTP protocol synchronization clock, the clock between the nodes have a small error. However, in large-scale distributed systems, the other way is better.

Vector clock, an approach that Amazon Dynamo uses to resolve data consistency issues. This is a logical clock. Suppose a copy of the data three copies, the three copies are named n1 ,, n2 n3 each copy will record all copies of the clock (including its own), a copy of a vector, three copies of a total of three vectors. The so-called clock, in fact, is the version number of the stored data, generally from 0 increments. The rules for updating clocks are as follows:

    • Initializes all clocks, which are all 0.
    • When a copy has data updates, the value of its own clock in its own vector is added to a step, and the general step is set to 1.
    • When a copy sends a message to another replica (typically to synchronize data), the copy sends its own vector to the other copy.
    • If a copy receives a message, compares its own vector and the sent vector, if the message sent is to synchronize the data, then you need to decide whether to update the data. The elements of each vector are compared and the maximum values are taken to update their vectors. So how do you update the data? Each value of the vector stored by the copy itself is less than the value of each vector sent, indicating that the data sent is relatively new, so update the data. If all are greater than, you do not need to update the data. Of course, the third case is both greater and less than the relationship, and one case is that the vectors are the same, but the data is different. In this case, a conflict resolution is required, such as a time stamp comparison.

Let me give you an example.

Let's say, the nickname of the n1 n2 n3 user to store the user ID 1.
At the beginning, the vector clock of the three copies and the data are indicated as follows:

n1: { vector: {n1:0, n2:0, n3:0}, data: null }n2: { vector: {n1:0, n2:0, n3:0}, data: null }n3: { vector: {n1:0, n2:0, n3:0}, data: null }

Moment 1,n1 updates the user nickname to John, and the vector clock and data are updated as follows:

n1: { vector: {n1:1, n2:0, n3:0}, data: ‘jian‘ }n2: { vector: {n1:0, n2:0, n3:0}, data: null }n3: { vector: {n1:0, n2:0, n3:0}, data: null }

At this point the system is read, the result should be ' jian '. N1 sent a message to N2, N3, and updated as follows:

n1: { vector: {n1:1, n2:0, n3:0}, data: ‘jian‘ }n2: { vector: {n1:1, n2:0, n3:0}, data: ‘jian‘ }n3: { vector: {n1:1, n2:0, n3:0}, data: ‘jian‘ }

At this point the system is read, the result should be ' jian '.

Time 2,n3 the user nickname to ' fan ', updated as follows:

n1: { vector: {n1:1, n2:0, n3:0}, data: ‘jian‘ }n2: { vector: {n1:1, n2:0, n3:0}, data: ‘jian‘ }n3: { vector: {n1:1, n2:0, n3:1}, data: ‘fan‘ }

When the system is read, the result should be ' fan '. N3 first sent a message to N2, updated as follows:

n1: { vector: {n1:1, n2:0, n3:0}, data: ‘jian‘ }n2: { vector: {n1:1, n2:0, n3:1}, data: ‘fan‘ }n3: { vector: {n1:1, n2:0, n3:1}, data: ‘fan‘ }

When N3 to send a message to N1, N1 changed the data, for example, to change the user nickname ' Ruan ', updated as follows:

n1: { vector: {n1:2, n2:0, n3:0}, data: ‘ruan‘ }n2: { vector: {n1:1, n2:0, n3:1}, data: ‘fan‘ }n3: { vector: {n1:1, n2:0, n3:1}, data: ‘fan‘ }

After that, there are two possible conflicts:

    • When the system is read, it is found that the vectors of N2, N3 and N1 have no partial order relation (i.e. no less than or greater than), and the values of the stored data are different. Conflicts need to be resolved at this time.
    • N1 received the message sent by N3, compared the two vectors, found the conflict, and then find a way to solve.
Information

Vector Clock

GOSSIP protocol

2.4.5 vector Clock (1)

"Large-scale distributed storage System-principle analysis and Architecture practice" chapter fifth Yang Shunhui
"In-depth NoSQL" Shashank Tiwari Gangcheng Translation

Dynamo involves algorithms and protocols--p2p architecture, consistent hash fault tolerant +GOSSIP protocol get cluster status + vector clock synchronization data

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.