Introduction to Vector clock algorithm--essentially similar to MVCC

Source: Internet
Author: User

Transferred from: http://blog.chinaunix.net/uid-27105712-id-5612512.html

First, the use of the background

Let's start with a scene that requires a vector clock . When we write data, we often want the data not to be stored at a single point. such as DB1,DB2 can provide write service at the same time, and all have the full amount of data. The client does not have to worry about data-scrambling, regardless of which DB is written. But in real-world scenarios, parallel changes are often encountered. Results in inconsistent DB1 and DB2 data. So someone came up with some solutions. Vector clocks are considered one of them. Easy to understand. But there is not a complete solution to the conflict, and the reality of distributed storage complements many additional techniques.

Here is the reverse narration, which introduces the vector clock. First of all, let the reader have a perceptual knowledge, and then say the algorithm rules.

Two, give an example

The vector clock is actually a set of version numbers (version number = logical clock), assuming that the data need to hold 3 copies, requires 3 db storage (denoted by a,b,c), then the vector dimension is 3, each DB has a version number, starting from 0, thus forming a vector version [a:0, b:0, c:0];
Step 1: In the initial state, all machines are [a:0, b:0, c:0] ;

db_a--> [a:0, b:0, c:0]

db_b--> [a:0, b:0, c:0]

db_c--> [a:0, b:0, c:0]

Step 2: Suppose the application is now a shopping mall, now enter a kidney 6 the price iphone6 Price 5888 ; The client randomly chooses a DB Machine writes. Now assume that ais selected. , the data is probably like this:
{key=iphone_price; value=5888; vclk=[a:1, b:0,c:0]}

Step 3: Next A will synchronize the data to B and the C and the final synchronization results are as follows

db_a--> {key=iphone_price; value=5888; vclk=[ a:1, b:0,c:0]}

db_b--> {key=iphone_price; value=6888; vclk=[ a:1, b:0,c:0]}

db_c--> {key=iphone_price; value=5888; vclk=[ a:1, b:0,c:0]}

Step 4: After a minute, prices fluctuate and rise to 6888 , so a salesman updated the price. At this point the system randomly chooses B as the write store, and the result looks like this:

db_a--> {key=iphone_price; value=5888; vclk=[a:1,b:0,c:0]}

db_b--> {key=iphone_price; value=6888; vclk=[a:1,b:1, c:0]}

db_c--> {key=iphone_price; value=5888; vclk=[a:1,b:0,c:0]}

Step 5: So B Synchronize the updates to several other storage

db_a--> {key=iphone_price; value=6888; vclk=[a:1, b:1, c:0]}

db_b--> {key=iphone_price; value=6888; vclk=[a:1,b:1, c:0]}

db_c--> {key=iphone_price; value=6888; vclk=[a:1, b:1, c:0]}

So far, it's normal to sync, so let's start with a little bit of an abnormal situation.

Step 6: Prices fluctuate again and become 4000 , this choice C Write:

db_a--> {key=iphone_price; value=6888; vclk=[a:1, b:1,c:0]}

db_b--> {key=iphone_price; value=6888; vclk=[a:1,b:1,c:0]}

db_c--> {key=iphone_price; value=4000; vclk=[a:1, b:1,c:1]}

Step 7:C Synchronize the updates to A and the B , because of some problems, only sync to A , the results are as follows:

db_a--> {key=iphone_price; value=4000; vclk=[a:1, b:1, c:1]}

db_b--> {key=iphone_price; value=6888; vclk=[a:1,b:1,c:0]}

db_c--> {key=iphone_price; value=4000; vclk=[a:1, b:1,c:1]}

Step 8: The price fluctuates again and becomes 6000 Meta, System selection B Write

db_a--> {key=iphone_price; value=6888; vclk=[a:1, b:1, c:1]}

db_b--> {key=iphone_price; value=6000; vclk=[a:1,b:2, c:0]}

db_c--> {key=iphone_price; value=4000; vclk=[a:1, b:1,c:1]}

Step 9: When the B synchronization update to A and C when the problem occurs, a own vector clock is [a:1, b:1, c:1], while receiving the update message brought over the vector clock is [a:1,b:2, c:0], b:2 than b:1 new, but C: 0 is older than C1. An inconsistent conflict occurs at this time. How to resolve the inconsistency problem? Vector clock policy does not give a resolution version, left to the user to solve, just tell you the current data conflict .

Iii. Introduction of rules

Version number change rule is actually 2, relatively simple

1, each time to modify the data, the version number of this node plus 1, such as Step 8 in the above write to B, so from B:1 to B:2, the other node version number does not change.

2, each synchronization of data (note here, synchronization and modification is not the same as the write operation OH), there will be three cases:

A: The vector version of this node is lower than the vector version of the message (less than or equal to) as this node is [A:1, B:2,c:3]}, the message is carried over to [A:1, B:2,c:4] or [A:2, B:3,c:4] and so on. At this time the merge rule takes the maximum value of each component.

B: The vector version of this node is more than the vector version of the message brought over, this time you can think that the local data than the synchronized data to be new, directly discard the version to synchronize.

C: There is a conflict, such as step 9 above, some of the component version is large, some component version is small, can not determine who is the latest version. Conflict arbitration is necessary.

Iv. Conflict Resolution

In fact, there is not a better version of the conflict resolution: As far as I know, plus time stamp is a strategy. The specific method is to add one dimension information: The timestamp of the data Update (timestamp). [A:1, B:2,c:4,ts:123434354], if there is a conflict, then compare the TS of two data, the large value of the comparison after the update, select it as the final data. and the vector clock is revised.

Introduction to Vector clock algorithm--essentially similar to MVCC

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.