Vector clock algorithm introduction _ PHP Tutorial

Source: Internet
Author: User
Introduction to Vector clock algorithms. Vector clock algorithm overview 1. Background: first, let's talk about scenarios where Vector clock is needed. When writing data, we often hope that the data will not be stored at a single point. Introduction to Vector clock algorithms for db1 and db2

I. Background

Let's talk about the scenario where Vector clock is needed. When writing data, we often hope that the data will not be stored at a single point. For example, db1 and db2 can both provide write services and all data is stored. No matter which database is written, the client does not have to worry about data writing disorder. However, in real scenarios, parallel modifications are often encountered. As a result, db1 and db2 data are inconsistent. So some people come up with some solutions. Vector clock is one of them. Easy to understand. But it does not completely solve the conflict problem. real-world distributed storage provides many additional techniques.


Here, we will introduce the Vector clock in reverse narration mode. First, let us give a practical example to give readers a perceptual knowledge and then talk about algorithm rules.

II. example

The vector clock is actually A set of version numbers (version number = logical clock). assume that the data needs to be stored in three copies and three databases are required (represented by A, B, C ), the vector dimension is 3, and each db has A version number starting from 0, thus forming A vector version [A: 0, B: 0, C: 0];
Step 1: In the initial state, all machines are [A: 0, B: 0, C: 0];

DB_A --> [A: 0, B: 0, C: 0]

DB_ B --> [A: 0, B: 0, C: 0]

DB_C --> [A: 0, B: 0, C: 0]

Step 2: assume that the current application is a mall, and now the price of iPhone 6 price is 5888, and the client randomly selects a db machine to write data. Assume that A is selected ., The data is probably like this:
{Key = iphone_price; value = 5888; vclk = [A: 1, B: 0, C: 0]}


Step 3: A will synchronize the data to B and C. The final synchronization result is as follows:

DB_A --> {Key = iphone_price; value = 5888; vclk = [A: 1, B: 0, C: 0]}

DB_ B --> {Key = iphone_price; value = 6888; vclk = [A: 1, B: 0, C: 0]}

DB_C --> {Key = iphone_price; value = 5888; vclk = [A: 1, B: 0, C: 0]}


Step 4: After a minute, the price fluctuates and the value increases to 6888. Therefore, a salesman updates the price. At this time, the system randomly chooses B as the write storage, so the result looks like this:

DB_A --> {Key = iphone_price; value = 5888; vclk = [A: 1, B: 0, C: 0]}

DB_ B --> {Key = iphone_price; value = 6888; vclk = [A: 1,B: 1, C: 0]}

DB_C --> {Key = iphone_price; value = 5888; vclk = [A: 1, B: 0, C: 0]}


Step 5: B synchronizes updates to several other storage devices.

DB_A --> {Key = iphone_price; value = 6888; vclk = [A: 1,B: 1, C: 0]}

DB_ B --> {Key = iphone_price; value = 6888; vclk = [A: 1,B: 1, C: 0]}

DB_C --> {Key = iphone_price; value = 6888; vclk = [A: 1,B: 1, C: 0]}


So far, synchronization has been normal. the following shows the abnormal situation.


Step 6: The price fluctuates again to 4000. this time, select C to write:

DB_A --> {Key = iphone_price; value = 6888; vclk = [A: 1, B: 1, C: 0]}

DB_ B --> {Key = iphone_price; value = 6888; vclk = [A: 1, B: 1, C: 0]}

DB_C --> {Key = iphone_price; value = 4000; vclk = [A: 1, B: 1,C: 1]}


Step 7: C: synchronize the update to A and B. due to some problems, only A is synchronized. The result is as follows:

DB_A --> {Key = iphone_price; value = 4000; vclk = [A: 1, B: 1,C: 1]}

DB_ B --> {Key = iphone_price; value = 6888; vclk = [A: 1, B: 1, C: 0]}

DB_C --> {Key = iphone_price; value = 4000; vclk = [A: 1, B: 1,C: 1]}


Step 8: The price changes to 6000 RMB, and the system selects B to write data.

DB_A --> {Key = iphone_price; value = 6888; vclk = [A: 1, B: 1,C: 1]}

DB_ B --> {Key = iphone_price; value = 6000; vclk = [A: 1,B: 2, C: 0]}

DB_C --> {Key = iphone_price; value = 4000; vclk = [A: 1, B: 1,C: 1]}


Step 9: when B is synchronously updated to A and C, A's Vector clock is [A: 1, B: 1, C: 1]. the vector clock carried by the received update message is [A: 1, B: 2, C: 0], B: 2 is newer than B: 1, but C: 0 is older than C1. At this time, an inconsistent conflict occurs. How can we solve the inconsistency problem? The vector clock policy does not provide a resolution version. it is left to the user to resolve the issue. it only tells you that there is a conflict with the current data.


III. rule introduction

There are actually two version number change rules, which are relatively simple.

1. each time data is modified, the version number of the current node is added with 1. for example, if data is written to B in step 8, the version number of the current node is changed from B: 1 to B: 2. the version number of other nodes is not changed.

2. each time you synchronize data (note that synchronization and modification are different write operations), there are three scenarios:

A: the vector version of the current node is lower (less than or equal to) than the vector version carried by the message. for example, if the node is [A: 1, B: 2, C: 3]}, the message is carried in [A: 1, B: 2, C: 4] or [A: 2, B: 3, C: 4. At this time, the merge rule takes the maximum value of each component.

B: the vector version of the current node is higher than the vector version carried by the message. at this time, we can think that the local data is newer than the synchronized data, and the version to be synchronized is directly discarded.

C: There is a conflict. for example, in step 9 above, some of the component versions are large and some of them are small, so you cannot determine who is the latest version. Conflict arbitration is required.


IV. conflict resolution

In fact, there is no better version for conflict resolution: as far as I know, adding a timestamp is a policy. The specific method is to add a dimension information: the timestamp of data update ). [A: 1, B: 2, C: 4, ts: 123434354]. in case of A conflict, compare the two data ts. if the two data ts are in conflict, the comparison is updated, select it as the final data. And the vector clock is corrected.


V. Other problems

1. the dimension of the vector clock is equal to the number of backups for storing data. if there are too many backups. The vector length is too long. However, this problem does not seem to exist currently. Generally, the number of backups = 3 is enough. It will not be too long even if there are a few more copies.
2. in case of conflict correction, there are many corrections: some are placed on the backend server for correction, and some are handed over to the client for correction. for example, after the client is arbitrated, it is written back to the server. There are also a lot of error correction opportunities. a bit of data reading is to find that data inconsistency is corrected, and some are to find that inconsistency is corrected during synchronization. You can choose the actual implementation.


First, let's talk about the scenario where Vector clock is needed. When writing data, we often hope that the data will not be stored at a single point. For example, db1 and db2 can all...

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.