Analysis of data consistency

Source: Internet
Author: User

What is data consistency?

?? In the case of multiple copies of the data, if the network, server, or software fails, the partial replica write succeeds and the partial copy write fails. This results in inconsistent data between the replicas and conflicting data content. In practice, there are many different types of data inconsistencies, such as the failure of the data update return operation, and the fact that the data has been successfully updated in the storage server.

Cap theorem

?? The cap theorem was introduced in 2000 by Eric Brewer. Brewer that when designing and deploying a system in a distributed environment, there are 3 core requirements that exist in a particular relationship. The distributed system here is said to be physically distributed systems, such as our common web system.
?? The 3 core requirements are: Consistency,availability and partition tolerance, which gives the theory another name-CAP.
??Consistency: Consistency, which is similar to the consistency of the database acid, but the consistency and correctness of the data on all data nodes here, while the acid of the database is concerned with some constraints on the data within a transaction. The system is still in a consistent state after an operation has been performed. In a distributed system, all users should be able to read the latest value after the update operation is successfully executed.
??Availability: availability, each operation is always able to return results within a certain amount of time. Note that "a certain amount of time" and "return results" are required. "A certain amount of time" means that the system result must be returned within a given time. "Return result" is the result of a successful or failed return operation of the system.
??Partition Tolerance: Partition tolerance, whether the data can be partitioned. This is due to performance and scalability.
?? The cap theorem argues that a storage system that provides data services does not allow colleagues to meet data consistency, data availability, and partition tolerance.
?? Why not fully guarantee this three point, the personal feel mainly because once the partition, it is necessary to communicate between the nodes, involving communication, there is no way to ensure that the specified text in a limited time, if required to complete between the two operations, because the communication is involved, There must be a time when only one part of the business operation is completed, and the data is inconsistent during the time the communication is completed. If consistency is required, the data must be protected during the time the communication is complete, making any operation that accesses the data unavailable.
?? If you want to ensure consistency and availability, then data cannot be partitioned. A simple understanding is that all data must be stored in a database and cannot be split into databases. This is unacceptable for large data volumes, and for high-concurrency Internet applications.
?? In large-scale Web applications, the scale of data is always rapidly expanding, so scalability is essential to partition tolerance, scale becomes larger, the number of machines will become large, which is the network and server failure will occur frequently, to ensure that the application is available, it is necessary to ensure the high availability of distributed processing system. So in large Web sites, it is common to choose to harden the availability (A) and Scalability (P) of the distributed storage system, to some extent, to give up consistency (C). Generally speaking, data inconsistency usually occurs in the case of high concurrent write operation or cluster state instability (failure recovery, cluster expansion, etc.), the application system needs to understand the data inconsistency of the distributed data processing system and make some sense compensation and error correction to avoid the incorrect application system data.

Data consistency model

?? Some distributed systems improve the reliability and fault tolerance of the system by replicating the data, and the different copies of the data are stored on different machines, so many systems adopt weak consistency to improve performance due to the high consistency of maintaining the data copy, and some different consistency models have been proposed in succession.

    1. Strong consistency: Requires that no matter which copy of the update operation is executed, all read operations will be able to obtain the latest data.
    2. Weak consistency: It takes a while for a user to read an operation to update system-specific data, which we call the "inconsistency window."
    3. Final consistency: A special case of weak consistency, which ensures that the user can eventually read an update to the system-specific data of an operation.
Data consistency implementation technology quorum system NRW strategy

?? This protocol has three keywords N, R, W.

    • n represents the number of copies that the data has.
    • R indicates the minimum number of copies to read for the read operation, that is, the minimum number of nodes to participate in a read operation.
    • W indicates the minimum number of copies to be written to complete the write operation, that is, the minimum number of nodes to participate in a write operation.

?? In this strategy, strong consistency can be guaranteed only if the r+w>n is guaranteed.
?? For example: n=3,w=2,r=2, it means that the data in the system has 3 different replicas, when the write operation, it is necessary to wait for at least 2 copies completed the write operating system will return the execution of the successful state, for the read operation, the system has the same characteristics. Because of R + W > N, the system is guaranteed to be strong consistent.
?? R + w> n produces a similar quorum effect. The read (write) delay in the model is determined by the slowest copy of R (W), sometimes in order to achieve higher performance and lower latency, r and W and possibly less than n, then the system cannot guarantee that the read operation can obtain the latest data.
?? If R + W > N, then the distributed system will provide strong consistency, because the nodes that read the data and the nodes that are being synchronously written overlap. In a relational data management system, if n=2 can be set to W=2,r=1, which is a strong consistency constraint, the performance of the write operation is relatively low because the system requires that the data on 2 nodes be updated before the results are returned to the user.
?? If R + w≤n, then the read and write operations are not overlapping, the system can only guarantee eventual consistency, and the consistent time of the replica depends on the implementation of the system asynchronous update, the time period of the inconsistency is equal to the time from the beginning of the update to all the nodes are asynchronous completion of the update between.
The settings of R and W directly affect the performance, scalability and consistency of the system. If W is set to 1, a copy completes the change to be returned to the user, and then updates the remaining copy of the N-W through an asynchronous mechanism, and if R is set to 1, the read can be done as long as a copy is read, and the values of R and W are less likely to affect consistency and larger will affect performance. Therefore, the settings for these two values need to be weighed.

Here are a few special cases of different settings:
1. When w=1,r=n, the system has higher requirements for write operations, but the read operation will be slow, if there are nodes in the N node failure, then the read operation will not be completed.
2. When r=1,w=n, the system to read operation has a high performance, high availability, but low write performance, for the need for a large number of read operations of the system, if there are nodes in the N node failure, the operation will not be completed.
3. When r=q,w=q (q=n/2+1), the system balances read and write performance, taking into account performance and availability.

Two-phase commit algorithm

?? In a two-phase commit protocol, the system generally consists of two types of machines (or nodes): one is the Coordinator (Coordinator), the other is usually one system, and the other is the transaction participant (participants,cohorts or workers), which generally contains multiple It can be understood as the number of copies of data in the data storage system. The two-phase commit protocol consists of two phases, under normal execution, the execution of these two phases is described below:

    • Phase 1: Request phase (commit-request phase, or voting stage, voting phase).
      During the request phase, the coordinator notifies the transaction participants that they are ready to commit or cancel the transaction before entering the voting process. During the voting process, participants will inform the facilitator of their own decision: consent (the transaction Contributor local job execution succeeds) or cancel (local job execution failure).
    • Phase 2: Commit phase (commit phase).
      At this stage, the facilitator will make a decision based on the poll results of the first stage: Commit or Cancel. The coordinator notifies all participants to cancel the transaction when and only if all participants agree to submit the Transaction Coordinator to notify all participants to commit the transaction. The action that the contributor will perform after receiving a message from the coordinator.

?? For example: A organization B, C and D three people go to climb the Great Wall: if everyone agrees to climb the Great Wall, then the event will be held, and if one disagrees to climb the Great Wall, the event will be canceled. The process of solving the problem with the 2PC algorithm is as follows:

    1. First, a will become the coordinator of the event, and B, C and D will be participants in the event.
    2. Phase 1:a sent E-mails to B, C and D, proposing to climb the mountain in the next Wednesday, ask whether agree. At this point a needs to wait for messages from B, C, and D. B, C, and D view their schedules separately. B, C found themselves on the day there is no activity arrangements, e-mail told a they agreed to climb the Great Wall next Wednesday. For some reason, D did not check the mail during the day. At this point A, B and c all need to wait. In the evening, D found a e-mail, and then check the schedule, found that the day of Wednesday there are other arrangements, then D-Reply A said the event canceled.
    3. Phase 2: At this point a received a message from all active participants, and a found that D was unable to climb the mountain in the next Wednesday. Then a will send e-mail notification B, C and D, the next Wednesday to climb the Great Wall activity cancelled. At this point B, c reply a "Too bad", D reply a "sorry". The transaction terminates.

?? The two-phase commit algorithm is combined with the distributed system to realize the single-user modification of multiple copies of files (objects) and the synchronization of multi-copy data. The principle of its combination is as follows:

    1. The client (coordinator) sends a storage host (contributor) to all copies of the data: modifies the specific file name, offset, data, and length information and requests the modification of the data, which is a 1-stage request message.
    2. After the storage host receives the request, it backs up the modified data for rollback, modifies the file data, and responds to the successful message to the client. If the storage host cannot modify the data for some reason (disk corruption, insufficient space, and so on), respond to the message that the modification failed.
    3. The client receives every message sent out in response, if the storage host responds with all the modifications successfully, sends a commit message confirming the change to each storage host, or if the storage host responds to a modification failure, or the timeout does not respond, the client sends a cancellation modification of the commit message to all storage hosts. The message is a 2-stage commit message.
    4. The storage host receives the client's commit message, and if it confirms the modification, it should submit the OK message directly, or if it cancels the modification, revert the modified data to the modified one and then respond to the message canceling the modification OK.
    5. The client receives a response from all storage hosts, and the entire operation succeeds.

?? In this process, there may be communication failures, such as network outages, host downtime, and so on, for other exceptions not defined in the algorithm, are considered to be a commit failure, all need to rollback, which is based on the determination of the communication reply to the implementation of the algorithm, In the case of logical processing on the participant's determination of a reply (whether it be a failure or a successful reply), a deterministic condition can certainly achieve a deterministic philosophical principle of outcome.
?? Disadvantage: A single A is a serious problem: there is no hot standby mechanism, a node down or link its network bad will block the transaction; The throughput is not enough to launch more a power, once a first phase of a is voted on it must be an exclusive lock, other transactions must not be connected until the current transaction commits or rollback.

Distributed lock Service

?? Distributed lock is a conservative attitude to the data being modified, the data is locked in the whole process, while the user modifies the data, other users are not allowed to modify it.
?? The use of distributed lock service to achieve data consistency, is to obtain the operation license before the operation target, and then perform the operation, if other users try to operate the target will be blocked, until the previous user released the license, the other users can operate the target. Analysis of this process, if there is only one user action target, no multiple users concurrency conflict, also applied for Operation license, resulting in the application of the license to use the resource consumption, waste network communication and increased latency.
?? Using distributed lock to realize the consistency problem of multi-copy content modification, the control content granularity is adopted to realize the request lock service. For example, we want to ensure that multiple copies of a file to modify the same, you can set a lock on the entire file modification, modify the request for a lock, modify multiple copies of the file, to ensure that multiple copies of the same modification, release the lock after the modification is complete, or the file fragment, or a single byte in the file to set the lock, Achieve finer-grained lock operations and reduce conflicts.
?? Common lock implementation algorithm has Lamport Bakery algorithm (commonly known as the bakery algorithm), there are Paxos algorithm and optimistic lock. The following is a brief overview of its principles.

1. Lamport Bread Shop Algorithm

?? is an algorithm that resolves mutually exclusive issues with multiple threads concurrently accessing a shared single-user resource. Invented by Leslie Lamport (English: Leslie Lamport).
?? This algorithm can also be called a timestamp strategy, or a lamport logic clock.
?? Here, let's state the contents of this logical clock:
?? We use the sequential relationship of events in a distributed system, denoted by the "-and" notation, for example: If event A occurs before event B, then A->b.
?? The relationship needs to meet the following three conditions:

    1. If A and B are events in the same process, a occurs before B, then a->b
    2. If event A is the message sender and B is the receiver, then a->b
    3. For events A, B, C, if there is a->b,b->c, there is a->c

?? Note that for any one event, A,a-A is not true, that is, the relationship---is anti-reflexive. With the definition above, we can also define the concept of "concurrency" (concurrent):

For events A and B, a and B are concurrent if none of the b,b, a, and two are true.

?? Intuitively, the above-mentioned relationship is very well understood, i.e. "xxx occurs before xxx". In other words, a system under input I1, if there is a->b, then for the same input I1 of the system, no matter how many times a repetition, a also always occurs before B, if the input I1 A and B are concurrent, it means that in the same input I1 under the different runs, a may be before B, It is also possible that after B, it may happen at the same time. That is, concurrency does not mean that it must happen at the same time, but rather represents an uncertainty. The concept of concurrency is one of the most fundamental concepts we have when we understand a system.
?? With the concept above, we can introduce the clock to the system. The clock here is the Lamport logic clock. A clock is essentially a function of an event to a real number (assuming time is continuous). This function maps each event to a number that represents the time the event occurred. Form point, for each process pi, there is a clock CI, which maps event A in the process to CI (a). and the whole system clock c=< C0, C1, ..., cn> for an event B, suppose B belongs to the process PJ, then C (b) =CJ (b).

?? Here, you can see the master's understanding of the distributed system from this definition. A "global" entity does not exist in the distributed system. In this system, each process is a relatively separate entity with its own local information (local knowledge). The information of the whole system is an aggregation of the information of each process.
?? Having an "essential definition" of the clock is not enough, we need to consider what kind of clock is a meaningful, or correct, clock. In fact, with the definition of the relationship above, the correct clock should meet the conditions already very obvious:
?? Clock conditions: For any two events A, B, C (a) < C (b).
?? Note that in turn this condition is not tenable. If we ask that the reverse be true, that is, "if A-B is false, then C (a) < C (b) is false", which is tantamount to requiring concurrent events to occur simultaneously, which is obviously unreasonable.
?? Combined with the definition of the relationship above, we can refine the above conditions into the following two articles:

    1. If A and B are two events in the Process Pi, and a precedes B in pi, then CI (a) < CI (b);
    2. If A is PI send message m,b is PJ receives message M, then CI (a) < Cj (b);

?? The logical clock is defined above. Obviously, a system can have countless logical clocks of a reasonable amount. It is also relatively simple to implement a logic clock, as long as you follow the two implementation rules:

    1. Each process pi increases the CI value between any two successive events of its own;
    2. If event A is pi sending message m, then the time stamp should be taken in M tm=ci (a); if B is the process PJ receives the message m, then the process PJ should set CJ to be greater than Max (TM,CJ (b)).

?? With the definition of the above logical clock, we can now order all the events in a system in order, that is, to use the logical clock readings at the time of the event to sort the readings at a small first. Of course, there may be two events at the same time. If you want to get rid of this situation, the method is very simple: if A is in Process pi, B is in process PJ, Ci (a) = Cj (b) and I < J, then A is before B. To formalize a point, we can define the whole order relationship "=" on System event E as:
?? Suppose A is an event in Pi and B is an event in PJ, then: A and B when and only if one of the following two conditions is true:

    1. Ci (a) < Cj (b);
    2. Ci (a) = Cj (b) and I < J;

?? Lamport The concept of these mathematical logic clocks with a very intuitive analogy for customers to shop for the bakery. A bakery can only receive purchases from one customer. N Customers are known to enter the bakery shop and arrange for them to register a registration number at the front desk in the order they are ordered. The registration number is added 1. According to the registration number from the small to the large order of entry into the shop. Customers who completed the purchase return their registration number to 0 at the front desk. If the customer who completed the purchase wants to enter the store again, it must be re-queued.
?? The customer in this analogy is the equivalent of a thread, and the inbound purchase is the exclusive access to the shared resource by entering the critical section. Due to the characteristics of the computer implementation, there are two threads to obtain the same registration number, this is because two threads almost simultaneously apply for a queued check-in number, read the status of the sent to the registration number, the two threads read the data is exactly the same, and then each on the read data to find the maximum value, Add 1 as your own queue registration number. For this reason, the algorithm specifies that if the queued check-in number of two threads is equal, the thread ID number has a lower priority.
?? The principle of this algorithm is combined with the distributed system to realize the step-lock.
?? Note that this system needs to introduce clock synchronization, the blogger's opinion is that SNMP can be used to achieve clock synchronization (non-authoritative, for reference only).

2.Paxos algorithm

?? The algorithm is more popular, similar to the 2PC algorithm upgrade version, do not repeat here, you can search for relevant information. (The blogger will organize the list later)
?? Note that this algorithm is also Leslie Lamport proposed, this shows the master of the great!
?? The problem solved by the Paxos algorithm is how a distributed system can agree on a value (resolution). A typical scenario is that in a distributed database system, if the initial state of each node is consistent, each node executes the same sequence of operations, then they can finally get a consistent state. To ensure that each node executes the same sequence of commands, a "consistency algorithm" is executed on each instruction to ensure that the instructions seen by each node are consistent. A general consistency algorithm can be applied in many scenarios and is an important problem in distributed computing. There are two models of node communication: Shared memory and message delivery (Messages passing). Paxos algorithm is a kind of consistency algorithm based on message passing model. BigTable uses a distributed data lock service chubby, and Chubby uses Paxos algorithms to ensure consistent backups.
?? The Paxos algorithm can be used not only in distributed systems, but also in the case where multiple processes need to achieve some consistency. The consistency method can be implemented through shared memory (requiring a lock) or message delivery, and the Paxos algorithm uses the latter. Here are a few scenarios where the Paxos algorithm works: Multiple processes/threads in a machine agree on data, multiple clients concurrently read and write data in a distributed file system or distributed database, and multiple replicas in distributed storage respond to read and write requests.

3. Synchronization using the optimistic locking principle

?? Let's give an example of how this algorithm is implemented. As a financial system, when an operator reads a user's data and modifies it on the basis of the user's data being read (such as changing the user account balance), the use of the previous distributed lock service mechanism means that the entire operation process (from the operator to read the data, start the modification until the entire process of submitting the modified results, Even when the operator takes the time to cook the coffee, the database record is always locked, and you can see what happens if you face hundreds of thousand concurrent cases.
?? The optimistic locking mechanism solves this problem to some extent. Optimistic locking, mostly based on the data version (versions) recording mechanism implementation. What is a data version? is to add a version identity to the data, which is typically done by adding a "version" field to the database table in the version solution based on the database table. When the data is read, the version number is read together, and then the version number is added one after the update. At this point, the version data of the submitted data is compared to the current version information of the database table corresponding to the record, and if the submitted version number is greater than the current version number of the database table, it is updated, otherwise it is considered to be outdated data.
?? For the example above to modify user account information, assume that there is a version field in the Account information table in the database, the current value is 1, and the Current Account balance field (balance) is $ $.

  1. Operator A reads it out at this time (version=1) and deducts it from its account balance < Span class= "Mrow" id= "mathjax-span-1673" > 50 ( 100-$50).
  2. During operator a operation, operator B also reads this user information (version=1) and deducts it from its account balance < Span class= "Mrow" id= "mathjax-span-1679" > 20 ( 100-$20).
  3. Operator A has completed the modification work, the data version number plus one (version=2), together with the account deduction after the balance (BALANCE=$50), submitted to the database update, at this time because the submission data version is larger than the current version of database records, the data is updated, database record version updated to 2.
  4. Operator B completes the operation, and the version number plus one (version=2) attempts to submit data to the database (balance=$80), but at this time compared to the database record version, operator B submitted a version number of 2, the current version of the database record is also 2, does not meet the "committed version must be greater than the record The current version is not able to perform the update "optimistic locking policy, so the submission of operator B is dismissed. This avoids the possibility of operator B overwriting operator A's results with the results of old version=1-based data modifications.

Analysis of data consistency

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.