Linear consistency and full sequence broadcast------"Designing data-intensive Applications" Reading notes 12

Source: Internet
Author: User

A chat about the difficulties of building a distributed system, this article will focus on the construction of fault-tolerant distributed system algorithms and protocols. The best way to build a fault-tolerant system is to use a generic abstraction that allows applications to ignore some of the problems in the distributed system. Let's talk first. Linear consistency, as well as the technology related to linear consistency, the subsequent need to understand the distributed coordination services, such as: zookeeper, etc., are based on the linear consistency of distributed systems.

1. Stronger consistency

Most distributed databases provide at least eventual consistency , which means that if you stop writing to the database and wait for a period of time, eventually all read requests will return the same value. However, this is a very weak consistency guarantee and the so-called period of time is uncertain. If you write a value and then read it immediately, you cannot guarantee that the value you have just written will be read.

The eventual consistency model is a big annoyance for application developers, and when using a database that only provides weak consistency, developers need to be aware of its problems, and the database may have a subtle error because the application may work well for most of the time. When there is a failure in the system (for example, a network outage) or high concurrency, the eventual consistency of the data model exposes many problems. So the data system can choose to provide a stronger consistency model, but it will introduce a new trade-off: a system that has stronger consistency is more likely to be used correctly, but it may have less performance or fault tolerance than weakly-consistent systems. We need to understand it better and choose the data model that best fits your needs.

Linear consistency

The idea of linear consistency is simple, and we use the following two illustrations to illustrate:

In a linear system, there must be a point in time (between the start and end of the write operation), and the value of x is changed from 0 to 1. Therefore, if a client reads X when it returns a new value of 1, all subsequent reads must also return a new value.

Linearization and serialization

Linearization differs from serialization in that it does not constitute a transaction. Therefore, the security of concurrent writes cannot be fully guaranteed. A database can provide both serialization and linearization, such as a two-phase lock that can provide both serialization and linearization, while serialized snapshot isolation is not linearized.

What problems can linear consistency solve?
    • Distributed locks and leader elections
      A single leader system needs to ensure that there is only one leader, and multiple leader can lead to the occurrence of brain fissures. The essence of the leader election is a lock contention, where each node tries to acquire a lock and gets a successful node to become leader. in any case, the lock must be linearized: all nodes must agree which node owns the lock and becomes the leader

    • Uniqueness constraints
      Uniqueness constraints are common in databases: for example, a user name or e-mail address must uniquely identify a user, and in a file store service, there cannot be two files with the same path and file name. If you want to enforce this constraint for data writing (for example, if two people try to create a user at the same time or a file with the same name, which will return an error), you need to linearization.

How to implement a linearization system?

linearization means: Like a single copy of the data, and all of its operations are atomic. The simplest answer is to really just use a single copy of the data. This is obviously a loss of fault tolerance, a single node exception, the system will not be able to access. The most common way to make system fault tolerance is to use replica technology:

    • Single leader multi-follower mechanism
      In the single leader multi-follower mechanism, leader has the primary replica, follower maintains a backup copy of the data on the other nodes. You can choose to read from the leader, or synchronize the updated follower, you can implement the linearization system on this basis.

    • Consistency algorithm
      Through the consensus protocol algorithm, we can prevent the splitting of the brain and read the outdated data, and the consistency algorithm can realize the safe storage of the core data linearization. This is the basic algorithm for distributed coordination services such as zookeeper and chubby.

The price of CAP theory and consistency

Eric Brewer introduced the CAP theory in 2000, in short: Data systems have to weigh in a triangular relationship of consistency, usability, and partitioning tolerance, and no system has the means to meet three features at the same time.

Therefore, the use of linearization consistency naturally requires some compromise in usability, under the single leader multi-follower mechanism, the client that needs to satisfy the linearization consistency of the write and read must be connected to the leader. If the leader generates interrupts, the follower data can still be read, but there is no guarantee that the linearization will be required at this time.

2. Full-sequence broadcast

As mentioned above, a linearized system can be implemented by single leader multi-follower mechanism and consistency algorithm, but there is also a very important content we need to explore: full-sequence broadcast .
But don't worry, let's talk about the timing of the Distributed system:

Lamport time Stamp

The Lamport Timestamp is a method of generating the sequence number of causality, which we can use to clarify the sequence of operations in the distributed system, Leslie Lamport was proposed in 1978. The implementation of the Lamport timestamp is simple, with each node having a unique counter identifier, and each node holds its counter. Two nodes may sometimes have the same counter value, but each counter value contains the node ID, so each counter value can be considered a unique timestamp.

The Lamport timestamp does not have the exact physical time, but it can be ordered in the distributed system: There are two timestamps, a larger counter timestamp is the updated value, and if the counter value is the same, a larger node ID is a larger timestamp. demonstrates how the Lamport timestamp works, and it conforms to the causal relationship in the distributed system:

However, judging from the total order of Lamport timestamps, it is not possible to determine whether the two operations are concurrent, or whether they are causal related. Although the Lamport timestamp confirms the causal relationship of the operation, there are still some problems in the distributed system:
Consider a system that needs to ensure that the user name uniquely identifies the user account. If two users try to create an account with the same user name at the same time, one of them should succeed and the other should fail. Obviously, if two accounts of the same user name are created, selecting a operation with a lower timestamp succeeds because the Lamport timestamp is fully ordered and this comparison is valid. However, to ensure that no other nodes are creating accounts at the same time, the nodes have to communicate with each other to confirm. If a network problem occurs, one of the other nodes is dead or unreachable, the system will also fail.

The problem with Lamport timestamps is that all operations need to be collected before the total order of operations occurs. If another node has other operations, the final order of the operation cannot be constructed without knowing it.

Full-sequence broadcast

The mechanism of the full-order broadcast is to use the single leader multi-follower mechanism to sort all operations on the leader node, thus determining the entire sequence of operations and broadcasting the sequence of operations. full-order broadcasts can guarantee global knowledge of information, and solve the problem of Lamport time stamp. However, full-order broadcasts also address a number of issues: how to extend the system and how to fail over if the throughput is greater than the single leader processing, and if there are leader failures.

The full sequence broadcast requirement satisfies the following two attributes that are always met:

    • Reliable delivery, no message loss: If a message is passed to a node, it is passed to all nodes.
    • Fully ordered, messages are passed to each node in the same order.

A correct full-order broadcast algorithm must ensure the reliability and order of node and network failure. Once the phenomenon of network differentiation occurs, the algorithm can keep retrying and still maintain the order of information. Full-order broadcasts are of great importance to distributed systems: if each message represents a write to the database, and each copy processes the same write in the same order, the replicas remain consistent, and the state machines of each node are consistent, allowing for state machine replication.

3. linearization consistency via full-order broadcast

A full-order broadcast is asynchronous: The message is guaranteed to be passed reliably in a fixed order, but there is no guarantee of when the message will be delivered (so that the node may lag behind other nodes). Linearization consistency guarantees that each read can read to the latest value of the write. We can rely on full-order broadcasts to achieve linearization consistency on storage:

    • 1. Append the message to the log, adding the user name to be declared.

    • 2. The node is checked through the state machine in memory, and if the user name is the first message, the user name is written successfully. Otherwise, the operation is terminated.

Because full-order broadcasts guarantee that the message is passed to all nodes in the same order, assuming there is concurrent write, all nodes will agree on the first message written to the user name. While full-order broadcasts can guarantee a linear write to a program, it is assumed that the nodes that read operations do not guarantee linear reads because of the latency of the message delivery, so the result of the read operation may be outdated.

Of course, this can be achieved by returning the location of the most recent log messages, by querying the location, waiting for all entries to read the entries are written, and then read operations, can achieve the linear consistency of the read operation. ( implemented by the sync () operation in zookeeper ), or by forcing the read of the leader node's pair, it is clear that the data on the leader node must be the latest result.

Summary:

With the linear consistency of the full sequence broadcast, we can already implement a distributed system coordination service. The next one will talk about the consistency protocol in the distributed system, but also the core concept of distributed system, how can we make the distributed node agree, the difficult person will not, the meeting is not difficult, we see next.

Linear consistency and full-order broadcast------Designing data-intensive applications Reading notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.