Hybridtime-accessible Global consistency with high Clock uncertainty

Source: Internet
Author: User
Tags cassandra use definition

Amazon's Dynamo [9] and Facebook's Cassandra [], relax the consistency Model,and offer only eventual consistency.

Others such as HBase [1] and BigTable [4] offer strong consistency only for operations touching a single partition, but no T across the database as a whole.

The Dynamo and Cassandra are based on the vector clocks and gossip algorithms, emphasizing high availability and can reach eventual consistency

HBase, however, can provide a single partition of strong consistency, but for cross-partitioning, or cross-database, there is no guarantee;

For example, with a geo-replicated service, a user may first is routed to a local datacenter and perform some action such as Sending an email message.
If They then reload their browser and is routed to another datacenter backend, they would still expect to see their sent Message in their email outbox.

This example is more typical, to achieve the consistency of the remote

The simplest way to solve the problem of the consistency of distributed problems is to use synchronized clocks across all servers, but this is basically not up to the

The traditional approach to global consistency have been to use logical clocks.
Logical clocks, such as Lamport clocks [] or Vector clocks [10, 20].

So the traditional approach is to use a logical clock

https://www.zhihu.com/question/19994133

Put a good answer.

Sure, but not for the distributed database. Why is it? First, you need to determine where you are going to take this time stamp. 1, all timestamps are taken from a central point node, so that timestamps get a full order relationship, this is the simplest implementation, if your distributed database is only deployed in a data center, but this adds a central node, affecting the scalability of the database, And any action that takes a timestamp increases the overhead of network latency. If a distributed database deployment requires cross-datacenter, this scenario becomes unsustainable and has too much latency across the data center network. This affects overall performance 2, all timestamps are taken from the node that already exists. This involves the clock synchronization problem, as Dongdong said, if the physical time taken, then these nodes do not have a consistent time, there will always be a little error, this solution has two ways:
One, is now the method of logic clock proposed by Leslie Lamport, redefining a kind of distributed whole order relation. The vector clock is developed on the basis of logical clocks.
Second, the use of physical clock, is Google spanner using GPS and atomic clocks for time synchronization, the time between different nodes is not accurate synchronization, can only synchronize the time between different nodes into a range, spanner a node to get a timestamp, not a time value, But a time value plus an error range, that is, a time window, if the two time window has coincident, it can not compare size, so that the time window can not be compared to the sequence of events represented by the relationship. and time synchronization algorithm is to let the distribution of the time between the nodes as small as possible, so take the time stamp this action is the highest efficiency, only in this node to take a physical time, but in order to synchronize time, Each node must communicate with other nodes in the cycle to synchronize the physical time (Google time synchronization algorithm does not seem to be open source). Time synchronization algorithms can only make time as accurate as possible. There is no direct use of NTP in the distributed database to synchronize time, because the NTP time synchronization protocol allows the time of the node to fall back, and the database requires that the local clock must be incremented.
Third, there is a mixed logic clock (Logical physical clocks), but also by the logic clock evolved, but the clock value is closer to the physical clock, and does not depend on the following physical clock synchronization algorithm, its timestamp can be taken locally, and take out is a time value, Rather than spanner, taking out is a time window. Details can be found inLogical Physical Clocksand consistent snapshots in globally distributed DatabasesThis essay

About Lamport or vector clocks,

The Lamport clock is the logical clock, that is, a partial order that does not depend on the physical clock, but relies on the causal relationship between the event to define

The nature of reference, full order, and distributed consistency

and the vector clock is a realization of the logical clock, the specific logic to see below,

Why Vector Clock is easy or hard?

An excerpt from the above answer

"Dynamo paper was written seven or eight years ago, and now Amazon Dynamo has abandoned the version vector, using synchronous replication (Paxos like protocol), Each partition will have a leader responsible for writing. In fact, version vector is not scale, because for a key, as the number of writers increased, the number of version vector exponentially number of growth "

The disadvantages of the logic clock,

First, they require that all clients must propagate clock data to achieve consistent views,
And second, the assigned timestamps has no relation to physical time.

Spanner introduced commit-wait, a-by-ensuring physical-time based consistent global state by forcing Operatio NS to wait long enough so, all participants agree the operation ' s timestamp have passed based on worst case SYNCHR Onization error bounds.

While innovative, the system performance becomes highly dependent on the quality of the time synchronization Infrastructur E, and thus may have unacceptable performance absent specialized hardware such as atomic clocks and GPS receivers.

Spanner uses atomic clocks and GPS receivers to achieve a more accurate clock, called TrueTime, each time the TrueTime API is called to return a time interval, rather than a specific value, this truetime guarantees the real time ( Absolute time/real time) must be within this interval, the range is usually about 14ms, or even smaller.

So only each transaction time interval, do not overlap, then can be relatively easy to achieve global order;

So we need commit-wait here,

Can see, I start transaction, take a true time,[t1.earliest, T1.latest]

So when, I can commit this transaction, that is, release the lock, that someone else can start a new transaction

When I take one more time true time,[t2.earliest, T2.latest], when T2.earliest > T1.latest, you can

Because of this, the time interval of two transaction can not overlap, of course, the cost is that each transaction have to wait for about 2 error bound time

Of course, the implementation of spanner relies on hardware infra devices, ordinary users can not replicate;

Hybridtime (HT), a hybrid between physical and logical clocks and we show how HT can is used to achieve the same semantics As spanner, but with good performance even with commonly available time synchronization.

Like spanner, we approach relies on physical time measurements with bounded error to assign hybridtime timestamps to even TS that occur in the system.
However, unlike spanner, our approach does not usually require the error is waited out, thus allowing for usage in comm On deployment scenarios where clocks is synchronized through common protocols such as the Network time Protocol, in which Clock synchronization error is often higher than with spanner ' s TrueTime.

The trade-off is, in order to avoid commit-wait, Hybridtime requires, timestamps was propagated across machines to Achieve the same consistency semantics as spanner.
Contrary to vector clocks, which can expand as the number of participants in the cluster grows, hybridtime timestamps has Constant and small size.

Hybridtime says it's a combination of physical and logical clocks, like spanner, can also use physical time with error bound
But because he did not have spanner hardware facilities, so he did not have low latency time synchronization, so he still use logic clock to achieve consistency, and he solved the vector clocks as the number of clients increased and the size of the expansion of the problem

Hybridtime Clocks Follow similar update rules to Lamport clocks, but the time values is not purely logical:
Each time value have both a logical component, which helps in guaranteeing the same properties as a Lamport clocks, and a P Hysical component which allows the event to being associated with a physical point-in-time.

One of spanner ' s key benefits is the externally consistent, which is defined as fully linearizable, even in t He presence of hidden channels.

Additionally we use the term globally consistent to describe a system which provides the same linearizability SEM Antics, provided that there is no hidden channels present.

Area is divided, whether have hidden channels? Meaning that there is a causal or sequential relationship between the clients, for example, by sending a message, it is the causal relationship that is not known to the system, such as a write, then send a message to B, then B write, then causality, B write must after a write, But he does not know about the database, so there is no guarantee that a write will be written first;

Spanner is guaranteed to externally consistent, as he commits to wait, to ensure that each transaction must be globally ordered

Relative globally consistent, simpler, because there is no hidden channels

Spanner ' s key innovation is this timestamps assigned by the system can being used to achieve external consistency, b UT also has physical meaning.

Hybridtime is all globally consistent, and through selective application of commit-wait is externally consiste Nt.

Spanner's main innovation is to achieve external consistency while preserving the physical time

and Hybridtime, by default, can be implemented globally consistent, that is, partial order, because he is using Lamport clock, and when you choose Commit-wait, you can also guarantee externally consistent;

Hybridtime Assumptions

1. Hybridtime assumes that machines has a reasonably accurate physical clock, represented by the PCi (e) function, which outputs the numeric timestamp returned by the physical clock as read by process I for event e, t Hat is able to provide absolute time measurements (usually in milli-or microseconds since 1 January 1970).

2. Keeps the physical clocks across different servers synchronized with regard to a reference server, the "reference" time, represented by the Pcref (e) function which outputs the numeric timestamp returned by the "reference" p Rocess for event E;

Additionally, we assume that such a substrate are able to provide an error bound along with each time measurement, Denoted by the El (e) function, which outputs the numeric valueεerror of process I at the time e occurred

3. We make no assumptions on the actual accuracy of the clocks, i.e. the physical timestamps returned by server ' s clocks m Ay has an arbitrarily large but finite error, as long as this error ' s bound is known

Plainly, suppose

There is a relatively reliable physical clock, an ideal reference clock, and the difference between them, error bound

Finally, we do not assume that this error bound will be very small, as long as the limited can

Suppose 1, we have limited error bound

Assuming 2, the process-level physical clock is monotonically increasing, note the process level

Based on the above hypothesis, Hybridtime Clock and Protocol are proposed.

Hybridtime Clock and Protocol

Hybridtime Clock (HTC) is a pair (physical,logical) where the first component is a representation of the physical Time at which the event occurred and the second component is a logical sequence number.

The definition is actually obvious ...

Algorithm 2 depicts the HTC algorithm.

HTC algorithm as above, two parts, now and update

Now is to take the current hybridtime clock

Upate is to update the current clock according to in

Algorithm 2 implements a Lamport Clock, with the additional advantage that generated timestamps has physical meaning and is accurate representations of physical time within a bound error.

The algorithm itself, in fact, is Lamport clock, just add the physical clock part

Give an example,

To order the events timestamped using the Hybridtime Clock algorithm we use Definition 1.

Definition 1. HCT (E) < HCT (f) is defined as the lexicographical ordering of the timestamp two-tuple (physical,logical)

Theorem 1. The Hybritime clock Happened-before relation forms a total order of events

Theorem 2. For all event in a causal chain F, the physical component of a HTC timestamp approximates the "real" time the event occur Red, with a error defined and bounded by

Implementation

No consistency-in This mode there is no external consistency guarantees, transactions is assigned timestamps from each Server ' s physical clock and no guarantee is made that reads be consistent or repeatable.

Direct use of local physical time, does not guarantee consistency

Hybridtime Consistency-in This mode our implementation guarantees the global consistency as spanner, absent hidden Chann Els, but using hybridtime instead of commit-wait.
Clients choosing this consistency mode on writes must make sure that's timestamp that's received from the server is pro Pagated to other servers and/or clients.
Within the same client process, timestamps is automatically propagated on behalf of the user.

This is actually the logical clock, no difference, is to ensure that the partial order

Commit-wait Consistency-in This mode we implementation guarantees the same external consistency semantics as spanner by Also using commit-wait in the the-described in the original paper.
However instead of using TrueTime, which is a proprietary and private API, we implemented commit-wait on top of the widely Used Network time Protocol (NTP). Hence, in this consistency mode we support hidden channels.

This estimate is useless, in the absence of spanner hardware guarantee, commit-wait, performance can not tolerate it

Hybridtime-accessible Global consistency with high Clock uncertainty

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.