Detailed COCKROACHDB transaction Processing system

Source: Internet
Author: User
Tags cockroachdb

Some of the terms mentioned in this article, such as serializability and linearizability, explain linearizability, serializability and Strict serializability.

Most of the views in this article are referred to COCKROACHDB official blog, design documents, code and related materials, relatively trivial, and some places are not explained too clearly, here to try to combine these data. I believe it will be easier to read this article and look at official documents.

Introduced

COCKROACHDB is a distributed data support for SQL, acid-enabled distributed transactions, and supports the highest isolation level of ANSI SQL serializability.

In a distributed system, it is difficult to support linearizability because there is a clock error between different machines and a global clock is required. TIDB chose the same solution as percolator, with a single point of timestamp Oracle providing a clock source. Google Spanner directly engaged in a hardware-based TrueTime API to provide relatively accurate clocks. COCKROACHDB does not have atomic clocks, and does not use a single point timestamp Oracle, but is based on NTP to synchronize clock offsets between machines as much as possible, and NTP errors can reach 250ms or more, and cannot be strictly guaranteed. This leads to COCKROACHDB to ensure linearizability consistency is difficult and performance is poor. In the end, although Cockroachdb supports linearizability, it is not officially recommended. By default, COCKROACHDB supports serializable isolation level, but linearizability is not guaranteed.

Serializable

A real database system at the same time there will be a lot of concurrent transactions in the execution, how to make these transactions feel that only their own running in the database is not affected by other transactions of any interference is an isolation level of the problem. Serializable is not subject to any interference, the weak level of isolation has repeatable read, read Committed, read uncommitted,snapshot Isolation These isolation levels are more or less likely to feel the interference of other transactions, such as Repeatable Read has a phantom reading problem, Snapshot isolation have write skew problem, specifically do not repeat. can refer to A-critique-of-ansi-sql-isolation-levels

It is difficult to implement a database that supports SERIALIZABLE isolation level, and many databases do not support the Serializable isolation level for several reasons, I think the most important reason is that performance is not good. Oracle 11g default Isolation level RC, highest isolation level snapshot isolation, industry-leading database support for isolation levels see what is "acid" acid? Rarely. Cockroachdb, however, took a lot of effort to achieve serializable.

A transaction typically contains multiple read and write operations and operates on different rows/columns. The database system dispatches the transactions in the system, and the transactions Cross, not one after the other.

Altogether three transactions, which are a kind of dispatching of the database system to these three transactions. So is this dispatch serializable? This has theoretical support: serializability graph. This theory introduces three kinds of conflicts, and three kinds of conflicts are the same data for different transactional operations:

    • Rw:w overrides R Read value
    • Wr:r read the value of the W update
    • Ww:w overrides the value of the first W update

For any one transaction scheduling result, if there is a conflict between two transactions, a forward edge is attached to the transaction (the subsequent transaction points to the previous transaction). is the serializability graph for the above transaction dispatch.

It has been shown that if a transaction does not have a ring in the serializability graph, the transaction schedule is serializable. So, how did cockroachdb do it?

COCKROACHDB Transaction Processing System

    • Multiple Versions


Cockroachdb's transaction is Lock-free, does not need to add any read-write lock, naturally need to maintain several versions of the data, the version is identified by timestamp.

Acid A and I are closely related, are guaranteed by the concurrency control Protocol, the following first explains how a is guaranteed, and then explain in the case of concurrency, I is how to guarantee. The Concurrency control protocol guarantees a and I.

    • Atomic Nature

A distributed transaction may read and write data on multiple nodes, how to guarantee atomicity? We all know that distributed transactions are 2PC, the first stage to do prepare, the need to read the data in the read (how to ensure that the latest data to read, the latter will say, the first hypothesis can be read), calculation, and finally the data written to the various nodes, but not the external entry into force, This data is not readable by other transactions in the system. This data that has been written to each node, but not in effect, is called write intent, and the write intent is stored with the actual data, except that it is not read externally cockroachdb.

So where does this state of affairs exist? In fact, at the beginning of a transaction, a record is written to the underlying storage system, a record called transaction Record,record that records the transaction ID, the transaction state, Pending (running) or committed, or aborted, In the write intent, the key points to the transaction Record. To commit a transaction, simply change the transaction state in the transaction record to committed, and rollback the transaction to aborted. Once the transaction state has been modified successfully, it can be returned to the client, and the left write intent will be processed asynchronously: when commit, the value of the write intent is overwritten with the original value, and the write intent is deleted directly when the write Intent,rollback is removed.

Then when the client comes over to read, if it encounters the write intent (previously said, write intent is an asynchronous delete), it will find the transaction Record along the write intent, look at the state of the transaction, if the state is committed, Returns the value in the write intent if abort returns the original value. If it is pending, it means that the transaction is still running, encountered a write conflict, how to resolve the Write conflict? This involves isolation levels and concurrency control protocols, see below.

    • Isolation of

As mentioned earlier, the data is multi-versioned, and the version is identified by timestamp. Timestamp is the wall time (actually HLC, a physical clock-based logical clock that captures causality) at the beginning of a transaction when a read/write transaction is started. This timestamp is only the final commit of this transaction candidate timestamp, not necessarily the final commit timestamp (the root cause is the existence of clock offset between machines, which will be discussed later), here first assume that Got a final timestamp. The larger the timestamp, the more new The version. All written data for this transaction will be marked with this timestamp as the version ID. In such a system, serializability graph probably looks like this:

The above diagram is non-circular. The following diagram is a ring:

Back to serializability, in order to achieve serializability, it is necessary to ensure that the scheduling of transactions is non-ring. Cockroachdb by avoiding the three conflicts mentioned earlier in the timestamp, so that there will be no and timestamp to the same side, thus guaranteeing no ring. Finally, Cockroachdb's serializability graph looks like this:



COCKROACHDB guarantees the following constraints:

      • The timestamp of rw:w can only be larger than R, which only produces a back edge (by maintaining a read Timestamp Cache at each node).
      • Wr:r will only read the largest version of timestamp smaller than his own, which will only produce a back edge.
      • WW: The second W's timestamp is larger than the first W's timestamp, which will only produce a back edge.

That is, if a transaction is only guaranteed to conflict with a smaller timestamp transaction, it can be guaranteed to be non-cyclic.

    • Recoverable

The brutal one is to ensure that serializability can be achieved without the loop, and that the consistency of the database is maintained, i.e. C in acid. Consider the following scenario:

T1,t2 two transactions, timestamp (T1) < timestamp (T2), T1 update A, not submitted, T2 read A. This is a WR conflict, but because this conflict is a back side, it is allowed. In order to maintain the RW constraints mentioned above, T2 must read the T1 update (W's timestamp must be larger than R, however T1 is smaller than T2). However, what is the problem with the update to a of T2 read T1?

      • T2 read the T1 update. If the final T2 commit, and then T1 rollback, this will violate the atomic nature of T1: T1 did not write a successful value was T2 read.

COCKROACHDB uses a more demanding schedule to handle this scenario: all operations can only be performed on data that has been committed! Here is how the cockroachdb of this harsh scheduling is guaranteed, here need to use the previous atomic knowledge.

    • Strict Scheduling

As you can tell from the previous section and the atomic section that a transaction has encountered a write intent, it is possible that the transaction that wrote the write intent has not ended (because the write intent is cleared asynchronously), which indicates that it is possible to run into uncommitted data. At this point, the current transaction checks the state of the transaction where the write intent is located, and if it has already committed, overwrites the old value with the write intent and clears the write intent. If you have already rolled back, clear the write intent directly. If it is pending, is it running? At this point, it's time to look at the priority of the transaction, the low-priority transaction needs abort, and the priority given at the start of the transaction is random. COCKROACHDB will ensure that the abort transaction has a higher priority after restart.

Here, cockroachdb how to provide serializability isolation level is finished, note that the premise here is that each transaction is given a suitable timestamp, what is called the appropriate timestamp? A distributed read/write transaction needs to be able to read the latest data that has been committed.

    • Cockroachdb how to give timestamps to transactions

Cockroachdb using NTP for clock synchronization, NTP basically guarantees that the clock offset between machines is less than 250ms, but this is not absolute, which is affected by network delay, system load and other factors. As can be seen from the front, Cockroachdb's serializability relies on clocks between machines in the cluster within a range ε. This range can be configured by default of 250ms. At any given moment, when you get wall time on a machine, the maximum wall time that may exist in the cluster is t+ε.

At the beginning of a transaction T, take a local wall time (actually HLC), as T, according to the NTP definition, the maximum wall time for the machine in the cluster is t+ε, if the data object read during the execution of the transaction is between [t,t+ε], We do not know whether this value is a commit after the start of T, or a commit before the start of T. So t need to restart, reset T for this timestamp to hit.

Summarize

Overall, COCKROACHDB's concurrency control protocol is a lock-free, unlocked, optimistic protocol. It is not suitable for the application of strong data competition, which requires frequent restart transactions. Also, NTP does not always guarantee that the clock error between machines is within a range, and once this range is exceeded, it will violate serializability.

Reference documents

Serializable, Lockless, distributed:isolation in Cockroachdb

How Cockroachdb Does distributed, Atomic transactions

Cockroachdb beta-20160829

Cockroachdb/cockroach

Living without Atomic clocks

Logical physical clocks and consistent snapshots in globally distributed Databases

Detailed COCKROACHDB transaction Processing system

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.