Quickly understand the distributed transaction scheme of Omid:yahoo on HBase

Source: Internet
Author: User

Liu Xuhui Raymond Reprint Please specify the source

Email:colorant at 163.com

blog:http://blog.csdn.net/colorant/


What is it

To put it simply, Omid is Yahoo! A distributed transaction solution built on HBase to extend the cross-row cross-table level of transactions that HBase does not support. Its targeting is an OLTP-type transaction. Similar systems also have a lot of, they are more or less borrowed from the percolator of Google's ideas, and Omid there is a greater difference between the specific differences, the following detailed analysis.

The following understanding of Omid mainly refers to Omid's various documents and related paper, as there is no detailed reference to Omid's latest code to compare the process of document description, so if you understand the wrong place please point out

Overall architecture Ideas

with most of the NoSQL constructs in the class HBase Similar to the Distributed Transaction Implementation scheme on KV database, the general idea of Omid is to support the multi-version data provided by HBase, and to realize distributed transaction with the help of MVCC. Theoretically, as long as the datastore that provide multi-version data support can be applied to OMID implementations. Unlike Percolator, Percolator completes transaction support by BigTable itself with a two-phase commit to store the lock information, and Omid's transaction implementation is based on the central server quorum-free scheme, and the following is a concrete look at how he is implemented.


First, as with most MVCC scenarios, Omid requires a to (timeoracle Time service) to provide the increment timestamp required by the MVCC, Omid introduces a global single-point TSO ( Span style= "Font-family:calibri" lang= "ZH-CN" >transactionstatus Oracle client client library and Hbase/to/tso interact, hbase itself does not need to be modified. Because there is a global quorum service to judge transaction conflicts, locks are not necessary as long as the service knows all the changes in the history of the data. The operation of a transaction is generally as follows :

  1. Cli ent through transactional Client requests to start a transaction operation and get a start timestamp sts
  2. the client reads and writes data based on the timestamp, and reads the data, writing the data to the specified STS when the cell is written. Version time is wrong, This step ensures that different transactions do not conflict when writing data, that is, the ability to provide the unlocked update data.
  3. After the data has been written, the client submits a commit request to TSO, which contains a list of all the rows involved in the modification, which TSO determines whether there is a conflict between the transactions, and in a similar percolator system, The conflict of a transaction is determined by the client itself (because a lock can be used to determine), and if successful, returns a transaction commit timestamp CTS

To judge the conflict of the transaction, first of all, look at the principle of the conflict, the basic idea is that if the time range of the transaction has the intersection, and the content of the commit modification has the intersection, then the transaction is considered to be a conflict, conversely, there is no conflict between the transactions.


So, only the T2and T3There is a conflict between the two transactions. Other situations: T1and T2time has intersection, but there is no intersection of the modified content, T1and T3modified content has intersection but time does not intersect, T4and all other transactions have a content intersection, but the time does not intersect.

How TSO determines if a conflict between transactions requires a look at data structure of TSO internal


There are several data structures inside TSO:

    • One is the transaction start timestamp The Mapping Relationship table (commit table) of the STs and the transaction commit timestamp CTS. After each transaction is successfully committed, a record of the STS and CTS is added.

    • The second is the timestamp record of the last successful commit of each cell, which is known as the conflict resolution table (Conflict map)

TSO based on this table to determine whether the new commit conflict, TSO according to the STS to commit the transaction to judge, if there is a row of the STS is less than the conflict resolution table of the last commit record, then the other transaction has modified the data of the row, and then to determine the transaction conflict.

But there is a big problem with this conflict resolution table, which is capacity. TSO is not likely to store up-to-date commit records for all the rows in history (that's about the size of another hbase cluster ...). , especially in the case of performance, the conflict resolution table is best to be fully accommodated in memory, so the structure of this table, in the OMID is carried out in a phased process, only to save part of the data. In the specific storage of ROWID hash storage, on the one hand there is a hash collision problem, on the other hand, the total capacity size is limited. When you select a location based on a hash for a linear index, there is no space available, replacing the oldest commit record in the table. Summing up, this design, one is likely to cause false Conflict, and second, because the data change record is incomplete, may lead to the conflict detection of missing out some of the conflict.

The first problem, Omid, assumes that the probability of occurrence is very low and is not processed ( the years of paper on this part of the content completely did not mention, do not know whether to remove the hash of this step algorithm), the second problem, Omid introduced the concept of low water level to solve.

    • Lowwatermark Low water level

when a low water level is stored, a timestamp information is Omid at each Conflict Map The oldest record is replaced, the corresponding CTS will replace the original low-water timestamp information, all subsequent transactions committed, if the STS is smaller than the low water level, will be directly discarded, into the interrupt rollback process. This approach solves the problem of leakage judgment, but it can also lead to more false positives, and for this case, Omid that the time span of a transaction is not long, and that in a short life cycle of a transaction, even in the case of massive concurrent transactions, The Conflict Judgment Table page is sufficient to hold the historical change information for the data row during the corresponding time period, so the probability of miscarriage is very low. (basically think that the transaction is a second level, that is, does not involve a long time span of the transaction, in fact, very long transactions, will be Omid active interrupt)

    • finally there is a aborted timestamp mapping the data structure of the record, the main reason is also should be conflict the Map is truncated to determine whether a particular version of the data is valid (because the data itself has been written to hbase, regardless of the commit process), by logging the interrupted transaction to read the data.

    1. TSO to determine the success of the transaction, return to Client the sts-of the transaction CTs, start-commit timestamp mapping. The client writes a timestamp map to the shadow Cell of each row of data, which is an additional column of data for each row to identify the most recent successfully submitted data.

The decision to read the data in the transaction is based on this shadow the Cell mapping relationship should be handled based on the start timestamp of the transaction, to look for the CTS that is smaller than its previous completed transaction, and to obtain the corresponding STS based on that CTS (since the data submission time of the previous transaction is the CTS, However, the time stamp actually written to HBase is an STS, and then the corresponding data is read by the STS, presumably a process. It is not possible to read snapshot directly from the CTS, because there may be data for incomplete or failed transactions that are overwritten synchronously.

    1. If the transaction fails, the client rolls back the data, which deletes the STS version of the corresponding HBase table.

Performance

The existence of TSO as a single point can also become a bottleneck for the entire system, whether on capacity or throughput

in fact, Omid is also referring to some prior technical paper Tso's thought, the main theoretical contribution is basically to solve the single point of performance problems, as stated above, on the one hand in order to reduce the capacity of the conflict resolution table data truncation and the introduction of low water level to do auxiliary judgment solution, On the other hand, Omid has made some additional optimizations to improve throughput, such as introducing commit Map and Lowwatermark to read-only caching on the client side, reducing the communication overhead for clients and TSO.

Summary:

In general, in order to implement a lock-free distributed transaction, Omid introduces a TSO central node to judge the transaction conflict, but in terms of efficiency, the data of the conflicting transaction also writes the database, and when the conflict is monitored, the data should be rolled back. Therefore, efficiency may not be more efficient than the mechanism by which percolator through lock to deal with conflicts.

However, the starting point of Omid may not only because of efficiency, but to avoid the percolator lock mechanism to bring the invalid lock cleanup problem, there is no need to clean the lock, but also to a certain degree of evasion of the lock written to hbase delay, and the central node of the transaction conflict Judgment mechanism, Also helps to solve the problem of lock contention that may occur in locking scenarios.

but in practical implementations, transactional interactions are logically more complex, and the crash recovery mechanism may not be as simple as Omid's original plan. this from the early Omid ppt, to Omid paper, to Omid the latest wiki documents, in the transaction conflict judgment, Changes to specific scenarios such as whether or not to write hbase additional data (such as Shadow cell, before paper explicitly said no additional data to hbase, reducing one interaction to improve efficiency, But the implementation document on the wiki says that the shadow cell is used to determine whether the corresponding version of the data is valid for submission) in this large structural process has been adjusted, is clearly omid in the actual project implementation, efficiency, reliability, Results under the influence of many factors, such as achievable

As for locking V. S . Locking free which scheme is better, not through the actual business verification comparison of large-scale data, so I can not make any judgment, single from the completeness of the theoretical solution, the individual is slightly inclined to percolator implementation, but 2 phase to submit the lock between row logic, how to reduce the inter-lock competition, and so on, the specific implementation needs to be considered when the problem is estimated, this believe that millet in the realization and use of THEMIS will have some experience?

Reference:

Https://github.com/yahoo/omid/wiki

Paper: omid:lock-free Transactional support fordistributed Data Stores

Http://yahoo.github.io/omid/docs/hadoop-summit-europe-2013.pdf

Related systems

similar to the BigTable to implement distributed transactions on the percolator system, interested can see me write another study document: percolator Google's massive data incremental processing system


Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Quickly understand the distributed transaction scheme of Omid:yahoo on HBase

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.