The principle of HBase transaction and concurrency control mechanism

Source: Internet
Author: User

As an excellent non-memory database, HBase provides the same transactional concept as traditional databases, except that hbase transactions are row-level transactions that guarantee the atomicity, consistency, isolation, and durability of row-level data, commonly known as acid properties. In order to realize transactional characteristics, HBase employs various concurrency control strategies, including various locking mechanisms, MVCC mechanisms, and so on. This paper first introduces two kinds of synchronization mechanism of hbase based on lock implementation, then introduces the implementation of row lock and the application scenarios of various read and write locks, and finally introduces the implementation strategy of MVCC mechanism.

HBase synchronization mechanism

HBase provides two synchronization mechanisms, one that is based on the countdownlatch implementation of the mutex, and the common usage scenario is the row lock held when the row data is updated. The other is a read-write lock based on the Reentrantreadwritelock implementation, which can add Read-lock or Write-lock to the critical resource. Where Read-lock allows concurrent read operations, and Write-lock is a complete mutex operation.

Countdownlatch

In Java, Countdownlatch is a synchronous helper class that allows one or more threads to block wait before completing an operation performed by a group of other threads. Countdownlatch with the given count initialization, the core two methods are countdown () and await (), the former can achieve a given count countdown, the latter is the wait count countdown to 0, if not reached 0, has been blocking the wait. Combined with thread-safe map containers, based on the test-and-set mechanism, Countdownlatch can implement the basic mutex, the principle is as follows:

1. Initialize: Countdownlatch initialization count is 1

2. Test procedure: The thread first tries to insert the critical resource as key,latch as value in a thread-safe map. If the return fails, indicating that the other thread has already held the lock, call the await method to block the latch and wait for other threads to release the lock;

3. Set procedure: If the return succeeds, it means that the lock has been held and the other threads must have failed to insert. After holding the lock, perform various operations, release the lock after execution completes, release the lock first to remove the corresponding keyvalue in the map, and then call the countdown method of latch, which will reduce the count by 1, and the other blocking threads will be awakened after 0.

Reentrantreadwritelock

Read-write locks are divided into reading and write locks, which provide higher parallelism than mutex locks. Read locks allow multiple threads to occupy a lock resource concurrently in read mode, while a write lock can only be occupied by one thread in write mode. If the read-write lock is a write-lock state, all threads attempting to occupy the lock will be blocked until the lock is released, and all other read requests to the lock will be executed in parallel if the read locking state, but the write request will be blocked. It is obvious that read-write locks are suitable for scenarios where there is less reading, and because read locks can be shared, write locks are exclusive to one thread, and read and write locks are also known as shared-exclusive locks, which are often seen s-locks and X-locks.

In Java, Reentrantreadwritelock is the implementation class for read-write locks, which have two methods Readlock () and Writelock () to obtain read and write locks, respectively.

Implementation of a row lock in HBase

HBase uses row locks to achieve the atomicity of updates, either all of the updates succeed or fail. All updates to HBase row-level data require that row locks be fetched first, and released after the update is complete, waiting for other threads to get it. Therefore, the update operation for the same row of data in HBase is a serial operation.

Row lock related data structures

As shown, the main structure associated with the HBase row lock is Rowlock and rowlockcontext two classes, where the Rowlockcontext class stores the row lock-related contextual information, including the lock thread, The lock object and the Countdownlatch object, which can implement the mutex, and so on, Rowlockcontext is a property of Rowlock, in addition, Rowlock contains the release field characterizing whether the row lock has been released. The specific fields are as follows:

Update lock Process

1. First use Rowkey and its own thread object to generate the row lock context Rowlockcontext object

2. Call Rowkey as the Key,rowlockcontext object as value Putifabsert the method is written to the global map. The uniqueness of key ensures that there is at most one rowlockcontext in the map. The Putifabsent method returns a Existingcontext object that represents the value of key in the map before the key is inserted, based on whether the existingcontext is null, whether it is a thread created by itself, Can be divided into the following three kinds of situations:

(1) The Existingcontext object is null, indicating that the row lock is not held by another thread and can hold the lock based on the context object created
(2) Existingcontext is the creation of its own thread, which means that its thread has created the Rowlockcontext object and holds the lock directly using the existing Rowlockcontext object. This situation occurs in the bulk update thread, where a bulk update may be updated several times over a row of data, and a row lock that needs to hold the row's data multiple times is allowed in HBase.
(3) Existingcontext is created by another thread, the thread blocks the lock held in this context until the holding row lock is freed or the blocking timeout occurs. If a row lock is released, the thread will re-compete to write the global map and hold the row lock once the competition is successful, or continue to block. If the block times out, an exception is thrown and no longer competes for the lock.

Release process

After the thread update completes, you must perform a row lock release operation in the Finnally method, called the Rowlock.release () method, which performs the following two operations:

1. Remove the rowlockcontext corresponding to the row from the Lockedrows global map

2. Call the Latch.countdown () method to wake up other threads that are blocking waiting for the row lock on await

Use of read-write locks in HBase

In addition to using mutex locks for row-level data consistency, HBase uses read-write locks to implement store-level operations and concurrency control for region-level operations. Like what:

1. Region update read-write Lock: HBase will add a region-level read lock (shared lock) before performing the data update operation, and all update operations threads will not block each other; HBase adds a region-level write lock (exclusive lock) when Memstore data is dropped on the disk. Therefore, when the Memstore data is dropped, the Data update operation thread (Put operation, append operation, delete operation) blocks waiting until the write lock is released.

2. Region Close Protection Lock: HBase will first add a region-level write lock (exclusive lock) when performing the close operation and the split operation, blocking other operations on the region, such as compact operations, flush operations, and other update operations, These operations will hold a read lock (shared lock)

3. Store Snapshot protection Lock: HBase performs the flush Memstore process first based on Memstore snapshot, which adds a Store-level write lock (exclusive lock), To block various updates to the Memstore by other threads, and the same when snapshot is cleared, a write lock is added to block other updates to the Memstore.

Implementation of MVCC mechanism in HBase

As mentioned above, HBase provides row and read-only locks for row-level data, store-level, and region-level concurrency control, respectively. In addition, HBase provides a MVCC mechanism for data read and write concurrency control. MVCC, a multi-version concurrency control technology, which makes the transaction engine no longer simply use row locks to achieve data read and write concurrency control, instead, the row lock and the row of multiple versions of the combination of a simple algorithm can be achieved by non-locking read, and greatly improve the concurrency performance of the system. HBase uses row lock + MVCC to ensure efficient concurrency and read and write data consistency.

Introduction to the MVCC mechanism

Before we understand how HBase implements MVCC, we need to first understand how the update operations that are currently based only on row lock implementations have an impact on read requests. Data Update timing for HBase-based row-lock implementations:

The data update process is simply described (subsequent articles will give an in-depth introduction to hbase data writes), and in short, data updates can be divided into the following phases: Acquiring row locks, updating Wal, writing data to the local cache memstore, releasing row locks.

As shown, there are two update operations on the same row of data before and after each. If the second update process has a read request after updating the column cluster CF1 to T2_CF1, the first column of data to be read will be the second updated data t2_cf1, but the second column data is the first updated data t1_cf2, it is clear that Adding a row lock for a more row operation will result in inconsistent read data. The simplest solution to data inconsistency is to read and write threads that are common to a row lock, which guarantees mutual exclusion between reads and writes, but the simultaneous preemption of a row lock by a read-write thread can greatly affect performance.

To do this, HBase uses the MVCC solution to avoid reading threads to acquire row locks. MVCC solution to the above data update operation timing and read operations have made some corrections, the main addition of a write sequence number and read the serial number, in fact, is the version number of the data. The corrected update operation sequence is:

As shown, the revised update operation mainly adds a ' get write sequence number ' and ' End write sequence number ' two steps, and each cell data write Memstore operation will carry the write sequence number. What kind of correction is required to read the request? HBase's approach is as follows:

(1) A read sequence number is assigned at the beginning of each read operation, called the read point
(2) The value of the read point is the largest integer in all the write operation completion sequence numbers
(3) The result of a read operation is a collection of all cell values corresponding to the read point

As shown, the first update gets a write sequence number of 1, and the second update gets a write sequence number of 2. When a read request comes in, the maximum integer in the write operation completion sequence is WN = 1, so the corresponding read point is WN = 1, and the read result is a set of all the cell values corresponding to the WN = 1, which is T1_CF1 and T1_CF2, which allows for a lock-free reading of the consistent data.

MVCC implementation in HBase

In HBase, the concrete implementation class for MVCC is Multiversionconsistencycontrol, which maintains two long variables, a WriteEntry object, and a writequeue queue:

1. Long memstoreread: Records the current global read point, and the read request comes in first to get the read point

2. Long memstorewrite: Records the current global write sequence number, according to which it assigns a new write sequence number to the next update thread

3. WriteEntry: Record the update operation of the write ordinal object, mainly contains two variables, one is Writenumber, indicating the write ordinal; one is a Boolean type of completed that indicates whether the update is complete

4. Writequeue: A collection of write sequence objects for all current update operations

Get Write Sequence number

According to the update data sequence diagram above, the update thread acquires a row lock and needs to get the write sequence number, the corresponding method is Beginmemstoreinsert, the method will memstorewrite plus 1, generate WriteEntry object and insert into the queue Writequeue A WriteEntry object is returned. Note: The generated WriteEntry object contains the write ordinal writenumber, and the update thread sets the Writenumber to a property of the cell data.

End Write Sequence number

After the data update is complete, the update thread calls the Completememstoreinsert method to update the WriteEntry object and the Memstoreread variable before releasing the row lock, which is divided into the following two steps:

1. First mark the WriteEntry object as ' completed ' and then move the global read point Memstoreread as much as possible forward. The forward algorithm iterates through all the WriteEntry objects in the queue writequeue, removing writeentry that have been marked as completed until an incomplete writeentry is encountered. Finally, the Memstoreread variable is updated to the latest completed writenumber.

2. Note that the above Memstoreread variable may not be equal to the writenumber of the current update thread, in which case the update operation on the data is not visible to the user. In order for the update to be visible to the user after the update is complete, wait for the memstoreread variable to move forward to the witenumber of the current update thread. So it blocks the current thread and waits for the WriteEntry object of the other thread to be marked as ' completed ' until Memstoreread equals the writenumber of the current thread.

Summarize

HBase provides a variety of locking mechanisms and MVCC mechanisms to ensure data atomicity, consistency, and so on, where the use of mutex implementation of row locks to ensure the atomicity of row-level data, using the read-write lock provided by the JDK to achieve the store-level, region-level data consistency, while using row lock + The MVCC mechanism achieves data consistency in high-performance, non-locked-read scenarios.

The principle of HBase transaction and concurrency control mechanism

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.