HBase row lock and MVCC Analysis

Source: Internet
Author: User

Four-part analysis:

    • Case scenarios
    • Process Analysis
    • 0.94-0.96 implementation Solution Analysis
    • Simulation Test and Analysis

I. Case studies

Sorry, the Weibo has been deleted by the author. View help: http://t.cn/zWSudZc | forward | favorites | comment

It is a different short chaincode. After the result is deleted, it will only operate on the same token, that is, it will be equal to zWSudZc.

Several operations are triggered:

Delete zWSudZc mid

Decr zWSudZc limit count

The problem is that the write operation is blocked on the rowKey zWSudZc.

If weibo feed uses HBase and uses mid as the rowKey, this problem also occurs for popular Weibo operations. Before analyzing this problem, we should first understand how HBase ensures write consistency:

Conflict prevention: Avoid distributed concurrent write operations. Route all write operations on specific data items to a single node (either the global master node or the partition master node ). To avoid conflicts, the database must sacrifice the availability in the case of network isolation. This method is often used in many systems that provide strong consistency assurance (such as most relational databases, HBase, and MongoDB ).

You can make the following conjecture when updating a single node:

  • The write operation locks the read lock.
  • Write operations are executed in a centralized manner, and the waiting time is long.

Ii. Process Analysis

CheckAndPut append increment operation in HRegion (HBase 0.94.X)

    • StartRegionOperation (lock. readLock (). lock ());
    • RowLock lock
    • UpdatesLock. readLock (). lock ()
    • Mvcc begion
    • Mvcc finish
    • UpdatesLock. unLock
    • CloseRegionOperation
    • Get scan
    • StartRegionOperation
    • MultiVersionConsistencyControl. setThreadReadPoint (this. readPt );
    • CloseRegionOperation

Three lock differences

Region lock updatesLock is ReentrantReadWriteLock. ReentrantReadWriteLock can be read multiple times. If a write lock is occupied, all other operations are blocked. UpdatesLock is occupied only when region flush is used. region lock does not contain writelock and is suspected to be useless. Rowlock is the ConcurrentHashMap <HashedBytes, CountDownLatch> type in MultiVersionConsistencyControl, and the variable name lockedRows is locked.

MVCC MultiVersionConsistencyControl

    • Manage read/write consistency of memstore. Use MVCC to make this set of increments/appends atomic to reads
    • 0.94 0.94.2 is to be implemented. TODO in increment append checkAnd (MVCC will be seen later)
    • 0.96 realized
    • Put operation: Currently, many operations are used in the project.
    • 0.94: HRegion internalPut

Iii. 0.94-0.96 implementation Solution Analysis

Medium 0.94

    • Increment append checkAndPut both use row locks and mvcc, but the internalPut called by put does not use row locks, and only mvcc is used.
    • Process:
    • StartRegionOperation (lock. readLock (). lock ());
    • RowLock lock
    • UpdatesLock. readLock (). lock ()
    • Mvcc begion
    • Mvcc finish
    • UpdatesLock. unLock
    • CloseRegionOperation

0.96:

Process:

(1) Acquire RowLock

(1a) BeginMVCC + Finish MVCC

(2) Begin MVCC

(3) Do work

(4) Release RowLock

(5) Append to WAL

(6) Finish MVCC

Wait for all prior MVCC transactions to finish-while we hold the row lock (so that we are guaranteed to see the latest state)

If the version is upgraded to 0.96, the increment operation may be slower due to MVCC intervention.

0.96 expected improvements:

Commiter also thinks two mvcc is not necessary to improve the process https://issues.apache.org/jira/browse/HBASE-7263

(1) Acquire RowLock

(1a) Grab + Release RowWriteLock (instead of BeginMVCC + Finish MVCC)

(1b) Grab RowReadLock (new step !)

(2) Begin MVCC

(3) Do work

(4) Release RowLock

(5) Append to WAL

(6) Finish MVCC

(7) Release RowReadLock (new step !)

In addition, the useless lockid allocation method of the client is removed.

Iv. Simulation Test and Analysis

  • Construct simulated code

HBaseInsertTest1 class. TestKeyValueSkipListSet is the KeyValueSkipListSet that extracts HBase as the public class and stores data using

Package com. daodao. hbase; import org. apache. hadoop. hbase. keyValue; import org. apache. hadoop. hbase. regionserver. multiVersionConsistencyControl; import org. apache. hadoop. hbase. util. bytes; import java. util. concurrent. *; import java. util. concurrent. atomic. atomicInteger; import java. util. concurrent. atomic. atomicLong; import java. util. concurrent. locks. reentrantReadWriteLock;/*** Created with IntelliJ IDEA. ** @ Author guanpu * Date: 13-1-9 * Time: * analyze 0.94 insert operation performance */public class HBaseInsertTest1 {volatile TestKeyValueSkipListSet kvset; final synchronized lock = new ReentrantReadWriteLock (); final ReentrantReadWriteLock updatesLock = new synchronized (); private final MultiVersionConsistencyControl mvcc = new MultiVersionConsistencyControl (); private static AtomicInteger fi NishedCount; private static AtomicLong mvccTime = new AtomicLong (0l); private static AtomicLong rowlockTime = new AtomicLong (0l); private static AtomicLong lockTime = new AtomicLong (0l ); private static AtomicLong updateLockTime = new AtomicLong (0l); private static AtomicLong insertTime = new AtomicLong (0l); private static AtomicLong releaseTime = new AtomicLong (0l); private final ConcurrentHashMap <String, CountDownLatch> lockedRows = new ConcurrentHashMap <String, CountDownLatch> (); public HBaseInsertTest1 () {kvset = new TestKeyValueSkipListSet (new KeyValue. KVComparator (); finishedCount = new AtomicInteger (0);} class HBaseInsertTask implements Runnable {public void run () {for (int I = 0; I <100000; I ++) {String key = "key" + I; long time = System. nanoTime (); MultiVersionConsistencyCo Ntrol. writeEntry localizedWriteEntry = null; try {lock. readLock (). lock (); // like startRegionOperation do lockTime. set (lockTime. get () + (System. nanoTime ()-time); time = System. nanoTime (); Integer lid = getLock (key); // get rowKey lock lockTime. set (System. nanoTime ()-time); time = System. nanoTime (); updatesLock. readLock (). lock (); updateLockTime. set (updateLockTime. get () + (System. nanoTime ()-ti Me); time = System. nanoTime (); localizedWriteEntry = mvcc. beginMemstoreInsert (); mvccTime. set (mvccTime. get () + (System. nanoTime ()-time); time = System. nanoTime (); kvset. add (new KeyValue (Bytes. toBytes (key), Bytes. toBytes ("f"), Bytes. toBytes ("column"), 1l, Bytes. toBytes (1l); insertTime. set (insertTime. get () + (System. nanoTime ()-time); time = System. nanoTime (); mvcc. completeMemstoreInsert (loc AlizedWriteEntry); mvccTime. set (mvccTime. get () + (System. nanoTime ()-time);} catch (Exception e) {System. out. println (e);} finally {time = System. nanoTime (); updatesLock. readLock (). unlock (); CountDownLatch rowLatch = lockedRows. remove (key); rowLatch. countDown (); lock. readLock (). unlock (); releaseTime. set (releaseTime. get () + (System. nanoTime ()-time) ;}} finishedCount. set (finishedCount. get () + 1); return;} private Integer getLock (String key) {CountDownLatch rowLatch = new CountDownLatch (1); // loop until we acquire the row lock (unless! WaitForLock) while (true) {CountDownLatch existingLatch = lockedRows. putIfAbsent (key, rowLatch); if (existingLatch = null) {break;} else {try {if (! ExistingLatch. await (30000, TimeUnit. MILLISECONDS) {System. out. println ("some thing wrong in waiting"); return null ;}} catch (InterruptedException ie) {// Empty }}} return 1 ;}} private class DaodaoTestWatcher implements Runnable {@ Override public void run () {long time = System. nanoTime (); while (finishedCount. get ()! = 50) {} System. out. println ("cost time:" + (System. nan otime ()-time)/1000000000.0); System. out. println ("cost time: mvcc" + mvccTime. get ()/1000000000.0/50); System. out. println ("cost time: lock" + lockTime. get ()/1000000000.0/50); System. out. println ("cost time: update" + updateLockTime. get ()/1000000000.0/50); System. out. println ("cost time: rowlock" + rowlockTime. get ()/1000000000.0/50); System. out. println ("cost time: release" + releaseTime. get ()/1000000000.0/50) ;}} public void test () {ExecutorService executorService = Executors. newFixedThreadPool (200); for (int I = 0; I <50; I ++) executorService.exe cute (new HBaseInsertTask (); executorService.exe cute (new DaodaoTestWatcher ());} public static void main (String [] args) {new HBaseInsertTest1 (). test ();}}

 

Time consumed:

cost time:24.727145cost time: mvcc22.98698292cost time: lock0.0cost time: update0.009690879999999999cost time: rowlock0.0cost time: release0.05001874

Remove mvcc

cost time:5.190751cost time:  mvcc0.0073236cost time:  lock0.0cost time:  update0.017533220000000002cost time:  rowlock0.0cost time:  release1.3753079

0.96 code, added after updatesLock. readLock (). lock:

                     time = System.nanoTime();                    // wait for all prior MVCC transactions to finish - while we hold the row lock                    // (so that we are guaranteed to see the latest state)                    mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());                    mvccTime.set(mvccTime.get() + (System.nanoTime() - time));

Time consumed:

cost time:43.04134cost time:  mvcc40.70520202cost time:  lock0.0cost time:  update0.00937416cost time:  rowlock0.0cost time:  release0.05023072

 

In 0.94, increment append checkAndPut both use row locks and mvcc, but the internalPut called by put does not use row locks, and only mvcc is used.

Optimization solution: for single-version services, you can change them to apply row locks to remove mvcc, And the write performance will be further improved.

 

If the rowkey is changed to a fixed single rowkey

Time consumed by version 0.94 (all of which are total time consumed ):

Cost time: 27.660935
Cost time: mvcc3.888678
Cost time: lock0.0
Cost time: insert9.319777
Cost time: update0.964697
Cost time: rowlock0.0
Cost time: release16.997803

However, it takes four times faster to change the key during HBase insertion than to change the key,

Run standalone test. The speed is basically the same. Performance consumption should be in the search for region or network transmission and needs further verification.

Summary:

    • The Update Time of region is mainly concentrated in MVCC.
    • For a single-version database, I think MVCC for various update operations can be removed, And the rowkey write lock can be obtained in advance in the modification operation to avoid MVCC in the full range of Region.
    • The overall process bottleneck of a single rowkey from the client to HBase also needs to be further explored in a real distributed environment.

 

---------------------------------------- Extended ----------------------------------

MySQL MVCC@ A Fei(Jun Wei)

MySQL5.6 and read-trasanction are optimized. @ Yangwm @ slow half-beat de knife @ Kai Pan cobain @ jolestar @ Wei 1984

Mvcc Principle

1. corresponds to row-level locks

Row-Level Lock pessimistic lock

R W

R y n

W n

MVCC

Save version

Update 10 v and read 9 v

Extended knowledge: Optimistic lock

Select-"update =" and then select to check whether there are any changes. If yes, rollback applies to cases with fewer conflicts.

Whether the redis server has implemented optimistic locks. ---- Do I need to lock the serial mode of a single thread?

2. innodb mvcc

Each record has two fields, tx_id rollback_point, for control.

Table:Row c1 c2 tx_id rollback_point

Rollback_point indicates the record of the previous version.

Mysql has four isolation levels: read onCommit (read uncommitted transactions) and read Committed (read only Committed data, judge from the current active transaction list, and trace back from the pointer), repeatable read (repeatable read), Serializable (Serializable, all statements plus select for update, background lock)

If the Read View is smaller than the active transaction value, it is Read normally. There is a gap between the Read View and it is correct when reading the intermediate version.

 

When the value is not Serializable, you must manually call

 

@ Wei 1984 http://boneylw.sinaapp.com /? P = 16 MVCC analysis can also be compared and read.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.