Four-part analysis:
- Process Analysis
- 0.94-0.96 implementation Solution Analysis
- Simulation Test and Analysis
I. Case studies
Sorry, the Weibo has been deleted by the author. View help: http://t.cn/zWSudZc | forward | favorites | comment
It is a different short chaincode. After the result is deleted, it will only operate on the same token, that is, it will be equal to zWSudZc.
Several operations are triggered:
Delete zWSudZc mid
Decr zWSudZc limit count
The problem is that the write operation is blocked on the rowKey zWSudZc.
If weibo feed uses HBase and uses mid as the rowKey, this problem also occurs for popular Weibo operations. Before analyzing this problem, we should first understand how HBase ensures write consistency:
Conflict prevention: Avoid distributed concurrent write operations. Route all write operations on specific data items to a single node (either the global master node or the partition master node ). To avoid conflicts, the database must sacrifice the availability in the case of network isolation. This method is often used in many systems that provide strong consistency assurance (such as most relational databases, HBase, and MongoDB ).
You can make the following conjecture when updating a single node:
- The write operation locks the read lock.
- Write operations are executed in a centralized manner, and the waiting time is long.
Ii. Process Analysis
CheckAndPut append increment operation in HRegion (HBase 0.94.X)
- StartRegionOperation (lock. readLock (). lock ());
- RowLock lock
- UpdatesLock. readLock (). lock ()
- Mvcc begion
- Mvcc finish
- UpdatesLock. unLock
- CloseRegionOperation
- Get scan
- StartRegionOperation
- MultiVersionConsistencyControl. setThreadReadPoint (this. readPt );
- CloseRegionOperation
Three lock differences
Region lock updatesLock is ReentrantReadWriteLock. ReentrantReadWriteLock can be read multiple times. If a write lock is occupied, all other operations are blocked. UpdatesLock is occupied only when region flush is used. region lock does not contain writelock and is suspected to be useless. Rowlock is the ConcurrentHashMap <HashedBytes, CountDownLatch> type in MultiVersionConsistencyControl, and the variable name lockedRows is locked.
MVCC MultiVersionConsistencyControl
- Manage read/write consistency of memstore. Use MVCC to make this set of increments/appends atomic to reads
- 0.94 0.94.2 is to be implemented. TODO in increment append checkAnd (MVCC will be seen later)
- 0.96 realized
- Put operation: Currently, many operations are used in the project.
- 0.94: HRegion internalPut
Iii. 0.94-0.96 implementation Solution Analysis
Medium 0.94
- Increment append checkAndPut both use row locks and mvcc, but the internalPut called by put does not use row locks, and only mvcc is used.
- Process:
- StartRegionOperation (lock. readLock (). lock ());
- RowLock lock
- UpdatesLock. readLock (). lock ()
- Mvcc begion
- Mvcc finish
- UpdatesLock. unLock
- CloseRegionOperation
0.96:
Process:
(1) Acquire RowLock
(1a) BeginMVCC + Finish MVCC
(2) Begin MVCC
(3) Do work
(4) Release RowLock
(5) Append to WAL
(6) Finish MVCC
Wait for all prior MVCC transactions to finish-while we hold the row lock (so that we are guaranteed to see the latest state)
If the version is upgraded to 0.96, the increment operation may be slower due to MVCC intervention.
0.96 expected improvements:
Commiter also thinks two mvcc is not necessary to improve the process https://issues.apache.org/jira/browse/HBASE-7263
(1) Acquire RowLock
(1a) Grab + Release RowWriteLock (instead of BeginMVCC + Finish MVCC)
(1b) Grab RowReadLock (new step !)
(2) Begin MVCC
(3) Do work
(4) Release RowLock
(5) Append to WAL
(6) Finish MVCC
(7) Release RowReadLock (new step !)
In addition, the useless lockid allocation method of the client is removed.
Iv. Simulation Test and Analysis
HBaseInsertTest1 class. TestKeyValueSkipListSet is the KeyValueSkipListSet that extracts HBase as the public class and stores data using
Package com. daodao. hbase; import org. apache. hadoop. hbase. keyValue; import org. apache. hadoop. hbase. regionserver. multiVersionConsistencyControl; import org. apache. hadoop. hbase. util. bytes; import java. util. concurrent. *; import java. util. concurrent. atomic. atomicInteger; import java. util. concurrent. atomic. atomicLong; import java. util. concurrent. locks. reentrantReadWriteLock;/*** Created with IntelliJ IDEA. ** @ Author guanpu * Date: 13-1-9 * Time: * analyze 0.94 insert operation performance */public class HBaseInsertTest1 {volatile TestKeyValueSkipListSet kvset; final synchronized lock = new ReentrantReadWriteLock (); final ReentrantReadWriteLock updatesLock = new synchronized (); private final MultiVersionConsistencyControl mvcc = new MultiVersionConsistencyControl (); private static AtomicInteger fi NishedCount; private static AtomicLong mvccTime = new AtomicLong (0l); private static AtomicLong rowlockTime = new AtomicLong (0l); private static AtomicLong lockTime = new AtomicLong (0l ); private static AtomicLong updateLockTime = new AtomicLong (0l); private static AtomicLong insertTime = new AtomicLong (0l); private static AtomicLong releaseTime = new AtomicLong (0l); private final ConcurrentHashMap <String, CountDownLatch> lockedRows = new ConcurrentHashMap <String, CountDownLatch> (); public HBaseInsertTest1 () {kvset = new TestKeyValueSkipListSet (new KeyValue. KVComparator (); finishedCount = new AtomicInteger (0);} class HBaseInsertTask implements Runnable {public void run () {for (int I = 0; I <100000; I ++) {String key = "key" + I; long time = System. nanoTime (); MultiVersionConsistencyCo Ntrol. writeEntry localizedWriteEntry = null; try {lock. readLock (). lock (); // like startRegionOperation do lockTime. set (lockTime. get () + (System. nanoTime ()-time); time = System. nanoTime (); Integer lid = getLock (key); // get rowKey lock lockTime. set (System. nanoTime ()-time); time = System. nanoTime (); updatesLock. readLock (). lock (); updateLockTime. set (updateLockTime. get () + (System. nanoTime ()-ti Me); time = System. nanoTime (); localizedWriteEntry = mvcc. beginMemstoreInsert (); mvccTime. set (mvccTime. get () + (System. nanoTime ()-time); time = System. nanoTime (); kvset. add (new KeyValue (Bytes. toBytes (key), Bytes. toBytes ("f"), Bytes. toBytes ("column"), 1l, Bytes. toBytes (1l); insertTime. set (insertTime. get () + (System. nanoTime ()-time); time = System. nanoTime (); mvcc. completeMemstoreInsert (loc AlizedWriteEntry); mvccTime. set (mvccTime. get () + (System. nanoTime ()-time);} catch (Exception e) {System. out. println (e);} finally {time = System. nanoTime (); updatesLock. readLock (). unlock (); CountDownLatch rowLatch = lockedRows. remove (key); rowLatch. countDown (); lock. readLock (). unlock (); releaseTime. set (releaseTime. get () + (System. nanoTime ()-time) ;}} finishedCount. set (finishedCount. get () + 1); return;} private Integer getLock (String key) {CountDownLatch rowLatch = new CountDownLatch (1); // loop until we acquire the row lock (unless! WaitForLock) while (true) {CountDownLatch existingLatch = lockedRows. putIfAbsent (key, rowLatch); if (existingLatch = null) {break;} else {try {if (! ExistingLatch. await (30000, TimeUnit. MILLISECONDS) {System. out. println ("some thing wrong in waiting"); return null ;}} catch (InterruptedException ie) {// Empty }}} return 1 ;}} private class DaodaoTestWatcher implements Runnable {@ Override public void run () {long time = System. nanoTime (); while (finishedCount. get ()! = 50) {} System. out. println ("cost time:" + (System. nan otime ()-time)/1000000000.0); System. out. println ("cost time: mvcc" + mvccTime. get ()/1000000000.0/50); System. out. println ("cost time: lock" + lockTime. get ()/1000000000.0/50); System. out. println ("cost time: update" + updateLockTime. get ()/1000000000.0/50); System. out. println ("cost time: rowlock" + rowlockTime. get ()/1000000000.0/50); System. out. println ("cost time: release" + releaseTime. get ()/1000000000.0/50) ;}} public void test () {ExecutorService executorService = Executors. newFixedThreadPool (200); for (int I = 0; I <50; I ++) executorService.exe cute (new HBaseInsertTask (); executorService.exe cute (new DaodaoTestWatcher ());} public static void main (String [] args) {new HBaseInsertTest1 (). test ();}}
Time consumed:
cost time:24.727145cost time: mvcc22.98698292cost time: lock0.0cost time: update0.009690879999999999cost time: rowlock0.0cost time: release0.05001874
Remove mvcc
cost time:5.190751cost time: mvcc0.0073236cost time: lock0.0cost time: update0.017533220000000002cost time: rowlock0.0cost time: release1.3753079
0.96 code, added after updatesLock. readLock (). lock:
time = System.nanoTime(); // wait for all prior MVCC transactions to finish - while we hold the row lock // (so that we are guaranteed to see the latest state) mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert()); mvccTime.set(mvccTime.get() + (System.nanoTime() - time));
Time consumed:
cost time:43.04134cost time: mvcc40.70520202cost time: lock0.0cost time: update0.00937416cost time: rowlock0.0cost time: release0.05023072
In 0.94, increment append checkAndPut both use row locks and mvcc, but the internalPut called by put does not use row locks, and only mvcc is used.
Optimization solution: for single-version services, you can change them to apply row locks to remove mvcc, And the write performance will be further improved.
If the rowkey is changed to a fixed single rowkey
Time consumed by version 0.94 (all of which are total time consumed ):
Cost time: 27.660935
Cost time: mvcc3.888678
Cost time: lock0.0
Cost time: insert9.319777
Cost time: update0.964697
Cost time: rowlock0.0
Cost time: release16.997803
However, it takes four times faster to change the key during HBase insertion than to change the key,
Run standalone test. The speed is basically the same. Performance consumption should be in the search for region or network transmission and needs further verification.
Summary:
- The Update Time of region is mainly concentrated in MVCC.
- For a single-version database, I think MVCC for various update operations can be removed, And the rowkey write lock can be obtained in advance in the modification operation to avoid MVCC in the full range of Region.
- The overall process bottleneck of a single rowkey from the client to HBase also needs to be further explored in a real distributed environment.
---------------------------------------- Extended ----------------------------------
MySQL MVCC@ A Fei(Jun Wei)
MySQL5.6 and read-trasanction are optimized. @ Yangwm @ slow half-beat de knife @ Kai Pan cobain @ jolestar @ Wei 1984
Mvcc Principle
1. corresponds to row-level locks
Row-Level Lock pessimistic lock
R W
R y n
W n
MVCC
Save version
Update 10 v and read 9 v
Extended knowledge: Optimistic lock
Select-"update =" and then select to check whether there are any changes. If yes, rollback applies to cases with fewer conflicts.
Whether the redis server has implemented optimistic locks. ---- Do I need to lock the serial mode of a single thread?
2. innodb mvcc
Each record has two fields, tx_id rollback_point, for control.
Table:Row c1 c2 tx_id rollback_point
Rollback_point indicates the record of the previous version.
Mysql has four isolation levels: read onCommit (read uncommitted transactions) and read Committed (read only Committed data, judge from the current active transaction list, and trace back from the pointer), repeatable read (repeatable read), Serializable (Serializable, all statements plus select for update, background lock)
If the Read View is smaller than the active transaction value, it is Read normally. There is a gap between the Read View and it is correct when reading the intermediate version.
When the value is not Serializable, you must manually call
@ Wei 1984 http://boneylw.sinaapp.com /? P = 16 MVCC analysis can also be compared and read.