Summary and architecture diagram of the hbase misunderstanding of individual learning source

Source: Internet
Author: User

the backup functionality of HDFs is not for HDFS-based projects such as HBase. If HBase needs to be backed up, you need to set up backup (snapshot) functionality for a long time.
Hmaster, Kafka and other non-main structures are not self-fulfilling elections, but rather a new master based on ZooKeeper's electoral strategy
HBase creates a table with a region that greatly affects insert performance
hfile Write, is divided into a block of a block of write, each block blocks of about 64KB, so that the data is conducive to random access, not conducive to continuous access, continuous access to large demand, you can set the size of block blocks larger.
 HBase writes the Memstore before writing the data, and then writes the Hlog successfully ,

every time Memstore writes the data, it will lock () finally{unlock ()} then the data cannot be accessed? NoNoNo actually copy the data to the snapshotsnapshot. This time is very short, the copy can access this kvset, the actual flush time, is the snapshot of the Kvset to flush off.




How is Put and delete handled on the service side?

The process is a bit messy, summarize the previous:

1, do the preparation work, instantiate variables

2. Check if the column family in put and delete is the same as the definition of the column family held in region.

3, to row lock, first calculate the hash value to do key, if the key is not on the lock, on a lock, and then calculate how many action to write, record to Numreadytowrite.

4, update the timestamp, the action inside all the KV time stamp update to the latest timestamp, it will also be the previous not running also updated together.

5, to the region to lock, after this point in time, it is not allowed to read, waiting time needs to be calculated according to the number of numreadytowrite.

6, after locking, the following is the play, that is, put, delete and so on. Create a batch number for these data written to Memstore.

7, the KV are written into the Memstore, and then calculate a new memstore after adding the data size addedsize.

8, the KV added to the log, the flag status is successful, if the user set the log is not written, it will not write to the log.

9. Add the log asynchronously first.

10. Release the lock created before.

11, synchronization log.

12, end the operation of the batch.

Final, synchronization logs are unsuccessful, and the operations in the Memstore are rolled back according to the batch.


There is no difference between put and delete operations, so how does it delete data?

back when the 4th step update timestamp, found some fishy, delete the situation executed Preparedeletetimestamps method, look at it. First judge whether it is the latest timestamp, only passed the Rowkey in, it is the latest,Delete constructor: all the versions before this point in time all the columns, we have to delete. A Get operation is done here, and thecontents of multiple versions of the column family are taken out, and if the quantity does not meet the expectations, there will be problems .

only after the compaction ., hbase files will become smaller,is it possible to read the data that has not been deleted before we make a get or scan operation before deleting it?









From for notes (Wiz)

Summary and architecture diagram of the hbase misunderstanding of individual learning source

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.