Analysis of HBase principle

Source: Internet
Author: User

This article belongs to reprint, the original link: http://www.aboutyun.com/thread-7199-1-1.html premise is that everyone at least understand the basic needs and components of hbase.

Start with the most familiar client-initiated requests, so that we can understand the principle of gradual understanding. For example, we launched a put request, and the client first needs to find the regionserver that needs to respond to the request. The Region->regionserver mapping is recorded by the HBase system table. META. Recorded. So we just need to know. META. The location of the table will tell you the scope of the key that each region responds to and the machine on which it resides. But. META. Table and what machines are there? This is again the master recorded by the-root-table where the-root-table is placed in zookeeper after the-root-table is allocated. So when we configure the client, we configure the location of the zookeeper, not the master location.

Why is it divided into-root-and. META.? This is because the region information itself may appear in many thousands of clusters in a cluster so. META. The table itself cannot save all user region information in a region, so it will split itself. And. META. The number of region in the table is relatively limited, so-root-is not split.

In summary, the client first request, take-root-and then through the request scope to find the corresponding. Meta., locate the specific region server in. Meta. And then send the request. -root-and. META. is available for caching.

Now that we've solved the problem of which RS the client should send the put to, we're going to send the request. When the region server receives the request, it saves the put data. This will have to say the HBase data model, HBase uses the Columnstore, the basic data structure for the LSMT log structure merge tree. A brief description of the idea is that the operation is recorded on the nodes in the tree and then timely merge the nodes so that the deletion of the key can eventually be reflected on a node, read the time will read the node with the corresponding operation of the key, return the value of the final key. It can be seen that LSMT is a data structure that transforms random read and write into sequential read and write, which is more suitable for sequential reading of scanning libraries, and less suitable for random reading.

So how does a put request relate to LSMT? First, when the region server receives the request, it first saves the operation (the KeyValue timestamp action type) to Hlog, then saves it to Memstore, and then returns the request that was successfully written. Where Memstore is saved in memory and flushed to the HDFs file after it is written. Hlog is designed to prevent the loss of data due to the inevitable loss of Memstore data when the RS fails, and the client can disable Hblog to speed up the write, but this is in exchange for data insecurity. As long as the memstore is brushed into hdfs each time, the oldest operation in the HDFs brush is judged and the old Hlog file is deleted by another thread based on this record.

Let's talk about the processing of memstore when it is full. When Memstore is full (each region's column family has a separate Memstore object but actually shared a single memory pool), the operations are distributed to each column family (store) in the region that corresponds to region. The store then saves the sequence of operations as a storage file (StoreFile).

In general, the important entity structure of Region Server is as follows: Regionserver:region = 1:n;region:store= 1:n;store:storefile = 1:n. For each column family of data files, the real machine is a lsmt leaf node, and each file holds the most recent operation for key in the column family.

When there are too many files in a column family, the compact is triggered, that is, the file is merged. HBase's compact is divided into two minor and major:minor are small-scale merge files, merging only the parts. The goal is to accumulate small files into large files. Because there is no full amount of data, the deletion of a key requires a retention tag and cannot be physically deleted. Majorcompact merges all the files in a column family into one, in order to make key changes and deletions, and ultimately physically. Because the major compact operates full-volume data for this column family, it can be physically deleted. However, because it is a full-volume data, it is time-consuming to execute, so hbase makes time-lapse restrictions on the major compact.

When the total file length in the store's store file collection is too large (exceeding the configured threshold), the region is split in half. Because split is in region, some column families are guilt-like split because the other column families are too large. So, from the rough view of this process, the put trigger Flush,flush will trigger compact,compact will trigger split. Of course, this is done in multiple threads and does not significantly block client requests.

The size of the store file is related to the size of the memstore, and once flush generates a store file in a column family. So the bigger the memstore, the more opportunities to generate big store file. When put is uneven, there will be more small store file in the column family, but more files will trigger the compact. Small file compact fast, so don't worry. Store file
------------------------------------------------
|block |
|----------------------------------------------|
|block |
...
| Meta |
|---------------------------------------------|
|blockindexes, and someKeyScope Information|
|---------------------------------------------|
|Bron Filtration|
-----------------------------------------------can roughly think of a storefile structure like this, the tail of the order and details are not very clear. A block consisting of multiple key value,key is ordered within the file. A key value record such as:<ignore_js_op>                              When I read the data, I send a GET request that will be converted to a scan within the region server. He will go to scan storefile in the related column family. The tail of the StoreFile contains the block index, Bron filter, update time, etc. so this can speed up file filtering that requires scan. So for a store file read is this: Determine whether the get request of the row key in the file saved data range, determine whether the GET request of the row key can be found from the Bron filter ( If the filter is a row-col filter it is also possible to determine whether to include a col that requires get, to determine whether the time range in a GET request is within the time range of the data saved by the file, to obtain the corresponding block index, to load the block into the block cache, and then scan Block; The number of version to be included in the GET request from the results of multiple store file, taking the first few to satisfy the number of version to be included in the GET request. Get can be seen as a special scan operation.

Must blockcache size is limited, there will be eliminated. Actually Blockcache is more appropriate for scan, because scan is generally a range sweep, and the row key in the block is ordered, so sequential reads are faster than random reads. The general hbase is difficult to adapt to high concurrency random reading, because blockcache this design itself, it is not suitable for caching random row key: Random reading is characterized by the reading of key uniform hash, this will make the read operation, fall on each block, The result is that each block is loaded into memory at the time of reading, and then it is eliminated quickly because the other blocks continue to load, and then swapped for it, rather wasting time.

The last two more important operations are open and close region. These two are commonly used in disaster tolerance and equalization.

Say close first. When normal close, flush Memstore and then notify Master close to end. When it's not properly closed, it's too late to flush. Master will know Regionsever hangs up through two ways, the heartbeat between ZK and region server.

Open is generally initiated by master. Master first finds the Hlog file that contains the region operation, then selects the region corresponding to the region directory, and then commands a region server open. Open repeats the action recorded in Hlog before loading the store and store file for region.

The more important principle is this. The principle is clear, then the analysis of the code, you can have a macro understanding.

Analysis of HBase Original understanding (Turn)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.