hbase--about Region

Source: Internet
Author: User
Tags md5 encryption zookeeper
Region

Region is the basic unit of HBASE data management. The move of the data, the balance of the data and the split of the data are all operated according to the region.

Region stores the real data for this user, and in order to manage the data, HBase uses Regionsever to manage region.


Addressing Process

The general process of data addressing is as follows, please refer to:

        Zookeeper Hbase:meta Table Table +--------+ +--------------        +               +--------------+
        |  |              ----------> | |              ---+          |
        |          +--------+               +--------------+    |              +--------------+ Hbase:meta |    |          |              |
        |          Location +--------------+ |              +--------------+
                                 |    |     +-----> |
                                 Row |              +--------------+ +--------------+ Row per table region |
                                                                |              +--------------+
                                                                |
                                                                |
                             +--------------+                                   |
                                                                | +--------------+

1. The location that manages Hbase:meta as master on zookeeper node nodes

2. The client obtains the address of region server via zookeeper

3. After the region information is obtained, the information of the data can be obtained

4. Client returns query results
Region name

HBase's region name is made up of the following three sections:

Usertablename +, + Startkey +, + RegionID

And RegionID is randomly generated by apache.org, specifically TIMESTAMP+.+MD5

Like what:

test1,r6786520,1456410376247.fc9bdcb4f88aec2e64b393fece99cf0e.

test1:Table name

r6786520: startkey

1456410376247.fc9bdcb4f88aec2e64b393fece99cf0e: regionid

1456410376247:The numeric type of the timestamp type that turns long

fc9bdcb4f88aec2e64b393fece99cf0e: ID generated by MD5 encryption algorithm


Number of Region

In general, if the number of region without prior use of hbase Shell to express the definition of the system, the number of region, generally only 3:

1. Hbase:meta

2. Hbase:namespace

3. Userregion

Because the default region size is 10G, in a small environment, the amount of data is difficult to quickly reach the threshold of data splitting.

Express to specify, region number can be based on the trend of their own business to achieve a select peak, so in addition to the design of an excellent rowkey, the data distribution is more balanced, the performance of the entire cluster is the best.

Doubts:

1. In the case of specifying region number beforehand, the Startkey chosen by region Partition point is based on what choice, this does not understand. The actual test found that when the region number was specified as 5 o'clock, the distribution of Startkey~endkey was as follows:

region1:-#INF ~33333333

region2:33333333~66666666

region3:66666666~99999999

region4:99999999~CCCCCCCC

region5:cccccccc~+# Inf


Rowkey

HBase data distribution and the operation of the data are based on rowkey to divide, if the rowkey design unreasonable, then the data will be distributed on a region, resulting in uneven load, IO request intensified, the user's experience felt an instant decline, the delay increased. Therefore, the general Rowkey does not recommend the use of timestamp, letters and other mixed, preferably with a hash hase, the use of MD5, such as the generation, so that the data distribution is balanced.




Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.