hbase--about Region

Last Update:2018-07-26 Source: Internet

Author: User

Tags md5 encryption zookeeper

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Region

Region is the basic unit of HBASE data management. The move of the data, the balance of the data and the split of the data are all operated according to the region.

Region stores the real data for this user, and in order to manage the data, HBase uses Regionsever to manage region.

Addressing Process

The general process of data addressing is as follows, please refer to:

        Zookeeper Hbase:meta Table Table +--------+ +--------------        +               +--------------+
        |  |              ----------> | |              ---+          |
        |          +--------+               +--------------+    |              +--------------+ Hbase:meta |    |          |              |
        |          Location +--------------+ |              +--------------+
                                 |    |     +-----> |
                                 Row |              +--------------+ +--------------+ Row per table region |
                                                                |              +--------------+
                                                                |
                                                                |
                             +--------------+                                   |
                                                                | +--------------+

1. The location that manages Hbase:meta as master on zookeeper node nodes

2. The client obtains the address of region server via zookeeper

3. After the region information is obtained, the information of the data can be obtained

4. Client returns query results
Region name

HBase's region name is made up of the following three sections:

Usertablename +, + Startkey +, + RegionID

And RegionID is randomly generated by apache.org, specifically TIMESTAMP+.+MD5

Like what:

test1,r6786520,1456410376247.fc9bdcb4f88aec2e64b393fece99cf0e.

test1:Table name

r6786520: startkey

1456410376247.fc9bdcb4f88aec2e64b393fece99cf0e: regionid

1456410376247:The numeric type of the timestamp type that turns long

fc9bdcb4f88aec2e64b393fece99cf0e: ID generated by MD5 encryption algorithm

Number of Region

In general, if the number of region without prior use of hbase Shell to express the definition of the system, the number of region, generally only 3:

1. Hbase:meta

2. Hbase:namespace

3. Userregion

Because the default region size is 10G, in a small environment, the amount of data is difficult to quickly reach the threshold of data splitting.

Express to specify, region number can be based on the trend of their own business to achieve a select peak, so in addition to the design of an excellent rowkey, the data distribution is more balanced, the performance of the entire cluster is the best.

Doubts:

1. In the case of specifying region number beforehand, the Startkey chosen by region Partition point is based on what choice, this does not understand. The actual test found that when the region number was specified as 5 o'clock, the distribution of Startkey~endkey was as follows:

region1:-#INF ~33333333

region2:33333333~66666666

region3:66666666~99999999

region4:99999999~CCCCCCCC

region5:cccccccc~+# Inf

Rowkey

HBase data distribution and the operation of the data are based on rowkey to divide, if the rowkey design unreasonable, then the data will be distributed on a region, resulting in uneven load, IO request intensified, the user's experience felt an instant decline, the delay increased. Therefore, the general Rowkey does not recommend the use of timestamp, letters and other mixed, preferably with a hash hase, the use of MD5, such as the generation, so that the data distribution is balanced.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More