I. Segmentation and distribution of large tables
The tables in HBase are made up of rows and columns. Tables in HBase can be up to billions of rows and millions of columns. The size of each table can be up to terabytes, sometimes even petabytes. These tables are split into smaller units of data and then allocated to multiple servers. These smaller data units are called region. The server hosting region is called Regionserver. A table consists of several smaller region, shown in 1.
Figure 1 Several smaller region form a single table
Regionserver and HDFs Datanode are typically configured on the same physical hardware, as shown in 2. Regionserver is essentially an HDFS client that stores/accesses data on top of it. The primary (master) process allocates region to Regionserver, and each regionserver typically hosts multiple region.
Figure 2 Regionserver and Datanode are typically configured on the same host side
Given that the underlying data is stored on HDFS, all clients can access it under a single namespace. All Regionserver can access the same file in the file system, so regionserver can host any region,3 shown. by Datanode and Regionserver side-by-side configuration, in theory regionserver can use local datanode as the main datanode for reading and writing operations.
Figure 3 Regionserver hosting the region
The single region size is determined by the configuration parameter HBase.hregion.max.filesize in the Hbase-site.xml file, and when a region size becomes larger than this value, it is divided into two.
Second, how to find the region
when a region is assigned to Regionserver, how does the client app know where it is?
HBase has two special tables,-root-and. META., to find out where the region of the various tables is located. -root-and. META. will also be cut into region, where-root-will never slice more than one region,.meta. As with other tables, you can cut into many region as needed.
When a client app wants to access a row, it looks for the-root-table and finds out where to find the region responsible for a row. -root-points to the. META. Table region to find the answer. META. The table consists of a portal address that the client application uses to determine which Regionserver hosts the region to be located. This lookup process is like a 3-layer distributed B + Tree (4), and the-root-table is the-root-node of the B + tree. META. Region is the leaf of the-root-node, and the region of the user table is. META. The leaves of region.
Figure 4-root-、. META. and user table B + tree view
in Figure 4, the-root-table contains only one region, hosted on the Regionserver RS1; META. table contains 3 region, hosted on RS1, RS2, and RS3 The user table T1 and T2 contain 3 and 4 region respectively, respectively, on RS1, RS2 and RS3.
Figure 5 The table in HBase is distributed on each regionserver, as shown in
5, Regionserver 1 (RS1) hosts the region R1 and. META. Table T1 of the. Region M2;regionserver 2 ( RS2) The region R2, R3, and. META. Table that hosts the user table, Region M1;regionserver 3 (RS3) only hosts-root-.
third, how to find the-root-table
A system called Zookeeper provides an entry point for the HBase system. Zookeeper is a centralized service for maintaining configuration information, naming services, providing distributed synchronization, and providing packet services. This is a highly available and reliable distributed configuration service.
The client interacts with the HBase system in several steps, zookeeper is the entry point. The entire interactive procedure is shown in 6.
Figure 6 Interaction between the client and the HBase system
As you can see from Figure 6, the interaction steps are:
The first step: client asks where is zookeeper,-root-?
The second step: Zookeeper reply to the client,-root-on Regionserver RS1 above.
The third step: the client asks the-root-table on the RS1, which one. META. Region can find row 00007 in table T1?
Fourth step: RS1 on the-root-table reply to the client, on the Regionserver RS3. META. Region M2 can be found.
Fifth step: The client asks the RS3. META. Region M2, on which region can I find row 00007 in table T1 and which regionserver serves it?
Sixth step: On the RS3. META. Region M2 reply to the client, the data is on the region T1R3 above the Regionserver RS3.
Seventh step: The client sends a message to RS3 above the region T1R3, which requires reading line 00007.
Eighth step: RS3 the above region T1R3 to return the data to the client.
My public number: ZHOUZXI, please scan the following two-dimensional code:
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
HBase Learning Summary (4): How HBase Works