HBase learning Summary (4): How HBase works and how hbase works
I. Split and allocate large tables
HBase tables are composed of rows and columns. HBase tables may contain billions of rows and millions of columns. The size of each table may reach TB or even PB level. These tables are split into smaller data units and allocated to multiple servers. These smaller data units are called region. The server hosting region is called RegionServer. A table consists of multiple smaller region values, as shown in 1.
Figure 1 multiple smaller region forms a table
RegionServer and HDFS DataNode are typically configured in parallel on the same physical hardware, as shown in figure 2. RegionServer is essentially an HDFS client that stores/accesses data. The master process assigns region to the RegionServer. Each RegionServer generally hosts multiple region instances.
Figure 2 RegionServer and DataNode are configured in parallel on the same host in Typical Cases
Considering that the basic data is stored on HDFS, all clients can access it in a namespace. All regionservers can access the same file in the file system, so RegionServer can host any region, as shown in 3. Using DataNode and RegionServer parallel configuration, RegionServer can theoretically use local DataNode as the main DataNode for read and write operations.
Figure 3 RegionServer hosts
The size of a single region is determined by the configuration parameter HBase. hregion. max. filesize in the hbase-site.xml file. When a region size is greater than this value, it is split into two region.
Ii. How to Find region
When a region is assigned to the RegionServer, how does the client application know its location?
HBase has two special tables-ROOT-And. META., which are used to find the location of region in various tables. -ROOT-And. META. will also be split into region, where-ROOT-will never split more than one region.. META. can be split into many region as needed like other tables.
When the client application needs to access a row, it first looks for the-ROOT-table to find the region in charge of the row. -ROOT-point to the region of the. META. Table to find the answer .. META. The table consists of the entry address. The client application uses this entry address to determine which RegionServer hosts the region to be searched. This search process is like a 3-Layer Distributed B + tree (4). the-ROOT-table is the-ROOT-node of the B + tree ,. META. region is the leaf of the-ROOT-node, and the region of the User table is. META. the leaves of region.
Figure 4-ROOT-,. META., and B + Tree View of the User table
In Figure 4, the-ROOT-table contains only one region hosted on the RegionServer RS1 ;. META. the table contains three region groups hosted on RS1, RS2, and RS3. User table T1 and T2 contain three and four region groups on RS1, RS2, and rs3.
Figure 5 HBase tables are distributed on each RegionServer.
As shown in figure 5, RegionServer 1 (RS1) hosts the region R1 and. META. the region M2 of the table; RegionServer 2 (RS2) hosts the region R2, R3, and. META. the region M1 of the table; RegionServer 3 (RS3) only hosts-ROOT -.
3. How to find-ROOT-table
A system called ZooKeeper provides the entry point of the HBase system. ZooKeeper is a centralized service that maintains configuration information, naming services, provides distributed synchronization, and provides Group services. This is a highly available and reliable distributed configuration service.
The interaction between the client and the HBase system involves several steps. ZooKeeper is the entry point. The entire interaction process is shown in Figure 6.
Figure 6 interaction between the client and HBase
As shown in figure 6, the interaction steps are as follows:
Step 1: The client asks ZooKeeper, where is-ROOT?
Step 2: ZooKeeper replies to the client, and-ROOT-is on the RegionServer RS1.
Step 3: the client queries the-ROOT-table on RS1. Which. META. region can find row 00007 in Table T1?
Step 4: The-ROOT-Table reply client on RS1 can be found in. META. region M2 on the RegionServer RS3.
Step 5: The client queries. META. region M2 on RS3. On which region can we find row 00007 in Table T1 and which RegionServer provides services for it?
Step 6: The. META. region M2 on RS3 replies to the client, and the data is on region T1R3 on the RegionServer RS3.
Step 7: the client sends a message to the region T1R3 above RS3, which requires reading row 00007.
Step 8: The region T1R3 above RS3 returns the data to the client.
My public account: zhouzxi. Please scan the following QR code:
Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.