HBase's service system conforms to the master-slave structure, consisting of hregion (server)-hregionserver (server cluster)-hmaster (master server), can see that multiple hregion constitute a hregionserver, Hmaster manages all the hregion. All servers are managed and coordinated through zookeeper. Hmaster does not store data in HBase, and hbase logical tables may be divided into multiple hregion, stored in hregion clusters, and HBase stores mappings from data to hregion clusters
When a table's data exceeds its set value, HBase automatically divides the table into different regions, each containing a subset of all rows, and for the user, a table is a dataset, and we use the primary key to differentiate the data. Physically, a table is divided into a hregion block, we distinguish by table name + start/End primary key, a hregion will save a section of a table continuous data, from the start of the primary key to the end of the primary key a complete table format saved on multiple hregion above.
All data is usually present in HDFs, the user obtains this data through hregion, generally only one hregion server is running on a machine, and a section hregion is also only maintained by a hregion server (a hregion server maintains a segment of the Hregion, only one segment on a server) Hregionserver is divided into two parts (Hlog and Hregion):
Part is the log data that stores HBase, in the form of a first-write log
There are many hregion,hregion is stored real data, hregion inside there are many stores, each store is actually a column family of data, Every store has a piece of Memstore,Memstore is resident in memory, the data will be present in the Memstore, when the threshold reaches the data will be updated to StoreFile, each store has multiple storefile, it is the smallest storage unitThe delete update operation for data is not involved in hbase, and all of his update operations are performed in an append manner. Data deletion and update operations are performed at the time of data merging, and data merging occurs when the number of storefile in the store is exceeded, triggering a data merge operation that merges multiple storefile into one when the user needs to update the data. HBase will commit the data to the corresponding hregionserver, the data will be submitted to the log log, after the log is written, the Commint () call will not be returned to the client, if a hregionserver fails, Then the hregion that it maintains will be assigned to the new Hregionserver, and the log log will be divided according to the Hregion, and when the new Hregionserver is loaded hregion, the data is recovered according to the log logs.when a hregion becomes larger than its set threshold, Hregionserver calls Hregion's Closeandspilt (), splits the hregion into two, and reports which of the primary server usesHregionserver Store the new hregion. This process will be very fast, because the original two hreegion only retained the StoreFile file reference, split when the hregion will be in the service stop state, when the new hregion split up and delete the reference, the old hregion will be deleted. In addition, two hregion can merge two hregion into a new hregion by calling Hregion.closeandmeger () (some HBase versions require two servers to stop service when performing this operation)
Each hregionserver will communicate with Hmaster, Hmaster will tell hregionserver what hregion to maintain when a new hregionserver joins the Hmaster server, Hmaster will tell him to wait for the allocation number, when the hregion crashes, Hmaster will divide it to other hregionserver if an hbase can start multiple Hmaster services, Then use Zookeper to ensure that only one hmaster runs Hmaster main duties:
root table and meta table
Root table (Root data table): The metadata information for all hregion is stored, and the data of this table cannot be split, there is only one hregion. Meta table (metadata table): The mapping relationship between Hregion and Hregionserver is distinguished hregion generally use table name + PRIMARY key range, hregion is stored in continuous data, so the general use of primary key can be determined hregion, However, Hregion has a merge, split operation, it is possible to perform this operation in the event of a panic, it is possible to appear multiple copies of the same primary key and table name data. Using the primary key + table name at this time will not be able to determine which hregion. The best way to distinguish hregion is to use
table name + PRIMARY KEY + Unique ID (regionid)
, the data is metadata, the metadata itself is present in hregion, and the meta-table is the mapping relationship between Hregion and Hregionserver. The metadata will continue to grow, and in order to locate the metadata table, we place the metadata table in the root table, where the table holds all the metadata tables, and the table is not split.when HBase starts, the primary server scans the root table first (because the table has only oneHregion, so Hregion's name is written dead) assigns the table to the corresponding hregion.whenWhen the root table is assigned, it reads the name of the metadata table and the location of the metadata table, assigns the metadata table (metatable) to the different hregionserver, and then reads the metadata table to find the Hregion region information. Assign to a different hregionserverthe metadata table and each row of the data table contain a column family
Therefore, when the client gets the root data table, it does not need to access the primary server again. Because the root table contains the location of all the meta tables, the META table contains a list of the spatial regions of all users and the location of the Hregionserver, and the client can cache all known root tables and meta tables. The primary server is responsible for hregion the timeout, only scans the root and meta tables when HBase is started, and returns the Hregionserver location of the root table
Zookeeper stores the root table and the meta-table location, each machine will register an instance in zookeeper, zookeeper will monitor the state of these machines, when a machine fails, zookeeper will be the first perception, Then tell Hmaster then do the related processing, Zookeepe also responsible for hmaster recovery work, and ensure that at the same time, only one hmaster to provide services.
The architecture of HBase