HBase learning Summary (3): HBase data model and working mechanism, hbase Model
I. HBase Data Model
The logical entities in HBase mode include:
(1)Table): HBase uses tables to organize data. The table name is a String consisting of characters that can be used in the file system path.
(2)Row): In the table, data is stored by row. The row is uniquely identified by the rowkey. The row key has no data type and is always regarded as a byte array byte .
(3)Column family): Data in the row is grouped by the column family. The column family also affects the physical storage of HBase data. Therefore, they must be defined beforehand and cannot be easily modified. Each row in a table has the same column family, although the row does not need to store data in each column family. The column family name is a String consisting of characters that can be used in the file system path.
(4)Column qualifier): Data in the columnfamily is located by column delimiters or columns. The column qualifier does not need to be defined beforehand, and the column qualifier does not need to be consistent between different rows. Like the row key, the column qualifier does not have a data type and is always regarded as a byte array .
(5)Cell): Determines a unit together with the row key, column family, and column qualifier. The data stored in a unit is called a value ). The value does not have any data type. It is always regarded as a byte array byte .
(6)Version): The unit value has a time version. The time version is identified by a timestamp and is a long value. If no time version is specified, the current timestamp is used as the basis for the operation. HBase retains the unit value. The number of time versions is configured based on the column family. The default number is three.
Each data value of HBase is accessed using coordinates. The complete coordinates of a value include the row key, column family, column qualifier, and time version. Because all coordinates are regarded as a whole, HBase can be considered as a key-value database.
For example, in the HBase learning summary (2): HBase introduction and basic operations (http://blog.csdn.net/zhouzhaoxiong1227/article/details/46682291:
(1) The table name is "mytable ".
(2) The rows are "first", "second", and "third ".
(3) The column family is "cf ".
(4) The column delimiters are "info", "name", and "nation ".
(5) The unit values are "hello hbase", "zhou", and "China ".
(6) The timestamps are "1435548279711", "1435548751549", and "1435548760826 ".
Ii. HBase Working Mechanism
1. HBase write path
In HBase, the internal processes of adding new rows or modifying existing rows are the same. By default, the write operation is written to two locations: write-ahead log (HLog) and MemStore. By default, HBase records write actions in these two locations to ensure data persistence. The write action is considered complete only when the changes in both locations are written and confirmed. The write process 1 is shown in.
Figure 1 HBase simultaneously writes data to WAL and MemStore
MemStore is the write buffer in the memory. Data in HBase is accumulated here before being permanently written to the disk. When MemStore is filled up, the data is flushed to the hard disk to generate an HFile. HFile is the underlying storage format used by HBase. HFile corresponds to the column family. A column family can have multiple hfiles, but an HFile cannot store data of multiple column families. Each column family has a MemStore on each node of the cluster. MemStore generates HFile process 2.
Figure 2 HFile generated by MemStore
If MemStore has not been flushed, the server will crash, and data not written to the hard disk in the memory will be lost. HBase writes WAL before the write operation is completed. Each server in the HBase cluster maintains a WAL to record changes. WAL is a file on the underlying file system. The write action is considered successful until the new record of WAL is successfully written. This ensures that HBase and the file system supporting HBase are persistent. In most cases, HBase uses Hadoop Distributed File System (HDFS) as the underlying file system.
If the HBase server is down, data that has not been written to HFile from MemStore will be restored by playback of WAL. You do not need to execute it manually. HBase's internal mechanism involves the restoration process. Each HBase server has a WAL, which is shared by all tables on this server (and their columnfamily.
It is worth noting that the risk of data loss increases when the RegionServer fails without writing data to WAL. If WAL is disabled, HBase may not be able to recover data when a fault occurs, and all written data that has not been flushed to the hard disk will be lost.
2. HBase read path
If you want to quickly access data, the general principle is to keep the data in order and store it in the memory as much as possible. HBase achieves these two goals. In most cases, read operations can be performed within milliseconds. The HBase read action must be reconnected to the HFile on the hard disk and the data in MemStore in the memory. HBase uses the LRU (least recently used algorithm) caching technology for read operations. This cache is also called BlockCache, and MemStore is in a JVM heap. BlockCache is designed to save frequently accessed data that is read from the memory in HFile to avoid hard disk reading. Each column family has its own BlockCache.
Understanding BlockCache is an important part of optimizing HBase performance. Block in BlockCache is the data unit that HBase reads from the hard disk once. HFile is physically stored in the form of a Block sequence plus the index of these blocks. This means that to read a Block from HBase, you must first search for the Block on the index and then read it from the hard disk. Block is the minimum data unit for creating an index and the minimum data unit for reading from the hard disk. The Block size is set by the column family. The default value is 64KB. You may increase or decrease the value based on the Application scenario. If the Block size is small, the index becomes larger, which consumes more memory. If the Block size is larger, the index size decreases and the index size decreases, thus saving the memory.
Read a row from HBase. First, check the queue in which MemStore is waiting for modification. Then, check BlockCache to see if the Block containing the row has been recently accessed, and finally access the corresponding HFile on the hard disk. The entire reading process is shown in step 3.
Figure 3 HBase reading process
Note: HBase stores snapshots of MemStore flushing at a certain time. Data of one full row may be stored in multiple hfiles. To read the complete row, HBase may need to read all hfiles containing the row information.
3. HBase merge
The DELETE command does not delete the content immediately. It only marks the deletion of the record. That is to say, a "tombstone" (tombstone) record for that content is written in as a marker for deletion. The tombstone record is used to indicate that the deleted content cannot return results in the get and scan commands. Because HFile files cannot be changed, these tombstone records will not be processed until a large merge is executed, and the space occupied by deleted records will be released.
Merge is divided into major compaction and minor compaction ). The two types will reorganize the data stored in HFile. Small merge combines multiple small hfiles to generate a large HFile. Because reading a complete row may reference many files, limiting the number of hfiles is very important to read performance. During merge, HBase reads the content of multiple existing hfiles and writes the records to a new file. Then, set the new file to the active state and delete all old files that constitute the new file. HBase determines which files to merge Based on the file number and size. The starting point of the small merge design is to slightly affect HBase performance, so there is an upper limit on the number of hfiles involved. These can be set. Figure 4 shows the small merge.
Figure 4 small merged
Large merge processes all hfiles of a column family for a given region. After the big merge is completed, all hfiles of this column family are merged into one file. You can manually trigger a large merge of the entire table (or a specific region) from the Shell. This action is quite resource-consuming and should not be used frequently. On the other hand, small merge is lightweight and can happen frequently. Large merge is the only opportunity for HBase to clear deleted records. Because we cannot ensure that the deleted record and the tombstone mark record are in an HFile, and the large merge can ensure that two types of records are accessed at the same time.
My public account: zhouzxi. Please scan the following QR code:
Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.