A. HBase Data Model
The logical entities in HBase mode include:
(1) table: HBase uses tables to organize data. The table name is a string, consisting of characters that can be used in the file system path.
(2) row: In the table, the data is stored by row. Rows are uniquely identified by row keys (Rowkey). The row key has no data type and is always treated as a byte array of byte [].
(3) Row Family (column family): Data in rows is grouped by column family, and the column families also affect the physical storage of hbase data, so they must be defined beforehand and not easily modified. Each row in the table has the same column family, although rows do not need to store data in each column family. The column family name is a string, consisting of characters that can be used in the file system path.
(4) Column Qualifier: The data in the column family is positioned by column qualifiers or columns. Column qualifiers do not have to be defined beforehand, and column qualifiers do not have to be consistent between peers. Just like a row key, the column qualifier has no data type and is always treated as a byte array of byte [].
(5) Unit (cell): Row key, column family, and column qualifier determine a single cell. The data stored in the cell is referred to as the cell value (value). The value also has no data type and is always treated as a byte array of byte [].
(6) Time Version: The cell value is sometimes versioned. The time version is identified with a timestamp and is a long. When a time version is not specified, the current timestamp is the basis for the operation. The number of HBase reserved unit value Time versions is configured based on the column family, with a default number of 3.
Each data value for HBase is accessed using coordinates. The full coordinates of a value include the row key, column family, column qualifier, and time version. Because all coordinates are treated as a whole, hbase can be thought of as a key-value (Key-value) database.
For example, in the HBase Learning Summary (2): HBase Introduction and its basic operations (http://blog.csdn.net/zhouzhaoxiong1227/article/details/46682291):
(1) The table name is "MyTable". The
(2) line is "first", "second", and "third". The
(3) column family is "CF". The
(4) column qualifier is "info", "name", and "nation". The
(5) cell values are "Hello HBase", "Zhou", and "China". The
(6) timestamp is "1435548279711", "1435548751549", and "1435548760826".
Second, hbase working mechanism
1.HBase write path
in HBase, the internal process is the same whether you add a new row or modify an existing row. By default, writes are written to two places: pre-write Logs (Write-ahead Log,wal, also known as Hlog) and Memstore. The default way for HBase is to record write actions in these two places to ensure data persistence. The write action is considered to be complete only if the change information in these two places is written and confirmed. The write procedure is shown in 1.
Figure 1 HBase writes to both Wal and Memstore
Memstore is a write buffer in memory, where data accumulates in hbase before it is permanently written to disk. When the memstore is filled, the data is written to the hard disk, creating a hfile. Hfile is the underlying storage format used by HBase. hfile corresponds to a column family, a column family can have more than one hfile, but one hfile cannot store data for multiple column families. On each node of the cluster, each column family has a memstore. The memstore generates hfile as shown in procedure 2.
Figure 2 Memstore generate hfile
If the Memstore has not been brushed, the server crashes and the data that is not written to the hard disk in memory is lost. The answer to HBase is to write to Wal before the write action is complete. Each server in the HBase cluster maintains a Wal to record the changes that have occurred. Wal is a file on the underlying file system. The write action will not be considered successful until the new Wal record is successfully written. This ensures that hbase and the file system that supports it are durable. In most cases, HBase uses the Hadoop Distributed File System (HDFS) as the underlying file system.
If the HBase server goes down, no data written from Memstore to hfile can be recovered by playing back the Wal. You don't need to do it manually. The internal mechanism of hbase has a recovery process part to deal with. Each hbase server has a Wal, and all tables (and their column families) on this server share this Wal.
It is worth noting that not writing to Wal increases the risk of data loss in the case of a regionserver failure. When you close the Wal, HBase may not be able to recover data when it fails, and any write data that is not written to the hard disk will be lost.
2.HBase Read Path
If you want to quickly access data, the general principle is that the data is kept in order and kept in memory as much as possible. HBase achieves these two goals, and in most cases the read operation can be done in milliseconds. The HBase read action must be re-linked to the data in the hfile and in-memory memstore on the hard disk. HBase uses the LRU (least recently used algorithm) caching technique for read operations. This cache is also called Blockcache, and Memstore is in a JVM heap. Blockcache is designed to hold frequently accessed data that is read into memory from the hfile, avoiding hard disk reads. Each column family has its own blockcache.
Mastering Blockcache is an important part of optimizing hbase performance. The block in Blockcache is the unit of data that HBase reads from the hard disk once. hfile Physical Storage form is a block sequence plus the index of these blocks. This means that reading a block from HBase requires first looking at the block on the index and then reading it out of the hard disk. A block is the smallest unit of data that is indexed and the smallest unit of data read from a hard disk. The block size is set according to the column family, and the default value is 64KB. Depending on the usage scenario, you may be able to increase or dim the value. A smaller block can cause the index to become larger and consume more memory; The block becomes larger, which means fewer index entries and smaller indexes, thus saving memory. The
reads a line from HBase, first checks the queue that Memstore waits to modify, and then checks Blockcache to see if the block containing the row has been recently accessed and finally accesses the corresponding hfile on the hard disk. The entire reading is shown in procedure 3.
Figure 3 HBase read-In Process
Note that HBase holds a snapshot of the Memstore brush at some point, and a full row of data may be stored in multiple hfile. In order to read the full line, hbase may need to read all the hfile that contain the line information.
Merging of 3.HBase
Deleting a command does not delete the content immediately, it simply marks the deletion of the record. That is, a "tombstone" (tombstone) record for that content is written in, as a token of deletion. Tombstone records are used to flag deleted content and cannot return results in the Get and scan commands. Because the hfile file cannot be changed until a large merge is performed, the tombstone records are processed and the space occupied by the deleted record is freed.
The merger is divided into two types: large merge (major compaction) and small merge (minor compaction). Two kinds of data that will be re-stored in the hfile. Small merges combine multiple small hfile to generate a large hfile. Because reading a complete line can refer to many files, limiting the number of hfile is important for read performance. When you perform a merge, HBase reads the contents of the existing multiple hfile and writes the records to a new file. Then, set the new file to active and delete all old files that make up the new file. HBase determines which files to merge based on the number and size of the file. The starting point of a small merge design is a slight impact on hbase performance, so there is an upper limit on the number of hfile involved. These can all be set. The small merge is shown in the schematic 4.
Figure 4 The small merged
A large merge will process all hfile for a column family of a given region. After the big merge is complete, all the hfile of this column family are merged into one file. You can manually trigger a large merge of the entire table (or a specific region) from the shell. This action is very resource-intensive and should not be used frequently. On the other hand, small merges are lightweight and can occur frequently. A large merge is the only opportunity for hbase to clean up deleted records. Because we cannot guarantee that deleted records and tombstone tags are recorded in a hfile, a large merge ensures simultaneous access to both records.
My public number: ZHOUZXI, please scan the following two-dimensional code:
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
HBase Learning Summary (3): Data model and working mechanism of hbase