All data files in the http://www.aliyun.com/zixun/aggregation/13713.html ">hbase are stored on the Hadoop HDFs file system, mainly including the two file types presented above: 1. hfile, hbase in keyvalue data storage format, hfile is the binary format of Hadoop file, in fact, StoreFile is to hfile do lightweight packaging, that is storefile the bottom is hfile 2. The storage format of Wal (Write projectile Log) in Hlog File,hbase, which is physically a sequence File of Hadoop hfile The following figure is the storage format for hfile:
First, the hfile file is indefinite, with only two of the length fixed: trailer and FileInfo. As shown in the figure, trailer has pointers to the starting point of other blocks of data. File info records Some meta information about the files, such as: Avg_key_len, Avg_value_len, Last_key, COMPARATOR, Max_seq_id_key, etc. The data index and the Meta index block record the starting point for each data block and meta block. Data block is the basic unit of HBase I/O, in order to improve the efficiency, Hregionserver has the block cache mechanism based on LRU. The size of each data block can be specified by parameters when creating a table, and large blocks are advantageous to sequential scan, and the small chunks facilitate random queries. Each data block in addition to the beginning of the magic is a keyvalue of stitching, magic content is a number of random numbers, the purpose is to prevent data corruption. The internal constructs for each keyvalue pair are described in detail later. Each keyvalue in the hfile is a simple byte array. But this byte array contains many items and has a fixed structure. Let's look at the concrete structure inside:
The start is a two fixed-length number that represents the length of the key and the length of the value. followed by the key, which begins with a fixed-length number, representing the length of the Rowkey, followed by the Rowkey, and then the fixed-length number, representing the accessibility length, then the accessibility, then the qualifier, followed by the two fixed-length values, indicating time Stamp and Key Type (Put/delete). The value part has no such complex structure, it is pure binary data. Hlogfile
The diagram above illustrates the structure of the Hlog file, in fact, Hlog file is a common Hadoop Sequence file,sequence files Key is a Hlogkey object, Hlogkey records the attribution information written to the data, In addition to the table and region names, the sequence number and Timestamp,timestamp are "write Time", the starting value for sequence number is 0, or the last time the file was deposited in the filesystem sequence Number。 The value of the Hlog sequece file is the KeyValue object of HBase, the hfile in the corresponding keyvalue, as described above.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.