Good data structure, for retrieving data, the efficiency of inserting data is very high.
Common data Structures
B + Tree
The root node and the minor point are simple, record the minimum value of each leaf node separately, and point to the leaf node with a pointer.
Each key value in the leaf node points to the real data block, each leaf node has a front pointer and a back pointer, this is to do the range query, leaf nodes can jump directly, so as to avoid backtracking to the branch and root node.
Characteristics:
1, there are n subtrees tree nodes contain n keywords, each keyword does not save data, only for the index, all the data are stored in the leaf node .
2, all the leaf nodes contain all the key words of information, and points to contain these key records of the pointer, and the leaf node itself according to the size of the key words from childhood and large sequential links.
3, all non-terminal nodes can be regarded as the index part, the node only contains its subtree (root node) in the maximum (or minimum) keyword.
Disadvantages:
Usually the amount of data is very large, the data in the disk in this form of paging, it will be more. It is very likely that the data stored in the two-page table is not contiguous, far apart, this sequential query will be slower.
The biggest performance problem with B + trees is that it generates a lot of random io. With the new data inserted, the leaf nodes will slowly split, logically continuous leaf nodes in the physical often discontinuous, or even separated very far , but do range query, will generate a lot of read random IO.
For a large number of random writes , as well, to give an example of inserting a key span, such as 7->1000->3->2000 ... The newly inserted data is stored far apart on the disk and generates a large amount of random write Io. As can be seen from the above, the low disk seek speed seriously affects performance.
LSM Tree
To better illustrate the principles of the LSM tree, here is a more extreme example:
Now assume that there are 1000 nodes of the random key, for the disk, it must be the order of the 1000 nodes written to disk the fastest, but in this way, read the tragedy, because key in the disk completely unordered, each read to the full scan;
So, in order to make reading performance as high as possible, the data must be ordered in the disk, this is the principle of B + tree, but it is tragic to write, because there will be a lot of random IO, disk seek speed can not keep up.
The LSM is essentially a balance between read and write, and it sacrifices partial read performance to significantly improve write performance compared to the B + tree.
Its principle is to split a big tree into n small trees, it is first written into memory (memory does not seek the speed of the problem, random write performance is greatly improved), in memory to build an ordered small tree, as the small trees grow larger, the memory of the small tree will be flush to disk. when reading, because do not know the data in which small tree, so must traverse all the small tree, but in each small tree internal data is orderly .
hbase Data storage format
HBase introduces the concept of the LSM tree, the log-structured merge-trees.
hfile Format
The hfile is divided into six parts:
Data Block Segment
--Save the data in the table, this part can be compressed. Each chunk consists of a block and some keyvalue, and the value of key is stored strictly in order. The block size defaults to 64K (specified when the CF was created or hcolumndescriptor.setblocksize (size)), which can compress the storage.
When querying data, it is a block of data from the hard drive load to memory. When looking for data, it is the sequential traversal of the keyvalue pair in that block.
Meta block Segment (optional)
–-saves user-defined key-value pairs that can be compressed. For example, Booleam filter is the existence of a metadata block, the block only retains value value, thekey value is preserved in the metadata index block . each metadata block consists of the size and value values . You can quickly determine if the key is in this hfile.
File Info Section
–--hfile Meta-information is not compressed, and users can add their own meta-information in this section.
Data block Index segment
--–data the index of the block, the key of each index is the key of the first record of the block being indexed(in the Format: header information (the data block in the file offset + data block length + data block of the first key), (the data block in the file offset + data block length + The first key of the data block), ...).
Meta Block index segment (optional)
The index of the –meta block. The block is formatted with the data Block index, but the meaning of a part is different.
Trailer
–-This paragraph is fixed-length. The offset of each segment is saved, and when a hfile is read, the trailer,trailer** is first read and the starting position of each segment is saved * * (the magic number of the segment is used for secure check), and then DataBlock index is read into memory. This way, when retrieving a key, you do not need to scan the entire hfile, but simply find the block where key is located in memory, read the entire block into memory with one disk IO, and then find the key you need. DataBlock index is eliminated by LRU mechanism.
The description is as follows:
1, FileInfo offset–fileinfo information in the hfile offset. Long (8 bytes).
2. Dataindex offset– The offset of the data block index in hfile. Long (8 bytes).
3, Dataindex count– the number of Data block index. Int (4 bytes).
4, Metaindex offset– Metadata index block in the hfile offset. Long (8 bytes).
5, Metaindex count– The number of metadata index blocks. Int (4 bytes).
6, totaluncompressedbytes– The total size of the uncompressed chunk portion. Long (8 bytes).
7, Entry count– The number of all cells (Key-value) in the data block. Int (4 bytes)
8, Compression codec– compression algorithm is the enum type, which represents the compression algorithm code. (lzo-0,gz-1,none-2), int (4 bytes)
9, version– version information. The current version value is 1. Int (4 bytes).
The hfile data Block,meta Block is typically stored in compression, which reduces network IO and disk IO Significantly, with the overhead of, of course, CPU compression and decompression. Compression support for Target hfile two ways: Gzip,lzo.
storefile Format
Each strore is made up of one memstore and 0 to more storefile.
StoreFile is saved in hfile format on HDFs.
keyvalue Object Format
The keyvalue format:
Keylengthvaluelengthkeyvalue
Where both keylength and valuelength are integral types, indicating the length.
Both key and value are byte data, key has fixed data, and value is raw data. The format of key is as follows.
The Key format:
rowlengthrow (i.e., the rowkey)columnfamilylengthcolumnfamilycolumnqualifiertimestampkeytype
There are four types of keytype, namely put, Delete, DeleteColumn, and deletefamily.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
HBase Data storage format