LSM tree origin, design ideas, and indexes applied to HBase

Source: Internet
Author: User

Transferred from: http://www.cnblogs.com/yanghuahui/p/3483754.html

Before you speak the LSM tree, you need to mention three basic storage engines in order to understand the origin of the LSM tree :

    • The hash storage engine is a persistent implementation of a hash table that supports increment, delete, change, and Random read operations, but does not support sequential scanning and the corresponding storage system is the Key-value storage system. For Key-value inserts and queries, the complexity of the hash table is O (1), which is significantly faster than the tree operation O (n), and if no sequential traversal of the data is required, the hash table is your mr.right
    • B-Tree storage engine is a B-tree (about the origin of B-tree, data structure and application scenarios can be seen in the previous post) of the persistence of implementation, not only support the single record of the increment, delete, read, change operation, also support sequential scan (b + tree leaf node between the pointer), the corresponding storage system is the relational database (MySQL
    • The LSM (log-structured Merge tree) storage engine, like the B-tree storage engine, also supports add, delete, read, change, and sequential scan operations. And the disk random write problem is avoided by the batch storage technology. Of course there are pros and cons, the LSM tree has sacrificed some of its read performance to significantly improve write performance compared to the LSM and B + trees.

Through the above analysis, we should know the origin of the LSm tree, the design of the LSM tree is very simple: the changes to the data will be kept in memory, the specified size limit to write these modifications to disk bulk , but the reading is a little cumbersome, You need to merge the history data in the disk and the most recent changes in memory, so the write performance is greatly improved, you may need to read to see if memory is hit, otherwise you need to access more disk files. In extreme, HBase's write performance based on the LSM tree is an order of magnitude higher than MySQL, with an order of magnitude lower in reading performance.

The LSM tree splits a big tree into n small trees, which are first written into memory, and as the trees grow larger and smaller, the small trees in memory are flush to disk, and the trees on the disk can be merged into a tree on a regular basis to optimize read performance.

These are probably the main ideas of the design of hbase storage, which correspond to the following instructions:

    • Because the small tree is written in memory, in order to prevent memory data loss, write memory needs to be temporarily persisted to disk, corresponding to HBase's Memstore and Hlog
    • After the tree on the memstore reaches a certain size, it needs to flush to the hregion disk (typically Hadoop DataNode) so that the Memstore becomes the disk file DataNode on the StoreFile, Regular hregionserver to datanode data to do the merge operation, completely delete invalid space, many small trees in this time merged into a big tree, to enhance reading performance.

Regarding the LSM tree, for the simplest two-layer LSM tree, the data in memory and the disk you have in the merge operation, such as

Picture from LSM paper

LSM tree, in theory, can be a part of the in-memory tree and the first tree in the disk to do the merge, for the disk in the direct update operation of the tree may damage the continuity of the physical block, but in practice, the general LSM has multiple layers, when the disk of small trees into a big tree, The sequence can be re-ordered, allowing the block to continuously optimize read performance.

HBase in the implementation, is the entire existence of a certain threshold, flush to disk, to form a file, the file is stored as a small B + tree, because hbase is generally deployed in HDFs, HDFS does not support the update operation of the file, So the overall memory flush of hbase, rather than the small tree merge update on the disk, makes sense. The memory flush to the disk of the small tree, regular will also be merged into a tree. On the whole hbase is the idea of using LSM tree.

LSM tree origin, design ideas, and indexes applied to HBase

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.