LSM tree origin, design ideas, and indexes applied to HBase

Last Update:2016-01-10 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Transferred from: http://www.cnblogs.com/yanghuahui/p/3483754.html

Before you speak the LSM tree, you need to mention three basic storage engines in order to understand the origin of the LSM tree :

The hash storage engine is a persistent implementation of a hash table that supports increment, delete, change, and Random read operations, but does not support sequential scanning and the corresponding storage system is the Key-value storage system. For Key-value inserts and queries, the complexity of the hash table is O (1), which is significantly faster than the tree operation O (n), and if no sequential traversal of the data is required, the hash table is your mr.right
B-Tree storage engine is a B-tree (about the origin of B-tree, data structure and application scenarios can be seen in the previous post) of the persistence of implementation, not only support the single record of the increment, delete, read, change operation, also support sequential scan (b + tree leaf node between the pointer), the corresponding storage system is the relational database (MySQL
The LSM (log-structured Merge tree) storage engine, like the B-tree storage engine, also supports add, delete, read, change, and sequential scan operations. And the disk random write problem is avoided by the batch storage technology. Of course there are pros and cons, the LSM tree has sacrificed some of its read performance to significantly improve write performance compared to the LSM and B + trees.

Through the above analysis, we should know the origin of the LSm tree, the design of the LSM tree is very simple: the changes to the data will be kept in memory, the specified size limit to write these modifications to disk bulk , but the reading is a little cumbersome, You need to merge the history data in the disk and the most recent changes in memory, so the write performance is greatly improved, you may need to read to see if memory is hit, otherwise you need to access more disk files. In extreme, HBase's write performance based on the LSM tree is an order of magnitude higher than MySQL, with an order of magnitude lower in reading performance.

The LSM tree splits a big tree into n small trees, which are first written into memory, and as the trees grow larger and smaller, the small trees in memory are flush to disk, and the trees on the disk can be merged into a tree on a regular basis to optimize read performance.

These are probably the main ideas of the design of hbase storage, which correspond to the following instructions:

Because the small tree is written in memory, in order to prevent memory data loss, write memory needs to be temporarily persisted to disk, corresponding to HBase's Memstore and Hlog
After the tree on the memstore reaches a certain size, it needs to flush to the hregion disk (typically Hadoop DataNode) so that the Memstore becomes the disk file DataNode on the StoreFile, Regular hregionserver to datanode data to do the merge operation, completely delete invalid space, many small trees in this time merged into a big tree, to enhance reading performance.

Regarding the LSM tree, for the simplest two-layer LSM tree, the data in memory and the disk you have in the merge operation, such as

Picture from LSM paper

LSM tree, in theory, can be a part of the in-memory tree and the first tree in the disk to do the merge, for the disk in the direct update operation of the tree may damage the continuity of the physical block, but in practice, the general LSM has multiple layers, when the disk of small trees into a big tree, The sequence can be re-ordered, allowing the block to continuously optimize read performance.

HBase in the implementation, is the entire existence of a certain threshold, flush to disk, to form a file, the file is stored as a small B + tree, because hbase is generally deployed in HDFs, HDFS does not support the update operation of the file, So the overall memory flush of hbase, rather than the small tree merge update on the disk, makes sense. The memory flush to the disk of the small tree, regular will also be merged into a tree. On the whole hbase is the idea of using LSM tree.

LSM tree origin, design ideas, and indexes applied to HBase

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

LSM tree origin, design ideas, and indexes applied to HBase

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

LSM tree origin, design ideas, and indexes applied to HBase

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support