Lsm-tree an efficient index data structure

Source: Internet
Author: User

Reprint: http://bofang.iteye.com/blog/1676698


Thesis the Log-structure merge-tree (lsm-tree) (http://www.google.com.my/url?sa=t&rct=j&q=&esrc=s& source=web&cd=4&cad=rja&ved=0cdoqfjad&url=http%3a%2f%2fciteseerx.ist.psu.edu%2fviewdoc% 2fdownload%3fdoi%3d10.1.1.44.2782%26rep%3drep1%26type%3dpdf&ei=6olpujuzfsayiafikihidg&usg= AFQJCNGGON9IFTLSHCV2HBL0RVQDELFXOW&SIG2=8WYSS63QLQRVWF5M3LK7BG) describes the object and algorithm details of this data structure.

The main goal of the Lsm-tree is to quickly build an index. B-tree is a common technique for indexing, however, in the case of large concurrent insertion of data, b-tree requires a large amount of disk random Io, and it is clear that a large number of disk random IO can seriously affect the speed of indexing. In particular, for situations where the index data is large (for example, a federated index of two columns), the insertion speed is an important indicator of performance impact, while reading is relatively small. The Lsm-tree is written on disk in order to achieve optimal write performance because it greatly reduces the number of disk seek times and one disk IO can write to multiple index blocks.

The main idea of lsm-tree is to divide trees of different grades. Take the two-level tree as an example, you can imagine that an index data consists of two trees, a tree exists in memory, a tree exists in the disk. In-memory trees can be not necessarily B-trees, but can be other trees, such as AVL trees. Because the data size is different, there is no need to sacrifice the CPU to achieve the minimum tree height. The tree that exists in the disk is a B-tree.

The data is first inserted into the in-memory tree. A merge operation occurs when the data in the in-memory tree exceeds a certain threshold. The merge operation iterates the leaf nodes of the in-memory tree from left to right and merges the leaf nodes of the tree in the disk, and when the amount of data being merged reaches the size of the disk's storage page, the merged data is persisted to disk, and the parent node's pointer to the leaf node is updated.

Once the leaf nodes that existed on the disk were merged, the old data is not deleted, and the data is copied to the disk in sequential order with the in-memory data. This can be a waste of some space, but Lsm-tree provides mechanisms to reclaim these spaces.

The non-leaf node data of the tree in the disk is also cached in memory.

Data lookup finds the in-memory tree first, and instead finds the tree on the disk if no results are found.

One obvious problem is that if the amount of data is too large, the trees in the disk will be correspondingly large, resulting in slower merging. One solution is to build trees of all levels, and lower-level trees are larger than the tree datasets in the previous hierarchy. Assuming that the tree in memory is C0, the tree in the disk follows the hierarchy once for C1, C2, C3, ... ck-1, CK. The order of merging is (C0, C1), (c1, C2) ... (Ck-1, CK).

Why is Lsm-tree inserted quickly?

1. First, the insert operation will first be used for memory, and the in-memory tree will not be large, which will be quick.

2. Merge operations write to one or more disk pages sequentially, which is much faster than random writes.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.