B-Tree, B-tree, + + tree, b* tree

Source: Internet
Author: User

Disclaimer: This article only introduces the definition of these trees and the comparison between them. Does not involve their insertion, deletion, splitting, integration and other operations. These will be described in a later article.

B-Tree

Binary search tree:

   1.所有非叶子结点至多拥有两儿子(Left和Right);   2.所有结点存储一个关键字;   3.非叶子结点的左指针指向小于其关键字的子树,右指针指向大于其关键字的子树;

Such as:

BTree search, starting from the root node, if the query keyword is equal to the keyword of the node, then hit;
Otherwise, if the query keyword is smaller than the node keyword, go to the left son; if the keyword is larger than the node, go to the right son;
If the pointer to the left son or right son is empty, the report cannot find the corresponding keyword;

If the B number of nodes of all the non-leaf nodes in the tree remains approximately (balanced), then the search performance of the tree is approximated by the B binary searching;
However, it has advantages over the binary search of contiguous memory space: Changing the B tree structure (inserting and deleting nodes) does not require moving large segments of memory data, or even constant overhead;

However B , after several insertions and deletions, the tree may lead to different structures:

The right side is also a B tree, but its search performance is already linear, the same keyword collection may lead to different tree structure index, so the use of B trees should also consider the tree to B keep the structure of the left graph, and avoid the structure of the right graph, so-called "balance" problem;

The actual B tree is based on the original B tree with the balance algorithm, that is, "balanced binary tree"; how to keep the B equilibrium algorithm of the tree node distributing evenly is the key to balance the binary tree ; the equilibrium algorithm is a B strategy to insert and delete nodes in the tree.

B-Tree

is a multi-path search tree (not two-pronged):

1) 每个节点 x 有下面属性: - 节点 x 有关键字 n 个 - 关键字本身非降序排列 - 有一个布尔值表示本节点是否是叶子节点2)每个内部节点x 还包含 n+1 个指向孩子的指针3)每个叶节点具有相同的深度4)每个节点包含的关键字个数有上界和下界, 使用B树的最小度数 t >=2 表示- 除根节点外每个节点必须至少有 t-1 个关键字,因此,每个内部节点至少有 t 个孩子- 每个节点最多包含 2t-1 个关键字,因此, 一个内部节点最多有 2t 个孩子。t =2 时是最简单的。每个内部节点有2、3、4个孩子。即一颗2-3-4树。

such as (m=3):


B-Tree search, starting from the root node, binary search for the key (ordered) sequence within the node, if

Hit the end, or enter the query keyword belongs to the range of the son node; repeat until the corresponding son pointer is

Empty, or already a leaf knot;

B-Characteristics of the tree:

       1.关键字集合分布在整颗树中;       2.任何一个关键字出现且只出现在一个结点中;       3.搜索有可能在非叶子结点结束;       4.其搜索性能等价于在关键字全集内做一次二分查找;       5.自动层次控制;
B + Tree

B+-tree: It is a kind of deformation tree which should be produced by the file system B-tree .

The m similarities and differences between the tree and the Order tree B+ m B are:

    1. There is n n a key word in the node of the subtrees tree;
    2. all of the leaf nodes contain information about all the keywords, and pointers to the records that contain them , and the leaf nodes themselves are linked by the size of the keywords from a large order of Origin . ( B the tree's leaf node does not include all the information it needs to find)
    3. All non-terminal nodes can be considered as the index part, and the nodes contain only the largest (or smallest) keywords in their sub-root nodes . ( B the non-final node of the tree also contains valid information that needs to be looked up)

The Central African endpoint contains only a pointer to the smallest keyword in the child tree root node.

Why is it that B+-tree is better suited to the file index and database index of the operating system in the actual application than the B-tree?
    1. B+-treeLower disk read and write costs

B+-treeThe internal node does not have pointers to specific information about the keyword. As a result, the internal nodes are smaller relative to the B tree. If you keep all of the same internal nodes in the same disk block, the number of keywords that the disk block can hold is more. The more keywords you need to find when you read into memory at once. The relative IO number of reads and writes is reduced.

For example, suppose a disk block in a disc is accommodated 16bytes , while a keyword 2bytes , a keyword, is a specific information pointer 2bytes . An internal node of a 9-step B-tree (a node with a maximum of 8 keywords) requires 2 disks fast. The B+ inner node of the tree only needs 1 disks fast. When the internal node needs to be read into memory, the B tree is more than the B+ tree block lookup time (on disk is the time of the disc rotation).
2. B+-tree more stable query efficiency

Because a non-endpoint is not a node that ultimately points to the contents of a file, it is only the index of the keyword in the leaf node. So any keyword search must take a path from the root node to the leaf node. The path length of all keyword queries is the same, resulting in a query efficiency equivalent for each data.
3. B+ The main reasons for using a tree for database indexing are:

BThe tree IO does not solve the problem of inefficient element traversal while improving disk performance. It is in order to solve this problem that the tree came into being B+ . B+The tree can traverse the whole tree as long as it traverses the leaf nodes. and the scope-based queries in the database are very frequent, and B the tree does not support such operations (or inefficient)

B*-tree

B*-treeIs B+-tree the variant, on the basis of the B+ tree (all the leaf nodes contain all the key information, and pointers to the records containing these keywords), the B* tree and non-leaf nodes and then increase the pointer to the brother ; B* The tree defines the number of non-leaf node keywords at least, that is (2/3)*M , the minimum usage of the block is substituted for the 2/3( B+ tree 1/2 . A simple example is given, as shown in:

B+Tree splitting: When a node is full, a new node is allocated, the data from the original node is 1/2 copied to the new node, and the new node pointer is added to the parent node, and the B+ Tree's division affects only the original and parent nodes, without affecting the sibling nodes, so it does not need to point to the sibling's pointer.

B*Tree splitting: When a node is full, if its next sibling node is not full, then move part of the data to the sibling node, insert the keyword at the original node, and finally modify the keyword of the sibling node in the parent node (because the sibling node's keyword range has changed); If the brothers are full, The new node is added between the original node and the sibling node, the data is copied 1/3 to the new node, and the pointer to the new node is added to the parent node at the end point.

Therefore, B* the probability of the tree allocating new nodes is lower than the B+ tree, and the space utilization rate is higher.

B-Tree, B-tree, + + tree, b* tree

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.