Data structure B-Tree, b* tree

Source: Internet
Author: User

1. B-Tree

B-Tree, a self-balancing tree that keeps data in order . This data structure allows data to be searched, sequentially accessed, inserted, and deleted, all within a logarithmic time.

The B-tree, which is generally a generalized two-fork search tree , can have more than 2 child nodes . Unlike the self-balancing binary search tree, the B-tree optimizes the read and write operations of large chunks of data. The B-Tree reduces the intermediate process that is experienced in locating records, thus speeding up the access speed. B-tree This data structure can be used to describe external storage. This data structure is often applied to the implementation of databases and file systems.

The nature of 1.1 B-Tree

M is the order of the tree, B-tree, or empty tree, otherwise the following conditions are met:

1. Definition of any non-leaf node up to only m sons; m>2;

2. The number of sons of the root node is [2, M];

3. The number of sons of non-leaf nodes outside the root node is [M/2, M];

4. Each node is stored at least m/2-1 (rounded) and up to M-1 keywords; (at least 2 keywords)

5. Number of key words for non-leaf nodes = number of pointers to sons-1;

6. Non-leaf node keywords: k[1], k[2], ..., k[m-1]; K[i] < k[i+1];

7. Pointers to non-leaf nodes: p[1], p[2], ..., p[m], where p[1] a subtree that points to a keyword less than k[1], p[m] a subtree that points to a keyword greater than k[m-1], and other p[i] to the subtree of the keyword belonging to (k[i-1], k[i]);

8. All leaf nodes are located on the same floor;

such as: (M=3)

B-Tree search, starting from the root node, the node in the key (ordered) sequence of binary search, if the hit is finished, otherwise enter the query keyword to the range of the son node; repeat until the corresponding son pointer is empty, or is already a leaf node.

1.2 B-Tree application scenario
    • Keep key values in order and traverse sequentially

    • Use a hierarchical index to minimize disk reads

    • Use a block that is not fully populated to speed up insertions and deletions

    • Maintain index balance with elegant traversal algorithms

      In addition, the B-tree minimizes wasted space by ensuring that the internal nodes are at least half full. A B-tree can handle any number of insertions and deletions.

2. B + Tree

B + Tree is a tree data structure, is an N-fork Tree , each node usually has more than one child, a B + tree contains the root node, the internal node and the leaf node . The root node may be a leaf node or a node that contains two or more children nodes.

B + trees are typically used in database and operating system file systems . File systems such as NTFS, ReiserFS, NSS, XFS, JFS, ReFS, and BFS are all using B + trees as metadata indexes . The B + tree is characterized by its ability to keep the data stable and orderly , with its insertion and modification having a more stable logarithmic time complexity. B + Tree elements are inserted from the bottom up.

Properties of 2.1 B + Tree

B + Tree is a variant of B-tree and is also a multi-path search tree, which is basically defined in the same way as a tree, except:

1. The sub-tree pointer of non-leaf node is the same as the number of keywords;

2. The subtree pointer of the non-leaf node p[i], pointing to the subtree (b-tree is open interval) of the key value belonging to [K[i], k[i+1]);

3. Add a chain pointer for all leaf nodes;

4. All keywords appear at the leaf node;

 

2.2 B + Tree

1. All keywords appear in the list of leaf nodes (dense index), and the key words in the list are in order;

2. Cannot be hit on non-leaf nodes;

3. The non-leaf node is equivalent to the index of the leaf node (sparse index), and the leaf node is equivalent to the data layer of storing (key) data;

4. more suitable for file indexing system ;

The B + search is basically the same as the B. C-tree, except that the second B-tree only hits the leaf nodes (b-trees can be hit on non-leaf nodes), and its performance is equivalent to doing a binary search in the keyword complete.

3. b* Tree

B + Tree Variant, the non-root and non-leaf nodes of the B + tree are then increased to point to the brother's pointer;

 

b* Tree defines the number of non-leaf node keywords at least (2/3) *m, that is, the minimum usage of the block is 2/3 (instead of the B + Tree 1/2);

Division of B + trees:
  

When a node is full, a new node is allocated and 1/2 of the original node is copied to the new node, and the new node pointer is added to the parent node, and the division of B + tree affects both the original node and the parent node, without affecting the sibling node, so it does not need to point to the brother's pointer;

b* The division of the tree:

When a knot is full, if the next sibling node is not full, move part of the data to the sibling node, insert the keyword at the original node, and finally modify the keyword of the sibling node in the parent node (because the sibling node's keyword range changes); If the brother is full, the new node is added between the original and the sibling nodes. And each copy 1/3 of the data to the new node, and finally the parent node to add a new node pointer;

Therefore, the probability of allocating new nodes to b* tree is lower than that of B + tree, and the space utilization rate is higher.

4 Summary
    • B-Tree:
        
      Multi-Path search tree, each node storage M/2 to M keywords, non-leaf node storage points to the key range of sub-nodes; All the keywords appear in the whole tree, and only once, non-leaf nodes can hit;

    • B + Tree:

      On the B-tree basis, increase the list pointer for the leaf nodes, all the keywords are in the leaf node.
      , the non-leaf node is the index of the leaf node, and B + tree is always hit by the leaf knot point;

    • b* Tree:

      On the basis of B + tree, the index pointer is increased for non-leaf nodes, and the minimum utilization rate of nodes is increased from 1/2 to 2/3.

Data structure B-Tree, b* tree

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.