B-tree Summary

Source: Internet
Author: User
Tags ranges

Many of the content in this article comes from the network and has been modified. Due to the large number of sources, I will not mention them one by one.

In order to reduce the number of reads and writes to the disk and improve the search efficiency, features in the unit of "page" for read/write operations based on direct access devices.

In 1972, R. Bayer and E. M. McCreight proposed a multi-path balanced search tree called B-tree. It is suitable for organizing dynamic search tables on direct access devices such as disks.

 

1. Definitions and features

B-tree is a balanced multi-path search tree, which is applied in the file system. It is mainly used as an index for files.

B-Tree Structure Features:

An m-Level B-tree, an empty tree, or an m-tree that meets the following features:(M ≥ 3)

(1) There is only one root node. The keyword count ranges from 1 to 1, and the number of branches ranges from 2 to m.

(2) Non-leaf nodes except the root node. Each node contains the number of branches in the range [[m/2], m], that is, the keyword word range is [[m/2]-1, s-1], where [m/2] indicates taking the smallest integer greater than m/2;

(3) Non-leaf nodes are split from leaf nodes, so the number of leaf node keywords also meets [[m/2]-M-1];

(4) All non-terminal nodes contain information: (n, P0, K1, P1, K2, P2 ,......, Kn, Pn ),

Ki is the key word, Pi is the pointer to the sub-tree root node, and the key word in the sub-tree indicated by Pi-1 is smaller than Ki, and the key word referred by Pi is greater than Ki (I = 1, 2, ......, N), n + 1 indicates the order of B-tree, and n indicates the number of keywords, that is, [ceil (m/2)-1] <= n <= s-1;

(5) All leaf nodes are on the same layer, and the pointer field is empty. It has the following properties:

According to the B-tree definition, the first layer has a root node with at least two branches, and the second layer has at least two nodes. When I ≥3, each layer must have at least 2 times ([m/2]) of the I-2 power node ([m/2] indicates taking the smallest integer greater than m/2 ). If there are N nodes in the m-order tree, then we can deduce that N must satisfy the power of the H-1 of N ≥ 2 * ([m/2) -1 (h ≥ 1). Therefore, if the search is successful, the height of h ≤ 1 + log [m/2] (N + 1)/2, h is also the number of disk accesses (h ≥ 1), ensuring the efficiency of the search algorithm.

 

The following conclusions can be drawn from the above definition features: For the m-Level B-tree

1) each node in the tree can contain up to m children (m> = 2 );

2) Apart from the root and leaf nodes, each other node must have at least [ceil (m/2)] children (ceil (x) is a function to take the upper limit );

3) if the root node is not a leaf node, there are at least two children (in special cases: the root node without children, that is, the root node is a leaf node, the entire tree has only one root node );

4) All leaf nodes are on the same layer, and leaf nodes do not contain any keyword information (leaf nodes only have no children and pointers to children, and these nodes also exist and have elements ).

5) each non-terminal node contains n keywords (n, P0, K1, P1, K2, P2, ......, Kn, Pn ). Where:

A) Ki (I = 1... n) is the keyword, And the keyword is sorted in ascending order K (I-1) <Ki.

B) Pi is the point pointing to the sub-tree root, and the pointer P (I-1) points to all the nodes of the sub-tree are less than Ki, but greater than K (I-1 ).

C) The number of keywords n must meet the following requirements: [ceil (m/2)-1] <= n <= S-1.

6) each node has an upper and lower bound Number of keywords. These circles can be called the minimum degree of the corresponding node of the B-tree (the minimum number of children of the node in the internal node, that is, the number of pointer P mentioned above), which is expressed by a fixed integer t> = 2.

A). Each non-root node must have at least T-1 keywords, each non-root internal node has at least t children, that is. If the fruit tree is not empty, the root node contains at least one keyword.

B) each node can contain up to 2 T-1 keywords, and each non-root internal node can have up to 2 t children. If a node is full, it has 2 T-1 keywords.

C). To sum up the number of root node keywords range: [1, 2 * t-1], the number of non-root node keywords range: [T-1, 2 * t-1]

A typical Level 3 B-tree is given.

Each node in the B-tree can contain a large amount of keyword information and branches based on the actual situation (of course, it cannot exceed the size of the disk block, depending on the disk drive, generally, the block size is 1 kb ~ 4 K or so); this reduces the depth of the tree, which means to find an element, as long as few nodes read from the external storage disk into the memory, quickly access the data to be searched

 

The above definition features are all based on the online data. The following is a brief summary of the number of branch keywords and the range of degrees:

One m-Level B-tree:

1) for the root node, the value range of the number of subtree (child or branch) is [2, m], and the number range of keywords is [M-1].

2) For inner nodes, the number of branches ranges from [ceil (m/2) to [m], and the number of keywords ranges from ceil (m/2) to 1, s-1].

3 ). for a node whose minimum degree is t> = 2, the number of root node keywords range: [1, 2 * t-1], the number of non-root node keywords range: [T-1, 2 * t-1], number of branches: [t, 2 * t]

PS understanding of the minimum level: I personally think that for the Level m B-tree t = ceil (m/2)

 

B-nodes defined by the tree

# Define MAXM 10/* defines the maximum order of B-tree */typedef int KeyType; /* KeyType is the keyword type */typedef struct BTNode/* B-Tree node type definition */{int keynum; /* Number of keywords currently owned by the node */KeyType key [MAXM];/* key [1 .. keynum] Stores keywords. key [0] does not need */struct BTNode * parent;/* parent node pointer */struct BTNode * ptr [MAXM]; /* child node pointer array ptr [0 .. keynum] */} BTTree;

 

 

2. B-tree complexity and height

Its height is (this is the result provided on the Internet and has an objection to the result derived later), rather than the H = log2n of several other trees, where T is the degree (the number of elements contained in each node), that is, the so-called order number. n is the total number of elements or the total number of keywords.

The formula for the height of Tree B above can also be deduced to add up the number of elements at each layer level, for example, a node with a degree of T with a root node, the second layer should have at least 2 nodes, the third layer should have at least 2 t nodes, and the fourth layer should have at least 2 t * t nodes. Add all the smallest nodes. The derivation process is n> = 1 + 2 t + 2 t * t + 2t3 + ..... + 2th-1 = 1 + 2 t (1 + t * t + t3 + .... + th-2) = 1 + 2 t * (th-1-1)/(t-1)> = 1 + 2 (t-1) * (th-1-1)/(t-1) | the reason why the last inequality was introduced: t> = 2

Final result h <= logt (n + 1)/2) + 1

 

 

The formula provided on the Internet is, but I personally feel that the derivation is not strict, and there seems to be a mistake in the middle. I have not introduced the same formula as them. I don't know why.

 

The process of time complexity borrowed from the Internet is not explained yet. M indicates the maximum number of Subtrees for non-leaf nodes and N indicates the total number of keywords; therefore, the performance of B-tree is always equivalent to binary search (irrelevant to M value), so there is no B-tree balance problem;

 

2. Basic operations

Here, only the insert and delete operations are provided. The search operations are much simpler and will not be explained here.

1) B-Tree insertion (focusing on determining whether n is satisfied <= m-1)

A. Use the aforementioned B-tree search algorithm to find the Insertion Location of the keyword. If this keyword is found, it indicates that the keyword already exists and is returned directly. Otherwise, the search operation will fail on a non-terminal node of the lowest layer.

B. Determine whether the node has a null position. Determines the total number of keywords of the node.Satisfied?N <= S-1. If yes, it indicates that there is still a blank location for the node. Directly Insert the keyword k to the appropriate location for the node. If this condition is not met, it indicates that the node has no empty position and needs to be split into two nodes.

The splitting method is to generate a new node. After the keywords and k on the original node are sorted in ascending order, the keywords (excluding the keywords in the middle) are divided into two parts. Keywords contained in the left part are placed in the old node, and keywords contained in the right part are placed in the new node,Keywords at the intermediate position, together with the new nodeIs inserted into the parent node. If the number of keywords on the parent node exceeds m-1), split the node and plug it in. Until this process is passed to the root node.

 

 

 

 

 

 

2 ). b-tree deletion operations (focus on determining the delete node and Its sibling node, n> ceil (m/2)-1, n = ceil (m/2) in the parent node) -1, n <ceil (m/2)-1)

The process of deleting the key word K on the B-tree can also be completed in two steps.

A. Use the aforementioned B-tree search algorithm to find the node where the keyword is located. There are different processing methods based on whether k is a leaf node.

B. if the node is not a leaf node and the deleted keyword is the I-th key [I] of the node, you can find the minimum keyword Y from the subtree indicated by son [I], replace the key [I], and then delete Y in the leaf node. Therefore, the problem of deleting the key word k on a non-leaf node becomes the problem of deleting the key word in the leaf node.

 

The method for deleting a keyword on the B-leaf node is

First, the key to be deleted is directly deleted from the leaf node. Then, perform corresponding processing based on different situations. There are three possible situations:

A. If the number of original keywords n> = ceil (m/2) of the node where the deleted keyword is located, the node still meets the B-tree definition after the keyword is deleted. This is the simplest case. You only need to delete the keyword from the node.

B. if n is equal to ceil (m/2)-1 for the node where the deleted keyword is located, it indicates that the node does not meet the B-tree definition after the keyword is deleted, and needs to be adjusted.

The adjustment process is as follows: if the Left and Right sibling nodes have "redundant" keywords, that is, the right (left) adjacent to the node)Sibling NodeInThe number of keywords is greaterCeil (m/2)-1. Right (left)Move the smallest (BIG) keyword in the sibling node to the parent node. HoweverThe key words in the parent node are small (large) and smaller than those in the upstream key word are moved down to the node where the deleted key word is located..

C. If there is no "redundant" keyword in the left and right sibling nodes, that is, the right (left) adjacent to the node)Sibling NodeNumber of keywords inEqualCeil (m/2)-1. This situation is complicated. You need to merge the key words of the node to be deleted from its left (or right) sibling node and its parent node into one node.Add pointer to remaining keywordsAnd doubleKeywords in the parent nodeKiTogether,MergeAi (that is, the parent node points to thisDelete the pointer to the left (right) sibling node of the keyword Node) ReferredSibling Node. If the number of keywords in the parent node is less than ceil (m/2)-1, the parent node will perform the same processing. So that the entire tree may be reduced by a layer until such processing is performed on the root node.

In short, if you set the deleted keyword to a non-terminal node's Ki, you can use Y, the smallest keyword in the subtree referred to by Ai, to replace Ki, and then delete Y in the corresponding node. Deletion of any keyword can be converted to deletion of the lowest-level keyword.

 

Description:

A. If the number of keys at the node where the deleted key is located is not less than ceil (m/2), you only need to delete the Ki and the corresponding pointer Ai from the node, and the rest of the tree remains unchanged.

 

B. Adjust the number of keys at the node where the deleted key is located to ceil (m/2)-1. The adjustment process is described above.

 

C. The number of keywords in the node where the deleted key is located and the adjacent sibling node is equal to ceil (m/2)-1. Assume that the node has the right sibling, and the right sibling Node Address is pointed by its parent node pointer Ai. After the keyword is deleted, the remaining keywords and pointers of the node where the keyword is located, plus the key Ki in the parent node, are merged into the sibling node specified by Ai (if there is no right sibling, merge to the left sibling node ). If the number of keywords in the parent node is less than ceil (m/2)-1, and so on.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.