From two fork sorting tree to balanced binary tree to red black Tree Series 3

Last Update:2015-05-29 Source: Internet

Author: User

Tags in degrees

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This blog mainly explains the B-tree and its insert delete operations, and gives the operation of the flowchart to achieve clear and understandable purpose, although the title is from the two-fork sorting tree to balance the binary tree and then to the Red Black Tree Series 3, no B-tree two words, but they are dynamic search tree, so I put them into a series.

B-trees are a balanced search tree designed for disk or other direct-access secondary storage devices. It promotes the two-fork search tree in a very natural way, where the B-tree differs from the red-black tree in that the children of the B-tree node are not limited to a maximum of 2, but can have several to thousands of variations. Because nodes have more branches, they are much smaller than red and black trees.

There are two definitions of B-trees, but they are the same.

Define form one (defined in degrees T):

A B-Tree T has the following properties:

1. Each node x has the following properties

A X.N the number of keywords currently stored in node X.

b X.N the keywords themselves, and in non-descending order

C X.leaf Boolean value TRUE if X is a leaf node, false otherwise

2. Each internal node x also contains x.n+1 pointers to their children. (A X.N keyword divides an interval into x.n+1, so there are x.n+1 children, number of children = key Words + 1). Leaf nodes do not have children, so they do not have pointers pointing to the child.

3.X.N a keyword to split the range of keywords stored in each subtree.

4 Each leaf node has the same depth. That is, the height of the tree H

5 The number of keywords contained in each node is called degrees, minimum degrees t>=2;

A root node has at least one keyword, except for the root nodes, each of which must have at least (t-1) keywords.

B Each node contains at most (2t-1) a keyword, that is, an internal node can have up to 2t children, when a node happens to have a 2t-1 keyword, the node is said to be full.

struct bnode{int num;//keyword number int key[maximun];//node The key that is stored in the struct BNode *parent;//points to the parent node pointer struct BNode *ptr[maximum];// Keyword's child node pointer array}

For example, the t=2 B-Tree is the simplest, each internal node can have 2,3,4 children, that is, a 2-3-4 tree. The following is a representation of all valid B-trees of {1,2,3,4,5} for t = 2.

The minimum number of root node keywords is 1, up to 3 (obviously root has 3 keywords, then it has 4 children, at least 4 keywords, altogether 7 keywords >5) The remaining nodes at least 1 keywords, up to 3 keywords.

A tree containing n keywords with a height of h minimum number of degrees T >=2 has the following properties:

< note: Tree height h is calculated starting from 1, that is, only the root node of the B-tree height h=1, and the introduction of the algorithm slightly different >

You can get: for a B-tree with a total of n keywords of t, its height limit is log (t) ((N+1)/2) +1, the B-tree is a very efficient data structure.

The following gives the B-tree definition form two (defined by order):

The B-Tree of an M-order satisfies the following conditions:

1 Each node has a maximum of M subtrees tree <== The node is at most m-1 keywords >

2 root nodes, other branch nodes have at least [M/2] (upper bounds) subtrees Tree <== at least [m/2]-1 keywords >

3 nodes with at least 2 subtrees trees.

4 All leaf nodes are on the same layer, and leaf nodes do not contain any keyword information.

5 have a J child's non-leaf knot exactly j-1 a keyword.

In fact, the definition by degree is from the number of keywords in the B-tree node, the order is defined by the number of sub-tree nodes of B tree, all the same.

The following focuses on the insert and delete operations of the B-tree:

B-Tree keyword insertion: The first thing to consider is the presence of the keyword to be inserted, and if it does not exist, the location of the insertion node will be found during the lookup process. The main consideration when inserting a node is whether there is enough space to insert the keyword into it (and then move the element after insertion to ensure an ascending order). If the node space is full, then split the node. And the middle keyword of the split anterior node is removed up to the parent node. If the parent node is full, you need to split again. The worst case is to insert a keyword, but need to always split to the root node, and then add a node, the whole B-tree added a layer, height h=h+1;

The specific flowchart is as follows:

B-Tree keyword deletion: when deleting a keyword, the first to find the location of the keyword, and consider whether the size of the node after the deletion of the keyword to meet the requirements of the B-tree and the influence of the keyword left and right sub-tree. Whether the node of the non-deleted keyword in the flowchart is the leaf node classification discussion. If necessary, merge subtrees. The most extreme case, because deleting a keyword, merging nodes from bottom to top, eventually leads to the height of the B-tree h=h-1.

The specific flowchart is as follows:

A few additional notes about the B-Tree Insert delete keyword:

1 keyword insertion must be in the leaf node, so do not need to consider the effects of the subtree, only need to consider the node may exceed the capacity, thus splitting nodes. The deletion may occur on a non-leaf node, so consider the subtree effect, and the node deletes a keyword, which may be less than the lower limit of the number of keywords, and therefore merge the nodes.

2 split nodes, the parent node of the subtree more than one, so the parent node of the keyword also more than one, will be split node middle value up to the parent node, just divided into two parts and the node capacity to meet the quantity requirements.

3 Delete the keyword, the number of nodes to reach the lower limit, do not rush to merge nodes, but first indirectly to the number of keywords more (more: After borrowing will not lead to reach the lower limit of the keyword) of the adjacent brothers to borrow the keyword (the actual process is to borrow a parent node, the parent node is missing from the adjacent sibling node up a keyword This is an indirect implication). Merging is considered only when the number of adjacent node key values is the lower limit and cannot be lent out.

4 When merging nodes, the parent node's subtree is one less, so the parent nodes have fewer keywords, so a keyword for the parent is moved down to the merge node. The number of keywords that merge nodes at this point is (t-1) +1+ (t-2) =2t-2

is just below the maximum number of keywords per node (2t-1). If the parent node moves down one of the keywords, it does not reach the lower limit, and if it does, repeat the above operation (at this point, the parent node deletes an element) and return to Description 3.

5 when inserting to reach the upper limit of the split is inevitable, and the deletion to reach the lower bound of the merger is not the first consideration, as far as possible to borrow keywords to maintain the B-tree structure.

B + Tree

Although the query efficiency of B-tree is very high, but does not solve the inefficiency of element traversal, in order to solve this problem, there is a B + tree, B + tree only need to traverse the leaf node to achieve the entire tree traversal, especially for the database based on the scope of the query. In addition, for B + tree, all keyword information is stored in the leaf node, so the keyword query path length is the same, more stable.

The B + Tree is a variant tree of the B. C-tree that appears as required by the file system. The difference between a M-order B + tree and a M-order tree is:
1 nodes with n subtrees tree contain n keywords

2 All leaf nodes contain all of the keyword information and pointers to these keyword records, and the leaf nodes themselves are sorted in ascending order of keyword size to connect

3 all non-leaf nodes can be viewed as an indexed portion, with only the largest (or smallest) keyword in its subtree.

References: Introduction to Algorithms

http://blog.csdn.net/v_JULY_v/article/details/6530142/

Reprint Please declare: http://blog.csdn.net/u010498696/article/details/46236119

From two fork sorting tree to balanced binary tree to red black Tree Series 3

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More