Deep understanding of C language B-tree

Last Update:2018-12-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

B-tree is a balanced search tree designed for disks or other direct storage devices. As shown in. Each node arrow points to an inbound level, indicating that the outbound level is called an outbound level. The node entry level of the tree structure is 1, otherwise it will become a graph. Therefore, we generally say that the degree of tree refers to the degree of output of the Tree node, that is, the number of subnodes of a node. With the concept of degree, we can simply define B (assuming that the minimum degree of a tree is M ):
1. Each node has at least M-1 key code, at most 2-1 key code;
2. Apart from the root node and the leaf node, each node has at least M sub-nodes and at most 2 M sub-nodes;
3. There are at least two subnodes. The only exception is that there are only root nodes, and no subnodes exist;
4. All leaf nodes are on the same layer.

Let's take a look at its node structure, as shown in:

Each node stores a keyword and a pointer to a subnode. It is easy to see that the pointer is one more than the key code.

From the definition of B tree, we can see some of its features:
1. Tree Height balance. All leaf nodes are on the same layer;
2. keywords are not repeated and are sorted in ascending order. The key code of the parent node is the boundary of the child node;
3. Tree B places records with close values on the same disk page, and uses the principle of access locality;
4. B tree ensures that a certain percentage of nodes are full, which can improve the space utilization.

How can we determine the size of the B-Tree node? To minimize disk operations, the node size is usually set to the size of a disk page. Generally, the height of the tree does not exceed three layers. That is to say, you only need three disk operations to find a key code.
In implementation, I refer to the introduction to algorithms and assume that:
1. The root node of Tree B is always in the primary storage and does not need to be read from the disk. However, after the root node changes, a write operation is performed on the disk;
2. When any node is passed as a parameter, a read disk is required.

In addition to the key code and pointer, each node should have the information of the file corresponding to the key code, such as the file offset, otherwise, how can we find this record. During implementation, this additional data is not stored in the node. The following is the structure of the defined tree. The file name is btrees. h. The content is as follows:

Copy codeThe Code is as follows:/* btrees. h */
# Define M 2
/* Minimum degree of B tree M> = 2
* Each non-root node must have at least one keyword for the M-1. Each non-root node has at least M children.
* Each node can contain up to 2-1 keywords. Therefore, an internal node can have at most 2 MB children.
*/
Typedef int bool;
Struct btnode {/* B tree node */
Int keyNum;/* Number of middle keys in the node */
Int k [2 * M-1];/* Key */
Struct btnode * p [2 * M];/* pointer to the subtree */
Bool isleaf;
};
Struct searchResult {
Struct btnode * ptr;/* pointer of the data node */
Int pos;/* Data location in the node */
}; Below is the code for creating an empty tree, named btree. c: Copy codeThe Code is as follows: # include <stdio. h>
# Include <stdlib. h>
# Include "btrees. h"
/* Allocate space to a node */
Struct btnode * allocateNode (struct btnode * ptr ){
Int I, max;
Ptr = (struct btnode *) malloc (sizeof (struct btnode ));
If (! Ptr ){
Printf ("allocated error! /N ");
Exit (1 );
}
Max = 2 * M;
For (I = 0; I <max; I ++)
Ptr-> p [I] = NULL;/* initialize pointer */
Memset (ptr-> k, 0, (max-1) * sizeof (int);/* value of the initialization key */
Return ptr;
}
/* Create an empty B tree with a root node */
Struct btnode * btreeCreate (struct btnode * root ){
Root = allocateNode (root );
Root-> keyNum = 0;
Root-> isleaf = 1;
Return root;
}
Tree B is inserted at the leaf node, because the number of keycodes in the node of Tree B is limited, the number of nodes in the B-tree whose minimum degree is M is from the M-1 TO THE 2s-1. For example, the number of nodes in the B-tree (also known as the 2-3 tree) with a minimum degree of 2 is 1-3, as shown in.

First, locate the location to be inserted. If the number of key codes on the leaf node has not reached the upper limit, such as inserting 32, it is relatively simple. Just insert it directly; if the number of key codes at the leaf node reaches the upper limit, it is necessary to split it into two subnodes and put the key codes in the middle to the parent node. However, in extreme cases, if the parent node is full, the root node needs to be split again. However, this algorithm is not easy to implement.
In the introduction to algorithms, another idea is to split the system first. When finding the inserted location, if a node is found to be full, it is split first, this ensures that when data is inserted at the last leaf node, the parent node of this leaf node is always dissatisfied. Here is an example:

We create a B-tree by inserting nodes one by one. The node sequence is {18, 31, 12, 10, 15, 48, 45, 47, 50, 52, 23, respectively, 30, 20}, let's look at the specific process:
1. Create an empty B-tree;
2. Insert 18, which is not full at this time, as shown below:

3. Similarly, inserting 31 and 12 is relatively simple, as shown below:

4. insert 10. At this time, the root node is full and will be split. Because the root node is special and there is no parent node, it should be processed separately. Mr will create an empty node as the new root node, then split, as shown below:

5. Insert 15, 48, 45 again. Because it is not full, insert directly, as shown below:

6. Insert 47. When the leaf node is full, split the node before inserting it, as shown below:

The others are the same, so I won't go into details. The following is the source code, which is added to the btree. in c, I finally wrote a main function and a method for displaying the breadth-first tree. You can compare the results by yourself. The implementation of the code can be found in introduction to algorithms and blogs.

Http://hi.baidu.com/kurt023/blog/item/4c368d8b51c59ed3fc1f10cc.html

He has implemented this in his blog, but when defining the B-tree, the number of pointers is the same as the number of key codes, so I wrote it myself.

Copy codeThe Code is as follows: // function objective: to split a node with the maximum storage size
Void btreeSplitChild (struct btnode * parent, int pos, struct btnode * child ){
Struct btnode * child2;
Int I;
// Allocate space for the newly split Node
Child2 = allocateNode (child2 );
// Same as the split point
Child2-> isleaf = child-> isleaf;
// Set the number of nodes
Child2-> keyNum = The M-1;
// Copy data
For (I = 0; I <M-1; I ++)
Child2-> k [I] = child-> k [I + M];
// If it is not a leaf node, copy the pointer
If (! Child-> isleaf)
For (I = 0; I <M; I ++)
Child2-> p [I] = child-> p [I + M];
Child-> keyNum = M-1;
// Insert the intermediate number into the parent node as the index
// The keyword and pointer behind the insertion point move one position backward.
For (I = parent-> keyNum; I> pos; I --){
Parent-> k [I] = parent-> k [I-1];
Parent-> p [I + 1] = parent-> p [I];
}
Parent-> k [pos] = child-> k [M-1];
Parent-> keyNum ++;
Parent-> p [pos + 1] = child2;
}
/* Function purpose: to insert a data entry into a non-full Node
* Note: Before insertion, ensure that the key does not exist in the original B tree.
*/
Void btreeInsertNoneFull (struct btnode * ptr, int data ){
Int I;
Struct btnode * child; // The child node of the node to be inserted
I = ptr-> keyNum;
// Insert data directly if it is a leaf node
If (ptr-> isleaf ){
While (I> 0) & (data <ptr-> k [I-1]) {
Ptr-> k [I] = ptr-> k [I-1];
I --;
}
// Insert data
Ptr-> k [I] = data;
Ptr-> keyNum ++;
}
Else {// Not a leaf node. Locate the child node to be inserted and insert the data
While (I> 0) & (data <ptr-> k [I-1])
I --;
Child = ptr-> p [I];
If (child-> keyNum = 2 * M-1 ){
BtreeSplitChild (ptr, I, child );
If (data> ptr-> k [I])
I ++;
}
Child = ptr-> p [I];
BtreeInsertNoneFull (child, data); // recursion in the subtree
}
}
/* Insert a node */
Struct btnode * btreeInsert (struct btnode * root, int data ){
Struct btnode * new;
/* Check whether the root node is full. If it is full, split and generate a new root node */
If (root-> keyNum = 2 * M-1 ){
New = allocateNode (new );
New-> isleaf = 0;
New-> keyNum = 0;
New-> p [0] = root;
BtreeSplitChild (new, 0, root );
BtreeInsertNoneFull (new, data );
Return new;
}
Else {// No data limit yet, insert directly
BtreeInsertNoneFull (root, data );
Return root;
}
}
// Function objective: to display the extended priority tree
Void btreeDisplay (struct btnode * root ){
Int I, queueNum = 0;
Int j;
Struct btnode * queue [20];
Struct btnode * current;
// Add to queue
Queue [queueNum] = root;
QueueNum ++;
While (queueNum> 0 ){
// Team out
Current = queue [0];
QueueNum --;
// Move the element following the first element forward to a position
For (I = 0; I <queueNum; I ++)
Queue [I] = queue [I + 1];
// Display nodes
J = current-> keyNum;
Printf ("[");
For (I = 0; I <j; I ++ ){
Printf ("% d", current-> k [I]);
}
Printf ("]");
// The child node enters the queue
If (current! = NULL & current-> isleaf! = 1 ){
For (I = 0; I <= (current-> keyNum); I ++ ){
Queue [queueNum] = current-> p [I];
QueueNum ++;
}
}
}
Printf ("/n ");
}
Int main ()
{
Struct btnode * root;
Int a [13] = {18, 31, 12, 10, 15, 48, 45, 47, 50, 52, 23, 30, 20 };
Int I;
Root = btreeCreate (root );
For (I = 0; I <13; I ++ ){
Root = btreeInsert (root, a [I]);
BtreeDisplay (root );
}
Return 0;
}

Running result:

The same batch of key codes may generate different B-trees using different algorithms. For example, when the four key code nodes [1, 2, 4] split, you can put both 2 and 3. The insertion sequence of the same algorithm may be different.
The attachment contains the source code, which is compiled in Linux.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Deep understanding of C language B-tree

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support