C language B-Tree deep understanding of _c language

Source: Internet
Author: User

A B-tree is a balanced lookup tree designed for disk or other direct storage devices. As shown in the following figure. Each node arrow points to what we call the entry degree, which indicates what goes out is called the degree. The node of the tree structure is 1, otherwise it becomes a graph, so we generally say that the degree of the tree refers to the degree of the tree node, that is, the number of nodes of a node. With the concept of degrees we can simply define the B-tree (assuming a minimum degree of M for a tree):
1. Each node has at least M-1 key code, at most have 2m-1 key code;
2. root node and leaf node, each node has at least m sub nodes, at most 2M sub nodes;
3. The root node has at least 2 child nodes, the only exception being the case of root nodes, there are no child nodes at this point;
4. All leaf nodes are on the same floor.

Let's look at the structure of its node, as shown in the following illustration:


Each node holds the key and the pointer to the child node, and it is easy to see that the pointer is one more than the key code.

By the definition of B-tree we can see some of its characteristics:
1. Tree high balance, all leaf nodes in the same layer;
2. Keywords are not duplicated, sorted in ascending order, the parent node key code is the boundary of the child node;
The 3.B tree puts the relative records of the value close to the same disk page, thus taking advantage of the principle of accessing locality;
The 4.B tree ensures that a certain proportion of nodes are full and can improve the utilization of space.

How do you determine the size of a B-tree node? To minimize disk operations, the node size is usually set to the size of one disk page. The general tree height does not exceed 3 levels, that is to say, it takes only 3 disk operations to find a key.
In the implementation, I refer to the "Introduction to the algorithm," the contents of the first assumption:
The root node of 1.B tree is always in main memory, it does not need read disk operation, but the root node changes to write disk operation;
2. When any node is passed as a parameter, read the disk once.

In the implementation of the time actually did a simplification, each node in addition to include key code and pointers, but also should have the key to the corresponding record of the file information, such as file offset, or how to find this record. In the implementation of this additional data is not placed in the node, the following is the structure of the definition tree, the file name is Btrees.h, the contents are as follows:

Copy Code code as follows:

* BTREES.H * *
# define M 2
/* b-Tree minimum degree m>=2
* Each non-root node must have at least M-1 keywords. There are at least m children per non root node.
* Each node can contain up to 2m-1 keywords. So an inner node can have a maximum of 2M children.
*/
typedef int BOOL;
struct btnode{/* B tree Node * *
int keynum; /* Number of keys in the node * *
int k[2*m-1]; /* Key * *
struct Btnode * p[2*m]; /* Pointer to subtree/*
BOOL IsLeaf;
};
struct searchresult{
struct Btnode *ptr; /* Data node pointer/*
int POS; /* Data in the location of the node * *
};

Copy code code as follows:

# include <stdio.h>
# include <s Tdlib.h>
# include ' btrees.h '
/* Allocate space for a node/
struct Btnode * allocatenode (struct btnode) {
int i,max;
ptr = (struct Btnode *) malloc (sizeof (struct btnode));
if (!ptr) {
printf ("Allocated error!/n");
Exit (1);
}
max = 2*m;
for (i=0 i<max; i++)
Ptr->p[i = NULL;/* initialization pointer/
memset (ptr->k, 0, (max-1) * sizeof (int )); /* Initialize the value of the key/*
return PTR;
}
/* Create an empty B-tree, just one root node */
struct Btnode * btreecreate (struct Btnode *root) {
root = AllocateNode (root);
Root->keynum = 0;
Root->isleaf = 1;
return root;
}

B-Tree insertion is in the leaf node, because the number of key points in B-Tree is limited, the minimum degree of M-B-tree node number is from M-1 to 2m-1. For example, the figure below is a B-tree with a minimum degree of 2 (also known as a 2-3 tree), as shown in the following figure, where the number of nodes is 1-3.

Navigate to the position you want to insert, if the number of key points in the leaf has not reached the upper limit, for example, insert 32, it is relatively simple, directly inserted on the line, if the key number of leaf nodes reached the upper limit, it is necessary to split into 2 sub nodes, the middle of the key code into the parent node. But the extreme situation is that the parent node is full, it needs to split again, and may end up splitting the root knot. But this algorithm is not very well implemented.
In the introduction to algorithms, the implementation of a different idea, is to split first, in the search for the insertion position in the process, if you find a full node, the first split it, which ensures that at the end of the node to insert data, the leaf node is always the parent node dissatisfaction. Let's take a look at the following example:

We create a B-tree with a node-by-point insertion method, which is {18, 31, 12, 10, 15, 48, 45, 47, 50, 52, 23, 30, 20}, and we look at the specific process:
1. Create an empty B-tree;
2. Insert 18, this time is not full, as follows:


3. Inserting 31 and 12 in the same vein is relatively simple, as follows:


4. Insert 10, this time the root node is full, it will split, because the root node is more special, there is no parent node, it is necessary to deal with the individual, Mr. into an empty node as a new root node, and then split, as follows:


5. Insert 15,48,45 again, because it is not full, insert directly, as follows:


6. Insert 47, this time the leaf knot is full, we must first split, then insert, as follows:

The other is the same reason, do not repeat, the following is the source code, added to the btree.c, and finally wrote a main function and a breadth-first display tree method, we can compare the results, the implementation of the Code reference to the "Introduction to the algorithm" and blog

Http://hi.baidu.com/kurt023/blog/item/4c368d8b51c59ed3fc1f10cc.html

His blog has been implemented, but in the definition of B-tree when the number of pointers and key number of the same, I then rewrite it myself.

Copy Code code as follows:

function Purpose: Split the maximum number of storage nodes
void Btreesplitchild (struct btnode *parent, int pos, struct btnode) {
struct Btnode *child2;
int i;
Allocate space for newly split nodes
Child2 = AllocateNode (child2);
With the split point of the same sibling
Child2->isleaf = child->isleaf;
Set node points
Child2->keynum = M-1;
Copy data
for (i=0; i<m-1; i++)
Child2->k[i] = child->k[i+m];
If it is not a leaf node, duplicate the pointer
if (!child->isleaf)
for (i=0; i<m; i++)
Child2->p[i] = child->p[i+m];
Child->keynum = M-1;
Inserts the middle number into the parent node as an index
The keywords and pointers behind the insertion point move one position backward
For (i=parent->keynum; i>pos; i--) {
Parent->k[i] = parent->k[i-1];
PARENT-&GT;P[I+1] = parent->p[i];
}
Parent->k[pos] = child->k[m-1];
parent->keynum++;
PARENT-&GT;P[POS+1] = child2;
}
/* Function Purpose: Inserts a data into a node that is not full
* Note: Before inserting the guarantee key does not exist in the original B-Tree
*/
void Btreeinsertnonefull (struct btnode *ptr, int data) {
int i;
struct Btnode *child; The child node to insert the node
i = ptr->keynum;
If it is a leaf node, insert the data directly
if (ptr->isleaf) {
while ((i>0) && (Data<ptr->k[i-1]) {
Ptr->k[i] = ptr->k[i-1];
i--;
}
Inserting data
Ptr->k[i] = data;
ptr->keynum++;
}
else {//is not a leaf node, locate the child node to which the data should be inserted and insert
while ((i>0) && (Data<ptr->k[i-1])
i--;
Child = ptr->p[i];
if (Child->keynum = = 2*m-1) {
Btreesplitchild (PTR, I, child);
if (Data > Ptr->k[i])
i++;
}
Child = ptr->p[i];
Btreeinsertnonefull (child, data); Recursion in a subtree
}
}
/* Insert a node * *
struct Btnode * Btreeinsert (struct btnode *root, int data) {
struct Btnode * NEW;
/* Check if root node is full, if full, split and generate new root node * *
if (Root->keynum = = 2*m-1) {
New = AllocateNode (new);
New->isleaf = 0;
New->keynum = 0;
New->p[0] = root;
Btreesplitchild (new, 0, Root);
Btreeinsertnonefull (new, data);
return new;
}
else {//not yet to the maximum number of data, insert directly
Btreeinsertnonefull (root, data);
return root;
}
}
function Purpose: Breadth first display tree
void Btreedisplay (struct Btnode *root) {
int I, queuenum=0;
Int J;
struct Btnode *queue[20];
struct Btnode *current;
Join queue
Queue[queuenum] = root;
queuenum++;
while (queuenum>0) {
Out Team
current = Queue[0];
queuenum--;
Move the element behind the first element and move one position forward
for (i=0; i<queuenum; i++)
Queue[i] = queue[i+1];
Show nodes
j = current->keynum;
printf ("[");
For (i=0 i<j; i++) {
printf ("%d", current->k[i]);
}
printf ("]");
Child node
if (current!=null && current->isleaf!=1) {
for (i=0; i<= (current->keynum); i++) {
Queue[queuenum] = current->p[i];
queuenum++;
}
}
}
printf ("n");
}
int main ()
{
struct Btnode *root;
int a[13] = {18, 31, 12, 10, 15, 48, 45, 47, 50, 52, 23, 30, 20};
int i;
Root = Btreecreate (root);
For (i=0 i<13; i++) {
Root = Btreeinsert (root, a[i]);
Btreedisplay (root);
}
return 0;
}

run Result:

Same Batch off The number of B-trees that are generated by different algorithms may be different, such as when the 4-key node [1,2,3,4] splits, the 2 or 3 can be put up, and the same algorithm may have different insertion order. The
attachment is the source code, compiled by Linux.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.