Introduction to B-Tree
B-Tree Nature :
A B-Tree T is a root tree with a bit of nature:
1. Each node x has the following properties:
A. X.N stores the number of keywords in the current node x;
B. X.N keyword itself x.key1, X.key2, ..., X.KEYX.N in a non-descending order, making x.key1≤x.key2≤ ... ≤X.KEYX.N;
C. x.isleaf is a bool value that indicates whether x is a leaf node, or an internal node.
2. Each inner node x also contains a x.n+1 pointer to its child x.c1, X.C2, ..., x.cx.n+1. Leaf nodes have no children.
3. Keyword X.keyi splits the range of keywords stored in each subtree.
4. Leaf nodes have the same depth, that is, the height of the tree H.
5. Each node contains a number of keywords with the upper and lower bounds, with a fixed integer t≥2, which is the minimum degree of B-Tree representation:
A. Each node except the root node must have at least t-1 keywords, so that there is at least a T child node in addition to the root node. If the tree is not empty, the root node has at least one keyword;
B. Each node can contain up to 2t-1 keywords, so an internal node has a maximum of 2t child nodes. When a node happens to have a 2t-1 keyword, it is called a full junction.
When t= 2 o'clock, this is a 2-3-4 tree. Because each node has a t-1=1 ~ 2t-1=3 keyword, there are t=2 ~ 2t=4 child nodes.
height of B-Tree :
Theorem: If n≥1, then for any one containing n keywords, height is h, the minimum degree t≥2 B-Tree T, there is h≤logt[(n+1)/2]. The height is H = logtn.
As with normal BST, most of the operations on the B-tree (the number of disk reads and writes required) process are proportional to the height. The advantage of the B-tree relative to the red-black tree is that the depth is very shallow, because the inspector has to access the disk once a node, so the B-tree greatly reduces the number of disk accesses. , the root B contains at least one keyword, and the other inner nodes contain at least t-1 keywords. So a B-tree with a height of h at a depth of 1 contains at least 2 nodes, a depth of 2 contains at least 2t nodes, and a depth h contains at least 2th-1 nodes.
Basic operations on B-Tree:
Always keep the following conventions:
b root nodes are always stored in main memory, so there is no need to read the disk. Need to write to disk when modifying root node;
Any node that is treated as a parameter needs to read the disk before it is passed.
All operations are one-way algorithms, down from the root of the tree, but not back up.
Search for B-trees
Similar to the BST search process, but for each internal node x, make a branch selection (X.N + 1) path.
This procedure accesses the number of disk pages as O (h) = O (logtn). The CPU time used is o (th) = O (tlogtn).
In the following code, X is a pointer to the root node, key is the keyword to be searched, if key is in the tree, the node y is returned and the order pair (Y, i) that makes the subscript I of y.keyi = = key, otherwise null is returned.
B-tree-search (Node *x,intkey) {i=1; while(I <= X.N && k >X.key[i])//Find the minimum subscript I, make key≤x.key[i], if not found, then I = X.N + 1 I++;if(I <= x.n and k = =X.key[i])//If found, returns the keyword node.return(x, i);Else if(X.isleaf = =True)//If not found and is a leaf node, the failed result is returned nullreturnNULL;Elsedisk-READ (x, C[i])//If it is not found and is not a leaf node, continue looking down. We need to read the disk here once .returnb-tree-Search (X.c[i], k);}
Create an empty B-tree
To construct an empty B-tree, first create an empty root node with b-tree-create, and then call the B-tree-insert insert operation. This evil process calls for the helper process Allocate-node, which allocates disk pages for the new node within O (1) time.
This process requires an O (1) disk operation and O (1) CPU time.
B-tree-create (Node *= allocate-=0;D ISK-Write (newtree);}
Insert a keyword into the B-tree
The B-Tree insertion process is the same as the 2-3-4 tree insertion process (because the 2-3-4 tree is one of the B-trees). As we insert from the root of the tree, all the full nodes encountered along the way are split, so that whenever a node is split, it ensures that its parent node is not full. In addition, inserting a keyword into the B-tree must be inserted on the leaf node that already exists.
Splitting nodes in a B-tree
The b-tree-split-child parameters for the splitting process are X and I, where x is the parent node of the split node, and I is the subscript in the parent node x of the split node, that is, the split nodes are x.ci. This process splits the X.CI into two nodes and x.ci the middle key in the X. If the split node is root, the height of the tree increases by 1, and the split root is the only operation that increases the height of the B-tree.
B-tree-split-child (Node *x,inti) {y=X.c[i]//Set Y to be split node X.c[i] Z= allocate-node ()//new node Z is the newly-created node Z.isleaf=y.isleaf//Set IsLeaf property of Node Z Z.N= T1 //Set number of keywords for node z forj = (1to T1)//Set the node z keyword z.key[j]= y.key[j+T]//node Z keyword is the second half of the keyword for node yif(Y.isleaf = =False)//If the split node is not a leaf node, further set node Z sub-nodes forj = (1to T) Z.c[j]= y.c[j+T] Y.N= T1 //Reset the number of keywords for node y forj = (x.n+1Downto i+1)//The parent node of the split node y is moved back one bit to insert the new node X.c[j+1] =X.c[j] X.c[i+1] =z//Insert node Z into parent node forj =(X.N Downto i)//The parent node of the split node Y is shifted one bit after the keyword to insert the new keyword X.key[j+1] =X.key[j] X.key[i]=Y.key[t]//Add the middle keyword of the split node y to the parent node X.N+ +//Reset the number of keywords for node xDisk-Write (y)//The Adjusted keyword is written to disk-Write (z) Disk-Write (x)}
is a B-tree splitting, splitting a t=4 node. The junction Y=x.ci is divided into two nodes Y and z,y the middle keyword S is promoted to its parent node X.
Inserts a keyword into the B-tree in one-way downward direction along the tree
The process B-tree-insert by calling B-tree-split-child to ensure that recursion never drops to a full node.
B-tree-insert (Node *root, int key) {R=Root; if(R.N = = 2t-1)//If the root node is a full node, you need to split the root by setting the empty node s as the parent node of the root node, splitting s= allocate-Node () root=s S.isleaf=False S.N=0s.c[1] =R B-tree-split-child (s),1) B-tree-insert-Nonfull (S, key)//After splitting is complete, call B-tree-insert-nonfull (s,k) to insertElseB-tree-insert-Nonfull (R, key)//If the root node is not a full node, insert directly}
The auxiliary procedure B-tree-insert-nonfull inserts the keyword K into the node x, which should be guaranteed by the B-tree-insert and b-tree-insert-nonfull recursive invocations, when the procedure is called to ensure that nodes X is not full.
B-tree-insert-nonfull (Node *x,intkey) {i=X.Nif(X.isleaf = =True)//If x is a leaf node then insert directly while(I >=1&& Key <X.key[i]//Find the right place to insert key X.key[i+1] =X.key[i] I--X.key[i+1] =key//Insert key X.N+ +//Reset node number of nodes XDisk-Write (x)//write to diskElse //If node x is not a leaf node, you need to look down for a child node that fits into the key while(I >=1&& K <Key[i]//Find the right position for key I--I++Disk-Read (x, C[i])//reads the child nodes to be writtenif(X.c[i] = = 2t-1)//If the child node is written to a full node, you need to split B-tree-split-Child (x, i)if(Key >X.key[i])//Determine insertion position I++B-tree-insert-Nonfull (X.c[i], key)//insert key into new node X.c[i]}
Delete a keyword from the B-tree
[CLRS] [CH 18] B-Tree