Originally from: http://blog.csdn.net/cjfeii/article/details/10858721
1. B + Tree Index overview
In the previous article, we discussed several important topics about index:
A) index is a data structure stored on the disk that is used to increase the speed of the query or scan record.
B) The sort index tree accelerates the lookup of the record by saving the page's pointer. (ISAM)
C) The cost of maintaining a sorted index tree is high, so ISAM solves this problem by creating overflow page, but too many overflow page reduces query performance from log (exponential log) level to linear traversal.
Below we will introduce a highly robust, more popular data structure--b+ tree, as an extension of ISAM.
In general, the B + tree is an efficient disk-based data structure that mainly holds (key, value) pair. It supports efficient discovery of keys, and efficient range iterations.
The B + Tree provides these features:
A) Fast Record lookup
B) Fast record traversal
C) do not maintain the sort tree structure in the form of overflow page
The key idea behind the B + tree is to use an ordered balanced tree instead of a sort tree in the ISAM.
2. The definition of B + Tree
B + trees are trees that use page on disk as node nodes. The nodes in the B + tree can be distinguished as leaf node (leaf nodes) and interior node (internal nodes).
Since each node is exactly a page in the disk, the terms node and page used in the B + tree are interchangeable.
2.1 Leaf Node
Leaf node holds data entry (entries, equivalent to record), entry in the form of (key, value). All leaf node is also organized into the form of a page list. The leaf node of the B + tree looks like this:
Here is the abstract data structure definition for leaf node:
struct Leafnode { vector<key> keys; Vector<value> values; Pagepointer Next_page;};
For any leaf node, the following formula is set:
P.keys.size () = = P.values.size ()
2.2 Interior Node
Interior node is organized into a tree form, starting with root node (root, root node, and Interior node) to speed up querying leaf node by saving a series of keys.
Interior node holds a series of key and page pointers, the structure of which:
Here is the interior node abstract data structure definition:
struct Interiornode { vector<key> keys; vector<pagepointer> pointers;};
The following formula is set for any interior node:
P.keys.size () +1 = = P.pointers.size ()
There is a definition: neighbouring pointer (near pointer)
For a key Ki, we define before (KI) as the page pointer near the front of Ki, and after (KI) is the pointer that is near the KI. Other words:
P.before (ki) = p.pointers[i]
P.after (ki) = p.pointers[i+1]
2.3 Properties and constraints of B + Tree
The keys in 2.3.1 node are all in order.
Assuming that P is node in the B + tree, then we must maintain p.keys about value as ordered.
2.3.2 Each node is also sorted by key.
B + trees are ordered trees, which are shown in several ways:
A) leaf node is ordered:
∀p∈leafnode,∀k∈p.keys,∀k′∈p.next_page.keys,k<k′
Multiple leaf node forms an ordered list that allows efficient pair (key, value) traversal between each leaf node.
B) Interior node is ordered:
The B + tree is for all key K, and its neighboring pointer after (k), after (k), satisfies the following conditions:
K>max (Keys (before (k)))
K≤min (Keys (after (k)))
In other words, K is a key between before (K) and after (k) key.
2.3.3 B + Tree is a balance tree
The B + tree is a balanced tree, and all path lengths from root node to any leaf node are equal.
2.3.4 B + Tree node is fully populated
The B + tree allows its node to be partially populated. The main point is to design a fill factor parameter to limit the minimum padding for each non-root node (non-root).
If a non-root node is not filled enough, we say that node underflow, only root node in the B + tree can be underflow.
Here is an example of a non-qualifying B + number. Let's say we've defined the following parameters:
Capacity of each Node:4 keys
Fill factor:50%
When the tree is balanced and sorted, its structure is as follows:
It has the problem of not satisfying the fill factor (fill factor) 50% that we have defined above:
3. B + Tree Query search and insert inserts
The main operations of the B + tree are:
/** * finds the leaf node that _should_ contain the entry with the specified key */leafnode search (node root, key key)/** * Inserts a Key/val pair into the tree. * Returns The root of the new tree which _may_ is different * from the old root node. */interiornode Insert_into_tree (interiornode root, Key Newkey, Value val)
The insert algorithm of the B + tree must ensure that the tree still satisfies all the properties and constraints of B + after performing the appropriate operation.
3.1 Searching
B + Tree Query algorithm is a simple direct tree lookup algorithm:
Leafnode Search (Node p, key key) { if (P is Leafnode) return root; else { if (Key < p.keys[0]) return search (before (p.keys[0]), key); else if (key > P.keys[-1]) return search (After (p.keys[-1]), key); else {Let I am p.keys[i] <= key < p.keys[i+1] return Search (After (p.keys[i]), key) } }}
3.2 Inserting
The insert operation of the B + tree is tricky. It is not as simple as the insert operation of AVL, and the B + tree also needs to consider node's overflow and underflow.
The insert algorithm starts here:
1) Find the right target for insert leaf node
2) Try the insert operation in the target leaf node
Interiornode Insert_into_tree (interiornode root, Key Newkey, Value val) { Leafnode leaf = search (root, newkey);
return Insert_into_node (Leaf, Newkey, Val);}
Among them, Insert_into_node, to do the following things:
/** * Tries to inserts the (newkey/val) pair into * the node. * * If ' target ' is an interior node and then ' Val ' must are a page pointer. */interiornode Insert_into_node (node target, Newkey, Val) { if (... Case 1 ...) { /* handle Case 1 * /} else if (...) Case 2 ...) { /* handle Case 2 * /} else if (...) Case 3 ...) {/ * handle Case 2 */ }}
Of these, three different case types include:
A) The target leaf node has enough space to hold the key
B) The target leaf node is full, but its parent node has enough space to hold the key
C) The target leaf node and its parent node are full.
Case 1:
This is the simplest case where entry (Newkey, value) is inserted into the target leaf node.
A) Root node does not need to change
B) disk I/O is not discussed. All operations are in one page. Buffer Manager (cache management) can be used as if all of the node is stored in memory.
Shown
Case 2:
In this case, target node is full, but its parent node has enough space to hold a key.
A) Create a target node's sibling node as New_target node and insert the New_target node after the target node.
B) Save all entry in target node and the entry allocations we need to add to target node and New_target node. Since target node is full before allocation, it is possible to conclude that the two node will not have underflow after allocation.
C) Insert the New_target pointer (k,p) = (Leaf2.keys[0], address[leaf2]) into the parent node of target node. Because parent node has enough space to hold a key, parent node does not appear overflow.
Shown
Case 3:
This is the case where target and parent[target] are full. We need a recursive attempt to insert the new key into the ancestor node of target. There is even a situation where root node does not have enough space to save the new key, in which case we have to split the root and create a new node as the root node of the B + tree.
Specific details are as follows:
A) Create a new_target Node,insert to target.
B) Save the entry assignment in target to target and New_target.
Now we need to insert New_target node's pointer (k,p) = (Leaf2.keys[0], address[leaf2]) into its parent[target], but Parent[target] is full.
A) Make target_parent = Parent[target]
B) Make All_keys = sorted (Target_parent.keys∪{k})
C) apply for a new Node:new_interior
D) Order i = Floor (All_keys.size ()/2)
Middle_key = All_keys[i]
E) Save All_keys[0. I-1] to Target_parent and save All_keys[i+1. N] new_interior to.
F) If target_parent is root, then we will create a new node as grandparent, grandparent = parent[target_parent].
G) to invoke recursively:
Insert_into_node (grandparent, Middle_key, Address[new_interior])
:
4. B + Tree of other things
A) The B + Tree also supports efficient deletion of delete. The deletion algorithm is the inverse process of the insert algorithm. In the delete algorithm, the merge (merge) node is used to avoid underflow. If merge node occurs, the Delete (Key, pagepointer) is recursive on the parent node.
b) If all data entry are stored in the sequential file, and the key is sorted, the sequential file can be loaded into the B + tree very effectively.
c) B + tree can be used as a sorting algorithm based on disk ordered storage.
"Go" database system--b+ Tree Index