Principles of MySql index algorithms (easy to understand, only B-tree)
When I first started learning, Baidu went to check it. However, I found that many of them were too complicated to understand. I would like to summarize them in the light of each article (I suggest reading the text, but it doesn't matter if I don't understand it, then look at the execution steps of the graph and then combine the text to make everything clearer)
B-tree, B is balance, which is generally usedDatabase index. Using the B-tree structure can significantly reduce the intermediate process experienced when locating records, thus accelerating access. B + tree is a variant of B-tree. The famous MySQL uses B + tree to implement its index structure.
Why does the database use this structure?
In general, the index itself is also very large and cannot be all stored in the memory. Therefore, the index is often stored on the disk as an index file. In this way, disk I/O consumption is required during index search. Compared with memory access, I/O access consumes several orders of magnitude, therefore, evaluating a data structure as the index's most important indicator is the progressive complexity of the number of disk I/O operations during the search process. In other words, the structure of the index should minimize the number of disk I/O accesses during the search process.
To achieve this purpose, the disk needs to be read on demand, requiring that the length of each pre-read is generally an integer multiple of the page. In addition, the database system sets the size of a node to equal to one page, so that each node can be fully loaded only once I/O. Each time you create a node, you can directly apply for a page space to ensure that a node is physically stored on a page. In addition, the computer storage allocation is page-aligned, A node only needs one I/O operation. Setting the m value in B-tree to a very large value will reduce the height of the tree and facilitate a full loading.
M-way search tree (focus on the step chart)
First, we will introduce the m-way query tree. as its name implies, each node of a tree has a degree smaller than or equal to m.
Therefore, its nature is as follows:
The number of key values for each node is less than m. The degree of each node is less than or equal to m. The key values of the subtree are arranged in order to be completely less than or greater than or between parent nodes.
B-tree
B-tree is also called a balanced multi-path search tree. The features of a m-Level B-tree are as follows:
(Ceil (x) is a function that obtains the upper limit)
1) each node in the tree has up to m children;
2) Besides the root node and the leaf node, each other node has at least ceil (m/2) Children;
3) if the root node is not a leaf node, there are at least two children (in special cases: the root node without children, that is, the root node is a leaf node, the entire tree has only one root node );
4) All the leaf nodes appear on the same layer. The leaf node does not contain any keyword information (it can be seen as an external node or a node that fails to query. In fact, these nodes do not exist, all pointers pointing to these nodes are null );
5) each non-terminal node contains n keywords (n, P0, K1, P1, K2, P2, ......, Kn, Pn ). Where:
A) Ki (I = 1... n) is the keyword, And the keyword is sorted in order K (I-1) <Ki.
B) Pi is the point pointing to the sub-tree root, and the pointer P (I-1) points to all the nodes of the sub-tree are less than Ki, but greater than K (I-1 ).
C) The number of keywords n must meet the following requirements: ceil (m/2)-1 <= n <= S-1.
Each node in B-tree can contain a large amount of keyword information and branches based on the actual situation (of course, it cannot exceed the size of the disk block, depending on the disk drive, generally, the block size is 1 kb ~ 4 K or so); this reduces the depth of the tree, which means to find an element, as long as few nodes read from the external disk into the memory, quickly access the data to be searched.
The following describes an instance of Level 5 B-tree (as shown in): (focus on)
It meets the preceding conditions: Except for root nodes and leaf nodes, each other node must have at least ceil (5/2) = 3 children (at least 2 keywords ); of course, up to five children (up to four keywords ). The keywords are uppercase letters in ascending order.
Insert operation: When an element is inserted, check whether it exists in B-tree. If it does not exist, end at the leaf node and insert the new element into the leaf node. Note: if the leaf node has enough space, you need to move the leaf node to the right to find the New Keyword element. If the space is full, there is not enough space to add new elements, split the node, split half of the keyword elements into the new adjacent right node, and move the intermediate keyword elements to the parent node (of course, if the space of the parent node is full, the operation also needs to be split.) When the key elements in the node are moved to the right, the related pointer needs to be shifted to the right. If a new element is inserted at the root node and the space is full, the split operation is performed. In this way, the intermediate keyword element in the original root node is moved up to the new root node, which leads to an additional layer of tree height.
Let's use an example to explain it step by step. Insert the following letters to an empty 5-level B-tree: C N G A H E K Q M F W L T Z D P R X Y S, 5-order means that a node has a maximum of five children and four keywords. Other nodes except the root node have at least two keywords. First, the node space is sufficient, four letters are inserted into the same node, for example:
When we try to insert H, the node finds that there is not enough space to split it into two nodes and move the intermediate element G to the New Root-knot point. In the implementation process, let's leave A and C in the current node, while H and N in the New Right neighbor node. For example:
When we insert E, K, and Q, no split operation is required.
Inserting M requires a split. Note that M is just an intermediate keyword element, so that it is moved up to the parent node.
Insert F, W, L, and T without any split operation
When Z is inserted, the rightmost leaf node space is full, and split operations are required. The middle element T is moved to the parent node. Note that the tree remains balanced by moving the intermediate element up, the split result node has two keyword elements.
When D is inserted, the leftmost leaf node is split. D is also an intermediate element. Move it to the parent node, and then the letter P, R, X, Y inserts one after another without any split operations.
Finally, when inserting S, the node Containing N, P, Q, and R needs to be split, and the intermediate element Q needs to be moved to the parent node. However, this is the case, the space in the parent node is full, so we need to split it and move the middle element M in the parent node to the New Root-knot point, note that the third pointer in the parent node contains the D and G nodes after modification. In this way, the specific insert operation is completed. The following describes the delete operation. The delete operation has more considerations than the insert operation.
Delete)Operation:First, find the elements to be deleted in B-tree. If the element exists in B-tree, delete the element in its node. If the element is deleted, first, determine whether the element has left and right child nodes. If yes, move a similar element from the child node to the parent node, and then move it. If no, after direct deletion, it will be moved ..
Delete the element. After moving the corresponding element, if the number of elements in a node is less than ceil (m/2)-1, you need to check whether a neighboring sibling node is full (the number of elements in the node is greater than ceil (m/2)-1). If it is full, you need to borrow an element from the parent node to meet the conditions; if the neighboring brothers are just getting rid of poverty, the number of nodes is smaller than ceil (m/2)-1, then the node is "merged" with a neighboring sibling node to form a node to meet the conditions. Let's take a look at the following examples.
The preceding example shows a 5-level B-tree constructed by the insert operation. The values H, T, R, and E are deleted in sequence.
Delete element H first, of course, first find H, H in a leaf node, and the number of elements of the leaf node 3 is greater than the minimum number of elements ceil (m/2)-1 = 2, the operation is very simple. We only need to move K to the original H location and move the L to K Location (that is, the element behind the deletion element in the node moves forward)
Next, delete T because T is not found in the leaf node, but in the middle node. We find its successor W (The next element in ascending letter order ), move W to T and delete W from the child node that contains W. After W is deleted, the number of elements in the child node is greater than 2, merge operations are not required.
Next, delete R and R in the leaf node, but the number of elements in the node is 2. As a result, only one element is deleted, which is smaller than the minimum number of elements ceil (5/2)-1 = 2, if a neighboring sibling node is full (the number of elements is greater than ceil (5/2)-1 = 2), you can borrow an element from the parent node, then, move the largest adjacent sibling node to the parent node and add the last or the first element to the instance, the right adjacent sibling node is relatively plump (three elements are greater than 2). Therefore, first borrow an element W from the parent node to move it to the leaf node, replacing the original position S and moving forward S; then, X moves to the parent node in the adjacent right brother node, and finally deletes X in the adjacent right brother node, and the subsequent elements move forward.
The last step is to delete E, which will cause many problems after deletion, because the number of nodes where E is located is just as high as the standard, and the minimum number of elements is enough (ceil (5/2)-1 = 2 ), the same is true for adjacent sibling nodes. Deleting an element does not meet the conditions. Therefore, the node must be merged with an adjacent sibling node; first, move the elements in the parent node (the element is between two node elements to be merged) to the child node, and then merge the two nodes into one node. Therefore, in this instance, we first move the Element D in the parent node to the node that has deleted E but only F, and then move the node that contains D and F and the node that contains, the adjacent sibling nodes of C are merged into one node.
You may think that the delete operation is over, but it is not. In this case, you will immediately find that the parent node contains only one element G, which is not up to standard, this is unacceptable. If the adjacent brothers of the problematic node are plump, you can borrow an element from the parent node. Assume that the right sibling node (containing Q, X) has more than one element (with elements on the Right of Q), and then we move M down to a child node with few elements, move Q to the M position. Then, the left subtree of Q will become the right subtree of M, that is, the right subtree Containing N, and the P node will be attached to the right pointer of M. Therefore, in this instance, we cannot borrow an element. We can only merge it with the sibling node into a node, and the unique element M in the root node moves down to the subnode, the height of the tree is reduced by one layer.
To further discuss the deletion details. Let's take another example:
Here is a different 5-level B-tree, so let's try to delete C
Therefore, the D element in the right subnode of the deleted element C is moved to the position of C. However, after the element is moved up, there is only one node.
Because the node contains E, the adjacent sibling node is just getting rid of poverty (the minimum number of elements is 2), and it is impossible to borrow elements from the parent node. Therefore, only merging operations can be performed, so, the left sibling node of B and the node containing E are merged into one node.
In this case, only one element F node exists. In this case, the adjacent sibling node is full (the number of elements is 3> the minimum number of elements 2 ), in this way, you can borrow elements from the parent node and move J from the parent node to the node. If J has elements in the node, the node moves forward, then, the first element (or the last element) in the adjacent sibling node is moved to the parent node, and the elements (or the previous element) are moved forward (or backward). Note that the element contains K, L nodes were previously attached to the left of M and now become attached to the right of J. In this way, each node meets the B-tree structure.