MySQL Index algorithm principle analysis (easy to understand, only speak B-tree)

Source: Internet
Author: User
Tags mysql index

At the beginning of the study, Baidu to check, but found a lot of said too complex and difficult to understand, combined with each article summed up (suggest probably read the text, do not understand it does not matter, then look at the diagram of the implementation of the steps and then in the combined text, so that all clear a lot)

b-tree,b is balance, which is commonly used for database indexing . using the B-TREE structure can significantly reduce the intermediate process that is experienced when locating records, thus speeding up the access speed. While B+tree is a variant of B-tree, the famous MySQL generally uses b+tree to implement its index structure.

Why does the database use this structure?

  In general, the index itself is large and cannot be stored in memory, so the index is often stored as an index file on the disk. In this way, the index lookup process will generate disk I/O consumption, relative to memory access, I/O access to the consumption of a few orders of magnitude, so the evaluation of a data structure as an index is the most important indicator of the number of disk I/O operations in the process of incremental complexity. In other words, the structural organization of the index minimizes the number of disk I/O accesses during the lookup process.

  To achieve this, the disk reads on demand and requires that the length of the read-ahead is usually an integer multiple of the page. and the database system sets the size of a node equal to one page, so that each node can be fully loaded with only one I/O. each time you create a new node, request a page space directly, so that a node is physically stored in a page, and the computer storage allocation is page-aligned, the implementation of a node only one time I/O. And put the M value in the B-tree is very large, it will let the height of the tree down, in favor of a full load



M-way Find Tree ( focus on step map )

First introduce the M-way search tree, as the name implies is the degree of each node of a tree is less than or equal to M.

Therefore, its nature is as follows:

    1. The number of key values per node is less than M
    2. The degree of each node is less than or equal to M
    3. Key values are sorted in order
    4. The key value of the subtree to be exactly less than or greater than or between the parent nodes



B-tree

B-tree also called balanced multi-path lookup tree. the characteristics of an m - order b-tree (M - fork Tree ) are as follows:

(where ceil (x) is a function that takes the upper limit)

1) each node in the tree has a maximum of m children;

2) root nodes and leaf nodes, each node has at least ceil (M/2) children;

3) joghen node is not a leaf node, then at least 2 Children (special case: No Child root node, that is, the root node is a leaf node, the whole tree has only one root);

4) all leaf nodes appear on the same layer, the leaf nodes do not contain any keyword information ( can be seen as external nodes or query failed nodes, in fact, these nodes do not exist, pointers to these nodes are null);

5)   n (n p0 k1 p1 k2 p2 kn PN)

a) Ki (I=1...N) is the keyword, and the keywords are sorted by order K (i-1) < Ki.

b) Pi is a contact point pointing to Subtree, and the key of the pointer P (i-1) to all nodes of the subtree is less than Ki, but both are greater than K (i-1) .

c) The number of keywords n must meet: ceil (M/2)-1 <= n <= m-1.

B-tree each node in the site can contain a large number of keyword information and branches depending on the actual situation ( Of course, it cannot exceed the size of the disk block, depending on disk drives , the size of the general block is around 1k~4k ) so the depth of the tree is reduced, which means finding an element as long as a few nodes are read into memory from the external memory disk and quickly access the data to be found.


The following is illustrated with a 5-step B-tree example (see below): ( focus on )

It satisfies the above conditions: root node and leaf node, each other node has at least ceil (5/2) =3 Children (at least 2 keywords); Of course 5 Children (maximum of 4 keywords). The keywords in uppercase letters, in ascending alphabetical order.

insert Operation : when inserting an element, first in B-tree , if it does not exist, it ends at the leaf node, and then inserts the new element in the leaf node. Note: If the leaf node space is sufficient, it is necessary to move to the right of the element in the leaf node greater than the newly inserted keyword, if the space is full so that there is not enough space to add a new element, the node is "split" and half of the number of key elements are split into the new adjacent right node. The middle key element moves up to the parent node (of course, if the parent node space is full, it also requires a "split" operation), and when the key element in the node moves to the right, the relevant pointer also needs to move to the right. If a new element is inserted at the root node and the space is full, the split operation is performed so that the intermediate key element in the original root node moves up to the new root node, thus causing the tree's height to increase by one level.

Let's step through an example. Insert the following character letter into the empty 5 order B-tree :C N G A H E K Q M F W L T Z D P R X Y S,5 The c7> sequence means that a node has a maximum of 5 Children and 4 keywords , and root nodes have at least 2 keywords outside the node, first of all, The node space is sufficient and4 letters are inserted into the same node, such as:

when we try to insert H , the node finds that there is not enough space to split it into 2 nodes, moving the middle element . G move up to the new root node, in the implementation process, we A and the C remain in the current node, and H and the N Place the new right neighbor node in it. such as:

when we insert E,k,q no split operation is required

Insert M need a split, notice M happens to be an intermediate keyword element that moves up to the parent node

Insert f,w,l,t no split operation required

Insert Z , the Right leaf node space is full and requires a split operation, the middle element T move up to the parent node, notice that by moving the middle element up, the tree eventually remains balanced, and the node of the split result exists . 2 a keyword element.

Insert D causes the left-most leaf node to be split, D exactly the middle element, moving up to the parent node, and then the letter P,r,x,y successive insertions do not require any split operations.

Finally, when inserting S when it contains N,p,q,r the nodes need to be split and the intermediate elements Q move up to the parent node, but the situation is that the space in the parent node is full, so divide the middle element in the parent node M move up to the newly formed root node, note that the third pointer that was previously in the parent node is modified to include D and the G node. The completion of such a specific insert operation, the following describes the delete operation, the deletion operation is relative to the insert operation to consider the situation more points.

delete (delete) action: first find b -tree The element to be deleted in the b -tree

Delete element, after moving the corresponding element, if the number of elements in a node is less than ceil (M/2)-1 , you need to see if the node of one of its neighbors is plump (the number of elements in the node is more than ceil (M/2)-1 ), if plump, borrows an element from the parent node to satisfy the condition, and if the neighboring brothers are just out of poverty, then the number of nodes is less than ceil (M/2)-1 , the node is connected to a neighboring sibling node . " Merging " into a node to satisfy the condition. Let's go through the following examples to learn more about it.

a tree constructed with the above insertion operation 5-Step B -tree As an example, delete h,t,r,ein turn.

First DELETE element H , of course first find H , H in a leaf node, and the number of elements in the leaf node . 3 greater than the minimum number of elements ceil (M/2) -1=2 , the operation is simple, we just need to move K to the original H the position, move L to K position (that is, the element following the delete element in the node moves forward)

Next, delete T, because T not in the leaf knot, but in the middle of the knot, we find his successor W ( the next element in ascending alphabetical order ) and Will W Move up to T the location, and then the original containing W in the children's knot . W Delete it and delete it exactly W , the number of elements in the child's node is greater than 2 , there is no need for a merge operation.

Next Delete R ,Rin the leaf knot.,but the number of elements in the node is2, deleting causes only1element, which is less than the minimum number of elementsceil (5/2) -1=2,if one of the neighboring sibling nodes is more plump (the number of elements is greater thanceil (5/2) -1=2), you can borrow an element from the parent node and then move up the last or the first element in the most contiguous sibling node to the parent, in which case the right neighbor sibling node is more plump (3elements greater than2), so first borrow an element from the parent nodeWMove down to the leaf node to replace the originalSthe location,Smove forward;Xmove up to the parent node in the adjacent right sibling node, and finally delete the adjacent right sibling node .X, and the back element moves forward.

last step delete e after deletion can cause a lot of problems, because e The number of nodes is exactly the same, just meet the minimum number of elements ( ceil (5/2) -1=2 ) d e f d f a,c The adjacent sibling nodes are merged into one node.

Maybe you think this is the end of the delete operation, but in fact, in this particular case, you will immediately find that the parent node contains only one elementG, it is not acceptable to reach the standard. If the neighbor of the problem node is more plump, you can borrow an element from the parent node. Suppose at this point the right sibling node (containingq,x) has more than one element (QThere are elements on the right, and then we'llMMove down to a sub-node with few elements,QMove up toMthe position, at this time,Qthe Zuozi will becomeMthe right sub-tree, which containsN,Pnodes are attached toMon the right pointer. So in this case, we have no way to borrow an element that can only be combined with a sibling node into a single node, and the only element in the root nodeMdown to the child node so that the height of the tree is reduced by one layer.

In order to further discuss the situation of deletion in detail. Give another example:

here is a different 5-Step B-tree , then let's try to delete C

the element is then deleted C in the right child node of the D element moves to the C position, but occurs when the element is moved up, only one element of the node is in the case.

and because it contains E nodes, the adjacent sibling nodes are just out of poverty (the minimum number of elements is 2 ), it is not possible to borrow elements from the parent node, so only the merge operation can be done, so this will contain A, b the left sibling node and contains E nodes are merged into one node.

and then there's only one element . F node, at which point the adjacent sibling knot is plump (the number of elements is 3> minimum number of elements 2 ) so that you can borrow the elements from the parent node, J Move down to the node, corresponding to the node in the J after the element is moved forward, then the first element (or the last element) in the adjacent sibling node is moved up to the parent, and the subsequent element (or the preceding element) moves forward (or back); K , L the node was previously attached to M To the left, now becomes attached to the J to the right. Thus each node satisfies the structural properties of B-tree .



MySQL Index algorithm principle analysis (easy to understand, only speak B-tree)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.