Analysis of MySQL indexing principle

Source: Internet
Author: User

First, the principle of the index

The so-called index, which is the rapid positioning and lookup, then the structure of the index to minimize the number of disk I/O during the lookup process (b + tree, its non-leaf nodes occupy a smaller space, can have more non-leaf nodes in the re-memory, reduce the large number of IO)

1. Index principle

2, the principle of local and disk pre-reading

Principle of locality:

    • When a data is used, the data around it is usually used immediately.
    • The data that is required during the program run is usually relatively centralized.

Due to the high efficiency of disk sequential reads (no seek time required and minimal rotational time), pre-reading can improve I/O efficiency for programs with locality.

The length of the read-ahead is generally the integer multiple of the page. Page is the logical block of Computer Management memory, hardware and operating system tend to divide main memory and disk storage area into contiguous size equal blocks, each storage block is called a page (in many operating systems, the page size is usually 4k), main memory and disk in the page to exchange data. When the program to read the data is not in main memory, will trigger a page fault, the system will send a read signal to the disk, the disk will find the starting position of the data and sequentially read one or several pages back into memory, and then return unexpectedly, the program continues to run.

  The database system skillfully utilizes the principle of disk pre-reading, setting the size of a node equal to one page, so that each node can be fully loaded with only one I/O.

Second, the data structure of the index

In large-scale data storage, to implement index query in such a real context, the number of elements stored by the tree node is limited (if the number of elements is very large, the lookup will degenerate into a linear lookup within the node), resulting in a two-fork lookup tree structure due to the depth of the tree and disk I/O read too frequently, This leads to inefficient queries (why this happens, as explained in the external memory-disk), so how to reduce the depth of the tree (not to reduce the amount of data queried, of course), is a basic idea: using a multi-fork Tree Structure (because the number of tree node elements is limited, the number of subtrees of the node is also limited).

That is, because disk operations are time-consuming and resource-intensive, it is inefficient to find multiple lookups too frequently. So how to improve efficiency, that is, how to avoid the disk too often multiple lookups? The number of disk lookups is often determined by the height of the tree, so as long as we reduce the tree's structure by a better tree structure to minimize the height of the trees, is it possible to effectively reduce the number of disk lookups and accesses? What kind of tree is this effective tree structure?

In this way, we propose a new search tree structure--The multi-path lookup tree. Based on the inspiration of the balanced binary tree, it is natural to think of a balanced multi-path search tree structure, which is the first topic to be elaborated in this article B~tree, the B-tree structure (we will see that the B-tree operations can keep the B-tree low in height, In order to effectively avoid the disk too frequent search and access operations, so as to effectively improve the search efficiency.

1. External memory-disk

Computer storage devices are generally divided into two types: internal memory (main memory) and external memory (external memory). Memory accesses are fast, but they are small, expensive, and cannot be stored for long periods of time (the data disappears without power).

External memory-disk is a direct-access storage device (DASD). It is characterized by a small change in access time. Any character group can be accessed directly, with large capacity and faster speed than other external memory devices.

When the disk drive is performing read/write functions. The disc is mounted on a spindle and rotates at high speed around the spindle, and when the track is passed under the Read/write head (also called the head), the data can be read/written.

The data on the disk must be uniquely marked with a three-dimensional address: The cylinder number, the plate number, the block number (the disk block on the track).

The following 3 steps are required to read/write a specified data on a disk:

(1) The first move arm moves the head to the desired cylinder according to the cylinder number, which is called positioning or locating.

(2) in the 6-disk group shown in 11.3, all the heads are positioned on the 10 tracks of 10 disks (the heads are bidirectional). The track on the specified disc is determined according to the disk face number.

(3) After the disc is determined, the disc begins to rotate and the track segment of the specified block number is moved to the head.

After the above three steps, the specified data storage location is found. You are now ready to start reading/writing operations.

Access to a specific information, consisting of 3 parts of time:

Seek time Ts: The time required to complete the above steps (1). This part of the time is the most expensive, the maximum can reach about 0.1s.

Wait (latency time) Tl: The time required to complete the above steps (3). As the disc revolves around the spindle speed is very fast, generally 7200 rpm (computer hard disk performance indicators, the average domestic hard disk speed is generally 5400rpm (notebook), 7200rpm of several). Therefore generally rotates around 0.0083s.

Transmission time (transmission times) Tt: The time that data is transmitted through the system bus to memory, typically transmitted in one byte (byte) approximately 0.02us=2*10^ ( -8) s

Disk read data is a block (block) for the basic unit. all data in the same block can be read all at once. The cost of disk IO is mainly spent on the lookup time TS. Therefore, we should try to keep the relevant information in the same disk block, the same track. Or at least on the same cylinder or adjacent cylinder, in order to read/write information to minimize the number of head-to- back movement, avoid too much time to find Ts.

Therefore, in large-scale data storage, a large amount of data is stored in the external memory disk, while in the external disk read/write blocks (block) of the data, the first need to locate a block in the disk, how to effectively find the data on the disk, need a reasonable and efficient external memory data structure, is the b-tree structure to be highlighted below, and the associated variant structure:b+-tree structure and b*-tree structure.

1, B-Tree

B-Tree, that is, tree. Because the original English name of B-Tree is b-tree, and many people in China like to b-tree translation B-tree, in fact, this is a very bad literal translation, it is easy to make people misunderstand. As one might think of a B-tree as a tree, and a tree of trees. In fact,B-tree refers to the B-tree .

B-Tree is a multi-fork designed for disk or other storage devices (below you will see that the B-tree has multiple branches, that is, multiple forks) to balance the lookup tree with respect to the binary.

The biggest difference between a B-tree and a red-black tree is that the nodes of a B-tree can have many children, from several to thousands of. Why do you say B-trees are similar to red-black trees? Because, like the red and black trees, a B-tree with n nodes is also O (LGN), but may be much smaller than the height of a red-black tree, and its branching factor is larger. Therefore, the B-tree can implement various dynamic collection operations such as INSERT, delete, etc. within O (logn) time.

As shown, that is a B-tree, a key word for the English consonant B-tree, now to find the letter R from the tree species (including n[x] key words x,x have n[x]+1] children (that is, an inner node x if it contains n[x] keywords, then x will contain n[x]+1 children). All leaf nodes are at the same depth, and shaded nodes are the nodes to check when the letter R is found:


  Believe that, from what you can easily see, an inner node x if it contains n[x] keyword, then x will contain n[x]+1 children. If there are 3 children in the inner node with 2 keyword D h, 4 children are included in the inner node with 3 keywords Q T x.

B-Tree is also called balanced multi-path search tree.

    1. Each node in the tree contains a maximum of M children (m>=2);
    2. Root nodes and leaf nodes, each of the other nodes has at least [Ceil (M/2)] Children (where ceil (x) is an upper-bound function);
    3. Joghen nodes are not leaf nodes, there are at least 2 children (special case: No Child root node, that is, the root node is a leaf node, the whole tree has only one root);
    4. All leaf nodes appear on the same layer, and leaf nodes do not contain any keyword information.
    5. Each non-terminal node contains n keyword information: (N,P0,K1,P1,K2,P2,......,KN,PN). which
      A) Ki (I=1...N) is the keyword, and the keyword is sorted in ascending order of K (i-1) < Ki.
      b) Pi is a contact point pointing to Subtree, and the key of the pointer P (i-1) to all nodes of the subtree is less than Ki, but both are greater than K (i-1).
      c) The number of keywords n must satisfy: [Ceil (M/2) -1]<= n <= m-1.

Each node in the B-tree can contain a large number of keyword information and branches according to the actual situation (of course, it cannot exceed the size of the disk block, depending on disk drives, the size of the general block is around 1k~4k); So the depth of the tree is reduced, This means finding an element as long as a few nodes are read into memory from the external memory disk and quickly accessing the data to be found.


  the specific process for file lookup :

For the sake of simplicity, here is a small amount of data to construct a 3-fork tree form, the actual application of the B-tree node in a lot of keywords. In the above figure, such as the root node, where 17 represents the file name of a disk, the Red Square indicates where the 17 file is stored on the hard disk, and P1 represents a pointer to the 17 left subtree.

Its structure can be simply defined as:

typedefstruct {    /*Number of files*/    intFile_num; /*file name (key)*/    Char*File_name[max_file_num]; /*pointer to child node*/Btnode* btptr[max_file_num+1]; /*where the file is stored on the hard disk*/file_hard_addr offset[max_file_num];} Btnode;
View Code

If each disk block can hold exactly one node of the B-tree (with exactly 2 file names). Then a Btnode node represents a disk block, and the subtree pointer is the address that holds the other disk block.

Below, let's simulate the process of finding file 29:

    1. Locate the root disk Block 1 of the file directory based on the root node pointer, and import the information into memory. "Disk IO operation 1 times"
    2. In memory, there are two filenames, 17, 35, and three data that store the page addresses of other disks. According to the algorithm we found: 17<29<35, so we find the pointer p2.
    3. Based on the P2 pointer, we navigate to disk Block 3 and import the information into memory. "Disk IO operation 2 times"
    4. In memory, there are two filenames 26,30 and three data that store the page addresses of other disks. According to the algorithm we found: 26<29<30, so we find the pointer p2.
    5. Based on the P2 pointer, we navigate to disk Block 8 and import the information into memory. "Disk IO operation 3 times"
    6. There are two filenames in memory, 28, 29. According to the algorithm we find the file name 29 and locate the disk address of the memory.

Analyzing the above procedure, it is found that 3 disk IO operations and 3 memory lookup operations are required. As for the file name lookup in memory, because it is an ordered table structure, you can use binary lookup to improve efficiency. The IO operation is a determinant of the efficiency of the whole B-tree lookup.

Of course, if we use the balanced binary tree disk storage structure to find, disk 4 times, up to 5 times, and the more files, B-tree than the balance of binary tree disk IO operations will be less, more efficient.

Height of the tree:

As we can see from the above example, the number of Io reads for secondary storage depends on the height of the B-tree. And what is the height of the B-tree determined by what? If a non-leaf node of B-Tree contains n keywords, then this non-leaf node contains n+1 child nodes, and all leaf nodes are on layer I, we can conclude that:

    1. Since the root has at least two children, there are at least two nodes on the 2nd floor.
    2. Root and leaves, the other nodes have at least ┌m/2┐ children,
    3. So there are at least 2*┌m/2┐ nodes on the 3rd floor.
    4. At least one ┌m/2┐^2 node on the 4th floor.
    5. There is at least a ┌m/2┐^ (L-2) node in layer I, so there are: n+1≥2*┌m/2┐i-2;
    6. Consider the number of nodes in the first layer of n+1, then ┌m/2┐^ (L-2) ≤n+1, that is, the minimum node of the L layer is exactly the n+1, namely: I≤log┌m/2┐ ((n+1)/2) +2;
So when the B-tree contains n keywords, the maximum height of the B-tree is L-1 (because the layer of the leaf node is not counted when the B-tree height is calculated), namely: L-1 = Log┌m/2┐ ((n+1)/2) +1

Each node in the tree contains a maximum of M children, i.e. m satisfies: ceil (M/2) <=m<=m. And the lower the number of children in each node in the tree, the greater the height of the tree, so

2. B + Tree

    B+-tree: It is a b-tree deformation tree which is produced by the file system.

The similarities and differences between a M-order B + tree and a M-order tree are:

All of the leaf nodes contain information about all the keywords, and pointers to the records that contain them, and the leaf nodes themselves are linked by the size of the keywords from a large order of origin. (The leaf node of the B-tree does not include all the information it needs to find)

      all non-terminal nodes can be considered as the index part , and the nodes contain only the largest (or smallest) keywords in their sub-root nodes. (The non-final node of the B-tree also contains valid information that needs to be found)


3, B-tree and A + + tree difference

The number of keywords is different; b + Tree branch node has m keyword, its leaf node also has m, its key is only played an index role, but B-tree, although there are M nodes, but it only has a m-1 keyword.

The location of the storage is different; the data in the B + tree is stored on the leaf nodes, that is, the data of all the leaf nodes is combined to be the complete data, but the B-tree data is stored in each node, and is not only stored on the leaf nodes.

Branching nodes are constructed differently; the branch node of the B + tree stores only the keyword information and the son's pointer (the pointer here refers to the offset of the disk block), which means that the internal node contains only the index information.

The query is different; the B-Tree ends when it finds a specific value, and the + + tree needs to be indexed to find the data in the leaf node, which means that the S + Tree's search process takes a path from the root node to the leaf node.

Third, why does MySQL use B + Tree as an index instead of using a tree?


    • The key sets in the B-tree are distributed throughout the tree, the leaf nodes contain no keyword information, and B + Tree keyword sets are distributed in the leaf nodes, and the non-leaf nodes are just the index of the keywords in the leaf node.
    • Any of the keywords in the B-tree appear only in one node, while the keywords in the + + tree must appear in the leaf nodes, or they may recur in non-leaf nodes;

On performance:

    • Unlike the B-tree, which is only suitable for random retrieval, the + + tree supports both random and sequential retrieval .
    • B + trees have lower disk read and write costs. B + Tree's internal node does not point to the keyword specific information pointers , the internal node is smaller than the B-tree, the disk block can accommodate more than the number of nodes in the node, a one-time read into the memory can be found in the keyword is more, relative, Io read and write times are reduced. The number of Io reads and writes is the most important factor affecting index retrieval efficiency. (for example, suppose that a disk block in a disc holds 16bytes, while a keyword of 2bytes, a keyword specific information pointer 2bytes.) An internal node of a 9-order B-tree (a node with a maximum of 8 keywords) requires 2 disks fast. and the B + tree internal nodes only need 1 disks fast. When an internal node needs to be read into memory, the B-tree is more than a second-order block lookup time (on disk is the time of disc rotation))
    • The query efficiency of B + trees is more stable. B-Tree search may end in non-leaf nodes, the closer the root node of the record lookup time is shorter, as long as the keyword can be found to determine the existence of the record, its performance is equivalent to the keyword in the complete set of binary search. In the B + tree, the sequential retrieval is more obvious, random search, any keyword search must go from the root node to the leaf node path, all the keywords are the same length of the search paths, resulting in the query efficiency of each keyword is equivalent. (because a non-endpoint is not a node that ultimately points to the contents of a file, it's just the index of the keyword in the leaf node.) So any keyword search must take a path from the root node to the leaf node. All keyword queries have the same path length, resulting in a query efficiency equivalent for each data.
    • (The main reason for database indexes with B + trees is that) b-trees do not solve the inefficiency of element traversal while improving disk IO performance. The leaf nodes of the B + tree are joined together using the pointer sequence, so that the entire tree can be traversed as long as the leaf nodes are traversed . and the scope-based queries in the database are very frequent , and the B-tree does not support such operations (or inefficient).


  (1) b + Tree space utilization is higher, can reduce I/o times ,

    In general, the index itself is large and cannot be stored in memory, so the index is often stored as an index file on the disk. In this case, disk I/O consumption is generated during the index lookup process. And because the internal nodes of the B + tree are only used as indexes, and not like B-trees, each node needs to store the hard disk pointer.
That is, each non-leaf node in the B + tree does not have pointers to specific information about a keyword, so each node can hold more keywords, which means that the more keywords you need to read into memory at one time, the more I/O operations are reduced.
e.g. assume that a disk block in the disc holds 16bytes, while a keyword of 2bytes, a keyword specific information pointer 2bytes. An internal node of a 9-order B-tree (a node with a maximum of 8 keywords) requires 2 disks fast. and the B + tree internal nodes only need 1 disks fast. When an internal node needs to be read into memory, the B-tree is one more block-lookup time (the disk is the time of disc rotation) than a B + tree.

   (2) When adding or deleting files (nodes), the efficiency is higher,
Because the leaf node of B + Tree contains all the keywords and is stored in an orderly linked list structure, this can improve the efficiency of deletion.
(3) B + Tree query efficiency is more stable,
Because every time a B + tree is queried, it needs to traverse a path from the root node to the leaf node. The query path for all keywords is the same length, resulting in a comparable efficiency for each query.

B + Tree also has one of the biggest benefits, easy to sweep the library, the second tree must use the method of sequential traversal in order to sweep the library, and the second-B tree directly from the leaf node to sweep over the end, B + Tree support Range-query is very convenient, and B. Tree does not support. This is the main reason why the database chooses B + trees

  One of the most important differences between B-and B-plus trees is that B + trees only have leaf nodes to store data, the rest of the nodes are indexed, and B-trees have data domains for each index node. This determines that the B + tree is more suitable for storing external data, known as disk data.

From the point of view of MySQL (INOODB), B + trees are used as indexes, generally the index is very large, especially the relational database such a large number of indexes can reach billions of levels, so in order to reduce memory consumption, the index will be stored on disk. So how does MySQL measure query efficiency? Disk IO times, B-tree (b-tree) is specific to the number of nodes per layer is very large, the number of layers is very small, in order to reduce the number of disk IO, when querying data, the best thing is to quickly find the target index, and then read the data, using B + tree can do a good job of this purpose, But the B-tree each node has the data field (pointer), which undoubtedly increased the size of the node, and plainly increase the number of disk IO (disk IO Once read the size of the data is fixed, a single data is larger, each read out less, more io, an IO time ah!) ), while the B + tree does not store data in addition to the leaf nodes, the nodes are small and the number of disk IO is less. This is one of the advantages.

Another advantage is that the B + tree all the data domains in the leaf node, generally will be an optimization, that is, all the leaf nodes with a pointer string up. This allows you to traverse the leaf nodes to get all the data so that you can access them in a range.


This article main reference:

Analysis of MySQL indexing principle

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.