MySQL index data structure and principle

Source: Internet
Author: User
Tags mysql index

Original: The essence of HTTP://WWW.UML.ORG.CN/SJJM/201107145.ASP1 Index

The official MySQL definition of an index is: index is the data structure that helps MySQL to get data efficiently. by extracting the skeleton of a sentence, you can get the essence of the index: The index is the data structure.

We know that database queries are one of the most important functions of a database, such as the following SQL statement:

1 SELECT * from my_table WHERE col2 = '

We all want to query the data as fast as possible, so the designers of the database system are optimized from the point of view of the query algorithm. The most basic query algorithm is, of course, sequential lookup (linear search), traversing "my_table" and then row by line matching "col2" value is "77", the complexity of the O (n) algorithm is obviously bad when the volume of data is large, Fortunately, the development of computer science provides a number of better search algorithms, such as binary search, binary tree search, binary search, and so on.

If you look at it a little bit, you will find that each lookup algorithm can only be applied to a particular data structure, such as a binary lookup requires an orderly retrieval of data, while a binary tree lookup can only be applied to a binary lookup tree, but the data itself cannot be fully organized to meet a variety of data structures (for example, It is theoretically impossible to organize both columns sequentially, so in addition to the data, the database system maintains a data structure that satisfies a particular lookup algorithm that references (points to) data in some way, so that an advanced find algorithm can be implemented on those data structures. This data structure is the index . You can get a data record of "col2" as "77" from the table "My_table".

2 data structure of the index

See an example:

Figure 1 shows a possible way to index. On the left is the data table, a total of two columns seven records, the leftmost is the physical address of the data record (note that logically adjacent records on disk is not necessarily physically adjacent). To speed up the search for Col2, you can maintain a two-fork lookup tree on the right, each containing the index key value and a pointer to the physical address of the corresponding data record, so that the binary lookup can be used to obtain the corresponding data in the complexity of O (log2n). Although this is a true index, the actual database system is almost not implemented using a two-fork search tree or its evolutionary breed red-black tree (Red-black tree), for reasons that are described below.

B-tree and B+tree

At present, most of the database system and file system use B-tree or its variant B+tree as the index structure, in the next section of this article will combine the memory principle and the computer access principle to discuss why B-tree and B+tree are so widely used in index, this section first describes them from the data structure angle.

B-tree

To describe the B-tree, first define a data record as a two-tuple [key, Data],key is the key value of the record, for different data records, the key is not the same; data records for data other than key. Then B-tree is a data structure that meets the following criteria:

    1. D is a positive integer greater than 1, called the degree of B-tree.
    2. h is a positive integer, called the height of the b-tree.
    3. Each non-leaf node consists of a n-1 key and n pointers, where d<=n<=2d.
    4. Each leaf node contains at least one key and two pointers, with a maximum of 2d-1 keys and 2d pointers, with null pointers for leaf nodes.
    5. All leaf nodes have the same depth, which equals the tree height H.
    6. The key and the pointer are spaced each other, and the nodes are pointers.
    7. Keys in one node are not descending from left to right.
    8. All nodes make up the tree structure.
    9. Each pointer is either null or points to a different node.
    10. If a pointer is on the leftmost node and is not NULL, then all keys that point to the node are less than V (key1), where V (key1) is the value of node's first key.
    11. If a pointer is on the rightmost node and is not NULL, all keys that point to the node are greater than V (Keym), where V (keym) is the value of the last key of node.
    12. If a pointer is keyi and keyi+1 and NOT NULL on the left and right sides of node nodes, then all keys to the node are less than V (keyi+1) and greater than V (keyi).

Figure 2 is a d=2 b-tree.

The node that the interval pointer points to is recursively searched until a node is found or a null pointer is found, which finds a success, and the latter finds a failure. The pseudo-code for the lookup algorithm on B-tree is as follows:

1 btree_search (node, key)2 3 {4 5     if(node = =NULL)return NULL;6 7     foreach(Node.key)8 9     {Ten  One         if(Node.key[i] = = key)returnNode.data[i]; A  -         if(Node.key[i] > key)returnBtree_search (point[i]->node); -  the     } -  -     returnBtree_search (point[i+1]->node); -  + } -  +data = Btree_search (root, My_key);

About B-tree has a series of interesting properties, such as a degree of D B-tree, set its index N key, the upper limit of its tree height h is logd ((n+1)/2), retrieve a key, its search node number of the progressive complexity of O (LOGDN). as can be seen from this point, B-tree is a very efficient index data structure.

In addition, since inserting deletes a new data record destroys the nature of the b-tree, it is necessary to do a split, merge, transfer, etc. to preserve the b-tree nature of the tree when inserting and deleting it, and this article does not intend to discuss the contents of B-tree completely. Because there is already a lot of information detailing the mathematical nature of b-tree and the insertion and deletion algorithm, interested friends can find the corresponding information in the reference column at the end of this article to read.

B+tree

B-tree has many variants, the most common of which is b+tree, such as MySQL, which generally uses b+tree to implement its index structure.

Compared with B-tree, B+tree has the following differences:

      1. The pointer to each node is capped at 2d instead of 2d+1.
      2. The inner node does not store data, only the key, and the leaf node does not store the pointer.

Figure 3 is a simple b+tree schematic.

Not all nodes have the same domain, so the B+tree mid-node and inner nodes are generally of different sizes. This is different from B-tree, although the number of keys and pointers stored in B-tree may be inconsistent, but the domain and the upper bound of each node are consistent, so in the implementation b-tree often apply for the same size of space for each node.

In general, B+tree is more suitable than b-tree to achieve the external storage index structure, the specific reason and the principle of external memory and the principle of computer access, will be discussed below.

B+tree with sequential access pointers

Generally, the b+tree structure used in database system or file system is optimized on the basis of classical b+tree, and the sequential access pointers are added.

As shown in 4, a pointer to an adjacent leaf node is added to each leaf node of B+tree, and a b+tree with sequential access pointers is formed. The purpose of this optimization is to improve the performance of the interval access, example 4, if you want to query key from 18 to 49 of all data records, when the 18 is found, simply follow the node and the pointer in order to traverse to all data nodes, and greatly mentions the efficiency of the interval query.

This section provides a brief introduction to B-tree and B+tree, and the next section, combined with the memory access principle, explains why the current b+tree is the preferred data structure for the database system implementation index.

3 Principles of indexing

Why use B-tree (B+tree)

As mentioned above, data structures such as red and black trees can also be used to implement indexes, but file systems and database systems generally use B-/+tree as the index structure, and this section will discuss B-/+tree as the theoretical basis of index based on the knowledge of computer composition principles.

In general, the index itself is large and cannot be stored in memory, so the index is often stored as an index file on the disk. In this way, the index lookup process will generate disk I/O consumption, relative to memory access, I/O access to the consumption of a few orders of magnitude, so the evaluation of a data structure as an index is the most important indicator of the number of disk I/O operations in the process of incremental complexity. In other words, the structural organization of the index minimizes the number of disk I/O accesses during the lookup process. The following first describes the memory and disk access principles, and then combined with these principles to analyze the efficiency of B-/+tree as an index.

Main Memory Access principle

At present, the main memory used in computer is random read-write memory (RAM), the structure and access principle of modern ram is more complex,

In this paper, we discard the specific differences and abstract a very simple access model to illustrate how RAM works.

Figure 5

From an abstract point of view, main memory is a series of storage units composed of a matrix, each storage unit stores a fixed size of data. Each storage unit has a unique address, modern main memory of the site rules are more complex, here it simplifies to a two-dimensional address: through a row address and a column address can be uniquely positioned to a storage unit. Figure 5 shows a main memory model of 4 x 4.

The main memory access process is as follows:

When the system needs to read main memory, the address signal is put to address bus upload to main memory, main memory read the address signal, parse the signal and locate the designated storage unit, and then put this storage unit data on the bus for other parts to read.

The process of writing main memory is similar, the system will write the unit address and data on the address bus and bus, respectively, main memory reads two bus contents, does the corresponding write operation.

As can be seen here, the main memory access time is only linear with the number of accesses, because there is no mechanical operation, two times access to the data "distance" will not have any impact on time, for example, first take A0 and then take A1 and take A0 and then take D3 time consumption is the same.

Disk Access principle

As mentioned earlier, indexes are typically stored as files on disk, and index retrieval requires disk I/O operations. Unlike main memory, disk I/O is mechanically active, so the time spent on disk I/O is huge.

Figure 6 is the overall structure of the disk.

Figure 6

A disk consists of a circular disc of the same size and the same axis, and the disk can be rotated (each disk must be rotated synchronously). There is a head bracket on one side of the disk, and the head bracket secures a set of heads that each head is responsible for accessing the contents of a disk. The head cannot be rotated, but it can be moved in the direction of the disk radius (actually oblique-tangential motion), and each head must also be coaxial at the same moment, that is, looking down from the top, all the heads are overlapping at any time (although there is already a multi-head independent technique, which is not limited by this).

Figure 7 is the disk structure.

Figure 7

The platters are divided into a series of concentric rings centered on the center of the disc, each concentric ring is called a track, and all tracks with the same radius form a cylindrical surface. Tracks are divided into small segments along the radius lines, each of which is called a sector, and each sector is the smallest storage unit of the disk. For the sake of simplicity, let's assume that the disk has only one platter and one head.

When data needs to be read from disk, the system will pass the logical address of the data to the disk, and the control circuit of the disk will translate the logical address into the physical address according to the addressing logic, that is, determine which track and which sector the data is to be read. In order to read the data in this sector, the head needs to be placed above this sector, in order to achieve this, the head needs to move to the corresponding track, this process is called seek, the time spent is called seek time, and then the disk rotation of the target sector is rotated to the head, the process of time is called rotation time.

Principle of locality and disk pre-reading

Due to the characteristics of the storage media, the disk itself is much slower than main memory, coupled with mechanical movement, disk access speed is often one of the hundreds of of main memory, so in order to improve efficiency, to minimize disk I/O. To do this, the disk is often not read strictly on-demand, but is read-ahead every time, even if only one byte is required, and the disk starts from this location, sequentially reading a certain length of data into memory. The rationale for this is the well-known local principle in computer science:

When a data is used, the data around it is usually used immediately.

The data that is required during the program run is usually relatively centralized.

Due to the high efficiency of disk sequential reads (no seek time required and minimal rotational time), pre-reading can improve I/O efficiency for programs with locality.

The length of the read-ahead is generally the integer multiple of the page. Page is the logical block of Computer Management memory, hardware and operating system tend to divide main memory and disk storage area into contiguous size equal blocks, each storage block is called a page (in many operating systems, the page size is usually 4k), main memory and disk in the page to exchange data. When the program to read the data is not in main memory, will trigger a page fault, the system will send a read signal to the disk, the disk will find the starting position of the data and sequentially read one or several pages back into memory, and then return unexpectedly, the program continues to run.

Performance analysis of B-/+tree indexes

Here you can finally analyze the performance of the B-/+tree index.

As mentioned above, the index structure is generally evaluated using disk I/O times. First, from the B-tree analysis, according to the definition of b-tree, it is necessary to retrieve up to H nodes at a time. The designer of the database system skillfully exploits the principle of disk pre-reading, setting the size of a node equal to one page, so that each node can be fully loaded with only one I/O. To achieve this, the following techniques are required to implement B-tree in practice:

Each time you create a new node, request a page space directly, so that a node is physically stored in a page, and the computer storage allocation is page-aligned, the implementation of a node only one time I/O.

B-tree requires a maximum of h-1 I/O (root node resident memory) in a single retrieval, and a progressive complexity of O (h) =o (LOGDN). in general practice, the out-of-size D is a very large number, usually more than 100, so H is very small (usually not more than 3).

In summary, using B-tree as index structure efficiency is very high.

And the red-black tree structure, H is obviously much deeper. Because the logically close node (parent-child) may be far away physically, it is not possible to take advantage of locality, so the I/O asymptotic complexity of the red-black tree is also O (h), and the efficiency is significantly worse than B-tree.

As mentioned above, B+tree is more suitable for external memory index, because it is related to the internal node out d. From the above analysis, it can be seen that the better the performance of the larger index, and the upper limit depends on the size of the key and data within the node:

DMax = Floor (pagesize/(keysize + datasize + pointsize)) (Pagesize–dmax >= pointsize)

Or

DMax = Floor (pagesize/(keysize + datasize + pointsize))-1 (Pagesize–dmax < pointsize)

Floor indicates rounding down. Because the B+tree node removes the data domain, it can have a greater degree of granularity and better performance.

This chapter discusses data structures and algorithms related to indexes from a theoretical point of view, and the next chapter will discuss how B+tree is specifically implemented as an index in MySQL, while combining the MyISAM and INNDB storage engines to introduce two different types of index implementations for nonclustered and clustered indexes.

4 MySQL Index implementation

In MySQL, the index is the concept of storage engine level, different storage engine implementation of the index is different, this article mainly discusses the MyISAM and InnoDB two storage engine index implementation way.

MyISAM Index Implementation

The MyISAM engine uses B+tree as the index structure, and the data domain of the leaf node holds the address of the record. is a schematic diagram of the MyISAM index:

Figure 8

There are three columns in the table, assuming that we have Col1 as the primary key, then Figure 8 is the main index (Primary key) of a MyISAM table. You can see that the index file of MyISAM only stores the address of the data record. In MyISAM, the primary index and secondary index (secondary key) are structurally indistinguishable, except that the primary index requires that key be unique, and the secondary index key can be duplicated. If we establish a secondary index on Col2, the structure of this index is as follows:

Figure 9

It is also the address of a b+tree,data field that holds data records. Therefore,the algorithm of index retrieval in MyISAM is to search the index according to the B+tree search algorithm first, if the specified key exists, the value of its data field is fetched, then the data record is read with the value of the data field.

The index of MyISAM is also called "non-aggregation", and the reason for this is to differentiate it from InnoDB's clustered index.

InnoDB Index Implementation

Although InnoDB also uses B+tree as the index structure, the implementation is very different from MyISAM.

The first major difference is that the InnoDB data file itself is the index file. as you know above, the MyISAM index file and the data file are detached, and the index file only holds the address of the data record. In InnoDB, the table data file itself is an index structure organized by B+tree, and the tree's leaf node data field holds the complete data record. The key of this index is the primary key of the data table, so the InnoDB table data file itself is the primary index.

Figure 10

Figure 10 is the InnoDB primary index (also the data file), and you can see that the leaf node contains the complete data record. This index is called a clustered index. Because the InnoDB data file itself is clustered by the primary key, the INNODB requires that the table must have a primary key (MyISAM can not), and if it is not explicitly specified, the MySQL system automatically selects a column that uniquely identifies the data record as the primary key, and if no such column exists, Then MySQL automatically generates an implicit field for the InnoDB table as the primary key, which is 6 bytes long and has a length of type.

The second difference from the MyISAM index is that the secondary index of the InnoDB data field stores the value of the corresponding record primary key instead of the address. in other words, all secondary indexes of InnoDB refer to the primary key as the data domain. For example, Figure 11 is a secondary index defined on the COL3:

Figure 11

Here is the ASCII code of the English character as the comparison criterion. Clustered index This implementation makes search by primary key efficient, but secondary index search needs to retrieve two times index: first retrieves the secondary index to obtain the primary key, and then retrieves the record with the primary key to the primary index.

Understanding how index implementations of different storage engines can be useful for proper use and optimization of indexes, such as knowing the InnoDB index implementation, makes it easy to understand why it is not recommended to use too-long fields as primary keys, because all secondary indexes refer to the primary index, and the long primary index makes the secondary index too large. For example, using non-monotonic fields as primary keys is not a good idea in InnoDB because the InnoDB data file itself is a b+tree, and a non-monotonic primary key causes the data file to be frequently split in order to maintain b+tree characteristics when inserting new records, which is inefficient, Using the self-increment field as the primary key is a good choice.

MySQL index data structure and principle

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.