The data structure and algorithm principle behind MySQL index

Last Update:2015-11-25 Source: Internet

Author: User

Tags documentation mysql manual mysql index

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Summary

This paper takes MySQL database as the research object and discusses some topics related to database indexing. In particular, MySQL supports many storage engines, and the various storage engines support the indexes differently, so the MySQL database supports multiple index types such as btree indexes, hash indexes, full-text indexes, and so on. To avoid confusion, this article will focus only on the Btree index, as this is the primary index for dealing with MySQL, and the hash index and the full-text index are not discussed in this article.

Data structure and Algorithm basics

The nature of the index

The official MySQL definition of an index is: index is the data structure that helps MySQL to get data efficiently. by extracting the skeleton of a sentence, you can get the essence of the index: The index is the data structure.

We know that database query is one of the most important functions of database. We all want to query the data as fast as possible, so the designers of the database system are optimized from the point of view of the query algorithm. The most basic query algorithm, of course, is sequential lookup (linear search), the complexity of the O (n) algorithm is obviously bad when the volume of data is large, fortunately, the development of computer science provides a lot of better search algorithms, such as binary search, Binary tree search (binary trees search), and so on. If you look at it a little bit, you will find that each lookup algorithm can only be applied to a particular data structure, such as a binary lookup requires an orderly retrieval of data, while a binary tree lookup can only be applied to a binary lookup tree, but the data itself cannot be fully organized to meet a variety of data structures (for example, It is theoretically impossible to organize both columns sequentially, so in addition to the data, the database system maintains a data structure that satisfies a particular lookup algorithm that references (points to) data in some way, so that an advanced find algorithm can be implemented on those data structures. This data structure is the index .

See an example:

Figure 1

Figure 1 shows a possible way to index. On the left is the data table, a total of two columns seven records, the leftmost is the physical address of the data record (note that logically adjacent records on disk is not necessarily physically adjacent). To speed up the search for Col2, you can maintain a two-fork lookup tree on the right, each containing the index key value and a pointer to the physical address of the corresponding data record, so that the binary lookup can be used to obtain the corresponding data in the complexity of O (log2n).

Although this is a true index, the actual database system is almost not implemented using a two-fork search tree or its evolutionary breed red-black tree (Red-black tree), for reasons that are described below.

B-tree and B+tree

At present, most of the database system and file system use B-tree or its variant B+tree as the index structure, in the next section of this article will combine the memory principle and the computer access principle to discuss why B-tree and B+tree are so widely used in index, this section first describes them from the data structure angle.

B-tree

To describe the B-tree, first define a data record as a two-tuple [key, Data],key is the key value of the record, for different data records, the key is not the same; data records for data other than key. Then B-tree is a data structure that meets the following criteria:

1. D is a positive integer greater than 1, called the degree of B-tree.

2. h is a positive integer, called the height of the b-tree.

3. Each non-leaf node consists of a n-1 key and n pointers, of which d<=n<=2d.

4. Each leaf node contains at least one key and two pointers, with a maximum of 2d-1 keys and 2d pointers, and the pointer of the leaf node is null.

5. All leaf nodes have the same depth, equal to the tree height H.

6. The key and the pointer are spaced each other, and the nodes are pointers at both ends.

7. Keys in one node are not descending from left to right.

8. All nodes make up the tree structure.

9. Each pointer is either null or points to a different node.

10. If a pointer is on the leftmost node and is not NULL, all key points to the node are less than V (key1), where V (key1) is the value of node's first key.

11. If a pointer is on the rightmost node and is not NULL, all keys to the node are greater than V (Keym), where V (keym) is the value of the last key of node.

12. If a pointer is keyi and keyi+1 and NOT NULL for the left and right neighboring keys of node nodes, then all keys to the node are less than V (keyi+1) and greater than V (keyi).

Figure 2 is a d=2 b-tree.

Figure 2

Because of the characteristics of B-tree, the algorithm of retrieving data by key in B-tree is very intuitive: first, binary lookup from the root node, if found to return the corresponding node of data, otherwise the corresponding interval pointer to the node to find recursively, until the node is found or a null pointer, the former lookup success , the latter lookup failed. The pseudo-code for the lookup algorithm on B-tree is as follows:

In addition, since inserting deletes a new data record destroys the nature of the b-tree, it is necessary to do a split, merge, transfer, etc. to preserve the b-tree nature of the tree when inserting and deleting it, and this article does not intend to discuss the contents of B-tree completely. Because there is already a lot of information detailing the mathematical nature of b-tree and the insertion and deletion algorithm, interested friends can find the corresponding information in the reference column at the end of this article to read.

B+tree

B-tree has many variants, the most common of which is b+tree, such as MySQL, which generally uses b+tree to implement its index structure.

Compared with B-tree, B+tree has the following differences:

1. The pointer to each node is capped at 2d instead of 2d+1.

2. The inner node does not store data, only the key, and the leaf node does not store the pointer.

Figure 3 is a simple b+tree schematic.

Figure 3

Since not all nodes have the same domain, the b+tree and inner nodes are generally of different sizes. This is different from B-tree, although the number of keys and pointers stored in B-tree may be inconsistent, but the domain and the upper bound of each node are consistent, so in the implementation b-tree often apply for the same size of space for each node.

In general, B+tree is more suitable than b-tree to achieve the external storage index structure, the specific reason and the principle of external memory and the principle of computer access, will be discussed below.

B+tree with sequential access pointers

Generally, the b+tree structure used in database system or file system is optimized on the basis of classical b+tree, and the sequential access pointers are added.

Figure 4

As shown in 4, a pointer to an adjacent leaf node is added to each leaf node of B+tree, and a b+tree with sequential access pointers is formed. The purpose of this optimization is to improve the performance of the interval access, example 4, if you want to query key from 18 to 49 of all data records, when the 18 is found, simply follow the node and the pointer in order to traverse to all data nodes, and greatly mentions the efficiency of the interval query.

This section provides a brief introduction to B-tree and B+tree, and the next section, combined with the memory access principle, explains why the current b+tree is the preferred data structure for the database system implementation index.

Why use B-tree (B+tree)

As mentioned above, data structures such as red and black trees can also be used to implement indexes, but file systems and database systems generally use B-/+tree as the index structure, and this section will discuss B-/+tree as the theoretical basis of index based on the knowledge of computer composition principles.

In general, the index itself is large and cannot be stored in memory, so the index is often stored as an index file on the disk. In this way, the index lookup process will generate disk I/O consumption, relative to memory access, I/O access to the consumption of a few orders of magnitude, so the evaluation of a data structure as an index is the most important indicator of the number of disk I/O operations in the process of incremental complexity. In other words, the structural organization of the index minimizes the number of disk I/O accesses during the lookup process. The following first describes the memory and disk access principles, and then combined with these principles to analyze the efficiency of B-/+tree as an index.

Main Memory Access principle

At present, the main memory used in the computer is random read-write memory (RAM), modern RAM structure and access principle is more complex, here the paper gives up the specific differences, abstract a very simple access model to illustrate how RAM works.

Figure 5

From an abstract point of view, main memory is a series of storage units composed of a matrix, each storage unit stores a fixed size of data. Each storage unit has a unique address, modern main memory of the site rules are more complex, here it simplifies to a two-dimensional address: through a row address and a column address can be uniquely positioned to a storage unit. Figure 5 shows a main memory model of 4 x 4.

The main memory access process is as follows:

When the system needs to read main memory, the address signal is put to address bus upload to main memory, main memory read the address signal, parse the signal and locate the designated storage unit, and then put this storage unit data on the bus for other parts to read.

The process of writing main memory is similar, the system will write the unit address and data on the address bus and bus, respectively, main memory reads two bus contents, does the corresponding write operation.

As can be seen here, the main memory access time is only linear with the number of accesses, because there is no mechanical operation, two times access to the data "distance" will not have any impact on time, for example, first take A0 and then take A1 and take A0 and then take D3 time consumption is the same.

Disk Access principle

As mentioned earlier, indexes are typically stored as files on disk, and index retrieval requires disk I/O operations. Unlike main memory, disk I/O is mechanically active, so the time spent on disk I/O is huge.

Figure 6 is the overall structure of the disk.

Figure 6

A disk consists of a circular disc of the same size and the same axis, and the disk can be rotated (each disk must be rotated synchronously). There is a head bracket on one side of the disk, and the head bracket secures a set of heads that each head is responsible for accessing the contents of a disk. The head cannot be rotated, but it can be moved in the direction of the disk radius (actually oblique-tangential motion), and each head must also be coaxial at the same moment, that is, looking down from the top, all the heads are overlapping at any time (although there is already a multi-head independent technique, which is not limited by this).

Figure 7 is the disk structure.

Figure 7

The platters are divided into a series of concentric rings centered on the center of the disc, each concentric ring is called a track, and all tracks with the same radius form a cylindrical surface. Tracks are divided into small segments along the radius lines, each of which is called a sector, and each sector is the smallest storage unit of the disk. For the sake of simplicity, let's assume that the disk has only one platter and one head.

When data needs to be read from disk, the system will pass the logical address of the data to the disk, and the control circuit of the disk will translate the logical address into the physical address according to the addressing logic, that is, determine which track and which sector the data is to be read. In order to read the data in this sector, the head needs to be placed above this sector, in order to achieve this, the head needs to move to the corresponding track, this process is called seek, the time spent is called seek time, and then the disk rotation of the target sector is rotated to the head, the process of time is called rotation time.

Principle of locality and disk pre-reading

Due to the characteristics of the storage media, the disk itself is much slower than main memory, coupled with mechanical movement, disk access speed is often one of the hundreds of of main memory, so in order to improve efficiency, to minimize disk I/O. To do this, the disk is often not read strictly on-demand, but is read-ahead every time, even if only one byte is required, and the disk starts from this location, sequentially reading a certain length of data into memory. The rationale for this is the well-known local principle in computer science:

When a data is used, the data around it is usually used immediately.

The data that is required during the program run is usually relatively centralized.

Due to the high efficiency of disk sequential reads (no seek time required and minimal rotational time), pre-reading can improve I/O efficiency for programs with locality.

The length of the read-ahead is generally the integer multiple of the page. Page is the logical block of Computer Management memory, hardware and operating system tend to divide main memory and disk storage area into contiguous size equal blocks, each storage block is called a page (in many operating systems, the page size is usually 4k), main memory and disk in the page to exchange data. When the program to read the data is not in main memory, will trigger a page fault, the system will send a read signal to the disk, the disk will find the starting position of the data and sequentially read one or several pages back into memory, and then return unexpectedly, the program continues to run.

Performance analysis of B-/+tree indexes

Here you can finally analyze the performance of the B-/+tree index.

As mentioned above, the index structure is generally evaluated using disk I/O times. First, from the B-tree analysis, according to the definition of b-tree, it is necessary to retrieve up to H nodes at a time. The designer of the database system skillfully exploits the principle of disk pre-reading, setting the size of a node equal to one page, so that each node can be fully loaded with only one I/O. To achieve this, the following techniques are required to implement B-tree in practice:

Each time you create a new node, request a page space directly, so that a node is physically stored in a page, and the computer storage allocation is page-aligned, the implementation of a node only one time I/O.

B-tree requires a maximum of h-1 I/O (root node resident memory) in a single retrieval, and a progressive complexity of O (h) =o (LOGDN). in general practice, the out-of-size D is a very large number, usually more than 100, so H is very small (usually not more than 3).

In summary, using B-tree as index structure efficiency is very high.

And the red-black tree structure, H is obviously much deeper. Because the logically close node (parent-child) may be far away physically, it is not possible to take advantage of locality, so the I/O asymptotic complexity of the red-black tree is also O (h), and the efficiency is significantly worse than B-tree.

As mentioned above, B+tree is more suitable for external memory index, because it is related to the internal node out d. From the above analysis, it can be seen that the better the performance of the larger index, and the upper limit depends on the size of the key and data within the node:

DMax = Floor (pagesize/(keysize + datasize + pointsize)) (Pagesize–dmax >= pointsize)

DMax = Floor (pagesize/(keysize + datasize + pointsize)) –1 (Pagesize–dmax < pointsize)

Floor indicates rounding down. Because the B+tree node removes the data domain, it can have a greater degree of granularity and better performance.

This chapter discusses data structures and algorithms related to indexes from a theoretical point of view, and the next chapter will discuss how B+tree is specifically implemented as an index in MySQL, while combining the MyISAM and INNDB storage engines to introduce two different types of index implementations for nonclustered and clustered indexes.

MySQL Index implementation

In MySQL, the index is the concept of storage engine level, different storage engine implementation of the index is different, this article mainly discusses the MyISAM and InnoDB two storage engine index implementation way.

MyISAM Index Implementation

The MyISAM engine uses B+tree as the index structure, and the data domain of the leaf node holds the address of the record. is a schematic diagram of the MyISAM index:

Figure 8

There are three columns in the table, assuming that we have Col1 as the primary key, then Figure 8 is the main index (Primary key) of a MyISAM table. You can see that the index file of MyISAM only stores the address of the data record. In MyISAM, the primary index and secondary index (secondary key) are structurally indistinguishable, except that the primary index requires that key be unique, and the secondary index key can be duplicated. If we establish a secondary index on Col2, the structure of this index is as follows:

Figure 9

It is also the address of a b+tree,data field that holds data records. Therefore,the algorithm of index retrieval in MyISAM is to search the index according to the B+tree search algorithm first, if the specified key exists, the value of its data field is fetched, then the data record is read with the value of the data field.

The index of MyISAM is also called "non-aggregation", and the reason for this is to differentiate it from InnoDB's clustered index.

InnoDB Index Implementation

Although InnoDB also uses B+tree as the index structure, the implementation is very different from MyISAM.

The first major difference is that the InnoDB data file itself is the index file. as you know above, the MyISAM index file and the data file are detached, and the index file only holds the address of the data record. In InnoDB, the table data file itself is an index structure organized by B+tree, and the tree's leaf node data field holds the complete data record. The key of this index is the primary key of the data table, so the InnoDB table data file itself is the primary index.

Figure 10

Figure 10 is the InnoDB primary index (also the data file), and you can see that the leaf node contains the complete data record. This index is called a clustered index. Because the InnoDB data file itself is clustered by the primary key, the INNODB requires that the table must have a primary key (MyISAM can not), and if it is not explicitly specified, the MySQL system automatically selects a column that uniquely identifies the data record as the primary key, and if no such column exists, Then MySQL automatically generates an implicit field for the InnoDB table as the primary key, which is 6 bytes long and has a length of type.

The second difference from the MyISAM index is that the secondary index of the InnoDB data field stores the value of the corresponding record primary key instead of the address. in other words, all secondary indexes of InnoDB refer to the primary key as the data domain. For example, Figure 11 is a secondary index defined on the COL3:

Figure 11

Here is the ASCII code of the English character as the comparison criterion. Clustered index This implementation makes search by primary key efficient, but secondary index search needs to retrieve two times index: first retrieves the secondary index to obtain the primary key, and then retrieves the record with the primary key to the primary index.

Understanding how index implementations of different storage engines can be useful for proper use and optimization of indexes, such as knowing the InnoDB index implementation, makes it easy to understand why it is not recommended to use too-long fields as primary keys, because all secondary indexes refer to the primary index, and the long primary index makes the secondary index too large. For example, using non-monotonic fields as primary keys is not a good idea in InnoDB because the InnoDB data file itself is a b+tree, and a non-monotonic primary key causes the data file to be frequently split in order to maintain the b+tree characteristics when inserting a new record, which is very inefficient and uses The self-increment field is a good choice as a primary key.

The next chapter will discuss these index-related optimization strategies in detail.

Index usage policy and optimization

MySQL optimization is mainly divided into structural optimization (Scheme optimization) and query Optimization (optimization). The high performance indexing strategy discussed in this chapter belongs to the structure optimization category. The content of this chapter is based entirely on the theoretical basis above, in fact, once understanding the mechanism behind the index, then the choice of high-performance strategy becomes pure reasoning, and can understand the logic behind these strategies.

Sample Database

In order to discuss the indexing strategy, a database with a small amount of data is required as an example. This article selects one of the sample databases provided in the MySQL official documentation: employees. This database has a moderate complexity and a large amount of data. Is the e-r diagram of this database (referenced from the official MySQL manual):

Figure 12

The page for this database in the MySQL official documentation is http://dev.mysql.com/doc/employee/en/employee.html. This database is described in detail and is provided and imported, and if you are interested in importing this database into your own MySQL you can refer to the content in this article.

The leftmost prefix principle and related optimizations

The primary condition for efficient use of indexes is to know what queries will be used in the index, which is related to the "leftmost prefix principle" in B+tree, which illustrates the principle of the leftmost prefix by example.

Let's talk about the concept of federated indexing. In the above, we assume that the index only refers to a single column, in fact, the index in MySQL can refer to multiple columns in a certain order, such an index is called a federated index, in general, a federated index is an ordered tuple <a1, A2, ..., An>, where each element is a column of the data table, In fact, to strictly define the index requires a relational algebra, but here I do not want to discuss too much of the topic of relational algebra, because it will be very boring, so this is not strictly defined here. In addition, a single-column index can be considered a special case where the number of federated indexed elements is 1.

Take the Employees.titles table as an example, and see what indexes are on it:

From the results can be to the main index of the titles table is <emp_no, title, From_date> and a secondary index <emp_no>. To avoid multiple indexes making things complicated (the MySQL SQL optimizer behaves more complex at multiple indexes), here we drop the secondary index:

ALTERTABLE employees.titles DROP INDEXemp_no;

So you can focus on the behavior of the index primary.

The data structure and algorithm principle behind MySQL index

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More