The data structure and algorithm principle behind the MySQL index, mysql Index
Summary
This article takes MySQL database as the research object and discusses some topics related to database indexes. In particular, MySQL supports many storage engines, and various storage engines also support different indexes. Therefore, MySQL database supports multiple index types, such as BTree indexes and hash indexes, full-text index. To avoid confusion, this article will only focus on the BTree index, because this is an index that is commonly used in MySQL. This article will not discuss hash indexes and full-text indexes for the moment.
Data Structure and algorithm Basics
The nature of Indexes
MySQL officially defines indexes:Index is a data structure that helps MySQL efficiently obtain data.Extract the sentence trunk to get the essence of the index: Index is the data structure.
We know that database query is one of the most important functions of the database. We all want to query data as quickly as possible, so the database system designers will optimize the query algorithm. Of course, the most basic query algorithm is linear search. This algorithm with the complexity of O (n) is obviously bad when the data volume is large, fortunately, the development of computer science provides many better search algorithms, such as binary search and binary tree search. If you perform a slight analysis, you will find that each search algorithm can only be applied to a specific data structure. For example, binary search requires that the retrieved data be ordered, while binary search can only be applied to binary search trees, however, the organizational structure of the data itself cannot fully satisfy various data structures (for example, theoretically, it is impossible to organize both columns in order at the same time). Therefore,In addition to data, the database system also maintains data structures that meet specific search algorithms. These data structures reference (point to) data in a certain way, in this way, you can implement advanced search algorithms on these data structures. This data structure is an index..
Let's look at an example:
Figure 1
Figure 1 shows a possible indexing method. On the left is a data table with a total of seven records in two columns, and on the left is the physical address of the data records (note that logically adjacent records are not physically adjacent on the disk ). To speed up Col2 search, you can maintain a binary search tree shown on the right. Each node contains an index key value and a pointer to the physical address of the corresponding data record, in this way, you can use binary search to obtain the corresponding data in the complexity of O (log2n.
Although this is a real index, the actual database system does not use a binary tree or its evolved red-black tree. The reason is described below.
B-Tree and B + Tree
Currently, most database systems and file systems use B-Tree or its variant B + Tree as the index structure, in the next section of this article, we will discuss why B-Tree and B + Tree are widely used in Indexing Based on the memory principle and computer access principle, this section describes them from the perspective of data structure.
B-Tree
To describe B-Tree, first define a data record as a binary group [key, data]. The key is the key value of the record. Keys are different for different data records; data records data except the key. B-Tree is a data structure that meets the following conditions:
1. d is a positive integer greater than 1, called the degree of B-Tree.
2. h is a positive integer called the height of B-Tree.
3. Each non-leaf node consists of N-1 keys and n pointers, where d <= n <= 2d.
4. Each leaf node contains at least one key and two pointers, and at most two D-1 keys and two 2d pointers. The pointers of leaf nodes are null.
5. All leaf nodes have the same depth, which is equal to the height of the tree H.
6. Keys and pointers are separated from each other. The two ends of the node are pointers.
7. The keys in a node are arranged from left to right in non-descending order.
8. All nodes form a tree structure.
9. Each pointer is either null or pointing to another node.
10. if a pointer is left at the far left of the node and is not null, all keys pointing to the node are smaller than v (key1), where v (key1) is the value of the first key of the node.
11. if a pointer is at the far right of a node and is not null, all keys pointing to the node are greater than v (keym), where v (keym) is the value of the last key of the node.
12. if the adjacent keys of a pointer between the left and right nodes are keyi and keyi + 1 and are not null, all keys pointing to the node are smaller than v (keyi + 1) and greater than v (keyi ).
Figure 2 is a B-Tree with d = 2.
Figure 2
Due to the characteristics of B-Tree, the algorithm for retrieving data by key in B-Tree is very intuitive: first, perform a binary search from the root node. If yes, the data of the corresponding node is returned, otherwise, recursive search is performed on the node pointed to by the pointer in the corresponding interval until the node is found or the null pointer is found. The former is found successfully, and the latter fails to be searched. The pseudo code for searching algorithms on B-Tree is as follows:
In addition, insertion and deletion of new data records will damage the nature of B-Tree. Therefore, when inserting and deleting data records, A split, merge, or transfer operation needs to be performed on the Tree to maintain the B-Tree nature. This article does not intend to fully discuss the B-Tree content, because many documents have already elaborated on the mathematical nature of B-Tree and the insertion and deletion algorithm, interested friends can find relevant materials in the reference column at the end of this Article for reading.
B + Tree
B-Tree has many variants, the most common of which is B + Tree. For example, MySQL generally uses B + Tree to implement its index structure.
Compared with B-Tree, B + Tree has the following differences:
1. The maximum pointer value for each node is 2d instead of 2d + 1.
2. Internal nodes do not store data, but only store keys. Leaf nodes do not store pointers.
Figure 3 shows a simple B + Tree diagram.
Figure 3
Because not all nodes have the same region, the size of the inner node and the inner node in the B + Tree is generally different. This is different from B-Tree. Although the number of keys and pointers stored by different nodes in B-Tree may be different, the domain and upper limit of each node are the same, therefore, in implementation, B-Tree usually applies for the same size of space for each node.
Generally, B + Tree is more suitable for implementing the external storage index structure than B-Tree. The specific reason is related to the principle of external storage and the principle of computer access, which will be discussed below.
B + Tree with sequential access pointers
Generally, the B + Tree structure used in the database system or file system is optimized on the basis of the classic B + Tree, and sequential access pointers are added.
Figure 4
4. Add a pointer to the adjacent leaf node in each leaf node of B + Tree to Form B + Tree with sequential access pointers. The purpose of this optimization is to improve the performance of the access interval. In example 4, if you want to query all data records whose key is from 18 to 49, after 18 is found, you only need to traverse the nodes and pointers in order to access all data nodes at a time, which greatly improves the interval query efficiency.
This section briefly introduces B-Tree and B + Tree, the next section describes why B + Tree is the preferred data structure for database system index implementation based on the principle of memory access.
Why use B-Tree (B + Tree)
As mentioned above, data structures such as red and black trees can also be used for indexing. However, B-/+ Tree is widely used as the index structure in file systems and database systems, this section discusses B-/+ Tree as the theoretical basis of Indexing Based on the knowledge of computer composition principles.
In general, the index itself is also very large and cannot be all stored in the memory. Therefore, the index is often stored on the disk as an index file. In this way, disk I/O consumption is generated during index search. Compared with memory access, I/O access consumes several orders of magnitude, therefore, the most important indicator for evaluating a data structure as an index is the progressive complexity of the number of disk I/O operations during the search process. In other words, the structure of the index should minimize the number of disk I/O accesses during the search process. Next we will first introduce the memory and disk access principles, and then combine these principles to analyze the efficiency of B-/+ Tree as the index.
Principle of primary storage access
Currently, the main memory used by computers is generally random read/write memory (RAM). The structure and access principle of modern RAM are complicated. Here, the specific differences are given, abstract A very simple access model to illustrate how RAM works.
Figure 5
From the abstract point of view, the primary storage is a matrix composed of a series of storage units, each of which stores a fixed size of data. Each storage unit has a unique address, and the addressing rules of modern primary storage are complicated. Here, it is simplified into a two-dimensional address: a row address and a column address can be used to uniquely locate a storage unit. Figure 5 shows a 4x4 primary storage model.
The access process of the primary storage is as follows:
When the system needs to read the primary storage, the address signal is put into the address bus and uploaded to the primary storage. After reading the address signal, the primary storage parses the signal and locates the specified storage unit, the stored unit data is then placed on the Data Bus for reading by other components.
The process of writing the primary storage is similar. The system will write the Unit Address and data respectively on the address bus and data bus. The primary storage will read the content of the two bus and write the data accordingly.
It can be seen that the time of primary storage access is only linearly related to the number of accesses, because there is no mechanical operation, the "distance" of the data accessed twice will not affect the time, for example, it takes the same time To get A0, then A1, and A0, and then D3.
Disk access principle
As mentioned above, indexes are generally stored on disks as files, and index retrieval requires disk I/O operations. Unlike primary storage, disk I/O consumes a lot of time due to mechanical movement.
Figure 6 shows the overall structure of the disk.
Figure 6
A disk is composed of circular disks of the same size and coaxial. the disk can be rotated (each disk must be rotated synchronously ). There is a head bracket on one side of the disk, and the head bracket is fixed with a set of heads, each of which is responsible for accessing the content of a disk. The head cannot be rotated, but it can be moved along the radius of the disk (in fact, it is a diagonal movement). Each head must be coaxial at the same time, that is, from top to bottom, all heads are overlapped at any time (however, multiple heads are already available independently, but not limited to this ).
Figure 7 shows the disk structure.
Figure 7
The disc is divided into a series of concentric rings. the center of the circle is the center of the disc. Each concentric ring is called a track, and all tracks with the same radius form a cylindrical surface. Tracks are divided into small segments along the radius line. Each segment is called a sector, and each sector is the smallest storage unit of the disk. For simplicity, we assume that the disk has only one disk and one head.
When data needs to be read from the disk, the system will pass the data Logical Address to the disk. the disk control circuit translates the logical address into a physical address according to the addressing logic, determine the track and sector of the data to be read. To read data from this sector, you need to place the head on the top of this sector. To achieve this, the head needs to be moved to the corresponding track. This process is called seeking, and the time consumed is called seeking time, after the disk rotation, the target sector is rotated to the head. the time consumed in this process is called the rotation time.
Local principle and disk pre-read
Because of the characteristics of the storage medium, the access to the disk itself is much slower than the primary storage, coupled with the cost of mechanical movement, the access speed of the disk is often one of the primary storage, so in order to improve efficiency, minimize disk I/O. To achieve this goal, the disk is usually not read strictly on demand, but preread every time. Even if only one byte is required, the disk starts from this location, read data of a certain length in sequence and put it into the memory. This theory is based on the well-known local principle in Computer Science:
When a data is used, the data nearby it is usually used immediately.
The data required during the program running is usually concentrated.
Because sequential disk reading is highly efficient (with little rotation time required without seeking time), preread can improve I/O efficiency for local programs.
The preread length is generally an integer multiple of the page. Pages are logical blocks for computer memory management. Hardware and operating systems often divide primary and disk storage areas into contiguous blocks of the same size, each storage block is called a page (in many operating systems, the page size is usually 4 k). The primary storage and disk exchange data in pages. When the data to be read by the program is not in the primary storage, a page missing exception is triggered, and the system sends a disk reading signal to the disk, the disk finds the starting position of the data and reads one or more pages consecutively into the memory. If an exception is returned, the program continues to run.
B-/+ Tree index Performance Analysis
At last, we can analyze the performance of B-/+ Tree indexes.
As mentioned above, the index structure is evaluated by the number of disk I/O operations. From B-Tree analysis, according to the definition of B-Tree, we can see that a maximum of h nodes can be accessed at a time. The database system designer cleverly utilizes the disk pre-read principle to set the size of a node to equal to a page, so that each node can be fully loaded only once I/O. To achieve this goal, you also need to use the following techniques to implement B-Tree:
Each time you create a node, you can directly apply for a page space to ensure that a node is physically stored on a page. In addition, the computer storage allocation is page-aligned, A node only needs one I/O operation.
A B-Tree retrieval requires a maximum of H-1 I/O (root node resident memory), and the progressive complexity is O (h) = O (logdN ).Generally, in practice, the output degree d is a very large number, usually greater than 100, so h is very small (usually no more than 3 ).
In conclusion, the efficiency of using B-Tree as the index structure is very high.
The structure of the red and black trees is much deeper than h. Because logically close nodes (Parent and Child) may be far physically unable to use locality, the I/O complexity of the red and black trees is O (h ), the efficiency is much lower than that of B-Tree.
As mentioned above, B + Tree is more suitable for external storage indexes because it is related to the degree d of inner nodes. From the above analysis, we can see that the larger the value of d, the better the index performance, and the upper limit of the degree of exit depends on the size of the key and data in the node:
Dmax =Floor(Pagesize/(keysize + datasize + pointsize ))(Pagesize-dmax> = pointsize)
Or
Dmax =Floor(Pagesize/(keysize + datasize + pointsize)-1(Pagesize-dmax <pointsize)
Floor indicates downgrading. Because the nodes in the B + Tree remove the data field, they can have a higher degree of output and better performance.
This chapter discusses the index-related data structures and algorithms theoretically. The next chapter will discuss how B + Tree is implemented as an index in MySQL, at the same time, we will introduce two different index implementation modes, non-clustered index and clustered index, in combination with MyISAM and InnDB storage engine.
MySQL index implementation
In MySQL, indexes belong to the concept of storage engine level. Different storage engines implement indexes differently. This article mainly discusses the index implementation methods of MyISAM and InnoDB Storage engines.
MyISAM index implementation
The MyISAM engine uses B + Tree as the index structure. The data domain of the leaf node stores the data record address. Is the principle of MyISAM index:
Figure 8
There are three columns in the table. If we use Col1 as the Primary key, Figure 8 shows the Primary index (Primary key) of the MyISAM table. It can be seen that the index file of MyISAM only stores the address of the data record.In MyISAM, the primary index and Secondary index (Secondary key) have no difference in structure, but the primary index requires that the key is unique, and the Secondary index key can be repeated.If we create a secondary index on Col2, the index structure is shown in:
Figure 9
It is also a B + Tree that stores data records in the data field. Therefore,The index search algorithm in MyISAM first searches for indexes based on the B + Tree search algorithm. If the specified Key exists, the value of its data domain is taken out, and the address is the value of the data domain, read the corresponding data records.
The index method of MyISAM is also called "non-clustered". The reason for this is to distinguish it from the clustered index of InnoDB.
InnoDB Index implementation
Although InnoDB uses B + Tree as the index structure, the implementation method is different from that of MyISAM.
The first major difference is that the InnoDB data file itself is an index file.As mentioned above, the MyISAM index file is separated from the data file, and the index file only stores the data record address. In InnoDB, the table data file itself is an index structure organized by B + Tree. The leaf node data field of this Tree stores the complete data records. The key of this index is the primary key of the data table, so the InnoDB table data file itself is the primary index.
Figure 10
Figure 10 shows the InnoDB primary index (also a data file). The leaf node contains the complete data record. This index is called a clustered index. Because the data files of InnoDB must be clustered by the primary key, InnoDB requires that the table have a primary key (MyISAM may not). If it is not explicitly specified, mySQL automatically selects a column that uniquely identifies a data record as the primary key. If this column does not exist, MySQL automatically generates an implicit field for the InnoDB table as the primary key, this field is 6 bytes in length and its type is long integer.
The second difference from the MyISAM index is that InnoDB's secondary index data domain stores the value of the primary key of the corresponding record rather than the address.In other words, all secondary indexes of InnoDB reference the primary key as the data domain. For example, Figure 11 shows a secondary index defined on Col3:
Figure 11
Here we use the ASCII code of English characters as a comparison criterion.Clustered index makes the search by primary key very efficient, but secondary index search requires two indexes: first, retrieve the secondary index to obtain the primary key, then, use the primary key to search for the record in the primary index.
Understanding the index implementation methods of different storage engines is very helpful for correct use and optimization of indexes. For example, after knowing the index Implementation of InnoDB, it is easy to understand why it is not recommended to use too long segments as the primary key, because all secondary indexes reference the primary index, too long primary index will make the secondary index too large. For example, it is not a good idea to use non-monotonous fields as the primary key in InnoDB, because the InnoDB data file itself is a B + Tree, non-monotonous primary keys will cause frequent split and adjustment of data files to maintain the features of B + Tree during the insertion of new records, which is very inefficient, using an auto-increment field as the primary key is a good choice.
The next chapter will discuss the index-Related Optimization Strategies in detail.
Index Usage Policy and Optimization
MySQL optimization mainly includes schema optimization and Query optimization ). The high-performance index Policy discussed in this chapter mainly belongs to the scope of structure optimization. The content of this chapter is based entirely on the theoretical basis above. In fact, once you understand the mechanisms behind the index, selecting a high-performance policy becomes pure reasoning, and you can understand the logic behind these policies.
Sample Database
To discuss the index policy, a database with a small amount of data is required as an example. This article uses one of the sample databases provided in the MySQL official document: employees. This database has moderate link complexity and a large amount of data. Is the E-R diagram of this database (reference from the MySQL official manual ):
Figure 12
In the official MySQL documentation, the page for accessing this database is http://dev.mysql.com/doc/employee/en/employee.html. This section describes the database in detail and provides and import methods. If you are interested in importing the database to your own MySQL, refer to the content in this article.
Principle and optimization of leftmost prefixes
The primary condition for using indexes efficiently is to know what kind of queries will use indexes. This problem is related to the "leftmost prefix principle" in B + Tree. The following example illustrates the leftmost prefix principle.
Here we will talk about the concept of Federated indexes. In the above article, we assume that the index only references a single column. In fact, the index in MySQL can reference multiple columns in a certain order. This index is called a joint index. Generally, A joint index is an ordered tuples <a1, a2 ,..., An>, each element is a column of a data table. In fact, relational algebra is required to strictly define indexes. But I don't want to discuss too many topics about relational algebra, because it will be boring, strictly defined here. In addition, a single column index can be seen as a special case where the number of union index elements is 1.
Take the employees. titles table as an example. The following describes the indexes on the table:
From the results, the primary index of the titles table can be <emp_no, title, from_date>, and a secondary index <emp_no>. To avoid the complexity of Multiple indexes (MySQL SQL optimizer performs more complex operations when multiple indexes exist), we will drop the secondary index here:
ALTER TABLE employees.titles DROP INDEX emp_no;
In this way, you can focus on the index PRIMARY behavior.