B+tree principle and the index analysis of MySQL

Last Update:2017-06-14 Source: Internet

Author: User

Tags data structures mysql index

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. The nature of the index

The official MySQL definition of an index is: index is the data structure that helps MySQL to get data efficiently. By extracting the skeleton of a sentence, you can get the essence of the index: The index is the data structure.

We know that database query is one of the most important functions of database. We all want to query the data as fast as possible, so the designers of the database system are optimized from the point of view of the query algorithm. The most basic query algorithm, of course, is sequential lookup (linear search), the complexity of the O (n) algorithm is obviously bad when the volume of data is large, fortunately, the development of computer science provides a lot of better search algorithms, such as binary search (binary search), two-tree search (binary search), and so on.

If you look at it a little bit, you will find that each lookup algorithm can only be applied to a particular data structure, such as a binary lookup requires an orderly retrieval of data, while a binary tree lookup can only be applied to a binary lookup tree, but the data itself cannot be fully organized to meet a variety of data structures (for example, It is theoretically impossible to organize both columns sequentially, so in addition to the data, the database system maintains a data structure that satisfies a particular lookup algorithm that references (points to) data in some way, so that an advanced find algorithm can be implemented on those data structures. This data structure is the index.

Second, B-tree (balanced multi-path search tree)

B-tree is a balanced lookup tree designed for storage devices other than disks. So before you talk about B-tree, you should know about the disk.

When the system reads data from the disk into memory, it is based on the disk block (block), and the data in the same disk block is read out at once, rather than needing anything.

The InnoDB storage engine has the concept of pages (page), which is the smallest unit of its disk Management . The default size of each page in the InnoDB storage engine is 16KB, the page size can be set to 4K, 8K, 16K by parameter innodb_page_size, and the page size can be viewed in MySQL by the following command:

MySQL>like'innodb_page_size';

While the system disk block storage space is often not so large, so innodb each request disk space will be a number of address contiguous disk block to reach the size of the page 16KB. InnoDB when the disk data is read into the disk will be the basic unit of the page, when querying data if each piece of data in a page can help to locate the location of data records, this will reduce disk I/O times, improve query efficiency.

The data of the B-tree structure allows the system to efficiently find the disk block where the data resides. In order to describe B-tree, first define a record as a two-tuple [key, data], key for the record, the primary key value in the corresponding table, data is a row of records in addition to the primary key. For different records, the key value differs from the other.

The b-tree of an M-order has the following characteristics:
1. Each node has a maximum of M children.
2. In addition to the root and leaf nodes, each of the other nodes has at least ceil (M/2) children.
3. If the root node is not a leaf node, there are at least 2 children.
4. All leaf nodes are on the same layer and do not contain other keyword information.
5. Each non-terminal node contains n keyword information (p0,p1,... Pn, K1,... kn)
6. Number of keywords N satisfied: ceil (M/2)-1 <= n <= m-1
7. Ki (i=1,... n) is the keyword, and the keyword is sorted in ascending order.
8. Pi (I=1,... N) is a pointer to a sub-root node. All node keywords for the subtree pointed to by P (i-1) are smaller than Ki, but are greater than K (i-1).

Each node in the B-tree can contain a large number of keyword information and branches, as shown in a 3-step b-tree:

Each node occupies disk space on a disk block, a node has two ascending sorted keywords and three pointers to the subtree nodes, and the pointer stores the address of the disk block where the child nodes reside. The two keywords are divided into three range fields that correspond to the range fields of the data of the subtree pointed to by three pointers. In the root node example, the data range for the subtree pointed to by the keyword 17 and the 35,P1 pointer is less than the 17,P2 pointer to the subtree that the 17~35,P3 pointer points to is greater than 35.

Simulate the process of finding keyword 29:

Disk Block 1 is found based on the root node and is read into memory. "Disk I/O operation 1th Time"
Compare the keyword 29 in the interval (17,35), locate the disk block 1 of the pointer P2.
Locate disk block 3 According to the P2 pointer and read into memory. "Disk I/O operation 2nd time"
Compare the keyword 29 in the interval (26,30), locate the disk block 3 of the pointer P2.
Locate disk Block 8 According to the P2 pointer and read into memory. "Disk I/O operation 3rd Time"
Find the Keyword 29 in the list of keywords in disk block 8.

Analyzing the above procedure, it is found that 3 disk I/O operations and 3 memory lookup operations are required. Because the in-memory keyword is an ordered table structure, the dichotomy method can be used to find efficiencies. The 3 disk I/O operations are the determining factor that affects the overall B-tree lookup efficiency . B-tree reduces the number of nodes relative to Avltree, making the data for each disk I/O to memory play a role, thus improving query efficiency.

Third, B+tree

B+tree is an optimization based on b-tree, making it more suitable for implementing the external storage index structure, InnoDB storage engine is to implement its index structure with B+tree.

From the B-tree structure diagram in the previous section, you can see that each node contains not only the key value of the data, but also the database value. and each page storage space is limited, if the data is large, it will cause each node (that is, a page) can store the number of keys is very small, when the amount of data stored is very large, it will also cause b-tree depth, increase the number of disk I/O at the time of query, thus affecting query efficiency. in B+tree, all data record nodes are stored in the same layer of leaf nodes according to the order of key values, instead of only the key value information is stored on the leaf node , which can greatly increase the number of key values stored by each node and reduce the height of b+tree.

There are several different b+tree relative to B-tree:

Non-leaf nodes store only key value information.
All leaf nodes have a chain pointer between them.
The data records are stored in the leaf node.

The B-tree in the previous section is optimized, since B+tree's non-leaf nodes store only key-value information, assuming that each disk block can store 4 key values and pointer information, it becomes b+tree after its structure as shown:

Usually there are two head pointers on the B+tree, one pointing to the root node and the other to the smallest leaf node, and all leaf nodes (i.e. data nodes) are a chain ring structure. Therefore, you can perform two kinds of lookup operations on B+tree: One is the range lookup and the paging lookup for the primary key, and the other is a random lookup starting from the root node.

Clustered index and non-clustered index

B+tree is commonly used for indexing in MySQL, but it differs in implementation based on clustered indexes and nonclustered indexes.

1. Clustered index

The so-called clustered index refers to the primary index file and data file as the same file, clustered index is mainly used in the InnoDB storage engine. The data on the leaf node of the b+tree in the way of the index implementation is itself, the key is the key, and if it is a generic index, data points to the corresponding primary index, as shown in:

A pointer to an adjacent leaf node is added to each leaf node of the B+tree, and a b+tree with sequential access pointers is formed. The purpose of this optimization is to improve the performance of the interval access , for example, if you want to query key from 18 to 49 of all data records, when the 18 is found, simply follow the node and pointer in order to traverse to all the data nodes, and greatly mentions the efficiency of the interval query.

2, non-clustered index

Non- The clustered index refers to the data on the leaf node of the b+tree, not the data itself, but the address where it is stored. There is no difference between the primary and secondary indexes, except that the key in the main index must be unique. Mainly used in the MyISAM storage engine , such as:

The non-clustered index has more IO operations than the clustered index to read the data once, so the lookup performance is poor.

V. MySQL INDEX implementation

In MySQL, indexes are at the storage engine level, and different storage engines implement the indexes differently, and the following is a discussion of how the MyISAM and InnoDB two storage engines are implemented.

1, MyISAM Index implementation

The MyISAM engine uses B+tree as the index structure, and the data domain of the leaf node holds the address of the record. is a schematic diagram of the MyISAM index:

There are three columns in the table, assuming that we have Col1 as the primary key, it is a MyISAM table's main index (Primary key) schematic. You can see that the index file of MyISAM only stores the address of the data record . In MyISAM, the Primary index and secondary index (secondary key) are structurally indistinguishable, except that the primary index requires that key be unique, and the secondary index key can be duplicated . If we establish a secondary index on Col2, the structure of this index is as follows:

Also a b+tree, thedata field holds the address of the record. Therefore, the algorithm of index retrieval in MyISAM is to search the index according to the B+tree search algorithm first, if the specified key exists, the value of its data field is fetched, then the data record is read with the value of the data field.

The index of MyISAM is also called "non-aggregation", and the reason for this is to differentiate it from InnoDB's clustered index.

2, INNODB Index implementation

Although InnoDB also uses B+tree as the index structure, the implementation is very different from MyISAM.

The first major difference is that the InnoDB data file itself is the index file . As you know above, theMyISAM index file and the data file are detached, and the index file only holds the address of the data record. In InnoDB, the table data file itself is an index structure organized by B+tree, and the tree's leaf node data field holds the complete data record. The key of this index is the primary key of the data table, so the InnoDB table data file itself is the primary index.

Is the InnoDB primary index (also the data file), you can see that the leaf node contains the complete data record . This index is called a clustered index . Because the InnoDB data file itself is clustered by the primary key, the INNODB requires that the table must have a primary key (MyISAM can not), and if it is not explicitly specified, the MySQL system automatically selects a column that uniquely identifies the data record as the primary key, and if no such column exists, Then MySQL automatically generates an implicit field for the InnoDB table as the primary key, which is 6 bytes long and has a length of type.

The second difference from the MyISAM index is that the secondary index of the InnoDB data field stores the value of the corresponding record primary key instead of the address. In other words,all secondary indexes of InnoDB refer to the primary key as the data domain . For example, to define a secondary index on the Col3:

Here is the ASCII code of the English character as the comparison criterion. Clustered index This implementation makes search by primary key efficient, but secondary index search needs to retrieve two times index: first retrieves the secondary index to obtain the primary key, and then retrieves the record with the primary key to the primary index.

Understanding how index implementations of different storage engines can be useful for proper use and optimization of indexes, such as knowing the InnoDB index implementation, makes it easy to understand why it is not recommended to use too-long fields as primary keys, because all secondary indexes refer to the primary index, and the long primary index makes the secondary index too large . For example, using non-monotonic fields as primary keys is not a good idea in InnoDB because the InnoDB data file itself is a b+tree, and a non-monotonic primary key causes the data file to be frequently split in order to maintain b+tree characteristics when inserting a new record, which is inefficient and Using the self-increment field as the primary key is a good choice .

InnoDB The difference between index and MyISAM index :

The first is the difference between the primary index, the InnoDB data file itself is the index file. The index and data of MyISAM are separate.

The second is the difference between secondary indexes: the secondary index of the InnoDB data field stores the value of the corresponding record primary key instead of the address. The secondary and primary indexes of MyISAM are not much different.

B+tree principle and the index analysis of MySQL

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More