Analyze Mysql InnoDB indexes and mysqlinnodb Indexes

Source: Internet
Author: User

Analyze Mysql InnoDB indexes and mysqlinnodb Indexes

Abstract:

This article introduces the InnoDB Index-related knowledge of Mysql, from various trees to the indexing principle to the details of storage.

InnoDB is the default storage engine of Mysql (MyISAM before Mysql5.5.5 ). For the purpose of efficient learning, this article mainly introduces InnoDB and compares a small number of MyISAM instances.

This article was summarized during my learning process. The content mainly comes from books and blogs (which will be provided by references). I have added some of my own understandings during the process, the description is not accurate.

1. Various TREE STRUCTURES

I didn't plan to start with the Binary Search Tree, because there are too many related articles on the Internet, but considering the clear illustration, it is helpful to understand the problem and to ensure the integrity of the article, finally, this part is added.

Let's take a look at several tree structures:

1. Search for a binary tree: each node has two subnodes. Increasing the data size will inevitably lead to a rapid increase in height. Obviously, this is not suitable as the infrastructure for storing large amounts of data.

2 B: a B-level m-tree is a balanced m-path search tree. The most important property is that the number of keywords contained by each non-root node j satisfies the following requirements: Running m/2 running-1 <= j <= m-1; the number of subnodes of a node is 1 more than the number of keywords, so that the keyword becomes the delimiter of the subnode. The key words are usually drawn to the middle of the subnode in the graph, which is very vivid and easy to distinguish from the B + tree. Because the data exists in both leaf nodes and non-leaf nodes, it is impossible to simply traverse the keywords in Tree B in order, so the method of central traversal must be used.

3 B + tree: an m-Level B tree is a balanced m-path search tree. The most important property is that the number of keywords contained by each non-root node j satisfies the following requirements: Running m/2 running-1 <= j <= m; the number of Subtrees can be as many as the number of keywords. Non-leaf nodes store the smallest keywords in the subtree. At the same time, data nodes only exist in leaf nodes, and horizontal pointers are added between leaf nodes, which makes it easy to traverse all data sequentially.

4 B * tree: an m-Level B tree is a balanced m-path search tree. The two most important properties are: 1. The number of keywords contained in each non-root node j satisfies the following requirements: Limit m2/3 limit-1 <= j <= m; 2. Horizontal pointers are added between non-leaf nodes.






The B/B +/B * trees have similar operations, such as searching, inserting, and deleting nodes. Here, we only focus on the insertion of nodes, and only analyze the insertion operations when the current node is full, because this action is slightly complicated and can fully reflect the differences between several trees. In contrast, the retrieval node is easier to implement, and the deletion of a node only needs to complete the process opposite to the insertion. (deleting a node in an actual application is not a complete Inverse Operation of insertion, it is often used only to delete data and keep the space for subsequent use ).

First, let's look at the split of Tree B. The red value is the newly inserted node. Every time a node is full, it needs to be split (split is a recursive process, refer to the insert 7 below, resulting in two layers of split ), because the non-leaf node of Tree B also saves the key value, the value after the node split is full will be distributed in three places: 1 original node, 2 parent node of the original node, 3. Create a new sibling node of the original node (refer to the insertion process in step 5 and Step 7 ). Splitting may increase the height of the tree (refer to the Insert Process of 3, 7), or may not affect the height of the tree (refer to the Insert Process of 5, 6 ).


Split of B + tree: When a node is full, allocate a new node, copy 1/2 of the data from the original node to the new node, and add a pointer to the new node in the parent node; the split of the B + tree only affects the original node and the parent node, but does not affect the sibling node. Therefore, it does not need to point to the sibling node.


B * tree split: When a node is full, if its next sibling node is not full, move part of the data to the sibling node, and then insert a keyword into the original node, finally, modify the keywords of the sibling node in the parent node (because the keyword range of the sibling node has changed ). If the brothers are full, add a new node between the original node and the sibling node, copy 1/3 of the data each to the new node, and add a pointer to the new node at the parent node. We can see that the split of the B * tree is very clever, because the B * tree should be 2/3 full after the split. If the B + tree method is used, simply dividing a full node into two parts will make each node only 1/2 full, which does not meet the requirements of the B * tree. Therefore, the B * tree adopts the policy of inserting a sibling node after the current node is full (this is why B * tree needs to add a sibling linked list to a non-leaf node ), until the sibling nodes are full and then pulled together to join the members, and each of them contributes 1/3 to set up a new node. The result is that the three nodes are just 2/3 full, it is a pleasure to meet the requirements of the B * tree.


The B + tree is suitable for serving as the basic structure of the database, because of the memory of the Computer-two-layer storage structure of the mechanical hard disk. The memory can achieve fast Random Access (random access gives any address and requires that the data stored in this address be returned), but the capacity is small. The Random Access to the hard disk must go through mechanical operations (one head is moved and two disks are rotated). The access efficiency is several orders of magnitude lower than the memory, but the hard disk capacity is large. The typical database capacity is much larger than the available memory size, which determines that it is possible to retrieve a piece of data in the B + tree through several disk I/O operations. As shown in: Generally, the downward reading of a node may be a disk I/O operation, but non-leaf nodes usually load the memory in the initial stage to speed up access. To speed up horizontal traversal between nodes, the blue CPU computing/memory read in the figure may be optimized to a binary search tree (the page directory mechanism in InnoDB) in the real database ).


The B + tree in the real database should be very flat. You can verify how flat the B + tree is in InnoDB by inserting enough data into the table in sequence. We CREATE a test table with only simple fields through the CREATE statement, as shown in, and then add data to fill the table. Through the statistical data (for the source, see References 1), we can analyze several intuitive conclusions, which display the scale of the B + tree in the database at a macro level.

1. Each leaf node stores 468 rows of data, and each non-leaf node stores about 1200 key values. This is a balanced 1200-way search tree!

2 For a 22.1G capacity table, only the B + tree with a height of 3 can be stored. This capacity can meet the needs of many applications. If the height is increased to 4, the storage capacity of the B + tree will immediately increase to 25.9T!

3 For a 22.1G capacity table, the height of the B + tree is 3. If you want to load all non-leaf nodes to the memory, you only need less than 18.8M of memory. (How can we draw this conclusion? For a tree with a height of 2, 1203 leaf nodes only need 18.8M space, while 22.1G is 3 from the height of the good table and 1204 non-leaf nodes. At the same time, we assume that the size of the leaf node is greater than that of the non-leaf node, because the leaf node stores row data, rather than the leaf node only has keys and a small amount of data .), With only such a small amount of memory, you can ensure that only one disk IO operation is required to retrieve the required data, and the efficiency is very high.



2. Mysql storage engine and Index

It can be said that the database must have an index. If there is no index, the search process becomes sequential search. the time complexity of O (n) is almost intolerable. We can easily imagine how to use the B + tree to index A Table composed of only single keywords, as long as the keywords are stored in the node of the tree. When a database record contains multiple fields, a B + tree can only store the primary key. If the retrieved field is a non-primary key field, the primary key index becomes a sequential query. In this case, the second index should be created on the second column to be retrieved. This index is organized by an independent B + tree. There are two common methods to solve the problem of multiple B + trees accessing the same set of table data. One is the Cluster index ), A kind of index is called a non-clustered index ). Although both are called indexes, they are not an independent index type, but a data storage method. For clustered index storage, row data and primary key B + trees are stored together. The secondary key B + tree only stores the secondary key and primary key, primary Key and non-primary key B + trees are almost two types of trees. For non-clustered index storage, the primary key B + tree stores the pointer to the real data row on the leaf node, rather than the primary key.

InnoDB uses clustering indexes to organize primary keys to a B + tree, and row data is stored on the leaf node, if you use a condition such as "where id = 14" to find the primary key, you can find the corresponding leaf node based on the B + tree search algorithm, and then obtain row data. If the Name column is searched by condition, two steps are required: the first step is to retrieve the Name in the secondary index B + tree and obtain the corresponding primary key at its leaf node. The second step is to use the primary key to perform the B + tree search operation on the primary index B + tree, and then obtain the entire row of data at the leaf node.

MyISM uses non-clustered indexes. The two B + trees of non-clustered indexes do not look very different. The structure of nodes is completely consistent, but the storage content is different, the primary key index B + Tree node stores the primary key, and the secondary key index B + tree stores the secondary key. Table data is stored in an independent place. The leaf nodes of the two B + trees use an address to point to the real table data. For table data, there is no difference between the two keys. Because the index tree is independent, secondary key retrieval does not require access to the primary key index tree.

To better illustrate the differences between the two indexes, we assume that a table stores four rows of data. Id is the primary index, and Name is the secondary index. The figure clearly shows the differences between clustered indexes and non-clustered indexes.




We focus on clustering indexes. It seems that the efficiency of clustering indexes is significantly lower than that of non-clustering indexes, because each secondary index search requires two B + tree searches, isn't that an alternative? What are the advantages of clustered index?

1. Because the row data and the leaf node are stored together, the primary key and row data are loaded into the memory together, And the row data can be returned immediately after the leaf node is found, if data is organized by the primary key Id, the data is obtained faster.

2. The secondary index uses the primary key as the "Pointer" instead of the address value as the pointer. This reduces the maintenance of secondary indexes when row movement or data page splitting occurs, using the primary key value as a pointer will occupy more space for the secondary index. The advantage is that InnoDB does not need to update the "Pointer" in the secondary index when moving data ". That is to say, the location of the row (which is located through a 16 K Page in implementation and will be involved later) the changes will occur as the data in the database changes (the previous B + Tree node split and Page split ), using clustered indexes ensures that no matter how the node of the primary key B + tree changes, the secondary index tree is not affected.

3 Page Structure

If the previous content is biased towards the principle of interpretation, then the specific implementation will be involved later.

To understand the implementation of InnoDB, we have to mention the Page structure. Page is the most basic component of InnoDB Storage and the minimum unit for InnoDB disk management, all Database-related content is stored in this Page structure. Page is divided into several types. Common Page types include the data Page (B-tree Node) Undo Page (Undo Log Page) System Page (System Page) Transaction data Page (Transaction System Page). The size of a single Page is 16 KB (the compilation macro UNIV_PAGE_SIZE is controlled). Each Page is uniquely identified using a 32-bit int value, this corresponds to the maximum InnoDB Storage Capacity of 64 Tb (16Kib * 2 ^ 32 = 64Tib ). Shows the basic structure of a Page:


Each Page has a common header and end, but the content in the center changes according to the Page type. The Page header contains some data that we care about. The details of the Page header are shown as follows:


We focus on the fields related to the Data Organization Structure: two pointers are saved in the Page header, which respectively refer to the forward and last pages, the header also contains the Page type information and the number used to uniquely identify the Page. Based on the two pointers, we can easily imagine that the Page link is a two-way linked list structure.

Let's look at the main content of the Page. We mainly focus on row data and index storage. They are all located in the User Records section of the Page, and User Records occupies most of the Page space, user Records is composed of one Record. Each Record represents a node on the index tree (non-leaf node and leaf node ). Inside a Page, the beginning and end of a single-chain table are represented by two records of fixed content. "Infimum" in string form indicates the beginning and "Supremum" indicates the end. These two Records are used to indicate that the Records at the beginning and end are stored in the System Records segment. The System Records and User Records are two parallel segments. InnoDB has four different Record types: 1 primary key index tree non-leaf node 2 primary key index tree leaf node 3 secondary key index tree non-leaf node 4 secondary key index tree leaf node. There are some differences in the Record formats of these four nodes, but they all store the Next pointer pointing to the Next Record. We will introduce these four nodes in detail in the future. Now we only need to regard Record as a single-chain table node that stores data and contains the Next pointer.


User Record exists in the form of a single-chain table in the Page. The initial data is arranged in the order of insertion. However, with the insertion of new data and the deletion of old data, the physical order of data will become chaotic, but they still keep the logic order.


By combining the organization form of the User Record with several pages, we can see a slightly complete form.


Now let's take a look at how to locate a Record:

1. Start to traverse the B + tree of an index through the root node, and reach a Page through non-leaf nodes at each layer. The Page stores all the leaf nodes.

2. traverse a single-chain table from the "Infimum" node in the Page (this traversal is often optimized). If the key is found, the result is returned successfully. If the record reaches "supremum", it indicates that there is no proper key in the current Page. In this case, you need to use the Next Page pointer of the Page to jump to the Next Page and continue searching one by one from "Infimum.

Let's take a closer look at what data is stored in different types of Record. According to the different B + Tree nodes, User Record can be divided into four formats, which are distinguished by color.

1 Primary Index Tree non-leaf node (green)

1. The smallest value (Min Cluster Key on Child) in the primary Key stored by the subnode is required by the B + tree to locate the specific record location in a Page.

The Number (Child Page Number) of the Page with the smallest value of 2. It is used to locate the Record.

2 primary index tree leaf node (yellow)

1 primary Key (Cluster Key Fields), which is required by the B + tree and is part of the data row

2. All columns except the primary Key (Non-Key Fields) are a set of all columns except the primary Key in the Data row.

The two parts here add up to a complete data row.

3 secondary index tree non-leaf node non-(blue)

1. Min Secondary-Key on Child, which is required by the B + tree, the function is to locate the location of a specific record in a Page.

2. Why do non-leaf nodes store primary keys? Secondary indexes can be unique, but the B + tree requires that the key value be unique, therefore, the value of the secondary key and the value of the primary key are combined as the real key value in the B + tree to ensure uniqueness. However, the secondary index B + tree has four more non-leaf nodes than the leaf nodes. (That is, the blue node is 4 bytes more than the red node)

The Number (Child Page Number) of the Page with the smallest value 3. It is used to locate the Record.

4 secondary index tree leaf node (red)

1 Secondary index Key value (Secondary Key Fields), which is required by the B + tree.

The Cluster Key Fields value is used to perform a B + tree search in the primary index tree to find the entire record.


The following is the most important part of this article. Combined with the structure of the B + tree and the four types of Record Content described above, we can finally draw a panorama. Because the secondary index B + tree has a similar structure with the primary key index, only the structure of the primary key index tree is shown here, it only contains "primary key non-leaf node" and "primary key leaf node", that is, the green and yellow nodes.


Restore the tree to the simpler tree below, which is part of the B + tree. Note that there is no one-to-one relationship between Page and B + Tree nodes. Page is only used as a Record storage container. It is used to facilitate batch disk space management, the Page numbered 47 in is split into two independent nodes in the tree structure.


So far this article is over. This article only summarizes the data structures and implementations related to InnoDB indexes, and does not involve the practical experience of Mysql. This is mainly based on the following reasons:

1 principle is the cornerstone. We can use InnoDB indexes efficiently only by fully understanding how InnoDB indexes work.

2. I personally like this expression.

3. About InnoDB optimization, I have a more comprehensive introduction in high-performance Mysql. Anyone interested in optimizing Mysql can obtain related knowledge by themselves, my own accumulation has not reached the point where I can share this content.

In addition, if you are more interested in InnoDB implementation, you can refer to Jeremy Cole's blog (refer to the source of the three articles in the Document). This old man has been in Mysql, Yahoo, Twitter, google is engaged in database-related work. His article is very good!

References:

[1] Jeremy Cole The physical structure of InnoDB index pages

[2] Jeremy Cole B + Tree index structures in InnoDB

[3] Jeremy Cole The physical structure of records in InnoDB

[4] Jiang chengyao MySQL technology insider-InnoDB Storage Engine Version 2

[5] Schwartz, B/Zaitsev, P/Tkach high-performance Mysql Third Edition

[6] B-tree wiki


Is full-text index supported for the innodb type of the latest mysql version?

Mysql5.6 supports full-text indexing, but remember, full-text indexing does not support Chinese! Do not use full-text indexing if it is a mix of Chinese and English or purely Chinese.

Note: mysql has an open-source project that supports full-text Chinese indexing, but it is not an official mysql project. If you are interested, you can search for it.

What optimizations have been made for Mysql Force Index?

You can refer to the following article. During the test, I did not expect the same SQL statement to increase the query speed by almost double after the force index. Select count (*) from http_log_3 force index (time) where time = 000000 ?????? // 1 row in set (11 min 19.35 sec) select count (*) from http_log_3 where time = 000000 ?????? // 1 row in set (20 min 5.86 sec). But in fact, the explain analysis shows that these two SQL statements use time indexes, exactly the same! In this special case, The Use of force index (time) does not affect the selection of the index key (the optimizer also uses the time index by default), but the type and rows. mysql 5.1.34 and innodb engine. Use the innodb_file_per_table option. Create a data table using Table Partitioning (10 partitions by day). A total of 50 million data is displayed in the table, that is, 5 million data is collected for each partition. Test output:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.