Mysql Innodb Indexing principle

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article comes from NetEase cloud community.

InnoDB is the most common storage engine for MySQL, and understanding the index of the InnoDB storage engine is of great benefit to daily work, and the index exists to speed up the retrieval of database Row Records. Here is my summary of recent learning, as well as some of the problems encountered and others mentioned in the analysis, if there are errors, please correct me, I will promptly correct.

1. INNODB table Structure

This summary does not have much to do with the index, but in order to facilitate understanding of the content of the index, add this summary as a foreshadowing of knowledge.

1.1 InnoDB Logical storage structure

All data in a MySQL table is stored in a space, called a tablespace, and can be divided into segments (segment), zones (extent), pages (page), Rows (row), and logical structures such as:

Segment (segment)

Table spaces are composed of different segments, the common segments are: Data segment, index segment, rollback segment, etc., in MySQL, data is stored in accordance with B + tree, so the data is indexed, so the data segment is a B + tree leaf node, the index segment is a B + tree non-leaf node, rollback section for storing the undo log, For data rollback after a transaction failure and before the transaction is not committed through the undo log, prior version data is obtained before the Innodb1.1 version of a InnoDB, which supports only one rollback segment and supports 1023 concurrent modification transactions at the same time. In the Innodb1.2 version, the number of rollback segments has been increased to 128, which means that 128*1023 concurrent modification transactions can be performed concurrently.

Area (extent)

The zone is a contiguous page of space, the fixed size of each zone is 1MB, in order to ensure the continuity of the page in the area, InnoDB will once from the disk to apply for each of the four areas, the default is not compressed, a zone can hold 64 consecutive pages. However, when starting a new table, the default size of the empty table is 96KB, because in order to make efficient use of disk space, the table will first use 32 page-size fragmented pages to store the data when the data is inserted, and when the fragments are exhausted, the table size will be incremented by the MB multiplier.

Pages (page)

Page is the smallest management unit of the InnoDB storage engine, each page size defaults to 16KB, starting from the InnoDB 1.2.x version, can be used innodb_page_size to change the page size, but changes can only be modified before initializing the InnoDB instance, and then cannot be modified. Unless mysqldump export creates a new library, common page types are: Data pages, undo pages, System pages, transactional data pages, insert buffer bitmap pages, insert buffered free list pages, uncompressed binary large object pages, compressed binary large object pages.

Rows (Row)

rows correspond to the row records in the table, the most stored row record per page is also a mandatory maximum of 16kb/2-200, that is, 7992 rows (16KB is the page size, I do not understand why this is said to be the kernel definition)

1.2 InnoDB row Record format

INNODB provides two formats for storing row records: The redundant format, the compact format, the dynamic format, the compressed format, and the redudant format for compatibility retention.

Redundant line format (format prior to version 5.0)

Field length Offset list: Stores the field offset, in contrast to the column field order, if the column length is less than 255 bytes, expressed in one byte, if it is greater than 255 bytes, it is represented by two bytes
Header information: Fixed with a 6-byte representation, with the following meanings:

Hidden columns: Transaction ID and rollback column ID, each occupying 6, 7 bytes, and if this table does not have a primary key, an additional 6-byte rowid column is added.

Compact line format (default line format for version 5.6)

Variable Length field length list: This field identifies the length of the column field, in contrast to the column field order, if the column length is less than 255 bytes, in one byte, if it is greater than 255 bytes, two bytes, which is the maximum length of the varchar type of MySQL is limited to 65535
Null flag bit: Identifies whether the column has empty fields, 1 is useful, otherwise 0, the flag bit length is ceil (N/8) (here is the MySQL technology insider-innodb Storage engine and the official document where the discrepancy);
Header information: Fixed with a 5-byte representation, with the following meanings:
Column data: This row stores the column field data, and null is not the storage space;
Hidden columns: Transaction ID and rollback column ID, each occupying 6, 7 bytes, and if this table does not have a primary key, an additional 6-byte rowid column is added.

Dynamic format (version 5.7 default line format) and compressed format

The dynamic format and the compressed format differ from the compact in that the row overflow will only hold a 20-byte pointer at that column, point to the actual storage location of the string, do not store a 768-byte prefix, and compressed format in the storage blob, TEXT , varchar, and so on are compressed using the zlib algorithm to store strings with high storage efficiency.

1.3 InnoDB Data page structure

"MySQL Technology insider-innodb Storage Engine" is described in this book, but should not be too accurate, the book has the following description, not detailed here, if interested please read this god book.

2. B-Tree and B +-Tree 2.1-Tree definition:

B-Trees (b-tree) can be called M-order B-trees if they meet the following conditions:

Each node has an M subtrees tree;
The root node has at least two sub-trees (in the case of subtrees);
In addition to the root node, each of the remaining branch nodes has at least m/2 subtrees tree;
All the leaf nodes are on the same layer;
The branch nodes of K subtrees tree exist k-1 key codes, and the key codes are arranged in ascending order;
The number of keywords needs to meet ceil (M/2)-1 <= n <= m-1;

B-Tree Insertion

B-Tree Delete

2.2 B + Tree definition:

B + trees can be called M-order B + trees if they meet the following conditions:

There is only one root node, and the number of branches ranges from [2,m].
Branch nodes, each node containing the number of branches in the range of [Ceil (M/2), M];
The number of keywords in a branch node is equal to the number of its sub-branches minus one, the number of keywords is [ceil (M/2)-1, m-1], and the keyword order is incremented;
All leaf nodes are on the same layer;

Insert:

The insertion of a B + tree must ensure that the records in the post-insertion leaf node are still sorted, and that there are three cases of inserting a B + tree, each of which may result in different insertion algorithms, inserting the algorithm into:

Insert example (not joined to a doubly linked list):

1, insert 28 This key value, found that the current Leaf page and index page are not full, directly inserted.

2, insert 70 This key value, the leaf page is full, but the index page is not full, according to the median value 60 split leaf nodes.

3, insert the Record 95,leaf page and index page are full, then need to do two times split

4. B + Trees will always maintain a balance. However, in order to maintain the balance, a large number of split page operations may be required for the newly inserted key values, while the B + tree is primarily used for disks, so splitting of the page means that the disk data is moving and should be minimized if possible. Therefore, the B + Tree provides the function of rotation (rotation). The rotation occurs when the leaf page is full, but its left and right sibling nodes are not full. The B + tree is not eager to do the split page operation, but instead moves the record to the sibling node of the page on which it is located. Usually, the left brother is first checked to do the rotation operation, in the case of the first picture, insert the key value 70, in fact, B + tree is not eager to split the leaf node, but instead of rotating, 50,55,55 rotation.

Delete:

The B + Tree uses a fill factor (fill factor) to control the deletion of the tree, and 50% is the minimum value that the fill factor can set. The delete operation of the B + tree must also ensure that the records in the leaf node after deletion are still sorted, as is the case with insert, the deletion of the B + tree also needs to be considered in the three cases shown, unlike the insertion, the deletion is measured according to the change in the fill factor.

Delete Example (not joined to the doubly linked list): 1, delete the key value of 70 of this record, directly delete (in the Insert 3rd based on the figure).

2. Then we delete the record with a key value of 25, which is also the value in the index page, so when you delete the value of 25 in the leaf page, you should also update the 28 of the right sibling node of 25 to page index.

3, delete the key value of 60, delete the leaf page in the key value of 60 of the record, the fill factor is less than 50%, the need to do a merge operation, the same, the index page in the delete related records need to do the index page of the merge operation.

B-Tree differs from A + + tree:

Take M-order tree as an example:

Keywords are different: B + Tree branch node has a m keyword, its leaf node also has m, but although there are m sub-tree, but it has only a m-1 keyword.
Storage location is different: B + tree non-leaf node of the key words only play the role of index, the actual keyword is stored in the leaf node, the non-leaf node of the tree is also stored keywords.
Branching constructs are different: the branch node of the B + Tree stores only the keyword information and the pointer of the son, meaning that the internal node contains only the index information.
Query is different (stable): B-Tree after finding a specific value, then the end, and B + tree will need to find the leaf node through the index to the end of the data, that is, B + tree in the search process took a path from the root node to the leaf node.

3. Clustered index and two-level index 3.1 clustered index

Each INNODB table has an index, called a clustered index, which stores row records, in general, clustered indexes are generated from the primary key. In order to obtain high-performance query, insert, and other database operations, it is necessary to understand the InnoDB clustered index.

The clustered index is created according to the following rules:

When a primary key is defined, InnoDB uses the primary key to generate its clustered index;
If there is no primary key, InnoDB selects a non-empty unique index to create the clustered index;
If this is not the case, InnoDB will implicitly create a self-increment column as the clustered index.

Note: The order in which unique indexes are selected is the order in which the unique indexes are defined, rather than the order of the columns in the table, and the unique indexed field that is selected serves as the primary key, or the InnoDB implicitly created self-increment column can also be viewed as the master key.

Clustered index Overall is a B + tree, non-leaf nodes are stored in the key value, the leaf node is stored in the row data, called the data page, which determines the table data is also part of the clustered index, the data page is linked through a doubly linked list, the above mentioned that the B + tree is a balanced search tree, That is, the clustered index data storage is ordered, but this is logically ordered, but in the actual physical storage of data is, because the data pages are connected through a doubly linked list, if the physical storage is the order, then the cost of maintaining a clustered index is very high.

3.2 Secondary Index

An index other than a clustered index can be called a secondary index, which differs from a clustered index in that it holds the key value of the primary key in the leaf node of the secondary index. A table can have more than one secondary index, but only one clustered index, through the secondary index to find the corresponding navigation record, need to take two steps, the first step through the secondary index to determine the corresponding primary key, the second step through the corresponding primary key value in the cluster index query to the corresponding row record, that is, two B + Tree search. Instead of querying the primary key by a secondary index, a secondary index can be traversed to determine the primary key, the so-called index overlay, without returning the table (querying the clustered index).

To create a secondary index, you can create a single-column index, that is, create an index with one field, or you can create a secondary primary index called a federated index with multiple fields, and after you create a federated index, the number of key values stored by the B + tree node is not 1, but multiple, such as:

The combined index B + tree is the same as the B + Tree of the one-touch secondary index, and the key values are sorted by the leaf node, which can read all the data logically sequentially, such as the stored data, in the form of (A, B), (2,1), (2,4), (3,1), (3,2) To store, so there is a benefit to store the data when ordered, when order by a field is sorted, you can reduce the complexity, speed up the query;
When select * from table where a=? and ? you use the index (a, b) to speed up the query, but there is a principle in the query, the order of the Where condition of SQL must be consistent with the level two index, and also follow the index leftmost principle, select * from table where b=? you cannot use (A, b) index to speed up the query.
A secondary index also has a concept of index overwrite, one of the benefits of index coverage is that the secondary index is not high with Row records, so its size is much smaller than the clustered index, and querying with a secondary index can reduce a lot of IO operations.

4. SQL Execution Order

Each of the following steps generates a virtual table as input for the next processing, in which the virtual tables are transparent to the user and are returned to the user using only the last completed virtual table, and no steps are skipped directly during the process.

The following is the logical order of execution:

(1) from : left-table right-table performs cartesian product (A*B) on the left and right tables, forming a virtual table VT1;
(2) on : VT1 the virtual table to on filter the conditions, only the qualifying records will be inserted into the virtual table VT2;
(3): Specifies that the unmatched rows will be join out join added to the VT2 generation VT3, if there are multiple tables, it will repeat (1) ~ (3);
(4) where : The VT3 is filtered to form a VT4, the where condition is from left to right execution;
(5) group by : The group operation of VT4 get VT5;
(6) cube | rollup : The operation of the VT5 to cube | rollup get VT6;
(7) having : Filter the VT6 to get VT7;
(8) select : Perform the selection operation to get VT8, I think VT7 and VT8 should be the same;
(9) distinct : VT8 to the weight, get VT9;
(Ten) order by : Sort the VT9, get VT10;
(one) limit : Intercept the record and get VT11 back to the user.

Note: The condition applies to the filtering of the filter, applied to on where the results after the on filter ( on if any), having applied to the packet filter

5. SQL Optimization Recommendations

The index is somewhat like this: reducing the amount of data scanned by the server, avoiding sorting and staging tables, and turning random I/O into sequential I/O.

Query methods that can be indexed using B + Tree

Full value match: matches all columns in the index, i.e. the number of fields in the condition field and the federated index is the same as the order;
Match the leftmost prefix: use only the first few fields of the federated index;
Match column prefix: such as like ‘xx%‘ can walk index;
Match Range Value: Range query, for example > , like etc;
Match a column and range to another column: Exact lookup + range lookup;
Access index query only: Index overlay, select the field of the primary key;

The criteria after a range query are not indexed, and are explained in the next section.

Selectivity of columns (degree of differentiation)

Selectivity (degree of differentiation) refers to the number of distinct column values/column values of the total number, the general meaning of the field to be highly differentiated, and in the construction of the joint index when the high-sensitivity column field to be placed in front, so that the first condition can be filtered out a large number of data, the use of performance improvements, for how to calculate the There are two ways to do this:

By definition, the number of column values that are calculated manually, and the total number of column values that are not duplicated
Through the MySQL carlinality, the command show index from <table_name> to see the explanation of the carlinality is not accurate, and MySQL in the B + species selected 8 data pages to sample the statistical values, that is carlinality= each data page record sum /8* all the data pages, it also indicates that this value is not accurate, because in the Insert/update record, real-time to update the carlinality for MySQL load is very high, if the data is large, trigger MySQL re-statistics that is worth the condition is when the table of 1/16 data changes.

However, the selection of high-sensitivity columns as an index is not a hundred test lark, and some cases are not appropriate, the next section will be described.

MySQL query process

When you want MySQL to run at high performance, the best thing to do is to understand how MySQL is optimized and executed, and once you understand that, many of the query optimizations actually follow some of the principles that allow the optimizer to run in the right way ———— quote from high performance MySQL

When you want the MySQL instance to send a request, MySQL queries as follows:

The client sends a query to the server first;
The server checks the query cache first, and if the cache is hit, returns the result stored in the cache immediately, otherwise entering the next stage;
The server side performs SQL parsing, preprocessing, and then generates the corresponding execution plan by the optimizer;
MySQL invokes the API of the storage engine to execute the query based on the execution plan generated by the optimizer
Returning the results to the client

Notes & Suggestions

The primary key is recommended to use integral type to avoid index splitting;
Query using index overlay can improve performance because it avoids the back table query;
Select the appropriate order to build the index, some of the scene is not the more differentiated column field in front of the better, the United Index is mostly used;
Rational use of in the operation of the scope of the query into a number of equivalent queries;
inOperations are equivalent to multiple equivalents, but it is important to note that this is equivalent to a order by range query, so for example, select * from t1 where c1 in (x,x) order by c2 SQL is not indexed;
The large-scale data Query task is decomposed into batch query;
Transform complex queries into simple queries;
Reasonable use inner join , for example, when paging

6. Analysis of some issues

This part of my study in the process of some of the questions, as well as in the work encountered or colleagues raised some of the problems, I did some research, summed up and added some of their own understanding, if there are errors and please correct me.

Index splitting

To mention the index splitting here, I personally understand that when MySQL inserts a record, it updates the corresponding index file of the configuration, according to the above understanding, when inserting an index, there may be a split of the index page, which will result in the movement of the disk data. When the primary key that is inserted is a random string, each insertion is not the last insert in the B + tree, each insertion position is random, each time it is possible to move the data page, and the storage space of the string is very large, so that rebuilding the index is not only inefficient and MySQL load is high, It also causes a lot of disk fragmentation, disk fragmentation is also a bit of the performance cost of the query, because the storage location is not contiguous resulting in more disk I/O, which is why it is recommended to define a primary key for the increment integer One reason, the MySQL index page default size is 16KB, when a new record is inserted, MySQL leaves 1/16 per page of space for future index record growth and avoids excessive disk data movement.

Disadvantages of self-increasing primary key

For high concurrency scenarios, inserting the primary key in the InnoDB may cause significant contention, and the upper bound of the primary key becomes a "hotspot", because all insertions occur here, the index concurrent inserts may cause gap lock contention, what is the gap lock competition, the next detailed introduction Another reason may be the auto_increment lock mechanism, when MySQL handles the self-increment primary key, when it innodb_autoinc_lock_mode is 0 or 1 o'clock, without knowing how many rows are inserted, for example insert t1 xx select xx from t2 , the execution of this statement will lock the table, Only this statement will release the lock after execution, and then the other insert will be able to continue, but in innodb_autoinc_lock_mode=2 this case there will be no table lock, but only guarantee that all concurrently executed statement inserted records are unique and self-increasing, However, there is no connection between the multiline inserts that are made by each statement.

Optimizer does not use index to select Full table scan

For example, an order table with a federated index (order_id, goods_id), this example shows that the problem is from two aspects:

Query fields in the index

select order_id from order where order_id > 1000, if you look at its execution plan, it is found using the use index condition, which is the index overlay.

Query field is not in index

select * from order where order_id > 1000, this statement queries all fields of the table, and some of the fields are not in this federated index, so the Federated index query takes two steps, first by the federated Index to determine the eligible primary key ID, and then use these primary key ID to go to the cluster index to query, and then get all the records, The process of querying records in a clustered index with a primary key ID is unordered and becomes a discrete read operation on disk, assuming that when there are many records to read (typically around 20% of the entire table), this time the optimizer chooses to use the clustered index directly, that is, to sweep the whole table, because sequential reads are faster than discrete reads, This is why generally do not need to distinguish between the field of small index, note is alone because the use of this field to find a lot of data, there is a great chance to go full table scan.

Criteria after a range query do not go index

According to the principle of MySQL query, when the where scope of the query conditions, the query will be all returned to the server side (query execution engine), the next condition operation on the server side of the processing, which is why the scope conditions do not go index reason, Because after the conditional filtering has not been completed in the storage engine. However, after MySQL 5.6, if a new function index condition pushdown (ICP), this feature allows the criteria after the range query condition to continue indexing, but there are several prerequisites:

The first condition of the query condition needs to have the boundary, for example select * from xx where c1=x and c2>x and c3<x , so that the C3 can go to the index;
Supports INNODB and MyISAM storage engines;
whereThe field of the condition needs to be in the index;
ICP function 5.7 start support;
When using an index overlay, the ICP does not work.

SET @ @optimizer_switch = "Index_condition_pushdown=on" open ICP set @@optimizer_switch = "index_condition_pushdown=off" close ICP

Range query Statistics function does not follow the leftmost principle of MySQL index

For example, create a table:

create table `person`(    `id` int not null auto_increment primary key,    `uid` int not null,        `name` varchar(60) not null,    `time` date not null,    key `idx_uid_date` (uid, time) )engine=innodb default charset=utf8mb4;

When executing the select count(*) from person where time > ‘2018-03-11‘ index of and Time < ' 2018-03-16 ' 时，time是可以用到 idx_uid_date ', look at the following execution plan:

Where the extra identity use index description is to walk index coverage, the general sense is that MySQL is not able to support the loose index, but for the statistical function, can be used to overwrite the index, so the MySQL optimizer chooses to take advantage of the index.

Paging offset value is a big performance issue

In MySQL, paging when the value of offset is very large, performance will be very poor, for example limit 100000, 20 , need to query 100,020 data, and then take 20, discard the first 100,000, in this process produced a lot of random I/O, which is poor performance reasons, in order to solve this problem, The entry point is to reduce the useless data query, reduce random I/O. The workaround is to use the index overlay, which is to scan the index for the ID and then query the row records from the clustered index, I know there are two ways:

For example, a paging query limit 1000000,5 from table t1

Useinner join

select * from t1 inner join (select id from t1 where xxx order by xx limit 1000000,5) as t2 using(id), the subquery goes through the index overwrite the ID, and then takes 5 data directly according to the obtained ID.

Use range query criteria to limit the data being fetched

select * from t1 where id > 1000000 order by id limit 0, 5, that is, the use id > 1000000 of conditions in the scan index is skipped 1 million records, and then take 5, this processing mode of the value of offset is 0, but this method is usually paged is not used, but can be used in batches to fetch data.

Index Merge

SELECT * FROM tbl_name WHERE key1 = 10 OR key2 = 20;SELECT * FROM tbl_name WHERE (key1 = 10 OR key2 = 20) AND non_key=30;SELECT * FROM t1, t2 WHERE (t1.key1 IN (1,2) OR t1.key2 LIKE ‘value%‘) AND t2.key1=t1.some_col;SELECT * FROM t1, t2 WHERE t1.key1=1 AND (t2.key1=t1.some_col OR t2.key2=t1.some_col2);

For the above SQL before MySQL 5.0 version, if not to establish a corresponding federated index, is to go full table scan, but after MySQL 5.1 introduced an optimization strategy for the index merge, can be used to some extent on the table of multiple single-column index to locate the specified row, The principle is to perform operations on the scan results for each index, all of which are: intersection, union, and their combination, but index merging is not an appropriate option because it can consume a lot of CPU and memory resources when doing index merging, and the general use of index merging also reflects the need to optimize the table's indexes.

7. References

"MySQL Technology insider-innodb Storage Engine": This book for InnoDB's explanation is more comprehensive and meticulous, but a little bit old and a little bit wrong place, this book is based on the MySQL version 5.6, the inside will be mixed with some 5.7 of knowledge.
MySQL Technology Insider: SQL Programming: Worth a look.
"High-performance MySQL third edition": This book is a MySQL book, there are a lot of MySQL optimization suggestions and some cases
Official documents : This is a more authoritative and up-to-date document, with a long, informative and plain English language, which is relatively short of Chinese in terms of comprehension and reading speed.

NetEase Cloud New User package:https://www.163yun.com/gift

This article is from the NetEase cloud community, by the author Fan Peng Cheng authorized release.

Mysql Innodb Indexing principle

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Mysql Innodb Indexing principle

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Mysql Innodb Indexing principle

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support