MySQL index top-level implementation principle

Last Update:2018-07-11 Source: Internet

Author: User

Tags mysql index

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The data structure and algorithm principle behind MySQL index

First, the definition

Index definition: Index is the data structure that helps MySQL to get data efficiently.
Essence: An index is a data structure.

Second, B-tree

M-order B-tree meet the following conditions:
1, each node can have a maximum of M subtrees tree.
2, the root node, only at least 2 nodes (or extreme cases, is a tree on a root node, single-celled organisms, that is, is the root, but also leaves, is also a tree).
3, non-root non-leaf nodes have at least ceil (M/2) subtree (ceil means rounding up, such as 5-order B-tree, each node has at least 3 subtrees, that is, at least 3 forks).
4. The information in the non-leaf node includes [n,a0,k1,a1,k2,a2,..., Kn,an], where n represents the number of keywords saved in the node, K is the keyword, and ki<ki+1,a is a pointer to the child tree root node.
5, each path from the root to the leaf has the same length (leaf node in the same layer)

B-tree Features:

1, the keyword set is distributed in the whole tree;
2. Any keyword appears and appears only in one node;
3, each node stores date and key;
4. Search may end at non-leaf nodes;
5, the key in a node from left to right non-descending arrangement;
6, all leaf nodes have the same depth, equal to the tree height H.

The pseudo-code for the lookup algorithm on B-tree is as follows:

Third, B+tree

The difference between B+tree and B-tree is:
1, b+tree non-leaf node does not store data, only store key;
2, all the keywords are stored on the leaf node;
3. Each leaf node contains a pointer to the adjacent leaf node, and the B + tree with sequential access pointer improves the interval lookup ability;
4, the non-leaf node can be regarded as the index part, the node contains only the largest (or smallest) keywords in its subtree (root node);

Performance analysis of index of b/b+ tree

Based on: Use disk I/O times to evaluate index structure

Main memory and disk Exchange data in pages, setting the size of one node equal to one page, so that each node can be fully loaded with a single I/O.

Based on the definition of B-tree, it is necessary to retrieve up to H nodes at a time
Progressive complexity: O (h) =o (LOGDN)
Dmax=floor (pagesize/(keysize+datasize+pointsize))

In general practice, the out D is a very large number, usually more than 100, so H is very small (usually not more than 3, 3 layers can save about 1 million data)
B-tree need h-1 I/O (root node resident memory) in one retrieval at a time
The B+tree node does not contain the data domain, so the degree of D is larger, then the H is smaller, the number of I/O is less and the efficiency is higher, so b+tree is more suitable for external memory index.

V. MySQL INDEX implementation
1, MyISAM engine uses B+tree as the index structure, the data domain of the leaf node holds the address of the record;
MyISAM Primary and secondary indexes do not differ structurally, except that the primary index requires key to be unique, while the secondary index key can be duplicated;

2. The InnoDB data file itself is an index file, and the leaf node contains the complete data record, which is called the clustered index.
Because the InnoDB data file itself is clustered by the primary key, the INNODB requires that the table must have a primary key (MyISAM can not), and if it is not explicitly specified, the MySQL system automatically selects a column that uniquely identifies the data record as the primary key, and if no such column exists, Then MySQL automatically generates an implicit field for the InnoDB table as the primary key.
The secondary index of the InnoDB data field stores the value of the corresponding record primary key instead of the address;
Secondary index search needs to be retrieved two times index: first retrieves the secondary index to obtain the primary key, then retrieves the record with the primary key to the main index;

3, page splitting problem

If the primary key is monotonically incremented, each new record is inserted sequentially into the page, and when the page is full, it continues to be inserted into the new page;

If the write is disorderly, InnoDB has to frequently do page splitting operations to allocate space for new rows. Page splitting causes a large amount of data to be moved, and at least one insertion needs to be modified by three pages instead of one page.

If the page splits frequently, the pages become sparse and are populated irregularly, so the final data is fragmented.

Vi. Summary

Understanding how index implementations of different storage engines can be useful for proper use and optimization of indexes

1. Why is it not recommended to use a long field as the primary key?

2. Why choose the self-increment field as the primary key?

3, why is the field is not recommended to build index?

4, why choose a high-sensitivity column as an index? The formula for the degree of differentiation is count (distinct col)/count (*)

5. Use the overlay index as much as possible

Seven, optimize the limit paging query

SELECT * FROM table where condition LIMIT offset, rows;

The implementation mechanism of the above SQL statement is:
1. Read the Offset+rows row record from the table.
2. Discard the previous offset row record and return to the following rows record as the final result.

Overwrite index:
Select a.ID, sid, Parent_s_id from Cashpool_account_relationship a join (SELECT ID from cashpool_account_relationship LIM IT 1000000,10) b on a.id = b.ID;

Select ID, sid, parent_s_id from cashpool_account_relationship where ID >= (SELECT ID from Cashpool_account_relationshi P limit 1000000,1) limit 10;

Eight, Q&a

1, InnoDB support hash index? --Ma Xin
InnoDB is a hash index support, but its supported hash index is adaptive, the InnoDB storage engine will automatically generate a hash index for the table based on the usage of the table, and cannot artificially intervene whether or not to generate a hash index in a single table.
2, the InnoDB primary key index leaf node contains the complete data record, that primary key index file is larger than the data file? --Xu Xianhou
1). In the InnoDB engine, the leaf nodes in the primary key index contain the record data, and the primary key index file is the data file.
2). The data_length data that is counted in the tables table is the primary key index size, Index_length is the size of all the secondary indexes (two-level indexes) indexed in this table.

MySQL index top-level implementation principle

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More