MySQL index bottom-level implementation

Last Update:2017-02-17 Source: Internet

Author: User

Tags mysql index

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The nature of the index

The official MySQL definition for an index is: The index is the data structure that helps MySQL to get data efficiently. It can be understood that the index is a data structure.

We know that database query is one of the most important functions of the database, we all want to query the data as fast as possible, so the database system designers from the perspective of the query algorithm optimization. The most basic query algorithm, of course, is the sequential lookup, of course, this time complexity of O (n) algorithm is obviously bad when the data volume is large, so there are two points to find, binary tree search and so on. But binary search requires the data to be retrieved in order, while binary tree lookup can only be applied to binary search tree, but the structure of the data itself can not completely satisfy the various data structures. Therefore, in addition to the data, the database system also maintains the data structure of the specific lookup algorithm, which references the data in some way, so that the advanced search algorithm can be implemented on these data structures. This data structure is the index.

B-tree and B+tree

At present, most database systems and file systems use B-tree and b+tree as index structures.

Indexpurpose of Indexing: Improve query efficiencyprinciple: By constantly narrowing down the range of data you want to filter out the results you want, and turning random events into sequential events, that is, we always lock the data by the same search method. data structure: B + TreeDiagram B + Tree and lookup process: For example, is a B + tree, the definition of B + tree can be seen in the B + tree, here is only a few points, the light blue block we call a disk block, you can see each disk block contains several data items (dark blue) and pointers (shown in yellow), such as disk Block 1 contains data items 17 and 35, including pointer P1, P2, P3,P1 represents a disk block less than 17, P2 represents a disk block between 17 and 35, and P3 represents a disk block greater than 35. Real data exists at leaf nodes 3, 5, 9, 10, 13, 15, 28, 29, 36, 60, 75, 79, 90, 99. Non-leaf nodes do not store real data, only data items that guide the direction of the search, such as 17 and 35, do not exist in the data table. The discovery process of B + trees, if you want to find the data item 29, then the disk Block 1 is loaded into memory by disk, at this time Io, in memory with a binary lookup to determine 29 between 17 and 35, locking disk Block 1 P2 pointer, memory time because of very short (compared to the disk IO) can be negligible, Disk Block 1 through disk address of the P2 pointer to the disk block 3 is loaded into memory, the second io,29 between 26 and 30, locking disk block 3 of the P2 pointer, loading disk blocks 8 through the pointer to memory, a third Io, while in-memory binary find found 29, the end of the query, a total of three IO. The real situation is, the 3-tier B + tree can represent millions of data, if millions of data to find only three Io, the performance will be huge, if there is no index, each data item will occur once IO, then a total of millions of Io, it is obviously very expensive. B + Tree Naturefrom the above analysis, we know that the number of IO depends on the height of B + H, assuming that the current data table data is N, the number of data items per disk block is M, then there is H=㏒ (m+1) n, when the amount of data n is certain, m larger, h smaller, and m = size of the disk block/data item size, The size of the disk block is also a data page size, is fixed, if the data items occupy less space, the more data items, the lower the height of the tree. This is why each data item, the index field, is as small as possible, such as an int accounting for 4 bytes, which is less than half the bigint8 byte. This is why the B + tree requires the real data to be placed on the leaf node instead of the inner node, and once placed in the inner node, the data items of the disk block will be greatly reduced, resulting in a higher tree. When the data item equals 1 o'clock, it will degenerate into a linear table.

MySQL index bottom-level implementation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More