MySQL optimization rules for index establishment

Source: Internet
Author: User

indexes are often used for data structures that are B + trees . Structures such as the following

For example, a B + tree, the definition of B + tree can be seen in the B + tree, here are just a few points. Light blue Block What we call a disk block, we can see that each disk block consists of several data items (seen in dark blue) and pointers (as seen in yellow), such as disk block 1 including data items 17 and 35. Includes pointers P1, P2, P3. P1 represents a disk block that is less than 17, P2 represents a disk block between 17 and 35, and P3 represents a disk block greater than 35. Real data exists at leaf nodes 3, 5, 9, 10, 13, 15, 28, 29, 36, 60, 75, 79, 90, 99.

Non-leaf nodes simply do not store real data, only data items that guide the direction of the search, such as 17 and 35, do not really exist in the data table.

The discovery process of B + trees

As you can see, assuming that you are looking for data item 29, you will first load disk block 1 into memory, and an IO occurs at this time. Using binary lookup in memory to determine 29 between 17 and 35, lock disk block 1 P2 pointer, memory time due to a short (compared to the disk IO) can be ignored, the disk block 1 by the disk address of the P2 pointer to disk block 3 is loaded into memory disk. The second io,29 occurs between 26 and 30, locking the P2 pointer to disk block 3. Load disk block 8 into memory with the pointer. Occurs the third Io, the same time in memory do two points find 29, end query, total three io.

The real situation is that the 3-tier B + tree can represent millions of data. Assuming that millions of data lookups require only three IO, the performance increase will be huge, assuming there is no index, each data item will have to occur once IO, then a total of millions of IO. Obviously the cost is very high.

B + Tree Nature

1. Through the above analysis, we know that the number of IO depends on the height of B + H. Assuming that the data for the current data table is n, the number of data items per disk block is M, then there is H=㏒ (m+1) n, and when the amount of data n is certain, m is larger and the H smaller. and M = size of the disk block/size of the data item. The size of a disk block is the size of a data page, which is fixed. Assuming that data items occupy less space, the higher the number of data items, the lower the height of the tree.

This is why every data item, that is, the index field is as small as possible, for example, int accounts for 4 bytes, less than half the bigint8 byte. This is why the B + tree requires that the real data be placed on the leaf node rather than the inner node, and once placed in the inner node, the data items of the disk block will be significantly reduced. Cause the tree to increase. When the data item equals 1 o'clock, it will degenerate into a linear table.
2. When the data item of the B + tree is a composite data structure, for example (Name,age,sex), the B + number is built in the order of left-to-right to build the search tree. For example, when the data (Zhang San, 20,f) is retrieved. The B + Tree will prefer the name to determine the direction of the next step. Assume that name is the same as the age and sex. Finally get the retrieved data. But when (20,f) this data without name, B + Tree does not know which node to check next, because the search tree when the name is the first factor, you must first search by name to know where to go next.

For example, when (Zhang San, F) This data is retrieved. B + trees can use name to specify the direction of the search, but the next field of age is missing, so only the data that is equal to Zhang San can be found, and then match the gender is the data of F. This is a very important property, the leftmost matching attribute of the index.

Several principles of index building

1. Least-left prefix matching principle, very important principle. MySQL will always match right until it encounters a range query (>, <, between, like) to stop the match, such as a = 1 and B = 2 and C > 3 and D = 4 Assuming the index of the established (A,B,C,D) order, D is not indexed. Assuming that the index of the Establishment (A,B,D,C) can be used, the order of a,b,d can be adjusted arbitrarily.
2.= and in can be disorderly, for example a = 1 and B = 2 and c = 3 build (a,b,c) indexes can be arbitrarily ordered. The MySQL query optimizer will help you optimize the form that the index can recognize
3. Try to choose a high-differentiated column as the index, the formula for the degree of sensitivity is count (distinct col)/count (*), indicating that the field does not repeat the scale, the greater the proportion of the number of records we scan, the difference between the unique key is 1, and some states, The gender field may be 0 in the face of big data, and one might ask. What is the empirical value of this ratio? Using different scenarios, this value is also difficult to determine, the general need to join the field we are required to be more than 0.1, that is, the average 1 scan 10 records
4. The index column cannot participate in the calculation, keep the column "clean". For example from_unixtime (create_time) = ' 2014-05-29 ' can not be used to the index, the reason is easy,b+ the tree is stored in the data table of the field values, but when the retrieval, it is necessary to apply all the elements of the function of the comparison, obviously the cost is too large. So the statement should be written create_time = Unix_timestamp (' 2014-05-29 ');
5. Expand the index as much as possible and do not create a new index. For example, the index of a is already in the table, and the index is now added (A, b). Then you just need to change the original index to

MySQL optimization rules for index establishment

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.