The index creation rules for mysql optimization and the index Optimization for mysql

Source: Internet
Author: User

The index creation rules for mysql optimization and the index Optimization for mysql

The data structure commonly used for indexing is B + tree.. The structure is as follows:

For example, it is a B + tree. For the definition of B + tree, see B + tree. Here we only talk about some key points. The light blue block is called a disk block, each disk block contains several data items (in dark blue) and pointers (in yellow). For example, disk Block 1 contains data items 17 and 35, including pointers P1, P2, and P3, p1 indicates a disk block smaller than 17, P2 indicates a disk block between 17 and 35, and P3 indicates a disk block larger than 35. Real data exists in leaf nodes: 3, 5, 9, 10, 13, 15, 28, 29, 36, 60, 75, 79, 90, and 99. Non-leaf nodes only store real data, but only data items directing the search direction. For example, data items 17 and 35 do not actually exist in the data table.

B + tree search process

If you want to query data item 29, disk Block 1 is first loaded from the disk to the memory. At this time, IO occurs. In the memory, use binary search to determine that 29 is between 17 and 35, the P2 pointer of disk Block 1 is locked. The memory time is negligible because it is very short (compared to the disk IO, the address of the P2 pointer of disk Block 1 is used to load disk block 3 from disk to memory. When the second IO occurs, the P2 pointer of disk block 3 is locked between 26 and 30, the pointer is used to load the disk block 8 to the memory, and the third IO occurs. At the same time, the binary search is performed in the memory to locate 29 and end the query, with a total of three I/O operations. The real situation is that the layer-3 B + tree can represent millions of data. If only three IO operations are required for millions of data queries, the performance improvement will be huge. If no index is available, if I/O occurs for each data item, a total of millions of I/O operations are required. Obviously, the cost is very high.

B + tree nature

1. through the above analysis, we know that the number of I/O operations depends on the height h of the number of B +. Assume that the data in the current data table is N, and the number of data items in each disk block is m, h = partition (m + 1) N. When the data volume is N, the larger the value of m, the smaller the value of h. m = the size of the disk block/data item, the size of a disk block, that is, the size of a data page, is fixed. If the data items occupy less space, the more data items, and the lower the tree height. This is why each data item, that is, the index field should be as small as possible. For example, int occupies 4 bytes, which is half less than bigint8 bytes. This is also why the B + tree requires that the actual data be placed on the leaf node instead of the inner node. Once placed on the inner node, the data items in the disk block will be greatly reduced, leading to an increase in the tree. When the data item is equal to 1, it is degraded into a linear table.
2. when the data items in the B + tree are composite data structures, such as (name, age, sex), the B + number creates a search tree from left to right, for example, when data such as (Zhang San, 20, F) is retrieved, the B + tree compares the name to determine the next search direction, if the names are the same, compare age and sex in sequence, and finally obtain the retrieved data. However, when (20, F) does not contain name data, B + tree does not know which node to query next, because name is the first comparison factor when a search tree is created. You must search by name before you can know where to query next. For example, when data such as (Michael, F) is retrieved, the B + tree can use name to specify the search direction, but the next field age is missing, therefore, we can only find all the data whose name is equal to Michael Jacob and then match the data whose name is F. This is a very important attribute, that is, the leftmost matching feature of the index.

Index Creation Principles

1. the leftmost prefix matching principle is very important. mysql always matches the right until it encounters a Range Query (>, <, between, like) to stop matching, for example, if a = 1 and B = 2 and c> 3 and d = 4 is created for an index in the order of (a, B, c, d), d cannot use the index, if an index (a, B, d, c) is created, the order of a, B, and d can be adjusted as needed.
2. = and in can be out of order. For example, a = 1 and B = 2 and c = 3 (a, B, c) indexes can be created in any order, the mysql query optimizer helps you optimize it into a recognizable form of indexes.
3. select a column with a high degree of discrimination as the index. The formula for differentiation is count (distinct col)/count (*), indicating the proportion of fields that are not repeated. The larger the proportion, the fewer records we scan, the differentiation of the unique key is 1, while some state and gender fields may be 0 in front of big data. Someone may ask, is there any experience with this proportion? Different use cases make it hard to determine this value. Generally, we require more than 0.1 join fields, that is, to scan 10 records on average.
4. index Columns cannot be used for calculation and keep the columns "clean". For example, if from_unixtime (create_time) = '2017-05-29 ', indexes cannot be used. The reason is very simple, the B + tree stores the Field Values in the data table. However, when searching, you must apply functions to all elements for comparison. Obviously, the cost is too high. Therefore, the statement should be written as create_time = unix_timestamp ('2017-05-29 ');
5. Extend the index as much as possible. do not create a new index. For example, if the table already has an index of a and now you want to add an index of (a, B), you only need to modify the original index.

Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.