Tracking and optimization of a MYSQL exception _ MySQL

Source: Internet
Author: User
Tracking and optimization of a MYSQL exception Problem

User ticket question: the same statement, but the last limit row is different. It is strange that the performance of limit 10 is about 10 times slower than that of limit 100 statements.

Hide user table information. the statements and results are as follows:
SELECT f1, SUM ('F2') 'cnt 'from t where f1 is not null and f3 = '2017-05-12' group by f1 order by 'cnt 'desc limit 10;
Execution time: 3 min 3.65 sec

SELECT f1, SUM ('F2') 'cnt 'from t where f1 is not null and f3 = '2017-05-12' group by f1 order by 'cnt 'desc limit 2014;
Execution time: 1.24Sec.

The performance gap is huge!

Analysis
MySQL Tips: The most common method for tracing statement execution is to explain the statement execution plan.

The more striking effect is that, after narrowing down the scope, the execution plans of limit 67 and limit 68 differ greatly in this data.

Two execution plans:
LIMIT 67
Id: 1
Select_type: SIMPLE
Table:
Type: range
Possible_keys: A, B, C
Key: B
Key_len: 387
Ref: NULL
Rows: 2555192
Extra: Using where; Using temporary; Using filesort
1 row in set (0.00 sec)

LIMIT 68
Id: 1
Select_type: SIMPLE
Table:
Type: ref
Possible_keys: A, B, C
Key:
Key_len: 3
Ref: const
Rows: 67586
Extra: Using where; Using temporary; Using filesort
1 row in set (0.00 sec)

We can see that the execution plans of the two statements are different: the indexes used are different.

MySQL Tips: In the explain results, the key indicates the final index used, and rows indicates the number of rows to be scanned using this index, which is an estimate.

Index A is defined as (f3, f4, f1, f2, f5), and index B is defined as (f1, f2, f3 );

One confirmation

Although rows is an estimate, it provides guidance on index usage. Since limit 68 can reach rows 67586, it indicates that this value should also be included in the optional results of the first statement Optimizer. why not choose index?
First, confirm the above conclusion.

MySQL Tips: MySQL syntax can use force index to force the optimizer to use an index.

Explain SELECT f1, SUM (f2) cnt from t force index (A) WHERE f1 is not null and f3 = '2017-05-12 'group by p order by cnt desc limit 67/G

Id: 1
Select_type: SIMPLE
Table:
Type: ref
Possible_keys:
Key:
Key_len: 3
Ref: const
Rows: 67586
Extra: Using where; Using temporary; Using filesort
1 row in set (0.00 sec)

By the way, because force index is specified, the optimizer does not consider other indexes. Only A is displayed in possible_keys. We are concerned with rows: 67586. This indicates that in the limit 67 statement, using index A can also reduce row scanning.

MySQL Tips: the MySQL Optimizer calculates the query cost for each possible index in possiable_key and selects the query plan with the minimum cost.

At this point, we can probably guess that this should be a bug in MySQL implementation: no suitable index is selected, leading to the use of a clearly incorrect execution plan.

MySQL Tips: the MySQL Optimizer depends on the statistical information of the table during execution, and the statistical information is an estimate. Therefore, the execution plan may be unoptimal.

However, it should be noted that the above Tip is objective (acceptable), but this example is an exception, therefore, the optimizer can actually obtain the data (rows value) that can make the correct selection, but the final selection is incorrect.

Cause analysis

The MySQL Optimizer evaluates the query cost to determine the index to be used. The process of calculating this estimate is basically determined by "estimating the number of rows to be scanned.

MySQL Tips: currently, MySQL only supports prefix indexing in the mainstream versions 5.1 and 5.5.

Therefore, index A can only use field f3, and index B can only use field f1. Rows is the number of data Rows to be scanned (estimated value) after an index is used to locate the upper and lower bounds ).

The preceding statement requires group and order by, so the execution plan contains Using temporary; Using filesort.
In the process, the query cost of index A is calculated in order.

Then calculate the query cost of other possitabe_key in sequence. Because sorting is required during the process, you need to determine whether there is a lower-cost sorting method (test_if_cheaper_ordering) after obtaining a tentative result ).
Similar to the previous one, the cost is calculated by estimating the number of scanned rows.

In the implementation of this logic, there is a bug: prefix indexes are not taken into account when evaluating the discrimination of the current index.

That is, if the table contains 50 million rows of data and the index B (f1, f2, f3), the index discrimination should be calculated based on the prefix that can be used. For example, if f1 has 1000 different values, the average number of records for each key value is 500. for example, if (f1, f2) has 10000 identical values, the average number of records on each combination key is 50. if (f1, f2, f3) has different values, the average number of records on each combination key is 1.

MySQL Tips: the smaller the number of records on each key, it indicates that the query efficiency of this index is the highest. The Cardinality value corresponding to the show index from tbl output result is larger.

In this case, index B can only use f1 for prefix indexing, but it uses (f1, f2, f3) to calculate the average value of rows on a single key ), this results in a low cost when index B is used for estimation. This leads to incorrect selection.

Back to problem itself

1. Why is it true when the limit value is large?
This is because when calculating the cost of B's query, the number of rows returned by the query must be limit_rows. if the limit value is large, the cost of B is higher, instead, it will be caused by the cost. If the value is greater than A, the optimizer selects.

2. this table has 50 million rows. why is the difference in limit so big?
This is related to the statement itself. This statement contains group by, which means that each more limit values needs to be scanned for N rows. Here, N is "total number of rows in the table"/"different f2 values in the table ".
That is to say, this statement amplified the bug.

Solution

After the analysis is clear, the solution is relatively simple. modify the code logic. during the execution of test_if_cheaper_ordering, use the differentiation degree of field f1 for calculation.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.