MySQL sorting principle and MySQL5.6 case analysis "turn"

Source: Internet
Author: User
Tags index sort

This article is from:http://www.cnblogs.com/cchust/p/5304594.html, which for oneself feel is the key to add the mark, easy to understand.

Objective
sorting is a basic function in a database, and MySQL is no exception. The order BY statement allows the user to sort the specified result set, in fact, not only the order BY statement, the Group by statement, the distinct statement implicitly uses the sort. In this paper, we will briefly explain how SQL uses the index to avoid sorting costs, then introduce the internal principles of MySQL implementation sequencing, and introduce the parameters related to sorting, and finally give a few "strange" sort of example, to talk about the consistency of the problem, and explain the nature of the cause of the phenomenon.

1. Sorting optimization and Index usage
in order to optimize the sorting performance of SQL statements, it is best to avoid sorting, and the proper use of indexes is a good way. Because the index itself is also ordered, if the appropriate index is established on the fields that need to be sorted, you can skip the sorting process and improve the SQL query speed. Below I have some typical SQL to illustrate which SQL can take advantage of indexes to reduce sorting and which SQL cannot. Suppose T1 table exists index Key1 (key_part1,key_part2), Key2 (Key2)

A. SQL that can use indexes to avoid sorting

1234 SELECTFROMt1 ORDERBYkey_part1,key_part2;SELECTFROMt1 WHEREkey_part1 = constant ORDERBYkey_part2;SELECTFROMt1 WHEREkey_part1 > constant ORDER BYkey_part1 ASC;SELECTFROMt1 WHEREkey_part1 = constant1 ANDkey_part2 > constant2 ORDERBYkey_part2;

B. Inability to use indexes to avoid sorting SQL

1234567891011 //排序字段在多个索引中,无法使用索引排序SELECTFROMt1 ORDERBYkey_part1,key_part2, key2;//排序键顺序与索引中列顺序不一致,无法使用索引排序SELECTFROMt1 ORDERBYkey_part2, key_part1;//升降序不一致,无法使用索引排序SELECTFROMt1 ORDERBYkey_part1 DESC, key_part2 ASC; //key_part1是范围查询,key_part2无法使用索引排序SELECTFROMt1 WHEREkey_part1> constant ORDERBYkey_part2;

2. Algorithm for sorting implementation
for SQL that cannot use the index to avoid sorting, the database has to implement its own sort function to meet the user's needs, when the SQL execution plan will appear "Using Filesort", it is important to note that Filesort does not mean that is the file sort, It is also possible that the memory is sorted, which is determined primarily by the Sort_buffer_size parameter and the result set size . MySQL internal implementation of the main 3 ways, general sorting, optimization and priority queue sorting , mainly related to 3 sorting algorithms: Fast sorting, merge sorting and heap sorting .

Suppose the table structure and SQL statements are as follows:

CREATE TABLE t1 (id int, col1 varchar, col2 varchar), col3 varchar (+), PRIMARY key (ID), key (Col1,col2)); SELECT col1,col2,col3 from T1 WHERE col1>100 ORDER by col2;

A. General sorting, two-way sequencing
(1). Get the record that satisfies the Where condition from table T1
(2). For each record, remove the record's primary key + sort key (id,col2) into the sort buffer
(3). If the sort buffer can hold all (id,col2) pairs that meet the criteria, it is sorted, otherwise the sort buffer is full, sorted and written to a temporary file . (The sorting algorithm uses a fast sorting algorithm)
(4). If a temporary file is generated in the sort order, the merge sort algorithm is used to ensure that the records in the temporary file are orderly .
(5). Loop through the above process until all the records that meet the criteria are all involved in sorting
(6). Scan the ordered (id,col2) team, which is the sort buffer, and use the primary key ID to fetch the other columns that the select needs to return (COL1,COL2,COL3)
(7). Returns the obtained result set to the user.
from the above process , whether to use file sorting is mainly to see if the sort buffer can accommodate the result set (Id,col2) that needs to be sorted, and the size of this buffer is controlled by the sort_buffer_size parameter . It also requires two IOfor one order, one for the sort field (id,col2) to sort buffer, and the second to fetch the other required return columns (COL1,COL2,COL3) by the primary key ID fetched above , because The returned result set is sorted by col2, so the ID is unordered , and a large amount of random Io is generated by a disorderly ID fetch (COL1,COL2,COL3). For the second IO fetch MySQL itself will be optimized, that is, before fetching the primary key ID sorted, and put into the buffer , this buffer size is controlled by the parameter read_rnd_buffer_size , and then ordered to fetch Records, Convert random io to sequential io.
B. Optimized sorting, single-way sequencing, Max_length_for_sort_data
The general Sort method requires an additional two Io, in addition to the sort itself. The optimal sorting method reduces the second Io, relative to the general sort. The main difference is that all the fields appearing in SQL are taken out in the sort buffer instead of just the fields needed for sorting (Id,col2). Because the sort buffer contains all the fields required by the query , it can be returned directly after sorting, eliminating the need to fetch data two times . The cost of this approach is that the same size of the sort buffer, the number of (COL1,COL2,COL3) can be stored less than (id,col2), if the sort buffer is not large enough, may result in the need to write temporary files, resulting in additional IO . Of course, MySQL provides parameter max_length_for_sort_data, which can only be used when all the fields appearing in the sort SQL are less than max_length_for_sort_data. Otherwise, you can only use regular sorting.
C. Priority queue sequencing
in order to get the final sort result, we all need to sort all the records that meet the criteria before returning. Is there still room for optimization relative to the optimization of the sorting method? version 5.6 is optimized for the order by limit M,n statement at the spatial level, adding a new sort-priority queue, which is implemented using heap sequencing . The heap sorting algorithm feature can solve the problem of the sort of limit m,n , although still need all the fields to participate in the sort, but only need to m+n the sort buffer space of tuples , for m,n very small scene, basic There is no problem with the sort buffer not being enough to merge and sort temporary files . For ascending, the large top heap, the elements of the final heap constitute the smallest n elements, for descending, the small top heap, the final heap elements constitute the largest n elements.

3. Sorting inconsistency issues

Case 1: ORDER BY No_index limit n inconsistency in MySQL5.5 and 5.6

after MySQL migrated from 5.5 to 5.6, it was found that there were duplicate values in the pagination (the sort field is not indexed, or is directly a full table scan), and MARIADB is already an optimized scheme, and 5.6 is consistent.

Problem Source: https://bbs.aliyun.com/read/248026.html, Solution: http://mysql.taobao.org/monthly/2015/06/04/

Test tables and data:

CREATE TABLE t1 (ID int primary KEY, C1 int, c2 varchar), insert into T1 values ("a"), insert into T1 values (2,2, ' B ') INSERT into T1 values (3,2, ' C '); Insert to T1 values (4,2, ' d '); INSERT into T1 values (5,3, ' e '); INSERT into T1 values (6,4, ' F INSERT into T1 values (7,5, ' G ');

Assuming 3 records per page, the first page of limit 0,3 and the second page of the limit 3,3 query results are as follows:

We can see that this record with ID 4 actually appears in two queries at the same time, which is obviously not as expected, and does not have this problem in version 5.5.

The purpose of sorting with priority Queues is to sort and use limit nIf you cannot use index ordering, and you only need to keep n records in the process of sorting, This does not solve the cost of sorting all records, but only a small amount of memory in sort buffer is required to complete the sort, as explained above.

The second page of data duplication is MySQL5.6 because priority queue sorting is used, and heap ordering is a sort of unstable sort method, which That is, the same value (the value in the example 2) may be sorted out of the data and read out the order of the data is inconsistent, can not guarantee the consistency of the data position before and after the sorting, so the phenomenon of page duplication .

To avoid this problem, there are several ways:

①: Index Sort field

By using the order of indexes, the index can be read and paged directly according to the order of the index, thus avoiding the problem encountered.

②: Use a multicolumn index to sort the same as a single column, using its primary key:

SELECT * from T1 ORDER by c1,id ASC Limit 0,3;select * from T1 ORDER by c1,id ASC limit 3, 3;

Case 2: single and two-way sorting returns different results

Two similar query statements, except that the return columns are different, the others are the same, but the results of the sorting are inconsistent .

Test tables and data:

CREATE TABLE t2 (ID int primary key, status int, C1 varchar (255), C2 varchar (255), C3 varchar (255), key (C1)); INSERT INTO T2 VA Lues (7,1, ' a ', repeat (' a ', 255), repeat (' a ', 255)); INSERT into T2 values (6,2, ' B ', repeat (' a ', 255), repeat (' a ', 255)); INSERT into T2 values (5,2, ' C ', repeat (' a ', 255), repeat (' a ', 255)), insert into T2 values (4,2, ' a ', repeat (' a ', 255), repeat ( ' A ', 255); INSERT into T2 values (3,3, ' B ', repeat (' a ', 255), repeat (' a ', 255)), insert into T2 values (2,4, ' C ', repeat (' a ', 255), repeat (' a ', 255)); INSERT into T2 values (1,5, ' a ', repeat (' a ', 255), repeat (' a ', 255));

Execute the SQL statement separately:

Select ID,STATUS,C1,C2 from T2 Force Index (c1) where c1>= ' B ' order by Status;select Id,status from T2 Force Index (C1) W Here c1>= ' B ' order by status;

The results of the implementation are as follows:

See if the execution plan is the same for both.

to illustrate the problem, because the test data is not much, make sure you can go to the C1 column index and add the hint of force index. The C1 statement takes an ID from the column index and then goes to the table to fetch the returned columns. Based on the size of the C1 column value, the relative position of the record in the C1 index is as follows:

(C1,id) <===> (b,6), (b,3), (c,5), (c,2),

The corresponding status values are 2,3,2,4 respectively. Take the data from the table and sort by status, then the relative position becomes (6,2,b), (5,2,c), (3,3,b), (2,4,C), this is the result returned by the second statement query, then why is the first query statement (6,2,B), (5,2,C) The order of the swap?

Here is a description:
1. The sum of the field type sizes taken out by the Query statement is less than Max_length_for_sort_data.
2. The sorted fields do not contain text and blob types.

The previously mentioned optimization sort can be understood: since the first query returns a column that has more bytes than Max_length_for_sort_data, the ordering takes a regular sort , in which case the second IO ,MySQL itself optimization will sort the ID, the random io into sequential io, so the return is first 5, followed by 6, and the second query is the optimization of the sorting, there is no second fetch data process, maintain a sorted record relative position, directly in the sort Out of the buffer. For the first statement, if you want to use an optimized sort, we'll make the max_length_for_sort_data setting bigger, like 2048.

4. Reference documentation

Http://dev.mysql.com/doc/refman/5.6/en/order-by-optimization.html

http://mysql.taobao.org/monthly/2015/06/04/

Http://ifxoxo.com/mysql_order_by.html

MySQL sorting principle and MySQL5.6 case analysis "turn"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.