"MySQL" Sorting principle and Case analysis

Source: Internet
Author: User

Objective

Sorting is a basic function in a database, and MySQL is no exception. The order BY statement allows the user to sort the specified result set, in fact, not only the order BY statement, the Group by statement, the distinct statement implicitly uses the sort. In this paper, we will briefly explain how SQL uses the index to avoid sorting costs, then introduce the internal principles of MySQL implementation sequencing, and introduce the parameters related to sorting, and finally give a few "strange" sort of example, to talk about the consistency of the problem, and explain the nature of the cause of the phenomenon.

Sorting optimization and Index usage

In order to optimize the sorting performance of SQL statements, it is best to avoid sorting, and the proper use of indexes is a good way. Because the index itself is also ordered, if the appropriate index is established on the fields that need to be sorted, you can skip the sorting process and improve the SQL query speed. Below I have some typical SQL to illustrate which SQL can take advantage of indexes to reduce sorting and which SQL cannot. Suppose T1 table exists index Key1 (keypart1,keypart2), Key2 (Key2)

A. SQL that can use indexes to avoid sorting

SELECT * FROM t1 ORDER BY key_part1,key_part2;SELECT * FROM t1 WHERE key_part1 = constant ORDER BY key_part2;SELECT * FROM t1 WHERE key_part1 > constant ORDER BY key_part1 ASC;SELECT * FROM t1 WHERE key_part1 = constant1 AND key_part2 > constant2 ORDER BY key_part2;

B. Inability to use indexes to avoid sorting SQL

//排序字段在多个索引中,无法使用索引排序SELECT * FROM t1 ORDER BY key_part1,key_part2, key2; //排序键顺序与索引中列顺序不一致,无法使用索引排序SELECT * FROM t1 ORDER BY key_part2, key_part1; //升降序不一致,无法使用索引排序SELECT * FROM t1 ORDER BY key_part1 DESC, key_part2 ASC; //key_part1是范围查询,key_part2无法使用索引排序SELECT * FROM t1 WHERE key_part1> constant ORDER BY key_part2;
Algorithm of sorting implementation

For SQL that does not use the index to avoid sorting, the database has to implement its own sort function to meet user needs, when the SQL execution plan will appear "Using Filesort", it is important to note that Filesort does not mean that is the file sort, in fact, there may be memory sort, This is determined primarily by the sortbuffersize parameter and the result set sizes. MySQL internal implementation of the main 3 ways, general sorting, optimization and priority queue sorting, mainly related to 3 sorting algorithms: Fast sorting, merge sorting and heap sorting. Suppose the table structure and SQL statements are as follows:

CREATE TABLE t1(id int, col1 varchar(64), col2 varchar(64), col3 varchar(64), PRIMARY KEY(id),key(col1,col2));SELECT col1,col2,col3 FROM t1 WHERE col1>100 ORDER BY col2;

A. General sorting

    • Get a record that satisfies a where condition from table T1
    • For each record, remove the record's primary key + sort key (id,col2) into the sort buffer
    • Sort buffer is sorted if it can hold all (id,col2) pairs that meet the criteria, otherwise the sort buffer is filled, sorted and cured into a temporary file. (The sorting algorithm uses a fast sorting algorithm)
    • If a temporary file is generated in the order, the merge sort algorithm is used to ensure that the records in the temporary file are orderly.
    • Loop through the above process until all the records that meet the criteria are all involved in sorting
    • Scan the ordered (id,col2) pair, and use the ID to go to the column (COL1,COL2,COL3) to be returned by the Select
    • Returns the obtained result set to the user.

From the above process, whether to use file sorting is mainly to see if the sort buffer can tolerate the order (ID,COL2) pair, the size of this buffer is controlled by the sortbuffersize parameter. In addition, a sort requires two Io, one for the Id,col2, and the second for the COL1,COL2,COL3, because the returned result set is sorted by col2, so the ID is unordered, and the ID of the order is scrambled (COL1,COL2,COL3) Generates a lot of random IO. for the second time MySQL itself an optimization, that is, before fishing the ID sorted first, and put into the buffer, this buffer size by the parameters of the readRndbuffer_size Control, and then orderly to remove records, random io into sequential io.

B. Optimizing sorting

The general sort method requires an additional two Io, in addition to the sort itself. the optimized sorting method reduces the second Io, relative to the general sort. The main difference is that putting the sort buffer is not (id,col2), but (COL1,COL2,COL3). Because the sort buffer contains all the fields required by the query, it can be returned directly after the sort is completed, eliminating the need for two fishing data. the cost of this approach is that the same size of the sort buffer, the number of (COL1,COL2,COL3) can be stored less than (id,col2), if the sort buffer is not large enough, may result in the need to write temporary files, resulting in additional IO. Of course MySQL provides the parameter maxlengthforsortdata, only if the sorting tuple is less than maxlengthforsortdata, Can be used to optimize the sorting method, otherwise you can only use the general sort.

C. Priority queue sequencing

In order to get the final sort result, we all need to sort all the records that meet the criteria to return. Is there still room for optimization relative to the optimization of the sorting method? Version 5.6 is optimized for the order by limit M,n statement at the spatial level, adding a new sort-priority queue, which is implemented using heap sequencing. The heap sorting algorithm feature can solve the problem of the sort of limit m,n, although still need all elements to participate in the sort, but only need to m+n a tuple of the sort buffer space, for m,n very small scene, basically not because of the sort Buffer is not sufficient to cause the temporary file to be merged and sorted. for ascending, the large top heap, the elements of the final heap constitute the smallest n elements, for descending, the small top heap, the final heap elements constitute the largest n elements.

Sort inconsistency issues

A. Case 1

After MySQL migrated from 5.5 to 5.6, it found that there were duplicate values for the paging.

Test tables and data:

create table t1(id int primary key, c1 int, c2 varchar(128));insert into t1 values(1,1,‘a‘);insert into t1 values(2,2,‘b‘);insert into t1 values(3,2,‘c‘);insert into t1 values(4,2,‘d‘);insert into t1 values(5,3,‘e‘);insert into t1 values(6,4,‘f‘);insert into t1 values(7,5,‘g‘);

Assuming 3 records per page, the first page of limit 0,3 and the second page of the limit 3,3 query results are as follows:

[email protected] 07:58:32>select * from t1 order by c1 asc limit 0,3;+----+------+------+| id | c1   | c2   |+----+------+------+|  1 |    1 | a    ||  3 |    2 | c    ||  4 |    2 | d    |+----+------+------+[email protected] 07:59:11>select * from t1 order by c1 asc limit 3,3;+----+------+------+| id | c1   | c2   |+----+------+------+|  4 |    2 | d    ||  5 |    3 | e    ||  6 |    4 | f    |+----+------+------+

We can see that this record with ID 4 actually appears in two queries at the same time, which is obviously not as expected, and does not have this problem in version 5.5. The reason for this is that 5.6 of the statements for limit m,n take precedence queues, while priority queues are implemented in heaps, such as the above example, the order by C1 ASC Limit 0,3 requires a large top heap of size 3, and a large top heap of size 6 is required for the limit 3,3. Because there are 3 records for the C1 2, and the heap ordering is non-stable (for the same key value, there is no guarantee that the sorting is consistent with the position before the sort), so the page repetition occurs. to avoid this problem, we can add unique values to the sort, such as the primary key ID, so that because the IDs are unique, the key values that participate in the sort are not the same. write the SQL as follows:

select * from t1 order by c1,id asc limit 0,3;select * from t1 order by c1,id asc limit 3,3;

B. Case 2

Two similar query statements, except that the return columns are different, the others are the same, but the results of the sorting are inconsistent. Test tables and data

create table t2(id int primary key, status int, c1 varchar(255),c2 varchar(255),c3 varchar(255),key(c1));insert into t2 values(7,1,‘a‘,repeat(‘a‘,255),repeat(‘a‘,255));insert into t2 values(6,2,‘b‘,repeat(‘a‘,255),repeat(‘a‘,255));insert into t2 values(5,2,‘c‘,repeat(‘a‘,255),repeat(‘a‘,255));insert into t2 values(4,2,‘a‘,repeat(‘a‘,255),repeat(‘a‘,255));insert into t2 values(3,3,‘b‘,repeat(‘a‘,255),repeat(‘a‘,255));insert into t2 values(2,4,‘c‘,repeat(‘a‘,255),repeat(‘a‘,255));insert into t2 values(1,5,‘a‘,repeat(‘a‘,255),repeat(‘a‘,255));

Execute the SQL statement separately:

select id,status,c1,c2 from t2 force index(c1) where c1>=‘b‘ order by status;select id,status from t2 force index(c1) where c1>=‘b‘ order by status;

Execution results

[email protected] 08:01:24>select id,status,c1,c2 from T2 Force Index (c1) where c1>= ' B ' ORDER by status;+----+ --------+------+----------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------- ---------------------------------------------------+| ID | Status | C1 |                                                                                                                                                                                                                                                  C2 |+----+--------+------+--------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------  -----------------------------------------------------------+|      5 | 2 | C | AaaaaaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa  Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa  ||      6 | 2 | B | Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa  aaaaaaaaaaaaa | |      3 | 3 | B | Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa  aaaaaaaaaaaaa | |      2 | 4 | C | Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa AaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA |+----+--------+------+--------------- --------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------- --------------------+4 rows in Set (0.00 sec) [email protected] 08:01:31>select id,status from T2 Force Index (C1) W Here c1>= ' B ' ORDER by status;+----+--------+| ID |  Status |+----+--------+|      6 |  2 | |      5 |  2 | |      3 |  3 | |      2 | 4 |+----+--------+4 rows in Set (0.00 sec)

See if the execution plan is the same for both

[email protected] 08:08:10>explain select ID,STATUS,C1,C2 from T2 Force Index (c1) where c1>= ' B ' ORDER by status     ; +----+-------------+-------+-------+---------------+------+---------+------+------+---------------------------- -----------+| ID | Select_type | Table | Type | Possible_keys | Key | Key_len | Ref | Rows | Extra |+----+-------------+-------+-------+---------------+------+---------+------+------+  ---------------------------------------+| 1 | Simple | T2 | Range | C1 | C1 | 768 |    NULL | 4 | Using index condition; Using filesort |+----+-------------+-------+-------+---------------+------+---------+------+------+------------- --------------------------+[email protected] 08:08:17>explain select id,status from T2 Force Index (C1) where C1                            >= ' B ' order by status; +----+-------------+-------+-------+---------------+------+---------+------+------+---------------------------- -----------+| ID | Select_type | Table | Type | Possible_keys | Key | Key_len | Ref | Rows | Extra |+----+-------------+-------+-------+---------------+------+---------+------+------+  ---------------------------------------+| 1 | Simple | T2 | Range | C1 | C1 | 768 |    NULL | 4 | Using index condition; Using filesort |+----+-------------+-------+-------+---------------+------+---------+------+------+------------- --------------------------+

To illustrate the problem, I added the hint of force index to the statement to make sure that I was able to walk up the C1 column index. The statement uses the C1 column index to skim the ID, and then goes to the table to skim the returned column. Based on the size of the C1 column value, the relative position of the record in the C1 index is as follows:

(c1,id) = = = (b,6), (b,3), (5,c), (c,2), corresponding status values are 2 3 2 4, respectively. To skim the data from the table and sort by status, the relative position becomes (6,2,b), (5,2,c), (3,3,c), (2,4,c), and this is the result returned by the second statement query, so why is the first query statement (6,2,B), (5,2,C) The order of exchange? Here's what I mentioned before. A. General ordering and B. Optimization sort the red part of the bid, you can understand why. since the number of bytes returned by the first query exceeds maxlengthforsortdata, the order is sorted by a regular sort, in which case MySQL will sort rowid, random io into sequential io, So the return is 5 in front, 6 in the rear, while the second query uses an optimized sort, without a second skimming of the data, maintaining the relative position of the sorted records. for the first statement, if you want to use an optimized sort, we'll make the maxlengthforsortdata setting larger, like 2048.

[email protected] 08:11:24>select id,status,c1,c2 from T2 Force Index (c1) where c1>= ' B ' ORDER by status;+----+ --------+------+----------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------- ---------------------------------------------------+| ID | Status | C1 |                                                                                                                                                                                                                                                  C2 |+----+--------+------+--------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------  -----------------------------------------------------------+|      6 | 2 | B | AaaaaaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa  Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa  ||      5 | 2 | C | Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa  aaaaaaaaaaaaa | |      3 | 3 | B | Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa  aaaaaaaaaaaaa | |      2 | 4 | C | Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa AaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA |+----+--------+------+--------------- --------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------- --------------------+
Reference
    1. MySQL sorting principle and case analysis
    2. ORDER by optimization

"MySQL" Sorting principle and Case analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.