Objective
Sorting is a basic function of the database, and MySQL is no exception. The user is able to sort the specified result set by the order BY statement, not just the order by statement, the Group by statement, and the distinct statement implicitly uses the sort. This article will start with a brief description of how SQL uses the index to avoid sorting costs, then introduce the internal principles of MySQL implementation sequencing, and introduce the parameters related to sorting, and finally give a few "strange" sort of example to talk about the problem of ordering consistency, and explain the nature of the cause of the phenomenon.
1. Sorting optimization and indexing use
In order to optimize the sorting performance of SQL statements, it is best to avoid sorting, and it is a good way to use indexes reasonably. Because the index itself is orderly, if the appropriate index is established on the fields that need to be sorted, you can skip the ordering process and increase the SQL query speed. Below I use some typical SQL to explain which SQL can take advantage of the index to reduce the ordering, which SQL is not. Suppose T1 table exists index Key1 (key_part1,key_part2), Key2 (Key2)
A. sql that can be used to avoid sorting by index
SELECT * from T1 ORDER by Key_part1,key_part2;
SELECT * from t1 WHERE Key_part1 = constant ORDER by Key_part2;
SELECT * from t1 WHERE Key_part1 > Constant order by Key_part1 ASC;
SELECT * from t1 WHERE key_part1 = constant1 and Key_part2 > Constant2 order by Key_part2;
B. SQL that cannot be used to avoid sorting by index
Sort fields in multiple indexes, cannot be sorted using the index
SELECT * from T1-Key_part1,key_part2, Key2;
The sort key order is inconsistent with the column order in the index and cannot be sorted using the index
SELECT * from T1 ordered by Key_part2, Key_part1;
The ascending and descending sequence is inconsistent and cannot be sorted using the index sort
SELECT * from T1 ordered by Key_part1 DESC, Key_part2 ASC;
Key_part1 is a range query, KEY_PART2 cannot use the index to sort the
SELECT * from t1 WHERE key_part1> the constant order by Key_part2;
2. Algorithm for sorting implementation
For SQL that cannot take advantage of the index to avoid sorting, the database has to implement its own sorting function to meet the needs of the user, in the SQL execution plan will appear "Using Filesort", where the need to note that Filesort does not mean that is the sort of file, in fact, it may be the memory sort, This is mainly determined by the Sort_buffer_size parameter and the result set size. MySQL internal implementation of the ranking there are 3 main ways, the general ranking, optimization and priority queue ranking, mainly involved in 3 kinds of sorting algorithms: fast sorting, merge sort and heap sorting. Suppose the table structure and SQL statements are as follows:
CREATE TABLE t1 (id int, col1 varchar (), col2 varchar (), col3 varchar (), PRIMARY key (ID), key (Col1,col2));
SELECT col1,col2,col3 from T1 WHERE col1>100 order by col2;
A. General sort
(1). Get a record that satisfies a where condition from the table T1
(2). For each record, remove the primary key + sort key (id,col2) of the record into the sort buffer
(3). If the sort buffer can hold all (id,col2) pairs that satisfy the condition, sort it, or the sort buffer is full, and then sort and solidify into a temporary file. (The sorting algorithm uses a fast sort algorithm)
(4). If a temporary file is generated in order, it is necessary to use the merge sort algorithm to ensure that the records in the temporary files are orderly.
(5). Loop through the process until all the records that meet the criteria participate in the sort
(6). Scan-Ordered (id,col2) pairs and use IDs to get the columns to be returned by select (COL1,COL2,COL3)
(7). Returns the obtained result set to the user.
From the above process, whether the use of file sorting is primarily to see whether the sort buffer can accommodate the sort of (id,col2) pairs, the size of this buffer is controlled by the Sort_buffer_size parameter. In addition, a sort requires two Io, one is fishing (id,col2), the second is fishing (col1,col2,col3), because the result set is returned by col2 order, so ID is disorderly, through the sequence of ID to get (col1,col2,col3) Will generate a lot of random IO. For the second time MySQL itself is an optimization, that is, before the first sorting the ID, and into the buffer, this buffer size by the parameters of the Read_rnd_buffer_size control, and then orderly to catch records, the random io into sequential io.
B. Optimizing sorting
The general sort method requires an additional two Io, in addition to the sort itself. The optimized sort method decreases the second IO compared to the regular sort. The main difference is that putting the sort buffer is not (id,col2) but (col1,col2,col3). Because the sort buffer contains all the fields that the query needs, it can be returned directly after the sorting is complete without two salvage data. The cost of this approach is that the same size of sort buffer can hold less (COL1,COL2,COL3) than (id,col2), and if the sort buffer is not large enough, it may result in the need to write temporary files, resulting in additional IO. Of course, MySQL provides parameter max_length_for_sort_data, only when the sorting tuple is less than max_length_for_sort_data, you can use the optimized sort method, otherwise you can only use the normal sort.
C. Priority queue sorting
In order to get the final sort result, we need to sort all the records that meet the criteria to return. So, compared to the optimization of the sorting method, whether there is room for optimization? The 5.6 version is optimized at the spatial level for the order by limit M,n statement, adding a new sort-priority queue, which is implemented by heap sorting. Heap sorting algorithm features just can solve limit m,n such sort of problem, although still need all elements to participate in sorting, but only need to m+n a tuple of sort buffer space, for m,n very small scene, basically not because sort Insufficient buffer to cause the need for temporary files to merge sort the problem. For ascending, with a large top heap, the elements in the final heap are composed of the smallest n elements, with the small top heap in descending order, and the elements of the final heap constituting the largest n element.
3. Sorting inconsistency problem
Case 1
After MySQL migrated from 5.5 to 5.6, a duplicate value was found for paging.
Test tables and data:
CREATE TABLE t1 (ID int primary KEY, C1 int, C2 varchar (128));
INSERT into T1 values (1,1, ' a ');
INSERT into T1 values (2,2, ' B ');
INSERT into T1 values (3,2, ' C ');
INSERT into T1 values (4,2, ' d ');
INSERT into T1 values (5,3, ' e ');
INSERT into T1 values (6,4, ' f ');
INSERT into T1 values (7,5, ' G ');
Assuming 3 records per page, the first page limit 0,3 and the second page limit 3,3 query results are as follows:
We can see that the record with ID 4 actually appears in two queries at the same time, which is obviously not the way it was expected, and it's not in version 5.5. The reason for this phenomenon is that 5.6 of the statements for limit M,n adopted a priority queue, and the priority queue with a heap implementation, such as the above example order by C1 ASC limit 0,3 need to use a large heap size 3, limit 3,3 need to take the size of 6 of the large heap. Because the C1 is 2 records have 3, and the heap sort is unstable (for the same key value, cannot guarantee the sort and the position before the order), so the phenomenon that causes paging duplication. To avoid this problem, we can add unique values to the sort, such as primary key IDs, so that because IDs are unique, the key values that participate in sorting are not the same. Write the SQL as follows:
SELECT * FROM t1 the c1,id ASC limit 0,3;
SELECT * from T1 ORDER by c1,id ASC limit 3, 3;
Case 2
Two similar query statements, except the return columns, are the same, but the results of the order are inconsistent.
Test tables and data:
CREATE TABLE t2 (ID int primary key, status int, C1 varchar (255), C2 varchar (255), C3 varchar (255), key (C1));
INSERT into T2 values (7,1, ' a ', repeat (' a ', 255), repeat (' a ', 255));
INSERT into T2 values (6,2, ' B ', repeat (' a ', 255), repeat (' a ', 255));
INSERT into T2 values (5,2, ' C ', repeat (' a ', 255), repeat (' a ', 255));
INSERT into T2 values (4,2, ' a ', repeat (' a ', 255), repeat (' a ', 255));
INSERT into T2 values (3,3, ' B ', repeat (' a ', 255), repeat (' a ', 255));
INSERT into T2 values (2,4, ' C ', repeat (' a ', 255), repeat (' a ', 255));
INSERT into T2 values (1,5, ' a ', repeat (' a ', 255), repeat (' a ', 255));
Execute SQL statements separately:
Select ID,STATUS,C1,C2 from T2 Force Index (c1) where c1>= ' B ' order by status;
Select Id,status from T2 Force Index (c1) where c1>= ' B ' order by status;
The results of the implementation are as follows:
See if the execution plan is the same
To illustrate the problem, I added the hint of force index to the statement to make sure that I was able to walk up C1 column index. Statement to grab the ID from the C1 column index and then skim the table for the returned column. Depending on the size of the C1 column value, the relative position of the record in the C1 index is as follows:
(c1,id) = = = = (b,6), (b,3), (5,c), (c,2), corresponding status value is 2 3 2 4 respectively. When you get data from a table and sort by status, the relative position becomes (6,2,b), (5,2,c), (3,3,c), (2,4,C), which is the result of the second statement query, so why is the first query (6,2,B), (5,2,c), the reverse order? Here's what I mentioned earlier about a. General sort and B. Optimizing the order of the winning red part, you can understand why. Because the first query returns a column that has more bytes than Max_length_for_sort_data, which causes the sort to be sorted in a regular order, in which case MySQL sorts the rowid and turns random io into sequential io, so it returns 5 in front, 6 in the back While the second query is optimized, there is no process to skim the data for the second time, and the relative position of the sorted record is maintained. For the first statement, if you want to use the optimization of the order, we will max_length_for_sort_data settings can be adjusted, such as 2048.
Here is my experience on MySQL custom sort (field,instr,locate), I hope to help you
First, here are three functions (order by Field,order by Instr,order by locate)
Original table:
ID user pass
AAA triple AAA
BBB
CCC CCC
ddd
eee Eee
FFF FFF
The following are the results of my execution:
SELECT * from ' User ' Order by field (2,3,5,4,ID) ASC
ID User pass
AAA, AAA
CCC CCC DDD ddd
eee Eee
FFF FFF
BBB BBB
According to the result analysis: Order by field (2,3,5,4,1,6) results are displayed in sequence: 1 3 4 5 6 2
SELECT * from ' User ' Order by field (2,3,5,4,ID) desc
ID user pass
bbb triple triple
AAA
CCC CCC
ddd
eee Eee
FFF FFF
According to the result analysis: Order by field (2,3,5,4,1,6) results are displayed in sequence: 2 1 3 4 5 6
SELECT * from ' user ' ORDER by INSTR (' 2,3,5,4 ', id) ASC
ID User pass
AAA AAA
FFF FFF
BBB BBB
CCC CCC
Eee Eee
ddd ddd
Based on the result analysis: the ORDER by INSTR (2,3,5,4,1,6) results are displayed in the sequence: 1 6 2 3 5 4
SELECT * from ' user ' ORDER by INSTR (' 2,3,5,4 ', id) DESC
ID User pass
ddd ddd
eee Eee
CCC CCC BBB triple AAA AAA
FFF FFF
Based on the result analysis: the ORDER by INSTR (2,3,5,4,1,6) results are displayed in the sequence: 4 5 3 2 1 6
SELECT * from ' user ' ORDER by locate (ID, ' 2,3,5,4 ') ASC
ID User Pass
AAA AAA
FFF FFF
BBB
CCC CCC
Eee Eee
ddd ddd
Based on the result analysis: the order by locate (2,3,5,4,1,6) results are displayed in the sequence: 1 6 2 3 5 4
SELECT * from ' user ' ORDER by locate (ID, ' 2,3,5,4 ') DESC
ID User pass
ddd ddd
eee Eee
CCC CCC BBB triple AAA AAA
FFF FFF
Based on the result analysis: the order by locate (2,3,5,4,1,6) results are displayed in the sequence: 4 5 3 2 1 6
If the ID order in the database I want to look up is first (2,3,5,4) and then in the other ID order, you first have to sort him out in descending order (4 5 3 2), then in SELECT * from ' user ', the INSTR (' 4,5,3,2 ', id) DESC Limit 0,10 or use the SELECT * from ' user ' ORDER by locate (ID, ' 4,5,3,2 ') DESC get the results you want.
ID User pass
BBB BBB
CCC CCC
Eee Eee
ddd AAA AAA
FFF FFF