MySQL sorting principle and case analysis, mysql sorting Case Analysis

Last Update:2016-03-22 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Preface
Sorting is a basic function in the database, and MySQL is no exception. The Order by statement can be used to sort the specified result set. It is not only an Order by statement, but also a Group by statement and a Distinct statement. This article will first briefly introduce how SQL uses indexes to avoid sorting costs, then introduce the Internal principles of MySQL for sorting, and introduce the parameters related to sorting, finally, we will give several "strange" sorting examples to talk about the sorting consistency problem and explain the underlying causes of the phenomenon.

1. Sorting optimization and Indexing
To optimize the sorting performance of SQL statements, it is best to avoid sorting. Using indexes properly is a good method. Because the index itself is also ordered, if an appropriate index is created on the fields to be sorted, you can skip the sorting process and improve the SQL query speed. Below I will use some typical SQL statements to explain which sqls can use indexes to reduce sorting and which sqls cannot. Suppose that the table t1 has an index key1 (key_part1, key_part2), key2 (key2)

A. You can use indexes to avoid sorting SQL statements.

SELECT * FROM t1 ORDER BY key_part1,key_part2;SELECT * FROM t1 WHERE key_part1 = constant ORDER BY key_part2;SELECT * FROM t1 WHERE key_part1 > constant ORDER BY key_part1 ASC;SELECT * FROM t1 WHERE key_part1 = constant1 AND key_part2 > constant2 ORDER BY key_part2;

B. SQL statements that do not use indexes to avoid sorting

// Sorting fields in Multiple indexes cannot be sorted using the index SELECT * FROM t1 order by key_part1, key_part2, key2; // The sorting key ORDER is inconsistent with the column ORDER in the index, you cannot use index sorting SELECT * FROM t1 order by key_part2, key_part1; // The ascending or descending ORDER is inconsistent. You cannot use index sorting SELECT * FROM t1 order by key_part1 DESC, key_part2 ASC; // key_part1 is a range query. key_part2 cannot use the index to sort SELECT * FROM t1 WHERE key_part1> constant order by key_part2;

2. Sorting Algorithm
For SQL statements that cannot use indexes to avoid sorting, the database has to implement the sorting function on its own to meet user requirements. In this case, "Using filesort" appears in the SQL Execution Plan ", it should be noted that filesort does not mean File Sorting. In fact, it may also be memory sorting, which is mainly determined by the sort_buffer_size parameter and the result set size. There are three main sorting methods in MySQL: Regular sorting, optimized sorting, and priority queue sorting. They mainly involve three sorting algorithms: quick sorting, Merge Sorting, and heap sorting. Assume that the table structure and SQL statement are as follows:

CREATE TABLE t1(id int, col1 varchar(64), col2 varchar(64), col3 varchar(64), PRIMARY KEY(id),key(col1,col2));SELECT col1,col2,col3 FROM t1 WHERE col1>100 ORDER BY col2;

A. Regular sorting
(1). Obtain records that meet the WHERE condition from Table t1
(2) For each record, extract the record's primary key + sort key (id, col2) and put it into the sort buffer.
(3). If the sort buffer can store all the (id, col2) pairs that meet the conditions, sort them; otherwise, after the sort buffer is full, sort it and solidify it into a temporary file. (The sorting algorithm uses the quick sorting algorithm)
(4). If temporary files are generated in sorting, use the Merge Sorting Algorithm to ensure that the records in the temporary files are orderly.
(5). Execute the above process cyclically until all records meeting the conditions are sorted.
(6) scan the (id, col2) pairs in the sorted order, and use the id to find the columns (col1, col2, col3) to be returned by the SELECT statement)
(7). Return the obtained result set to the user.
From the above process, whether to sort files depends on whether the sort buffer can accommodate the (id, col2) pairs to be sorted. The buffer size is controlled by the sort_buffer_size parameter. In addition, two IO operations are required for one sorting, one for retrieval (id, col2) and the other for retrieval (col1, col2, col3). Since the returned result set is sorted by col2, therefore, IDs are out of order. A large number of random IO is generated when the IDS (col1, col2, col3) are obtained through unordered IDs. For the second MySQL optimization, the id is first sorted and placed into the buffer before fishing. The cache size is controlled by the read_rnd_buffer_size parameter, and then the records are retrieved in sequence, converts random I/O to sequential I/O.
B. Optimize sorting
In addition to sorting, the conventional sorting method requires two additional IO operations. Compared with regular sorting, the optimized sorting method reduces the second IO. The main difference is that the buffer in sort is not (id, col2), but (col1, col2, col3 ). Because the sort buffer contains all the fields required for the query, it can be returned directly after sorting, without secondary data fishing. The cost of this method is that the number of (col1, col2, col3) values that can be stored in sort buffer of the same size must be smaller than (id, col2). If sort buffer is not large enough, this may cause the need to write temporary files, resulting in additional IO. Of course, MySQL provides the max_length_for_sort_data parameter. Only when the sorting tuples are smaller than max_length_for_sort_data can we optimize the sorting method. Otherwise, we can only use the regular sorting method.
C. Priority queue sorting
In order to get the final sorting result, we need to sort all records that meet the conditions to return the results. Is there room for optimization relative to the optimized sorting method? Version 5.6 optimizes the Order by limit M and N statements at the spatial level and adds a new sorting method-priority queue, which is implemented by heap sorting. Heap Sorting Algorithm features can solve the limit M, N sorting problem. Although all elements still need to be involved in sorting, only the sort buffer space of M + N tuples is required, for scenarios where M and N are small, the sort buffer is not enough, which causes the need to merge and sort temporary files. For ascending order, a large top heap is used, and the final elements in the heap form the smallest N elements. For descending order, a small top heap is used, and the final elements in the heap form the largest N elements.
3. Inconsistent sorting

Case 1
After Mysql is migrated from 5.5 to 5.6, duplicate values are found on the page.
Test Table and data:

create table t1(id int primary key, c1 int, c2 varchar(128));insert into t1 values(1,1,'a');insert into t1 values(2,2,'b');insert into t1 values(3,2,'c');insert into t1 values(4,2,'d');insert into t1 values(5,3,'e');insert into t1 values(6,4,'f');insert into t1 values(7,5,'g');

Suppose there are three records on each page. The results of limit on the first page and limit on the second page are as follows:

We can see that the record with id 4 appears in two queries at the same time, which obviously does not meet the expectation and does not have this problem in version 5.5. The reason for this phenomenon is that 5.6 uses the priority queue for limit M, N statements, and the priority queue uses the heap implementation, for example, in order by c1 asc limit, a large top heap with a size of 3 is required; in limit, a large top heap with a size of 6 is required. Because there are three records whose c1 is 2, the heap sorting is unstable (for the same key value, it cannot be ensured that the position after sorting is consistent with that before sorting ), this leads to repeated pages. To avoid this problem, we can add a unique value in the sorting, such as the primary key id. Because the id is unique, make sure that the key values involved in the sorting are different. Write the SQL statement as follows:

select * from t1 order by c1,id asc limit 0,3;select * from t1 order by c1,id asc limit 3,3;

Case 2

Two similar query statements, except for different returned columns, are the same, but the sorting results are inconsistent.
Test Table and data:

create table t2(id int primary key, status int, c1 varchar(255),c2 varchar(255),c3 varchar(255),key(c1));insert into t2 values(7,1,'a',repeat('a',255),repeat('a',255));insert into t2 values(6,2,'b',repeat('a',255),repeat('a',255));insert into t2 values(5,2,'c',repeat('a',255),repeat('a',255));insert into t2 values(4,2,'a',repeat('a',255),repeat('a',255));insert into t2 values(3,3,'b',repeat('a',255),repeat('a',255));insert into t2 values(2,4,'c',repeat('a',255),repeat('a',255));insert into t2 values(1,5,'a',repeat('a',255),repeat('a',255));

Execute SQL statements separately:

select id,status,c1,c2 from t2 force index(c1) where c1>='b' order by status;select id,status from t2 force index(c1) where c1>='b' order by status;

The execution result is as follows:

Check whether the execution plans are the same.

To illustrate the problem, I added the force index hint in the statement to ensure that the column c1 index can be taken. The statement uses the c1 column index to retrieve the id, and then removes the returned column from the table. Based on the size of the c1 column value, the relative position of the record in the c1 index is as follows:

(C1, id) = (B, 6), (B, 3), (5, c), (c, 2 ), the corresponding status values are 2, 3, 2, 4, respectively. Fetch data from the table and sort the data by status. The relative position is (6, 2, B), (5, 2, c), (3, 3, c), (2, 4, c ), this is the result returned by the second query. Why are the first query statements (6, 2, B) and (5, 2, c) in a different order? Here, we can understand the reason by looking at the section a. Regular sorting and B. Optimizing the ranking marked red. Because the number of bytes returned by the first query exceeds max_length_for_sort_data, the sorting adopts regular sorting. In this case, MYSQL sorts rowid and converts random IO to sequential IO, therefore, the return value is 5 before and 6 after, while the second query uses optimized sorting, without the second data retrieval process, keeping the relative position of the record after sorting. If you want to optimize the sorting of the first statement, set max_length_for_sort_data to a greater value, for example, 2048.

4. References
Http://dev.mysql.com/doc/refman/5.6/en/order-by-optimization.html
Http://mysql.taobao.org/monthly/2015/06/04/
Http://ifxoxo.com/mysql_order_by.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

MySQL sorting principle and case analysis, mysql sorting Case Analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

MySQL sorting principle and case analysis, mysql sorting Case Analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support