MySQL Order By Principle and Optimization
1 Introduction
For business (MySQL) DBAs or business developers, order by is a common business function that sorts the results based on specified fields to meet front-end display requirements. However, sorting operations also often show slow query rankings. This article will gradually learn about order by from several aspects, such as principle and actual case optimization, and restrictions on order.
Principle 2
Before learning about the order by sorting principle, I strongly wrote two articles about the sort algorithm 《Implementation of Merge Sorting《Classic Sorting Algorithm. MySQL supports two sorting algorithms: Regular sorting and optimization. in MySQL 5.6, order by limit M and N are specially optimized. This is listed as the third sorting algorithm.
2.1 regular sorting
A. Obtain records that meet the WHERE condition from Table t1.
B. For each record, extract the record's primary key + sort key (id, col2) and put it into the sort buffer.
C. If the sort buffer can store all (id, col2) pairs that meet the conditions, sort them. Otherwise, after the sort buffer is full, sort them and solidify them into temporary files. (The sorting algorithm uses the quick sorting algorithm)
D. If temporary files are generated in sorting, use the Merge Sorting Algorithm to ensure that the records in the temporary files are ordered.
E. Execute the above process cyclically until all matching records are sorted.
F scans the sorted (id, col2) pairs and uses the id to find the columns (col1, col2, col3) to be returned by the SELECT statement)
G returns the obtained result set to the user.
From the above process, whether to sort files depends on whether the sort buffer can accommodate the (id, col2) pairs to be sorted. The buffer size is controlled by the sort_buffer_size parameter. In addition, two IO operations are required for one sorting, one for retrieval (id, col2) and the other for retrieval (col1, col2, col3). Since the returned result set is sorted by col2, therefore, IDs are out of order. A large number of random IO is generated when the IDS (col1, col2, col3) are obtained through unordered IDs. For the second MySQL optimization, the id is first sorted and placed into the buffer before fishing. The cache size is controlled by the read_rnd_buffer_size parameter, and then the records are retrieved in sequence, converts random I/O to sequential I/O.
2.2 optimize sorting
In addition to sorting, the conventional sorting method requires two additional IO operations. Compared with regular sorting, the optimized sorting method reduces the second IO. The main difference is that the buffer in sort is not (id, col2), but (col1, col2, col3 ). Because the sort buffer contains all the fields required for the query, it can be returned directly after sorting, without secondary data fishing. The cost of this method is that the number of (col1, col2, col3) values that can be stored in sort buffer of the same size must be smaller than (id, col2). If sort buffer is not large enough, this may cause the need to write temporary files, resulting in additional IO. Of course, MySQL provides the max_length_for_sort_data parameter. Only when the sorting tuples are smaller than max_length_for_sort_data can we optimize the sorting method. Otherwise, we can only use the regular sorting method.
2.3 priority queue sorting
In order to get the final sorting result, we need to sort all records that meet the conditions to return the results. Is there room for optimization relative to the optimized sorting method? Version 5.6 optimizes the Order by limit M and N statements at the spatial level and adds a new sorting method: priority queue, which is implemented by heap sorting. Heap Sorting Algorithm features can solve the limit M, N sorting problem. Although all elements still need to be involved in sorting, only the sort buffer space of M + N tuples is required, for scenarios where M and N are small, the sort buffer is not enough, which causes the need to merge and sort temporary files. For ascending order, a large top heap is used, and the final elements in the heap form the smallest N elements. For descending order, a small top heap is used, and the final elements in the heap form the largest N elements.
3. Optimization
Through the above principle analysis, we know that the essence of sorting is to convert the result set into an ordered result set through certain algorithms (cpu-consuming operations, memory, and temporary file IO. How to optimize it? The answer is to use the order of indexes in two ways (MySQL's B + tree index is sorted Ascending by default from small to large) to reduce sorting. The best way is to directly not sort indexes.
- Create table t1 (
- Id int not null primary key,
- Key_part1 int (10) not null,
- Key_part2 varchar (10) not null default '',
- Key_part3
- Key idx_kp1_kp2 (key_part1, key_part2, key_part4 ),
- Key idx_kp3 (id)
- ) Engine = innodb default charset = utf8
The following types of queries can use the index idx_kp1_kp2
- SELECT * FROM t1 order by key_part1, key_part2 ,...;
- SELECT * FROM t1 WHERE key_part1 = constant order by key_part2;
- SELECT * FROM t1 order by key_part1 DESC, key_part2 DESC;
- SELECT * FROM t1 WHERE key_part1 = 1 order by key_part1 DESC, key_part2 DESC;
- SELECT * FROM t1 WHERE key_part1> constant order by key_part1 ASC;
- SELECT * FROM t1 WHERE key_part1 <constant order by key_part1 DESC;
- SELECT * FROM t1 WHERE key_part1 = constant1 AND key_part2> constant2 order by key_part2
TIPS: you should look at the official examples in a dialectical manner and do your best.
I think this is the focus of this Article. for developers, it is easier to remember that index sorting cannot be used.
- 1 The most common condition is that the index (key2) used to search results is different from the index (key1) for sorting. where a = x and B = y order by id;
- SELECT * FROM t1 WHERE key2 = constant order by key1;
- 2. Sorting fields in different indexes cannot be sorted by indexes.
- SELECT * FROM t1 order by key1, key2;
- 3. the sorting field order is inconsistent with the column order in the index, and index sorting cannot be used. For example, the index is key idx_kp1_kp2 (key_part1, key_part2)
- SELECT * FROM t1 order by key_part2, key_part1;
- 4. The ascending/descending order in order by is inconsistent with the default descending order in the index, so the index sorting cannot be used.
- SELECT * FROM t1 order by key_part1 DESC, key_part2 ASC;
- 5. ey_part1 is a range query, and key_part2 cannot be sorted by index.
- SELECT * FROM t1 WHERE key_part1> constant order by key_part2;
- 5 The rder by and group by columns are inconsistent.
- SELECT * FROM t1 WHERE key_part1 = constant order by key_part2 group by key_part4;
- 6. indexes are stored out of order, such as hash indexes.
- 7 The order by field is indexed only with the prefix key idx_col (col (10 ))
- Select * from t1 order by col;
- 8. For join queries, the sorting fields are not all from the first table. Use explain to check that the type value of the first table in the execution plan is not const.
How can we optimize sorting operations that cannot be avoided? Obviously, the using index sorting method is preferred. If the index sorting method cannot be used, MySQL should try its best to use the second single-path algorithm for sorting. This can reduce a large number of random IO operations and greatly improve the sorting efficiency.
1. Set the max_length_for_sort_data parameter.
In MySQL, it is determined whether to use the old-fashioned or simplified Sorting Algorithm by using the max_length_for_sort_data parameter. When the maximum length of all returned fields is smaller than this parameter value, MySQL selects the improved sorting algorithm. Otherwise, it selects the old algorithm. Therefore, if there is sufficient memory for MySQL to store non-sorted fields that need to be returned, you can increase the value of this parameter so that MySQL can choose to use the sort algorithm of the latest version.
2 remove unnecessary returned Fields
When the memory is not abundant, you cannot force MySQL to use the sort algorithm by adding the preceding parameters. Otherwise, MySQL may have to divide the data into multiple segments, then sort the data, which may be worth the candle. In this case, you need to remove unnecessary return fields to adapt the length of the returned results to the constraints of the max_length_for_sort_data parameter.
At the same time, we must standardize MySQL development specifications to avoid large fields as much as possible. When a select query column contains a large field blob or text, MySQL selects regular sorting.
"The optimizer selects which filesort algorithm to use. It normally uses the modified algorithm blocks t when BLOB or TEXT columns are involved, in which case it uses the original algorithm ."
3. Increase the value of sort_buffer_size.
If this value is too small, and you return too many results at a time, you may sort the results multiple times, and then sort the results at last, this will take longer. Increasing sort_buffer_size is not to allow MySQL to select the sort algorithm, but to minimize the number of data segments to be sorted in the sorting process, because segmentation will cause MySQL to have to use temporary tables for exchange sorting. But the value is not larger, the better:
1 sort_buffer_size is a connection-level parameter. When this buffer is required for each connection for the first time, the configured memory is allocated at a time.
2 sort_buffer_size is not as large as possible. Because it is a connection-level parameter, excessive settings + high concurrency may exhaust system memory resources.
3 It is said that when sort_buffer_size exceeds 2 MB, mmap () instead of malloc () will be used for memory allocation, resulting in reduced efficiency.
This article permanently updates the link address: