Improvement of quick sorting

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I forgot where I saw this article. It feels good. Let's look at it. It is said that the sort in STL uses the fast sorting + insert sorting, in the worst case, the time complexity is also O (nlgn ).
Reference:
Quick sorting is an important Sorting Algorithm Based on the partitioning technology. It has been widely noticed by researchers since its invention. Over the years, many improvements have been made to this basic algorithm. I have collected and consulted some relevant materials and will introduce these improvements below.

I. Basic quick sorting algorithms
The quick sorting algorithm was invented by C. A.R. Hoare in 1961. Its idea is as follows:
First, select a central axis value in sequence a to be sorted, and then partition a into two parts. The elements in Part B on the left are smaller than or equal to the central axis value, the elements of Part c on the right are greater than or equal to the central axis value (points ). Then, the quick sorting process is called recursively to sort the two parts separately ). Finally, combine the results of the two parts to obtain the final sorting sequence ).
When sorting n numbers, it will make a comparison of nlogn times on average, and in the worst case, it will need to compare (n ^ 2) times. However, in fact, the fast sorting algorithm is significantly faster than other nlogn algorithms, mainly because its inner loop can be effectively implemented in most designs, in addition, this algorithm can be designed based on actual data to minimize the probability of occurrence in the worst case.

Ii. Improvements to the quick Sorting Algorithm
1. Three-way average score method
The simplest description of this improvement is probably like this: Unlike the general fast sorting method, it does not select the first number of arrays to be arranged as the central axis, the center value of the three elements to be arranged on the leftmost, rightmost, and rightmost of the array is used as the central axis. This improvement has two main advantages for the original fast Sorting Algorithm [1]:
(1) First, it reduces the probability of occurrence of the worst case.
(2) second, to prevent cross-border comparison of time groups, a record should be set at the end. If the element in the middle (that is, the central axis) is exchanged with the second element in the rightmost number during partition sorting, therefore, we can omit the comparison with this whistle value.
There are also different sayings about this improvement, or there are further improvements regarding this improvement. In the continuous improvement, we do not only need to select a better central axis to compare the three elements on the left and right, it also sorts the three numbers in order and then places them back to the array to be sorted, in this way, we can ensure that the length of the n-length array after the partition is the maximum length of the sub-partition for The N-2, rather than the original n-1. This technique can reduce the algorithm running time by about 5% [9].
The three-way average score method can be further expanded. When selecting the central axis value, you can choose to expand from the middle left to the Right to five elements or more elements. Generally, there will be the (2 t + 1) Average partition method (median-of-(2 t + 1), and the three average partition method is median-of-three in English ). In [9], there is a detailed analysis on the improvement of the (2 t + 1) average score area method. However, the article is long and difficult to read, so I read it at the beginning. In the face of the three average partition method also made a detailed analysis, and made a theoretical estimation, its average complexity is, it is less than the average complexity of the general fast sorting algorithm mentioned above [9].

2. Adjust the algorithm based on the partition size
This improvement is aimed at the weakness of the Quick Sort Algorithm. Fast sorting is not very good for small-scale datasets. Some people may think that this disadvantage can be ignored, because most sorting only needs to consider large-scale adaptability. However, the fast sorting algorithm uses the Partitioning technology. In the end, large datasets must be divided into small datasets for processing. The improvement is that when the dataset is small, you do not have to call the quick sorting algorithm recursively, but instead call other sorting algorithms with strong processing capabilities for small-scale datasets. [7] Introsort is an algorithm that uses a fast Sorting Algorithm for sorting. When recursion reaches a certain depth, it is changed to heap sorting for processing. In this way, the complex central axis selection of fast sorting in small-scale dataset processing is overcome, and the complexity of O (n log n) in the worst case is ensured. [8]
Another optimization improvement is that when the partition size reaches a certain hour, the quick sorting algorithm is stopped. That is to say, the final product of the quick sorting algorithm is an ordered series completed by "almost" sorting. Some elements in the sequence are not ranked in the final sequence, but there are not many such elements. You can use the insert Sorting Algorithm to sort the "almost" sorted series to complete the sorting process. This is because insertion sorting is close to linear complexity for such "almost" sorted series. This improvement proves to be more effective than the continuous use of the fast sorting algorithm.
Another Improved Method for fast sorting is to recursively sort sub-partitions and always select the smallest partition with priority. This option can make more effective use of the bucket to accelerate algorithm execution as a whole. [7]

3. consider different partition schemes
For the fast sorting algorithm, a large amount of time is actually spent on partitions, so a good partition implementation is very important. Especially when all element values to be partitioned are equal, the general fast Sorting Algorithm falls into the worst case, that is, the same element is repeatedly exchanged and the worst central axis value is returned. Any dataset contains many identical elements, which is a serious problem because many "bottom layer" partitions are identical.
An Improvement Method for this situation is to divide the partition into three blocks instead of the original two: one is all elements smaller than the central axis value, and the other is all elements equal to the central axis value, the other is all elements greater than the central axis value. Another simple improvement method is that when the partition is complete, if the leftmost and rightmost element values are found to be equal after the partition is complete, other sorting algorithms are used instead of recursive calling.

4. fast parallel sorting [4] [6]
Because the fast sorting algorithm is implemented using the Partitioning technology, it makes it easy to process in parallel on multiple processing machines.
In most cases, it takes much longer to create a thread than to compare and exchange two elements. Therefore, A parallel algorithm for fast sorting cannot create a new thread for each partition. Generally, a threshold value is set in the implementation code. If the number of elements in the partition is greater than the threshold value, a new thread is created to process the sorting of the partition, otherwise, recursive calls will be made to sort the data. [4] [6]

The quick sorting algorithm has also been improved. The main problem of this algorithm is that this step of partitioning is always completed before the subsequence is processed in parallel, which limits the degree of parallelism of the entire algorithm. The solution is to process partitions in parallel. The improved parallel fast sorting algorithm uses 2n pointers to process partitions in parallel, thus increasing the degree of parallelism of the algorithm.

Iii. Summary
In general, the improvement of the quick sorting algorithm is mainly focused on three aspects [1]:
1. Select a better central axis Value
2. Adjust the algorithm based on the generated subpartition size
3. Different partitioning Methods
This article mainly introduces the first two aspects, and the third is a brief introduction because I have not found enough relevant information. In addition, this article also introduces parallel fast algorithms and introduces possible improvements to the fast Sorting Algorithm from another aspect.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More