Big talk data structure Chapter 1 sorting 9th fast sorting (II)

Last Update:2018-12-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

9.9.4 Quick Sort Optimization
There are still a lot of improvements to the fast sorting mentioned just now. Let's look at some optimization solutions.

1. Optimize selection Pivot
If the selected tkey is in the middle of the entire sequence, we can divide the entire sequence into a decimal set and a large number set. But note that what I said just now is "If ...... Is the intermediate number. What if the selected tkey is not the intermediate number? For example, the array {9, 1, 5, 8, 3, 7, 4, 6} That is used for bubble and simple selection and sorting are described in the previous section, and the code line 4th "repeated tkey = L-> r [low]; we should select 9 as the first pivot tkey. At this time, after a round of "Partition = Partition (L,);" conversion, it only replaces the positions 9 and 2, and returns 9 to the Partition, the entire series has not changed substantially. 9-9-8.

That is to say, line 1 of the code "repeated tkey = L-> r [low];" becomes a potential performance bottleneck. The speed of sorting depends on L. the keyword of r [1] is in the position of the entire sequence. r [1] is too small or too large, which will affect performance (for example, 50 in the first example is an intermediate number, and 9 in the second example is a large number relative to the entire sequence ). In reality, the series to be sorted are very likely to be basically ordered.Fixed SelectionThe first keyword (in fact, no matter which position the keyword is fixed to be selected) as the first pivot becomes extremely unreasonable.
Improvement Method, someone suggested that we should randomly obtain a number rnd between low and high so that its keyword L. r [rnd] and L. r [low] switching is not easy at this time. This is calledRandom selectionPivot method. It should be said that, to some extent, this solves the performance bottleneck for fast sorting of basic ordered sequences. However, there is a sense of random impact. What should I do if the hit fails and the keyword is still small or large at random?
And then improve it.Median-of-three Method.That is, sort the three keywords first, and use the intermediate number as the pivot. Generally, the numbers are left, right, and center.Or you can select them randomly. In this way, at least the intermediate number will not be the smallest or the largest number. In terms of probability, it is very unlikely that the three numbers are the smallest or the largest, therefore, the possibility of intermediate digits in a relatively intermediate value is greatly increased. Because the entire sequence is in an unordered state, it is actually the same to randomly selecting three numbers and getting three numbers from the left and right sides. Moreover, the random number generator itself will incur time overhead, so random generation is not considered.
Let's take a look at the implementation code with three numbers: left, right, and middle. We will add such code between lines 3rd and 4th of the Partition Function Code.

3 int unique tkey;

Int m = low + (high-low)/2;/* calculate the subscript of the element in the middle of the array */
If (L-> r [low]> L-> r [high])
Swap (L, low, high);/* swap left and right data to ensure that the left side is small */
If (L-> r [m]> L-> r [high])
Swap (L, high, m);/* swap the middle and right data to ensure that the middle is small */
If (L-> r [m]> L-> r [low])
Swap (L, m, low);/* Exchange intermediate and left data to ensure that the left side is small */

/* At this time, L. r [low] is the median of the three left and right keywords in the entire sequence. */
4. Repeated tkey = L-> r [low];/* use the first record of the sub-table as the pivot record */

Suppose we compare the values of {9, 1, 5, 8, 3, 7, 4, 6, 2} to the array in the left 9, Center 3, and right 2, so that L. r [low] = 3, must be more reasonable than 9 and 2.
In the case of a small array, there is a high probability to select a better tkey. However, for a very large sequence to be sorted, it is not enough to ensure that a good tkey can be selected, therefore, there is another method called "median-of-nine", which is to take three samples from the array first, and take three samples each time, then, take one of the three numbers as the pivot. Obviously, this ensures that the obtained tkey is a keyword close to the intermediate value. If you are interested, you can implement the Code by yourself.

2. Optimize unnecessary exchanges
Figure 9-9-1 ~ Figure 9-9-6 shows that the position of the keyword "50" is changed from 1 to 9 to 3 to 6 to 5. In fact, its ultimate goal is 5, the exchange is not required. Therefore, we can optimize the code of the Partition function.

/* Quick Sort optimization algorithm */
Int Partition1 (SqList * L, int low, int high)
{
Int foreign tkey;
// Here, the code for getting three numbers is omitted.
Repeated tkey = L-> r [low];/* use the first record of the sub-table as the pivot record */

L-> r [0] = pivotkey;/* back up the pivot keyword to L-> r [0] */
While (low {
While (low High --;
L-> r [low] = L-> r [high];/* replace instead of swap */
While (low Low ++;
L-> r [high] = L-> r [low];/* replace instead of swap */
}

L-> r [low] = L-> r [0];/* replace the pivot value with L. r [low] */
Return low;/* return the Pivot Position */
}

Pay attention to the changes in the highlighted part of the code. We actually backed up the tkey to L. r [0], and then when swap was previously used, only the replacement work will be done. Finally, when low and high meet, that is, when the Pivot Position is found, then L. the value of r [0] is assigned back to L. r [low]. This is because the data exchange operation is missing multiple times, and the performance has been partially improved. 9-9.

3. Optimize the sorting scheme for small arrays
For a mathematical scientist and doctoral advisor, he can overcome world-wide problems and cultivate the best doctor of mathematics, but asked him to teach the pupils the arithmetic course "1 + 1 = 2, it may not be better than a math teacher who has been working in primary school all the year round. In other words, small applications sometimes become useless. I just talked about the solution to a very large array. In the opposite case, if the array is very small, in fact, quick sorting is not as good as directly inserting sorting (directly inserting is the best performance in simple sorting ). The reason is that recursive operations are used for quick sorting. When a large amount of data is sorted, this performance impact is negligible compared to its overall algorithm advantages, however, if the array has only a few records to be sorted, this becomes a big problem for mosquitoes. Therefore, we need to improve the QSort function.

# Define MAX_LENGTH_INSERT_SORT 7/* array length threshold value */
/* Sort the subsequence L. r [low... high] in sequence table L quickly */
Void QSort (SqList & L, int low, int high)
{
Int timeout;
If (high-low)> MAX_LENGTH_INSERT_SORT)/* fast sorting when high-low is greater than the constant */
{
Partition = Partition (L, low, high);/* Split L. r [low... high] into two parts and calculate the pivot value */
QSort (L, low, semi-1);/* recursively sorts low subtables */
QSort (L, percentile + 1, high);/* recursively sorts high subtables */
}
Else/* sort by insert directly when high-low is less than or equal to a constant */

InsertSort (L );
}

We have added a judgment that when high-low is not greater than a constant (we have some data that 7 is more appropriate, and some think that 50 is more reasonable, and the actual application can be adjusted as appropriate ), directly Insert the sorting to maximize the advantages of the two sorting methods.

4. Optimize recursive operations
As you know, recursion has a certain impact on performance. The QSort function has two recursive operations at the end of the function. If the sequence to be sorted is extremely unbalanced, the recursive depth will approach n, rather than the log2n In the equilibrium, which is not just a matter of speed. The stack size is limited. Each recursive call consumes a certain amount of stack space. The more function parameters, the more space each recursion consumes. Therefore, if recursion can be reduced, the performance will be greatly improved.
So we implemented tail recursion Optimization for QSort. Let's look at the code.

/* Sort the subsequence L. r [low... high] in sequence table L quickly */
Void QSort1 (SqList * L, int low, int high)
{
Int timeout;
If (high-low)> MAX_LENGTH_INSERT_SORT)
{
While (low {
Break = Partition1 (L, low, high);/* L. r [low... high] split into two parts and calculate the pivot value */
QSort1 (L, low, semi-1);/* recursively sorts low subtables */
Low = Lower + 1;/* tail recursion */
}
}
Else
InsertSort (L );
}

When we change if to while (see the highlighted Code Section), because after the first recursion, the variable low is useless, so we can assign Limit + 1 to low. After recycling, for a Partition (L, low, high), the effect is equivalent to "QSort (L, latency + 1, high );". The results are the same, but the stack depth can be reduced by iterative instead of recursive methods, thus improving the overall performance.
In practical applications, for example, C ++, java, PHP, C #, VB, and Javascript all implement fast sorting algorithms. The implementation methods are slightly different, however, it is basically a reflection of our spirit on the basis of the fast sorting.

The sorting algorithms we have learned are classified and named by implementation methods, such as simple sorting by choice, direct insertion sorting, and Merge Sorting. They are named by their sorting method in comparison with the real world, for example, Bubble sorting, heap sorting, and name by name, such as Hill sorting. However, the sorting we just talked about is named by "quick", which means that as long as someone finds a better sorting method, the "quick" name will be invalid. However, at least today, after many optimizations, TonyHoare's fast sorting method remains the king of sorting algorithms in terms of overall performance. We should study and master it.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Big talk data structure Chapter 1 sorting 9th fast sorting (II)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Big talk data structure Chapter 1 sorting 9th fast sorting (II)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support