We have already talked about the basic understanding of fast sequencing in the previous article, see Http://www.cnblogs.com/curo0119/p/8588565.html, and then let's take a closer look at the optimization of the quick row.
1, the basic idea of fast sorting:
Quick Sort using the idea of divide and conquer, the waiting sequence is divided into two parts by a sort of order, and some of the recorded keywords are smaller than the keywords recorded in the other part. The two parts of the records are then sequentially sorted to achieve the order of the entire sequence.
2. Three steps to quickly sort:
(1) Select Benchmark: In a pending sequence, pick an element in a certain way, as a "datum" (pivot)
(2) Split operation: divides the sequence into two sub-sequences with the actual position of the datum in the sequence. At this point, the elements on the left side of the datum are smaller than the datum, and the elements to the right of the datum are larger than the datum
(3) Quickly sort two sequences recursively, until the sequence is empty or there is only one element.
3. How to select a benchmark
For divide-and-conquer algorithm, when each partition, if the algorithm can be divided into two equal-length sub-sequence, then the efficiency of the divide-and-conquer algorithm will reach the maximum. In other words, the selection of benchmarks is important. The selection of the datum determines the length of the two two sub-sequences, which has a decisive effect on the efficiency of the whole algorithm.
Ideally, the chosen benchmark would be able to divide the ordered sequence into two equal-length sub-sequences
We introduce the methods of three selection benchmarks
Method (1): Fixed position
Thought: Taking the first or last element of a sequence as a benchmark, which is what we said in the last article, but it's always been a bad way to handle it.
If the input sequence is random, the processing time can be acceptable. if the array is already in order , the segmentation at this point is a very bad one. Because each partition can only be ordered to reduce the sequence of one , fast sorting into a bubble sort , the time complexity of θ (n^2), at this time is the worst case.
Method (2): Randomly selected datum
Reasons for Introduction: In order to arrange the sequence is partially ordered, fixed selection of pivot to make the efficiency of the fast row, to alleviate this situation, the introduction of a randomly selected pivot
Thought: Take any element in the ordered sequence as a benchmark
/* Randomly select the position of the pivot, between low and high, * /int selectpivotrandom (int arr[],int low,int high) { //create pivot position Srand ((unsigned) time (NULL)); int pivotpos = rand ()% (high-low) + low; The elements of the pivot position and the low position elements are exchanged, at this time can be called the same as the normal fast line partition function swap (Arr[pivotpos],arr[low]); return arr[low]; }
Since the position of the pivot is random, the resulting segmentation will not always result in inferior segmentation. when the entire array number is equal, the worst case is still, and the time complexity is O (n^2). In fact, the probability of a stochastic quick ordering to get a theoretical worst case is only 1/(2^n). So randomization fast sequencing can achieve the desired time complexity of O (NLOGN) for the vast majority of input data.
Method (3): Three count (Median-of-three)
Reason for introduction: Although the probability of bad segmentation is reduced when the pivot is selected randomly, the worst case or O (n^2), to alleviate this situation, introduced the three-digit pick pivot
The best division is to divide the sequence to be sorted into equal-length subsequence, the best state we can use the middle value of the sequence, that is, the number of N/2. However, this is difficult to figure out, and will significantly slow down the speed of sorting. The median estimate can be obtained by randomly selecting three elements and using their median value as the pivot element. In fact, randomness does not help much, so the general practice is to use the median value of the three elements on the left, right, and center positions as pivot elements. It is clear that the three-digit median split method eliminates the bad situation of pre-sorted inputs and reduces the number of comparisons of the fast rows by approximately 14%
Example: The sequence to be sorted is: 8 1 4 9 6 3 5 2 7 0
The left is: 8, the right is 0, the middle is 6.
We take three numbers here to sort the middle number as the pivot, then the pivot is 6
Note: When selecting the middle axis value, it can be selected from the middle left and right three medium selection to five elements or more elements, in general, there will be (2t+1) mean partition method (median-of-(2t+1), three mean partitioning method English is Median-of-three).
Specific idea: treat the data in low, mid, and high three positions in the sort sequence, take the data in the middle of them as a pivot, and store the pivot with 0 subscript elements.
That is, using three-digit, and 0 subscript elements to store the pivot.
/* Function: Take the data in low, mid, and high three positions in the sorted sequence, select the data in the middle of them as pivot */ int selectpivotmedianofthree (int arr[],int low,int High) { int mid = low + ((high-low) >> 1);//Calculate the subscript of the element in the middle of the array //use three-digit method to select the pivot if (Arr[mid] > Arr[high])//target : Arr[mid] <= Arr[high] { swap (Arr[mid],arr[high]); } if (Arr[low] > Arr[high])//target: Arr[low] <= Arr[high] { swap (Arr[low],arr[high]); } if (Arr[mid] > Arr[low])//target: Arr[low] >= Arr[mid] { swap (Arr[mid],arr[low]); } At this time, Arr[mid] <= Arr[low] <= Arr[high] return arr[low]; The position of low to save these three positions in the middle of the value //split can be directly using the low position element as a pivot, without changing the Split function }
Test data:
Test data analysis: Using the three-digit selection pivot advantage is still obvious, but it's still not working with duplicate arrays
Other optimizations:
Optimization 1. When the length of the sequence to be sorted is split to a certain size (when the element is less), sort by insertion
Cause: for very small and partially ordered arrays, the quick row is better than the interpolation . When the length of the sequence to be sorted is split to a certain size, the efficiency of continuing the split is worse than the insert sort, where you can use the interpolation instead of the fast
Cutoff range: The sequence length to be sorted n = 10, although a similar result can be produced in any cutoff range between 5~20, this practice also avoids some harmful degradation scenarios.
if (high-low + 1 <) { insertsort (arr,low,high); return; } else, a fast row is executed normally
for a random array, using the three-digit selection pivot + interpolation, the efficiency can be improved a bit, really for the sorted array, there is no use . Because the ordered sequence is ordered, each partition can only subtract one from the sorted sequence. At this point, the interpolation is not playing a role. So there is no time to see the reduction. In addition, the three-digit selection pivot + interpolate or cannot handle the repeating array
Optimization 2, at the end of a split, you can put the elements equal to the key together, continue the next split, no longer the same as the key element split
Example:
Sequence to sort 1 4 6 7 6 6 7 6 8 6
Select Pivot in three-count: Number of subscript 4 6
After conversion, the sequence to be split: 6 4 6 7 1 6 7 6 8 6
Pivot Key:6
After this partition, the result of being equal to the key element is not processed: 1 4 6 6 7 6 7 6 8 6
The next two sub-sequences are: 1 4 6 and 7 6 7 6 8 6
The result of equality with key elements after this division: 1 4 6 6 6 6 6 7 8 7
The next two sub-sequences are: 1 4 and 7 8 7
After comparison, we can see that, after a division, the elements equal to the key together, can reduce the number of iterations, efficiency will improve a lot
Process: In the process, there will be two steps
The first step, in the partitioning process, puts the key equal element into the two ends of the array
The second step, after dividing, moves the element equal to the key around the pivot.
Example:
Sequence to sort 1 4 6 7 6 6 7 6 8 6
Select Pivot in three-count: Number of subscript 4 6
After conversion, the sequence to be split: 6 4 6 7 1 6 7 6 8 6
Pivot Key:6
The first step, in the partitioning process, puts the key equal element into the two ends of the array
Results: 6 4 1 6 (Pivot) 7 8 7 6 6 6
At this point, all elements equal to 6 are placed at both ends.
The second step, after dividing, moves the element equal to the key around the pivot.
Results: 1 4 66 (pivot) 6 6 6 7 8 7
At this point, all elements equal to 6 are moved around the pivot.
After that, the 1 4 and 7 8 72 sub-sequences are in a fast row
void QSort (int arr[],int low,int high) {Int. first = low; int last = high; int left = low; int right = high; int leftlen = 0; int rightlen = 0; if (high-low + 1 <) {Insertsort (Arr,low,high); Return }//Split int key = Selectpivotmedianofthree (Arr,low,high),//select pivot while using three-digit method (Low < high {while (High > Low && Arr[high] >= key) {if (arr[high] = = key)//process equal element {swap (Arr[right],arr[high]); right--; rightlen++; } high--; } Arr[low] = Arr[high]; while (High > Low && arr[low] <= key) {if (arr[low] = = key) { Swap (Arr[left],arr[low]); left++; leftlen++; } low++; } Arr[high] =Arr[low]; } Arr[low] = key; A quick line ends//moves the same element as the pivot key to the pivot final position around int i = low-1; int j = First; while (J < left && arr[i]! = key) {swap (arr[i],arr[j]); i--; j + +; } i = low + 1; j = Last; while (J > right && arr[i]! = key) {swap (arr[i],arr[j]); i++; j--; } QSort (Arr,first,low-1-Leftlen); QSort (Arr,low + 1 + rightlen,last); }
Test data analysis: three-digit selection pivot + interpolation + aggregation of equal elements of the combination, the effect is surprisingly good.
Cause: In an array, if there are equal elements, then a lot of redundant partitioning can be reduced. This is particularly evident in the repeating array.
In fact here, the role of Plug and Play is not big.
Optimization 3: Optimize recursive operations (loop stack, reduce exceptions)
The fast-line function has two recursive operations at the end of the function, and we can use the tail-recursive optimization
Advantages: If the sequence to be sorted is extremely unbalanced, the depth of recursion will approach N, and the size of the stack is very limited, each recursive call will cost a certain amount of stack space, the more parameters of the function, the more space each time the recursion cost. After optimization, the stack depth can be reduced and the original O (n) is reduced to O (Logn), which will improve performance.
void QSort (int arr[],int low,int high) { int pivotpos =-1; if (high-low + 1 <) { insertsort (arr,low,high); return; } while (Low < high) { Pivotpos = Partition (Arr,low,high); QSort (arr,low,pivot-1); Low = pivot + 1; } }
Note: After the first recursion, low is useless, at which point the second recursion can use the loop instead of
Summary: The most efficient quick-row combination is: three-digit + interpolation + aggregation equal element, which is more efficient than the sort function in STL
Optimization 4: Using parallel or multithreaded processing of sub-sequences (slightly)
This article was reproduced in: http://blog.csdn.net/insistgogo/article/details/7785038
Three quick-sort and quick-sort optimizations