Algorithm-5. Fast sorting and algorithm sorting
Quick sorting may be the most widely used sorting algorithm. The popular reason is that it is easy to implement and suitable for various input data and is much faster than other sorting algorithms in general applications. The notable features of quick sorting include that it is in-situ sorting (only a small secondary stack is required), and the time required to sort arrays whose length is N is proportional to NlgN. The sorting algorithms we have learned cannot combine these two advantages. In addition, the inner loop of quick sorting is shorter than that of most sorting algorithms, which means it is faster theoretically or practically.
1. Basic Ideas
Quick sorting is a Sort Algorithm for grouping. It divides an array into two sub-arrays and sorts the two parts independently. Quick Sort and merge sort are complementary: Merge sort splits the array into two sub-arrays to sort separately, and merges the sub-arrays to sort the entire array; the Quick Sort sorts the arrays by means that when both sub-arrays are ordered, the entire array is naturally ordered. In the first case, recursive calls occur before processing the entire array; in the second case, recursive calls occur after processing the entire array. In Merge Sorting, an array is classified into two halves. In fast sorting, the split position depends on the content of the array.
2. Specific Algorithms
/*** Quick sort ** @ author huazhou **/public class Quick extends Model {public void sort (Comparable [] a) {StdRandom. shuffle (a); // removes the input dependency sort (a, 0,. length-1);} private void sort (Comparable [] a, int lo, int hi) {if (hi <= lo) {return;} int j = partition (, lo, hi); // split the sort (a, lo, J-1); // split the left half of a [lo .. j-1] sort (a, j + 1, hi); // convert the right half of a [j + 1 .. hi] sorting}/*** fast sorting splitting * Splits the array into a [lo .. i-1], a [I], a [I + 1 .. hi] */private int partition (Comparable [] a, int lo, int hi) {int I = lo, j = hi + 1; // left and right scanning pointer Comparable v = a [lo]; // split element // scan left and right, check whether the scan is complete and switch element while (true) {while (less (a [++ I], v) {if (I = hi) {break;} while (less (v, a [-- j]) {if (j = lo) {break ;}} if (I >= j) {break ;} exch (a, I, j);} exch (a, lo, j); // put v = a [j] in the correct position: return j; // a [lo .. j-1] <= a [j] <= a [j + 1 .. hi] achieved }}
The sub-array a [lo .. hi] sorting. Put a [j] in a proper position using the partition () method, and then use recursive calls to sort the elements in other locations.
The key to this method is segmentation. This process makes the array meet the following three conditions:
■ For a j, a [j] has been scheduled;
■ All elements from a [lo] to a [J-1] shall not be greater than a [j];
■ All elements in a [j + 1] to a [hi] are not less than a [j].
We use recursion to call splitting for sorting.
Because the splitting process can always schedule an element, it is not difficult to use induction to prove that recursion can correctly sort the array: If the left and right sub-arrays are ordered, then, the left sub-array (ordered without any element greater than the split element), the split element, and the right sub-array (ordered without any element smaller than the split element) the result array must be ordered.
To complete this implementation, You need to implement the segmentation method. The general policy is to randomly take a [lo] As the splitting element, that is, the element that will be scheduled, then we start scanning right from the left side of the array until we find an element greater than or equal to it, and then start scanning left from the right side of the array until we find an element smaller than or equal to it. These two elements are obviously not scheduled, so we exchange their locations. In this way, we can ensure that the left element of the Left pointer I is not greater than the split element, and the right element of the Right pointer j is not less than the split element. When two pointers meet each other, we only need to swap the split element a [lo] and the element (a [j]) on the far right of the Left subarray and return j. The general process of splitting is as follows.
/*** Fast sort splitting ** splits the array into a [lo .. i-1], a [I], a [I + 1 .. hi] */private int partition (Comparable [] a, int lo, int hi) {int I = lo, j = hi + 1; // left and right scanning pointer Comparable v = a [lo]; // split element // scan left and right, check whether the scan is complete and switch element while (true) {while (less (a [++ I], v) {if (I = hi) {break;} while (less (v, a [-- j]) {if (j = lo) {break ;}} if (I >= j) {break ;} exch (a, I, j);} exch (a, lo, j); // put v = a [j] in the correct position: return j; // a [lo .. j-1] <= a [j] <= a [j + 1 .. hi] achieved}
This code is split according to the value v of a [lo. When pointer I and j meet, the main loop exits. In a loop, we increase I when a [I] is less than v, and when a [j] is greater than v, we decrease j, then, exchange a [I] And a [j] to ensure that the elements on the left side of I are not greater than v, and those on the right side of j are not smaller than v. When the pointer encounters, a [lo] And a [j] are exchanged, and the split ends (the cut value is left in a [j ).
2.1 In-situ splitting
If we use an auxiliary array, we can easily split it, but the overhead of copying the split array back may not be worth the candle. A beginner java programmer may even create an empty array in a recursive splitting method, which greatly reduces the sorting speed.
2.2 do not cross-border
If the split element is the smallest or largest element in the array, we should be careful not to let the scan pointer run out of the boundary of the array. The partition () implementation can be clearly checked to prevent this situation. The test condition (j = lo) is redundant because the split element is a [lo] and cannot be smaller than itself. The right side of the array is the same, and they can be removed.
2.3 maintain randomness
The order of array elements is disrupted. Because the quick sorting algorithm treats all sub-arrays equally, all its sub-arrays are also sorted randomly. This is important for predicting the running time of an algorithm. Another way to maintain randomness is to randomly select a splitting element in partition.
2.4 terminate the cycle
The most common error is that the array may contain other elements with the same value as the split element.
2.5 duplicate split element values
The left-side scan is best to stop when an element greater than or equal to the split element value is encountered. The right-side scan is to stop when an element smaller than or equal to the split element value is encountered. Although it may be unnecessary to exchange some equivalent elements, in some typical applications, it can avoid changing the algorithm runtime to a square level.
2.6 terminate Recursion
For example, a common error in fast sorting is that the splitting element cannot be placed in the correct position, as a result, the Program falls into an infinite recursive loop when the split element is the largest or smallest element of the Child array.
3. Algorithm Analysis
Proposition:Sort the non-repeated arrays with N length, and the average sorting speed is ~ 2NlnN comparison (and 1/6 exchange)
Proof:So that CN is the number of comparisons required to sort N different elements on average. Obviously C0 = C1 = 0. For N> 1, the recursive program can get the following inductive relationship:
CN = N + 1 + (C0 + C1 +... + CN-2 + CN-1)/N + (CN-1 + CN-2 +... + C0)/N
The first is the cost of splitting (always N + 1), the second is the average cost of sorting the left sub array (length may be 0 to N-1, the third item is the average cost of sorting the right sub-array (the same length as the left sub-array. Multiply the left and right sides of the equation by N and sort the items to get:
NCN = N (N + 1) + 2 (C0 + C1 +... + CN-2 + CN-1)
The same equation for subtracting the formula from the N-1 is available:
NCN-(N-1) CN-1 = 2N + 2CN-1
Sort the equation and divide the two sides by N (N + 1) to obtain:
CN/(N + 1) = CN-1/N + 2/(N + 1)
Induction can be obtained:
CN ~ 2 (N + 1) (1/3 + 1/4 +... + 1/(N + 1 ))
The amount in the brackets is the discrete approximate area from 3 to N plus one in the curve 2/x, and the integral is obtained from CN ~ 2 NlnN. Note that 2NlnN ≈ 1.39 NlgN, that is to say, the average number of comparisons is only 39% more than the best case.
4. Summary
Despite the many advantages of quick sorting, its basic implementation still has a potential disadvantage: This program may be extremely inefficient when segmentation is unbalanced. For example, if you split from the smallest element for the first time and from the second smallest element for the second time, each call will only remove one element. This causes a large sub-array to be split many times. The main reason we need to sort arrays randomly before fast sorting is to avoid this situation. It can minimize the possibility of generating bad splitting, so we don't have to worry about it.
In general, it is certain that the running time of the fast sorting algorithm is within the range of a constant factor of 1.39NlgN for an array of N. Merge Sorting can also achieve this, but fast sorting is generally faster (although it is more than 39% times), because it moves data less times. All these guarantees come from mathematical probability. You can trust them.
Tens of millions
Hundreds of millions
5. Algorithm Improvement
If your sort code is executed many times or used in large arrays (especially if it is published as a library function, the attribute of the sort object array is unknown ), the following suggestions are worth your reference. It should be noted that you need to use experiments to determine the improvement effect and select the best parameters for the implementation. Generally, they can improve performance by 20% ~ 30%
Switch to insert sort
Like most recursive sorting algorithms, a simple way to improve the fast sorting performance is based on the following two points:
■For small arrays, quick sorting is slower than insert sorting;
■ Because of recursion, the sort () method of quick sorting also calls itself in a small array.
Therefore, you should switch to insert sorting when sorting small arrays. You can simply change the algorithm to do this: Convert the statements in sort ()
If (hi <= lo) return;
Replace it with the following statement to sort small arrays by insert:
If (hi <= lo + M) {Insertion. sort (a, lo, hi); return ;}
The optimal value of the conversion Parameter M is related to the system, but 5 ~ Any value between 15 is satisfactory in most cases.
【Source code download]