Select Sorting Algorithm Summary
Select the maximum number or minimum number of algorithms to select the maximum number or minimum number of code to achieve the selection of the maximum number and minimum number of algorithms to select the maximum number and minimum number of code to achieve the selection of the maximum number and minimal code optimization quick selection algorithm analysis quick selection algorithm coding implementation quick selection algorithm code optimization BFPRT selection algorithm principal component selection BFPRT selection algorithm performance analysis BFPRT selection algorithm code implementation
Note: All the code in this article is here
Select Algorithm
Select Algorithm
Is used to selectk
Big Data problems. There are many design methods to select an algorithm. For example, you can sort the data first and then extract the correspondingk
The average running time of this algorithm is O (nlogn) It is neither slow nor fast. We will introduce a linear time O (n) You can complete the selected algorithm.
Before entering the subject content, we will introduce several terms:
Sequence statistics: No K The order statistics are contained in N Find K Large elements. For example, the minimum value is the first sequence statistic, and the maximum value is N Order Statistics
Median: The median is the number in the middle of the sequence. Number of sequence elements N Base, the median is (N + 1)/2 Sequence statistics. If it is an even number, there are two medians. One is the lower median. N/2 , One is the upper median. (N + 2)/2 Select the maximum or minimum number of Algorithms
Selecting the maximum or minimum number in a sequence is the simplest case in the algorithm, that is, selecting the first sequence statistic or the last sequence statistic. You only need to traverse the array and select the maximum or minimum value. The running time is Round (n) , You can select the Maximum/minimum number within the linear time.
Code implementation by selecting the maximum or minimum number
The code for selecting the minimum value in the array is implemented here. Readers can write the algorithm for selecting the maximum value by themselves.
/*** Find the smallest element * @ param array input array * @ param arraySize array size * @ param minNumber output minimum value * @ return minimum position in the array */size_t findMin (int array [], int arraySize, int * minNumber) {if (array = NULL | arraySize <= 0 | minNumber = NULL) return-1; int minPos =-1; int minNumberTemp = INT_MAX; for (int I = 0; I <arraySize; ++ I) {if (array [I] <minNumberTemp) {minNumberTemp = array [I]; minPos = I ;}* minNumber = minNumberTemp; return minPos ;}
Running result:
Input array is:
48 18 97 27 13 85 8 38 95 31
Find the min number 8 at pos 7
We can see from the code thatfor
Cyclic Operationn
Time, each time a comparison is performedif(array[i] < minNumberTemp)
If the minimum value we mark is greater than the current array element, the minimum value of the current array element will be remarked. Because this code is relatively simple, I will not repeat it here.
Select the maximum number and minimum number of Algorithms
The condition is changed. Now you need to select the maximum and minimum numbers in a sequence. This is different from the maximum or minimum number as described above. All you need to do is select the maximum value.OrMinimum, but now we wantAt the same timeSelect the maximum valueAndMinimum value.
When I saw this question for the first time, the author compared it to the case where only the minimum number is selected. Isn't that the same? Just add a comparison of the maximum number in the loop? This is indeed true. Let's take a look at some code implementations.
Code implementation by selecting the maximum and minimum numbers
/*** Find the smallest element * @ param array input array * @ param arraySize array size * @ param minNumber output minimum value * @ return minimum value position in the array */MinMaxPair findMinMax (int array [], int arraySize, int * minNumber, int * maxNumber) {/** omitting some code */for (int I = 0; I <arraySize; ++ I) {if (array [I] <minNumberTemp) {minNumberTemp = array [I]; minPos = I;} if (array [I]> maxNumberTemp) {maxNumberTemp = array [I]; maxPos = I ;}/ ** some code is omitted */}
Here, two comparisons are performed in a loop, so the running time is 2n Although it is also done in linear time, the overhead of constant items is significantly increased.
Code optimization with maximum and minimum numbers selected
However, we can still optimize this algorithm. Each time we select two elements, we first compare the two elements and compare the smaller elements with the minimum number of tags, if the number of small users is smaller than the minimum number, replace the minimum number and compare the number of large users with the maximum number. If the number of large users is greater than the maximum number, replace the maximum number. N/2 Times, compare three times per time, so the running time is 3n/2 It saves 25% of the running time than the code before optimization.
/*** Record the position of the maximum and minimum values in the array */class MinMaxPair {public: MinMaxPair (int _ minPos =-1, int _ maxPos =-1 ): minPos (_ minPos), maxPos (_ maxPos) {}size_t minPos; // the minimum value is size_t maxPos in the array; // The Position of the maximum value in the array is bool operator = (const MinMaxPair & pair) {return (this-> minPos = pair. minPos & this-> maxPos = pair. maxPos) ;}};/** find the maximum and minimum values in an array */MinMaxPair findMinMax (int array [], int arraySize, int * minNumber, int * maxNumber) {if (array = NULL | arraySize <= 0 | minNumber = NULL | maxNumber = NULL) return MinMaxPair ();/** set an odd number of elements, take the first element as the initial maximum and minimum values */int maxNumberTemp = array [0]; int minNumberTemp = array [0]; size_t maxPos =-1; size_t minPos =-1; int I = 1; if (arraySize % 2 = 0) // a total of an even number of elements. The first two elements are extracted, and the big one is used as the initial value of the maximum value, small Initial value as the minimum value {I = 2; // compare the first two elements of the array maxNumberTemp = array [0]; minNumberTemp = array [1]; maxPos = 0; minPos = 1; if (array [0] <array [1]) {maxNumberTemp = array [1]; minNumberTemp = array [0]; maxPos = 1; minPos = 0 ;}} (; I <arraySize; I + = 2) {/** extract two elements each time */int temp1 = array [I]; int temp2 = array [I + 1]; int tempPos1 = I; int tempPos2 = I + 1;/** compare two retrieved elements */if (array [I]> array [I + 1]) {temp1 = array [I + 1]; temp2 = array [I]; tempPos1 = I + 1; tempPos2 = I ;} /** compare the minor with the minimum value of the logo */if (minNumberTemp> temp1) {minNumberTemp = temp1; minPos = tempPos1 ;} /** compare the largest person with the maximum value of the logo */if (maxNumberTemp <temp2) {maxNumberTemp = temp2; maxPos = tempPos2 ;}} // set the output element * maxNumber = maxNumberTemp; * minNumber = minNumberTemp; return MinMaxPair (minPos, maxPos );}
Running result
Input array is:
69 72 82 53 61 35 43 74 83 76
Find the min number 35 at pos 6
Find the max number 83 at pos 9
In the above Code, if the length of the input data is odd, We will select the first element as the initial values of the maximum and minimum elements, and select two elements each time starting from the second element of the array; if it is an even number, take out the first two elements, the big one as the initial value of the maximum value, the small one as the initial value of the minimum value, and start from the third element, each time get two elements
Quick Algorithm Selection
The selection of the maximum and minimum numbers previously discussed are extreme. If you need to select K What are the quick methods for sequential statistics? The first thing that comes to mind is sorting and then selecting. However, the average running time of sorting is O (nlogn) .Quick Algorithm SelectionIt is so slow.
Quick Selection Algorithm Analysis
Remember the quick sorting _ QUICKSORT mentioned earlier. The quick selection algorithm uses the idea of a quick sorting algorithm. Suppose we have an array A [left... right]
1. The first choice is in the array A Select a Principal Component M
2. traverse the Array (from left to right ). M The large element is placed on the right of the principal component, and the element smaller than or equal to the principal component is placed on the left of the principal component. At this time, the position of the principal component in the array is I . So the principal component M Is I Order Statistics
3. Compare the principal component Position I And target sequence statistics K , If I = k Directly return the principal component. M If K , Update Right = I? 1 , To the action (2) to continue running; if K> I , Then update Left = I + 1 , K = I? Left , Go to action (2) to continue running.
In this way, we can find K Target sequence statistics. The expected running time of this operation is O (n)
Quickly select algorithm encoding implementation
Here we use two methods to implement the quick selection algorithm. One is iteration, the other is recursion, and the two algorithms implement the same idea, but the implementation method is different from
Recursive Implementation
/*** Find the k-largest element in the array * @ param array input array * @ param arraySize array size * @ param kthNumber the k-largest element size * @ param k k-large element */void randomizedSelect (int array [], int arraySize, int * kthNumber, int k) {if (array = NULL | arraySize <= 0 | kthNumber = NULL | k <0 | k> = arraySize) return; randomizedSelectKernel (array, 0, arraySize-1, kthNumber, k);}/*** find the k-level element from leftBorder to rightBorder, recursive Function * @ param array input array * @ param leftBorder left boundary * @ param rightBorder right boundary * @ param kthNumber the actual value of the k-th element * @ param k the largest element */void randomizedSelectKernel (int array [], int leftBorder, int rightBorder, int * kthNumber, int k) {if (leftBorder> rightBorder) return; // here the idea of fast sorting is used to complete int I = leftBorder-1; int j = leftBorder; int x = array [rightBorder]; // first find the principal component for (; j <rightBorder; ++ j) {if (array [j] <= x) {exchange (array, j, ++ I) ;}++ I; exchange (array, I, rightBorder ); // now location I is where the principal is to be placed if (I = leftBorder + k-1) * kthNumber = array [I]; else if (I> leftBorder + k-1) randomizedSelectKernel (array, leftBorder, I-1, kthNumber, k); else if (I <leftBorder + k-1) randomizedSelectKernel (array, I + 1, rightBorder, kthNumber, k-(I-leftBorder + 1 ));}
Running result
Input array is:
96 47 95 38 53 45 3 92 20 73
2th max number is -------- 20
3 20 45 38 47 53 73 92 96 95
1th max number is -------- 3
3 20 45 38 47 53 73 92 96 95
3th max number is -------- 38
3 20 38 45 47 53 73 92 96 95
6th max number is -------- 53
3 20 38 45 47 53 73 92 96 95
Iterative implementation
/*** Find the k-largest element in the array * @ param array input array * @ param arraySize array size * @ param kthNumber the k-largest element size * @ param k k-large element */void randomizedSelect (int array [], int arraySize, int * kthNumber, int k) {if (array = NULL | arraySize <= 0 | kthNumber = NULL | k <0 | k> = arraySize) return; int left = 0; int right = arraySize-1; int kTemp = k; while (left <= right) {// use the fast sort thought // first find the principal element int I = left-1; int j = left; int x = array [right]; for (; j <right; ++ j) {if (array [j] <= x) {exchange (array, ++ I, j) ;}}++ I; exchange (array, I, right ); /** the current position I is the principal component Position */if (I = kTemp + left-1) // find the k-th element {* kthNumber = array [I]; return;} else if (I
KTemp + left-1) {right = I-1 ;}}}
Running result:
Input array is:
62 66 70 54 74 98 83 52 80 19
2th max number is -------- 52
19 52 54 62 74 98 83 70 80 66
1th max number is -------- 19
19 52 54 62 66 98 83 70 80 74
3th max number is -------- 54
19 52 54 62 66 98 83 70 80 74
6th max number is -------- 70
19 52 54 62 66 70 74 98 80 83
Quick Selection Algorithm code optimization
Careful readers may have discovered that, when we select an array over and over again, the data has gradually become orderly. The input array at the beginning is62 66 70 54 74 98 83 52 80 19
After the selection is made four times, it is found that the array has been updated19 52 54 62 66 70 74 98 80 83
, Close to order. We know that ordered arrays are fatal to fast algorithms. If you do not optimize the fast algorithms, the fast algorithms will reach the worst running time. O (n2) , Because ordered data will lead to extremely unbalanced grouping for fast sorting.
The same is true in the quick selection algorithm. We should avoid the extremely unbalanced grouping caused by the input of Ordered arrays. Therefore, we made the following optimization. Before making a quick selection, first, select three elements from the header, middle, and tail of the array to find the second largest element among the three elements and exchange with the last element of the array, in this way, we can avoid the situation of extremely unbalanced groups, but it is possible to avoid the situation of extremely unbalanced groups. What we will explain is as follows:BFPRT Selection Algorithm
Balanced grouping
We add the following code before each iteration or recursion.
/** A lot of code is omitted */if (leftBorder> rightBorder) return; // use the fast sorting method here. // int mid = (leftBorder + rightBorder)/2 to avoid the worst case; int midPos = rightBorder; // The mid element is the second largest if (array [leftBorder]> array [mid] & array [mid]> array [rightBorder]) | \ (array [leftBorder] <array [mid] & array [mid] <array [rightBorder]) midPos = mid; // The left element is the second largest else if (array [mid]> array [leftBorder] & array [leftBorder]> arr Ay [rightBorder]) | (array [mid] <array [leftBorder] & array [leftBorder] <array [rightBorder]) midPos = leftBorder; if (midPos! = RightBorder) exchange (array, midPos, rightBorder); int I = leftBorder-1; int j = leftBorder;/** omitted a lot of code */
BFPRT Selection Algorithm
As we have mentioned above, if the selection of the principal component is inappropriate in the quick sorting algorithm, the grouping of the quick sorting algorithm will be extremely unbalanced, which greatly reduces the efficiency of the quick selection algorithm.
1973,Blum
,Floyd
,Pratt
,Rivest
,Tarjan
Together, we released an article named"Time bounds for selectionThis paper gives a way to select K The average complexity of large elements is O (n) Algorithm, commonly known"Median algorithm". This algorithm relies on a well-designed sequence selection method, that is, the median of the median is used as the principal component, which ensures that the linear time complexity can be achieved in the best case, and the average O (nlogn) , Worst O (n2) Complex quick Sorting Algorithm
BFPRT selection algorithm principal component selection
In fact, the most subtle part of this algorithm is the search of the principal component. This algorithm can find a principal component to balance the fast sorting group.
Determine whether the number of elements is greater than five. If yes, jump to step (2). Otherwise, sort the array. If the number of elements is odd, return the median. If it is an even number, returns the following median to group the array. Each element in each group has five elements, at most one array has fewer than five elements to insert and sort each group of elements (when there are few elements, the insertion sorting performance is good) the median in each group is the third of the five elements. For groups with less than five elements, if the number of elements is an odd number, the median is obtained. If the number is an even number, take down the median pair to form a new array and go to step (1) to start to take the median for the retrieved median array.
Vc/QobXE1KrL2KGjPC9wPg0KPGgyIGlkPQ = "bfprt selection algorithm performance analysis"> BFPRT selection algorithm performance analysis
Five elements in each group. After grouping the array, we get N/5 Groups ., Remove less than five elements and include Principal Elements X At least 3 × (n5? 2) ≥3n10? 6 Element is greater X , Similar, at least have 3n10? 6 Element less X So in each group, the two groups are the most unbalanced. 7: 3 So we get the recursive formula. Here, 140 is a constant randomly selected.
T (n) = {O (1) T (n5) + T (7n10 + 6) + O (n), n <140, n ≥ 140
The computing formula shows that the running time is O (n)
BFPRT selection algorithm code implementation
/*** Locate the median in the array * @ param array input array * @ param lefyBorder array left boundary * @ param rightBorder array right boundary const int & arraySize = rightBorder-leftBorder + 1; * @ return coordinates of the median */int BFPRT (int array [], int leftBorder, int rightBorder) {if (array = NULL | leftBorder> rightBorder) return-1; const int & arraySize = rightBorder-leftBorder + 1; // determine whether the number of elements is greater than five if (arraySize <= 5) {insertSort (array, arraySize); r Eturn leftBorder + arraySize/2;} // if the number of elements is greater than five, set the number of five groups to const int & groupSize = 5; int * groupStart = array; int midCount = 0; for (int I = leftBorder + groupSize; I <= rightBorder; I + = groupSize) {insertSort (groupStart, groupSize); exchange (array, leftBorder + midCount, I-3 ); // place the median in front of the array + + midCount; groupStart + = groupSize;} // sort the remaining less than five if (arraySize % groupSize! = 0) {insertSort (groupStart, arraySize % groupSize); exchange (array, leftBorder + midCount, leftBorder + arraySize-arraySize % groupSize + (arraySize % groupSize-1)/2 ); + + midCount;} // all newly selected medians are in the front midCount // return the median return BFPRT (array, leftBorder, leftBorder + midCount-1 );} /*** select the K-th element * @ param array input array * @ param leftBorder left boundary * @ param rightBorder right boundary * @ param k * @ param kthNumber k large Number */void BFPRTselect (int array [], int leftBorder, int rightBorder, int k, int * kthNumber) {if (array = NULL | leftBorder> rightBorder | kthNumber = NULL | k> (rightBorder-leftBorder + 1) return; /** select the principal component */int index = BFPRT (array, leftBorder, rightBorder); if (index =-1) return; cout <"lefy --->" <
"<
Running result:
Before sort the array:
75 84 30 35 77 60 75 32 64 2
Lefy-> 0 right-> 9 index: 0 midNumber: 75
Lefy-> 0 right-> 6 index: 1 midNumber: 32
Lefy-> 0 right-> 1 index: 1 midNumber: 30
2th max number is -------- 30
2 30 32 60 75 64 35 75 84 77
Lefy-> 0 right-> 9 index: 0 midNumber: 32
Lefy-> 0 right-> 1 index: 1 midNumber: 2
1th max number is -------- 2
2 30 32 60 75 64 35 75 84 77
Lefy-> 0 right-> 9 index: 0 midNumber: 32
3th max number is -------- 32
30 2 32 60 75 64 35 75 84 77
Lefy-> 0 right-> 9 index: 0 midNumber: 32
Lefy-> 3 right-> 9 index: 4 midNumber: 84
Lefy-> 3 right-> 8 index: 4 midNumber: 75
Lefy-> 3 right-> 6 index: 5 midNumber: 35
Lefy-> 4 right-> 6 index: 5 midNumber: 75
Lefy-> 4 right-> 5 index: 5 midNumber: 60
Lefy-> 5 right-> 5 index: 5 midNumber: 64
6th max number is -------- 64
2 30 32 35 60 64 75 77 84
After sort the array:
2 30 32 35 60 64 75 77 84
Note: All the code in this article is here