"Algorithm (fourth edition)" Sorting-----high-speed sequencing

Source: Internet
Author: User
Tags benchmark comparable sorts

References article:?? http://ahalei.blog.51cto.com/4767671/1365285


1. Concept

High-speed sorting, listen to this name can think of it sort fast. It is a sort of in-place (requires only a very small secondary stack, note not an array). And the time required to sort the array of length n is proportional to the NLGN

The disadvantage is: very fragile. In the implementation must pay attention to a few small details (detailed below), talent to avoid errors.


2. Basic ideas:

Randomly find a number (usually the first data of the array). Insert it into a position. Makes it smaller than the number on the left. Its right side of the data is larger than it, so that an array is divided into two sub-arrays, and then the same way the array is divided into smaller sub-arrays, until it can not be decomposed. It is also a classical experiment of dividing thought (merge sort also)


3. The difference between high speed and merge sort:

(1) Merge sort divides the array into two sub-arrays, and then sorts them separately. And the ordered sub-arrays are merged to sort the entire array;

????? The high-speed sort sorts the array in the same way that the entire array is naturally ordered when two sub-arrays are ordered.

(2) Recursive call to merge sort occurs before the entire array is processed

? ? ? ? Recursive invocation of high-speed sequencing occurs after the entire array is processed


4. Illustrative examplesLet's say we're sorting 6, 1, 2, 7, 9, 3, 4, 5 10? 8 "This 10 number. First, in this sequence, look for a number as the base number (do not be frightened by this noun, it is a number to take the picture.) You'll know what it's used to do later.) For convenience. Let's get the first number 6 as the base number. Next, you need to put all the numbers in this sequence that are larger than the base number on the right side of 6, the number smaller than the base number on the left side of 6, similar to the following arrangement. ??? 3? 1? 2 5? 4? 6?? 9 7? 8
???? in the initial state. The number 6 is the 1th bit in the sequence. Our goal is to move 6 somewhere in the middle of the sequence, assuming that this position is K. Now it is necessary to look for this k, and to the K-bit as the demarcation point, the number on the left is less than or equal to 6, the right number is greater than or equal to 6.

Think about it, do you have a way to do that?
Let me give you a hint. Please review the bubble sort, how to pass "swap", step by step to make each number return to the bit. At this point you can also use the "Exchange" method to achieve the goal. What is the details of how to exchange the steps? How can exchange be convenient and save time? Don't rush to look down, take out a pen, draw on the paper to see.

When I first learned about the bubble sorting algorithm in high school, I thought bubble sort was a waste of time, and it was obviously unreasonable to compare the number of adjacent two numbers at a time. So I thought of a way. Later only to know that this is the "high-speed sequencing." Please agree to my little narcissism (^o^).

6? 1? 2 7" 9? 3? 4? 5.8 "Both ends begin to detect." First from right toward left find a less than 6 left toward right find a greater than 6

Here you can use two variables I and J, pointing to the leftmost and rightmost of the sequence, respectively. We have a nice name for these two variables "Sentinel I" and "Sentinel J".

Just at the beginning let Sentinel I point to the leftmost (ie i=1) of the sequence , pointing to the number 6. Let Sentinel J Point to the far right of the sequence (that is, j=10). Point to the number 8.


? ? ? First Sentinel J started out. Since the number of bases set here is the leftmost number, Sentinel J needs to be dispatched first. This is important (think about why). Sentinel J moves to the left (ie, j--) step at a pace. Until a number less than 6 is found to stop.

Next, Sentinel I moves to the right (ie i++) step-by-step. Until a number greater than 6 is found to stop. The Last Sentinel J stopped in front of the number 5. Sentinel I stopped in front of the number 7.





? ? ? Now exchange the values of the elements pointed to by Sentinel I and Sentinel J. The sequence after the interchange is for example the following.

???? ? 6? 1? 2? 5?? 9 3? 4? 7?? Ten? 8


At this time, the first exchange ended.

Next, Sentinel J continues to move to the left (again, it must be the Sentinel J to start each time).

He found 4 (smaller than the base number 6, met the requirements) and stopped. Sentinel I also continued to move to the right, and he found 9 (larger than the base number 6 to meet the requirements) and then stopped.

At this point the interchange is exchanged again, followed by a sequence such as the following.

????? 6? 1? 2 5? 4?? 3? 9?? 7 10? 8
???? The second exchange ends. "Probing" continues.

Sentinel J continues to move to the left. He found 3 (smaller than the base number 6, met the requirements) and then stopped. Sentry I continues to move to the right, bad. At this point Sentinel I and Sentinel J met, Sentinel I and Sentinel J all walked to 3.

This means "probing" ends. We swapped the benchmark number 6 and 3. The sequence after the interchange is for example the following.

3?? 1 2? 5? 4? 6?? 9 7? 10? 8
, ???? to this first round of "probing" really ended.

At this point the base number 6 is the cutoff point, and the number of the left of 6 is less than or equal to 6. 6 The number on the right is greater than or equal to 6. Look back at the process just now. In fact, the mission of Sentinel J is to find a number less than the base number. Sentinel I's mission is to find a number larger than the base number, until I and J meet.


? ? ? ? OK, the explanation is complete.

Now the benchmark number 6 has been normalized, it is the advantage in the 6th bit of the sequence. At this point we have split the original sequence into two sequences with 6 as the dividing point, the sequence on the left is "3" 1 2? 5? 4 "and the right sequence is" 9? 7? 10? 8 ".

The next step is to process the two sequences separately. Because the 6 left and right sequences are still very confusing at the moment. It just doesn't matter, we have mastered the method. The next step is to simulate just the method of 6 left and right sequences respectively. Now let's deal with the 6 left sequence now.


The sequence on the left is 3? 1? 2 5? 4 ". Adjust this sequence to 3 as the base number so that the number on the left of 3 is less than or equal to 3, and the number on the right of 3 is greater than or equal to 3.

All right, let's start with the pen.
Suppose you're not wrong about the simulation. The order of the sequence after the adjustment is complete should be.

????? 2? 1? 3?? 5? 4
? ? ? ? OK, now 3 has been returned to place.

Next you need to deal with 3 left sequence "2 1" and the right sequence "5 4". The sequence "2 1" is adjusted with 2 as the base number, and the sequence after processing completes is "1 2", to which 2 is already returned. The sequence "1" has only one number and does not require any processing. At this point we have completed all processing of the sequence "2 1", and the resulting sequence is "1 2". The processing of the sequence "5 4" also mimics this method. The last obtained sequence is for example the following. ???? 1? 2? 3 4? 5? 6 9? 7? 8
???? for the sequence "9? 7" 10? 8 "also simulates the process just now, until a new subsequence is not detached. Finally, you will get such a sequence, such as the following.

????? 1? 2? 3 4? 5 6? 7? 8.9
???? to this. The sort ended completely. The attentive classmate may have found that each round of high-speed sequencing is in fact the return of the base number of the round to the position until all the numbers are returned. The sort is over.

The following previous domineering diagram to describe the whole process of the algorithm.



The high-speed sorting is faster, because each exchange is a jump-through compared to the bubble sort. Set a datum point each time it is sorted, place a number less than or equal to the datum point to the left of the Datum point, and all the numbers greater than or equal to the datum point to the right of the datum point. This will not be the same as bubble sort at each time of the exchange, only can be exchanged between the adjacent number, the distance is much larger.

So the overall comparison and the number of exchanges is less. The speed is naturally increased. Of course, in the worst case, it is still possible to exchange two consecutive numbers.

So the worst time complexity for high-speed sorting is the same as the bubble sort, which is O (N2). Its average time complexity is O (NLOGN). In fact, high-speed sequencing is based on a thought called "dichotomy."


5. Detailed implementations such as the following:

public class quicksort{public static void QuickSort (comparable[] a) {//stdrandom.shuffle (a);//disrupts array a order. Eliminate the dependency on the input.        This is a static function written by the author of the fourth edition of the algorithm.    QuickSort (A, 0, a.length-1); } public static void QuickSort (comparable[] A, int lo, int hi) {if (Hi <= lo)//jump out of recursion condition.        Equivalent to no further decomposition of return;    Int J = partition (A, lo, HI);           Slice (A[lo] into the appropriate position in the array: the left is smaller than him, the right is bigger than it, so the array is split into two parts) QuickSort (A, lo, j-1);           Sort the left half a[lo...j-1] QuickSort (A, j+1, HI); Sort right half A[j+1...hi] private static int partition (comparable[] A, int lo, int hi) {//Divide the array into a[lo...i-1], a[i] and a[        I+1...hi];   int i = lo, j = hi + 1;     Left and right scan pointers, j=hi+1, to a great extent for the following write--j, comparable v = A[lo]; Leave the element to be used for slicing while (true) {//scan left and right to check whether the scan ends and swap elements while (less (a[++i], V))//Two conditions will jump out of the while loop. Until the left side finds a greater than V. Or i the pointer has gone to the end (I==hi), ++i reason: V is starting from Lo, satisfies less () if (i = = hi) break;//only these two inferred test conditions are superfluous. can be removed. Because the i,j itself is going from both ends of the past.     Not until you get to the head       while (less (v,a[--j));            if (j = = lo) break;  if (i >= j) break;            I and J meet, then jump out of the loop.          Exch (A,i,j); Suppose that the last two while are jumping out, indicating I.           J stops at the A[i]>v, A[j]<v, so the two exchanges will be able, then I and J continue to move toward the middle of} Exch (A, Lo, j);                 Put V = a[j] in the correct position, when the two pointers meet, exchange A[lo] and a[j], so that the value of the Shard is left in A[j], the initial a[lo] found the correct position return J; A[LO...J-1] <= a[j] <= A[j+1...hi] reached}}


6. Note:

(1) to deal with the value of the split element is repeated , so the left side scan is best to encounter elements of the >= element value to stop. Right side scan stops when encountering elements that <= the value of the split element, so that it can perform a time square level than the second algorithm

(2) Termination of recursion: must be guaranteed to terminate the conditions of recursion , or recursion into a dead loop

7. Algorithm improvements

7.1 Switch to insert sort

Is the same as most recursive sorting algorithms. An easy way to improve high-speed sorting performance is based on the following two points:

(1) For fractional groups, high-speed sorting is slower than insert sort

(2) Because of recursion, the sort () method of high-speed ordering also calls itself in the decimal group

Therefore, the sort decimal group should switch to insert sort. One sentence in a simple modification algorithm is: The statement in the sort ()

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? if (Hi <= lo)? Return

Change to:???????????? if (Hi <= lo + M) {Insertion.sort (A, lo, hi);? return;}

This makes it possible to convert the decimal group into an insert sort

The optimal value of the conversion parameter m is related to the system, but the 5~15 of your value in most cases can be pleasant


7.2 Three-sample segmentation

The second way to improve the performance of high-speed sorting is to use the median of a small subset of the sub-arrays to cut the fractional group, which results in better segmentation, but at the cost of calculating the median.

It has been found that setting the sample size to 3 and centering the element atmosphere is best, and we are able to put the sampled elements at the end of the array as "Sentinels" to remove the array bounds from the partition () test.



7.3 Entropy most of the sort

In practice, an array of repeated elements is often present, such as the fact that we may need to sort a large number of personnel data according to birthdays or by gender. Our pre-high-speed sequencing also has a lot of room for improvement.

The simple idea is to divide the array into three parts, each corresponding to less than. An array element that is equal to and greater than the Shard element, which is also an easy-to-spot programming exercise triggered by the Dutch flag, since it is like sorting the array with three possible primary key values, the three primary key values corresponding to the three colors on the Dutch flag.

The idea of high-speed sorting algorithm for three-direction segmentation for example, the following:

It iterates through the array one at a time from left to right. Maintain a pointer lt makes the elements in a[lo...lt-1] less than V, a pointer to the GT so that the elements in A[gt+1...hi] are greater than V, and a pointer I makes a[lt ... The elements in i-1] are equal to the elements in v,a[i...gt] are not yet determined.

(1) a[i]?< v?? A[LT] and A[i] will be exchanged, the LT and I plus a

(2) a[i]?> v?? A[GT] and A[i] will be exchanged, reducing the GT by one

(3) a[i]? =? v?? Add I plus one

lt++ Reason:lt corresponds to the current position of V, A[lt] =v. So only when the a[lt] >a[i] will be exchanged, after the exchange of a value of more than V, so lt++

i++ Reason: There are two cases i++, the first a[i] < V, so it is bound to change the value of a[i] before V, but the pointer that has been moving is I, so I want to go forward. Compare the size of the next a[i] with V. Assuming equal own initiative plus one

The reason for gt--: Exchange A[GT] and a[i] is the reason, A[i]>v. Then sure A[i] value met in the V, simply put this greater than V on the last face, certainly meet v behind it is bigger than it, so put A[GT] take over, at this time not yet???????????????????????????????? . CompareTo (v), will be able to determine whether the a[gt] is bigger or smaller than V, but has been determined by a[i] and A[GT] after the exchange of A[GT] ken??????????????????? Minus One


Detailed implementation code such as the following

Public  class quick3way{    private static void sort (comparable[] A, int lo, int hi) {        if (Hi <= lo)    return;        int lt = lo, i = lo+1, gt = Hi;        Comparable v = A[lo];        while (i <= GT) {            int cmp = A[i].compareto (v);            If  (CMP < 0) {   exch (A, lt++, i++);            } else if (cmp > 0) {  exch (A, I, gt--);            } else{                i++;            }        }  now a[lo...lt-1] < v = a[lt ... GT] < A[gt+1...hi] Set up        sort (A, lo, lt-1);        Sort (A, gt+1, HI);    }}

Diagram Exchange Process:



So it can be seen that when the elements in the array have a large amount of repeated data, the high-speed sorting effect of the three-direction segmentation is better




"Algorithm (fourth edition)" Sorting-----high-speed sequencing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.