Algorithm (version 4) sorting ----- quick sorting

**1. Concept**

Fast sorting: You can think of this name as a fast sorting speed. It is an in-situ sorting (only a small auxiliary stack is needed, note that it is not an array ), the time required to sort arrays with N length is proportional to that of NlgN.

The disadvantage is that it is very fragile. You must pay attention to a few small details during implementation (as described below) to avoid errors.

**2. Basic Ideas:**

Randomly find a number (usually just take the first data in the array), insert it into a position so that the number on the left is smaller than it, and the data on the right is larger than it, in this way, a number is divided into two sub-arrays, and then sub-arrays are divided into smaller sub-arrays in the same way until they cannot be decomposed. It is also a classic experiment of the idea of sub-governance (Merge Sorting is also)

**3. Differences between quick sorting and Merge Sorting:**

(1) Merge and sort the arrays into two sub-arrays, and sort them separately. Then, combine the sub-arrays to sort the entire array;

The way to sort arrays in quick sorting is that when both sub-arrays are ordered, the entire array is naturally ordered.

(2) recursive call of Merge Sorting occurs before processing the entire array.

Recursive calling of quick sorting occurs after processing the entire array.

**4. Examples**Suppose we are now sorting the 10 numbers "6 1 2 7 9 3 4 5 10 8. First, you can find a number in the sequence as the reference number (don't be scared by this term. It is a reference number, and you will know what it is used ). For convenience, let the first number 6 be used as the benchmark number. Next, we need to put all the numbers in this sequence greater than the benchmark number on the right side of 6, and the numbers smaller than the benchmark number on the left side of 6, similar to the following arrangement. 3 1 2 5 4 69 7 10 8 in the initial state, the number 6 is 1st bits in the sequence. Our goal is to move 6 to a position in the middle of the sequence, assuming that this position is k. Now we need to look for this k, and take the k bit as the demarcation point. The number on the left is less than or equal to 6, and the number on the right is greater than or equal to 6. Think about it. Can you do this? Let me give you a prompt. Recall how the Bubble sorting step by step restores each number through "Exchange. At this time, you can also achieve the goal through the "Exchange" method. Specifically, how can we exchange data step by step? How can we exchange data to save both time and convenience? Don't rush to look down, take out a pen, and draw pictures on paper. When I first learned the Bubble Sorting Algorithm in high school, I felt that the Bubble Sorting was a waste of time. Every time I had to compare two adjacent numbers, this was obviously too unreasonable. So I thought of a way, and later I realized that this was a "quick sort". Please allow me a little bit of narcissism (^ o ^ ).

The method is actually very simple: Start "probe" from the two ends of the initial sequence "6 1 2 7 9 3 4 5 10 8 ". First, find a number smaller than 6 from the right to the left, then find a number greater than 6 from the left to the right, and then exchange them. Here we can use two variables I and j to point to the leftmost and rightmost of the sequence respectively. We give the two variables a nice name, "sentini" and "sentinj ". At the beginning, let the Sentinel I point to the leftmost (I = 1) of the sequence and to the number 6. Let the Sentinel j point to the rightmost side of the sequence (j = 10) and to the number 8.

First, guard j started to dispatch. Because the baseline number set here is the leftmost number, it is very important to let sentinj dispatch first (think about why ). Sentinel j moves left (j --) Step by step until a number smaller than 6 is found. Next, I moves to the right (I ++) Step by step until I finds a number greater than 6 and stops. Finally, sentinj stops at number 5, and sentini stops at number 7.

Now the values of the elements pointed to by Sentel I and j are exchanged. The sequence after switching is as follows.

6 1 2 59 3 3 4 710 8

At this point, the first exchange ends. Next, start to move the sentinj to the left (and then remind me that every time it is set to sentinj, start first ). He found 4 (smaller than the benchmark number 6, meeting the requirements) and then stopped. Sentinel I also moved to the right. He found 9 (larger than the benchmark number 6, meeting the requirements) and then stopped. In this case, the sequence is as follows. 6 1 2 5 43 97 10 8 the second exchange ends and "probe" continues. Sentinj continued to move to the left. He found 3 (smaller than the benchmark number 6, meeting the requirements) and then stopped. Sentini continues to move to the right! Now I met sentinel j, and both I and j came to the front of 3. This indicates that the "probe" is now over. We will exchange the number of benchmarks 6 and 3. The sequence after switching is as follows. 31 2 5 4 69 7 10 8. The first round of "probe" has really ended. In this case, the baseline number 6 is used as the demarcation point, and the number on the left side of 6 is less than or equal to 6, and the number on the right side of 6 is greater than or equal to 6. Review the previous process. In fact, the mission of Sentel j is to find a number smaller than the benchmark number, and the Mission of Sentel I is to find a number larger than the benchmark number, until I and j meet. OK. The explanation is complete. Now that the number of benchmarks is 6, it is exactly 6th bits in the sequence. Now we have split the original sequence with 6 as the demarcation point into two sequences. The sequence on the left is "3 1 2 5 4", and the sequence on the right is "9 7 10 8 ". Next we need to process these two sequences separately. Because the sequence on the left and right of 6 is still chaotic. But it doesn't matter. We have mastered the method. Next we just need to simulate the method just now to process the sequence on the left and right of 6 respectively. Now let's process the sequence on the left of '6. The sequence on the left is "3 1 2 5 4 ". Adjust the sequence with 3 as the reference number so that the numbers on the left of 3 are less than or equal to 3, and the numbers on the Right of 3 are greater than or equal to 3. Let's get started. If the simulation is correct, the sequence after adjustment should be. 2 1 35 4 OK, now 3 has been reset. Next we need to process the sequence "2 1" on the left of 3 and the sequence "5 4" on the right ". Adjust the sequence "2 1" to 2 as the reference number. After the processing, the sequence is "1 2", which has been reset. Sequence "1" has only one number and does not need to be processed. Now we have processed all the sequences "2 1", and the obtained sequence is "1 2 ". The processing of the sequence "5 4" is also based on this method. The Final sequence obtained is as follows. 1 2 3 4 5 6 9 7 10 8 for the sequence "9 7 10 8", it also simulates the process just now until the new subsequence cannot be split. The sequence is as follows. 1 2 3 4 5 6 7 8 9 10 to this end, the sorting is complete. Careful students may have discovered that each round of quick sorting is actually to normalize the benchmark number of this round until all the numbers are reset, And the sorting is over. The previous domineering figure below describes the processing process of the entire algorithm.

Fast sorting is faster, because compared with Bubble sorting, each exchange is Skip. Set a benchmark for each sort. Place all the numbers smaller than or equal to the benchmark to the left of the benchmark, and all the numbers greater than or equal to the benchmark to the right of the benchmark. In this way, the exchange between adjacent numbers is not the same as the bubble sort. The exchange distance is much larger. Therefore, the total number of comparisons and exchanges is reduced, and the speed naturally increases. Of course, in the worst case, it is still possible that two adjacent numbers are exchanged. Therefore, the worst time complexity of fast sorting is O (N2), and the average time complexity of Bubble Sorting is O (NlogN ). In fact, quick sorting is based on the idea of "binary.

**5. The specific implementation is as follows:**

Public class QuickSort {public static void quickSort (Comparable [] a) {// StdRandom. shuffle (a); // disrupt the sequence of array a and eliminate the dependency on input. This is a static function written by the author of algorithm version 4, quickSort (a, 0,. length-1);} public static void quickSort (Comparable [] a, int lo, int hi) {if (hi <= lo) // jump out of recursive conditions, return; int j = partition (a, lo, hi); // split (insert a [lo] to an appropriate position in the array: the left side is smaller than him, and the right side is bigger than it, so that the array is split into two parts) quickSort (a, lo, J-1); // move the left half a [lo... j-1] Sort quickSort (a, j + 1, hi); // convert the right half of a [j + 1... hi] sorting} private static int partition (Comparable [] a, int lo, int hi) {// splits the array into a [lo... i-1], a [I] And a [I + 1... hi]; int I = lo, j = hi + 1; // pointer for Left and Right scans, j = hi + 1, which is written to a large extent for the following: -- j, comparable v = a [lo]; // keep the element to be split while (true) {// scan left and right to check whether the scan is complete and exchange the element while (less (a [++ I], v) // The two conditions will jump out of the while LOOP, until the value of v is found on the left side, or the I pointer has reached the header (I = hi), the reason for ++ I: v starts from lo, satisfying less () if (I = hi) break; // However, the test conditions for these two cross-border judgments are redundant and can be removed. Because I, j is passed from both ends, while (less (v, a [-- j]); if (j = lo) break; if (I> = j) break; // If I and j run into each other, the entire loop is exch (a, I, j). // if both the last two while times jump out, it indicates I, j stops at a [I]> v, a [j].

**6. Note:**

(1)**Process duplicate split element values**Therefore, it is better to stop when the left-side scan encounters an element> = Split element value, and stop when the right-side scan encounters an element <= Split element value, which is more than the second algorithm running time.

(2) terminate recursion:**Conditions for terminating Recursion**Otherwise, the recursion will fall into an endless loop.

**7. Algorithm Improvement**

7.1 switch to insert sorting

Like most recursive sorting algorithms, a simple way to improve the fast sorting performance is based on the following two points:

(1) For small arrays, fast sorting is slower than insert sorting

(2) because of recursion, the sort () method of fast sorting also calls itself in a small array.

Therefore, in the sorting small array, you should switch to insert sorting. simply change one sentence in the algorithm: the statement in sort ()

If (hi <= lo) return;

Change to: if (hi <= lo + M) {Insertion. sort (a, lo, hi); return ;}

In this way, small arrays can be converted to insert sorting.

The optimal value of the conversion Parameter M is related to the system, but 5 ~ The value between 15 is satisfactory in most cases.

7.2 triplicate splitting

The second way to improve the fast sorting performance is to use the median of a small part of the sub-array to split the array. In this way, the splitting effect is better, but the cost is to calculate the median. We found that setting the sampling size to 3 and the atmosphere of elements centered in the size is the best. We can also put the sampling elements at the end of the array as the "Sentinel" to remove the array boundary test in partition.

7.3 entropy sorting

In practical applications, arrays containing a large number of repeated elements often appear. For example, we may need to sort a large number of personnel data by birthday or by gender. In this case, there is still much room for improvement in our quick sorting.

The simple idea is to divide the array into three parts, which correspond to the array elements smaller than, equal to, and greater than the sharding elements. This is also an easy-to-use programming exercise triggered by the Dutch flag, because it is like sorting the array with three possible primary key values, these three primary key values correspond to the three colors on the Dutch flag.

The idea of a fast Sorting Algorithm for three-way splitting is as follows:

It traverses the array from left to right and maintains a pointer which makes a [lo... the elements in lt-1] are smaller than v, and a pointer gt makes a [gt + 1... the elements in hi] are greater than v. a pointer I causes a [lt... all elements in I-1] are equal to v, a [I... the elements in gt] are not yet determined.

(1) a [I] <v switches a [lt] And a [I], and adds lt and I

(2) a [I]> v switches a [gt] And a [I], and drops gt by one.

(3) a [I] = v add one to I

**Reasons for lt ++:**It is equal to the current position of v, a [lt] = v, so it is exchanged only when a [lt]> a [I, after the switch, a value is added before v, so lt ++

**I ++:**There are two situations: I ++. The first one is a [I] <v, which will inevitably change the value of a [I] To v, however, the pointer that is always moving is I, so I must move forward and compare the size of the next a [I] and v.

**Gt -- cause:**The reason for switching a [gt] And a [I] Is that a [I]> v, then the value of a [I] must be after v, simply put the value greater than v at the end of the page. It must meet the requirement that the value after v is greater than it. Therefore, if a [gt] is taken, the size of a [gt] cannot be guaranteed, so put it in the position I, and compare a [I] in the next round. compareTo (v), you can determine whether the obtained a [gt] is bigger or smaller than v, however, it is determined that the [gt] after a [I] And a [gt] exchange is greater than v, so there is no need to switch to the previous one in the next round of exchange, so gt minus one

The specific implementation code is as follows:

Public class Quick3Way {private static void sort (Comparable [] a, int lo, int hi) {if (hi <= lo) return; int lt = lo, I = lo + 1, gt = hi; Comparable v = a [lo]; while (I <= gt) {int cmp = a [I]. compareTo (v); if (cmp <0) {exch (a, lt ++, I ++);} else if (cmp> 0) {exch (a, I, gt --) ;}else {I ++ ;}// now a [lo... lt-1] <v = a [lt... gt] <a [gt + 1... hi] set up sort (a, lo, lt-1); sort (a, gt + 1, hi );}}

Graph exchange process:

Therefore, it can be seen that when the elements in the array have a large amount of repeated data, the quick sorting effect of three-way segmentation is better.