**Learning Algorithms from scratch: 10 sorting algorithms (medium)**Author: matrix67 Date: 2007-04-06 font size: small, medium, and large. This article is divided into four sections by the gorgeous split line. For the O (nlogn) sorting algorithm, we will introduce Merge Sorting in detail and prove the time complexity of Merge Sorting. Then we will briefly introduce heap sorting, and then give the basic idea and proof of complexity of fast sorting. Finally, we will prove that O (nlogn) has achieved the optimal theoretically. People who have learned oi generally have learned these basic things, and most oier do not have to read them. In order to maintain the integrity of the series of articles, I spent some time writing it.

First, consider a simple question: how to merge two ordered queues into one ordered queue (and output) in a linear time )?

Queue A: 1 3 5 7 9

B queue: 1 2 7 8 9

As shown in the above example, the AB sequence is already ordered. When data is given in an orderly manner, we will find a lot of magical things. For example, the first number we will output must come from the number at the beginning of each of the two sequences. If the two numbers are both 1, we can retrieve one (for example, the one in queue a) and output:

Queue:13 5 7 9

B queue: 1 2 7 8 9

Output: 1

Note: We have taken out a number and deleted it from the original series. The delete operation is implemented by moving the first pointer of the team, otherwise the complexity is high.

Now, the number of headers in queue a is 3, and the number of headers in queue B is still 1. At this point, we can compare the values 3 and 1 and output a small number:

Queue:13 5 7 9

B queue:12 7 8 9

Output: 1 1

The following steps:

Queue:13 5 7 9 a queue:1 35 7 9 a queue:1 3 57 9 a queue:1 3 5 79

B queue:1 27 8 9 => B queue:1 27 8 9 => B queue:1 27 8 9 => B queue:1 27 8 9 ......

Output: 1 1 2 output: 1 1 2 3 output: 1 1 2 3 5 output: 1 1 2 3 5 7

I hope you understand how this is done. This is obviously correct, and the complexity is obviously linear.

Merge sort will use the merge operation mentioned above. A series is given, and the series are sorted from small to large in the time of O (nlogn) using the merge operation. Merge Sorting uses the idea of divide and conquer. First, we divide the given series equally into the left and right sections, then sort the two series separately, and finally use the merge algorithm to sort the two segments (sorted) merge a series into a series. Someone will ask "What sort is used for sorting the numbers of the left and right columns separately? The answer is: sort by merging. That is to say, we recursively divide each series into two sections for the above operation. You don't need to worry about how it actually works. Our program code will call this process recursively until the series cannot be divided (only one number.

When I first looked at this algorithm, some people mistakenly thought that the time complexity was quite high. The following figure provides a non-recursive view of the actual operation process of Merge Sorting for your reference. We can use this graph to prove that the time complexity of the Merge Sorting Algorithm is O (nlogn ).

[3] [1] [4] [1] [5] [9] [2] [7]

////////

[1 3] [1 4] [5 9] [2 7]

////

[1 1 3 4] [2 5 7 9]

//

[1 1 2 3 4 5 7 9]

Each "//" in indicates the linear time merge operation described above. 4 rows are used to illustrate the Merge Sorting. If there are n numbers, it indicates that O (logn) rows are required. The total complexity of merge operations for each row is O (n), so the total complexity of logn rows is O (nlogn ). This is equivalent to analyzing the complexity of merging and sorting by using the recursive tree method. Assume that the complexity of Merge Sorting is T (N), T (n) is composed of two T (n/2) and a linear time about N, then t (N) = 2 * t (n/2) + O (n ). Continue to expand this formula. We can also get the conclusion that T (n) = O (nlogn). You can try it by yourself. If you can merge the results of the two sets of different data calculated separately in a linear time, T (n) = 2 * t (n/2) + O (n) = O (nlogn), then we can construct the division algorithm of O (nlogn. This conclusion is often used later. We will give a lot of similar examples in the computational ry section.

If you see such a strange Algorithm for the first time, you may be interested in it. Grouping is an application of recursion. This is our first recursive operation. The fast sorting mentioned below also uses recursive thinking. The complexity analysis of recursive Programs is usually the same as above. Master theory can simplify this analysis process. The main theorem is too far away from the content in this article and we will not use it in the future. Therefore, we will not introduce it. You can check it yourself. If you have a term here, it will become very easy to find learning materials. What I fear most is that I don't know what the name is, and I won't be able to find any materials for a long time.

Merging and sorting has an interesting by-product. By means of Merge Sorting, the number of Reverse Order pairs in a given sequence can be calculated in the O (nlogn) time. You can use any balanced binary tree to complete this operation, but it is more convenient to use the merge sort to calculate Reverse Order pairs. We will discuss Reverse Order pairs, which are generally called reverse order pairs in an arrangement. Therefore, we assume that all numbers are different. If we want to count the number of backward pairs in 1, 6, 3, 2, 5, and 4, we will first divide this series into the left and right sections. There are only three possible cases for a reverse order pair: two numbers are on the left, two on the right, and one on the left and the other on the right. After processing the Left and Right segments separately, we can calculate the number of reverse pairs in all the third cases in the linear merge process. In other words, we can calculate the number of a queue in a linear time than the number of B queues.

A queue: 1 3 6 a queue:13 6 a queue:13 6 a queue:1 36 A queue:1 36

B queue: 2 4 5 ==> B queue: 2 4 5 ==> B queue:24 5 => B queue:24 5 => B queue:2 45 ......

Output: 1 Output: 1 2 output: 1 2 3 output: 1 2 3 4

Every time we retrieve a number from queue B, we know that the number of queues A is larger than the number of queues B. It is equal to the number of remaining queues. For example, when we extract 2 from queue B, we also know that the numbers 3 and 6 of queue a are larger than 2. In the merge operation, we constantly update the number of remaining numbers in queue A. When we retrieve a number from queue B each time, we add the remaining number in queue a to the final answer. In this way, we have calculated all the situations where "the big number is in the first half and the small number is in the last half". In other cases, the reverse order has been recursively calculated before this.

=================================================== Gorgeous split line ============== ==============================

Heap Sort uses the heap data structure (what is heap ?). The heap insertion Operation is an average constant, and it takes O (log n) Time to delete a root node. Therefore, to complete heap sorting, you need to establish a heap in a linear way (insert all elements into one heap in sequence), and then extract the smallest one with the total O (nlogn) time. As long as the heap will be involved, the heap will be sorted. The Heap has detailed descriptions in that log, so we will not repeat it here.

=================================================== Gorgeous split line ============== ==============================

Quick Sort also applies the recursive idea. We want to divide the given sequence into two segments and sort them separately. A good idea is to select a number as the "keyword", divide the other number into two parts, and put all the numbers smaller than the keyword on the left of the keyword, all vertices greater than the keyword are placed on the right side, and then recursively sorts the left and right sides. By comparing all the numbers in this interval with the keywords in sequence, we can perform the split operation in a linear time. There are many technical implementation methods to complete the split operation. For example, the most common one is to define two pointers. One is to find a keyword that is larger than the previous one, one finds something smaller than the keyword from the back, and then exchanges the positions of the elements corresponding to the two pointers and moves the pointer to repeat the previous process. This is just a rough method, and there are still many details about the implementation. Quick sorting is one of our most commonly used codes. There are a wide variety of fast sorting codes on the Internet, including various languages and styles. You can take a look at it. I have told you about algorithms but not how to implement them. Noip is very simple. Many people carry a fast sorting code before noip and then go to the battlefield. At that time, I had finished sorting the results quickly, and took the time to back the history, so that I could not pass the dictation at night.

Unlike Merge Sorting, the time complexity of quick sorting is difficult to calculate. We can see that the worst case of Merge Sorting is O (nlogn), while the worst case of fast sorting is O (n ^ 2. If the keywords selected each time are the maximum (or minimum) number in the current range, this will reduce the size of each time by only one number, this is no different from the square-level sorting, such as insertion sorting and selection sorting. This is not impossible. If every time you select a keyword is the first number of the selected range, and the data you get is just sorted, your quick sorting will be finished. Obviously, it is best to say that the number of each selected item is exactly the median, which splits the range equally into two segments. The complexity is exactly the same as the Merge Sorting discussed above. Based on this, quick sorting has some common optimizations. For example, we often take a random number from a series as a keyword (instead of always taking the number at a fixed position each time), so as to avoid inefficiency caused by some special data as much as possible. A better way is to randomly take three numbers and select the median of these three numbers as the keyword. The random value of the three numbers takes more time, so we can take the first number, the last number, and the middle number of the series respectively. In addition, when recursion reaches a certain depth and finds that there are only a few or a dozen numbers in the current range, it will take time to continue recursion. It is better to return the result after insertion sorting. This method also avoids the possibility of a recursive operation error when the number is too small.

The following shows that the average complexity of the fast sorting algorithm is O (nlogn ). There are different interpretations in different books. Here I use the introduction to algorithms. It is more skillful and interesting, and requires several turns to understand.

Let's take a look at the code for quick sorting. As we mentioned in the split method, the program performs one exchange after several comparisons with keywords, so the number of comparisons is more than the number of exchanges. We prove that the average number of comparisons between elements in a quick sort is O (nlogn), which indicates the average complexity of the Quick Sort Algorithm. The key to the proof is that we need to calculate the probability that two elements have been compared throughout the algorithm.

Let's take an example. If the 10 numbers from 1 to 10 are given, keywords 7 are selected for the first time and divided into {1, 2, 3, 4, 5, 6} and {8, 9, 10, on the left side of recursion, we select 3 as the keyword, which splits the left part into {1, 2} and {4, 5, 6 }. We can see that number 7 is compared with all other numbers once so that the split operation can be realized. Similarly, the numbers from 1 to 6 must be compared with 3 (except itself ). However, 3 and 9 cannot be compared with each other, and 2 and 6 cannot be compared, because the keywords between 3 and 9, 2 and 6 are separated for the first time. That is to say, two numbers a (I) and a (j) have been compared. if and only when the first one satisfies a (I) <= x <= a (j) the keyword X is exactly a (I) or a (j) (Suppose A (I) is smaller than a (j ). We call it the number of I smaller after sorting as Z (I). If I <j, it is first displayed in Z (I) and Z (j) the keyword between them is that the probability of Z (I) or Z (j) is 2/(J-I + 1). This is because when Z (I) and Z (j) when no keyword exists between them, Z (I) and Z (j) are in the same interval to be separated, no matter how large the interval is, no matter where the recursion goes, keyword selection is always random. We get that the probability of Z (I) and Z (j) compared in a quick sort is 2/(J-I + 1 ).

There are now four numbers, 2, 3, 5, and 7. During sorting, the adjacent two numbers must have been compared, and the probability of two, five, three, and seven is 2/3, there Is A 2/4 possibility of being compared between 2 and 7. That is to say, if the four numbers are sorted 12 times quickly, then a total of 12*3 = 36 times are compared between 2 and 3, 3 and 5, 5, and 7, A total of 8*2 = 16 times were compared between 2 and 5, 3 and 7, and 6 times were compared on average between 2 and 7. Therefore, the total number of comparisons in the 12-time sorting is expected to be 36 + 16 + 6 = 58. We can calculate the average number of times a quick sort is compared: 58/12 = 29/6. In fact, it is equal to the sum of 6 probabilities, 1 + 1 + 1 + 2/3 + 2/3 + 2/4 = 29/6. This is actually a formula related to the expected value.

Similarly, if there are n numbers, the average number of comparisons required for quick sorting can be written as the following formula. So that K = J-I, we can finally get the expected number of comparisons as O (nlogn ).

Here we use a knowledge: 1 + 1/2 + 1/3 +... + 1/N and log n increase at the same speed, that is, Σ (1/n) = percentile (log n ). Its proof is at the end of this article.

Among the three O (nlogn) sorting algorithms, the theoretical complexity of fast sorting is not ideal. Apart from this, the other two algorithms mentioned today use the worst case o (nlogn) to sort the complexity. But in practice, quick sorting is the most efficient (otherwise, it is called quick sorting), because the code for quick sorting is simpler and shorter than other algorithms with the same complexity.

Quick sorting also has an interesting by-product: Quickly selecting the K-small number in the given number. A simple method is to use any of the preceding O (nlogn) algorithms to sort these numbers and return the k elements of the sorted array. The quick select algorithm can perform this operation at the average O (n) time. The worst case is O (n ^ 2), just like fast sorting ). After each split, we can know the number of numbers smaller than the keyword, so as to determine the minimum number of keywords in all numbers. We assume that the keyword is m small. If K = m, then we find the answer-the k-th small element is the keyword. Otherwise, we recursively calculate the Left or Right: When k <m, we recursively look for the smallest k in the left element; when k> m, we recursively look for the K-M small number in the element on the right. Because we do not consider the order of all numbers, we only need to recursion one side of them, so the complexity is greatly reduced. The average complexity is linear, so we will not prove it.

Another algorithm can find the K-small element in the worst O (n) time. That is the least practical algorithm I have ever seen. That O (n) has only theoretical value.

=================================================== Gorgeous split line ============== ==============================

We have previously proved that the complexity can only reach O (N ^ 2) by exchanging adjacent elements ). As a result, people try to swap elements farther away. When people find that the Sorting Algorithm of O (nlogn) is already the limit, what is the lower bound to the complexity? We will discuss more underlying things. We still assume that all numbers are not equal.

We always compare numbers. You can try it. It is absolutely impossible to sort the order of the four numbers after only four comparisons. Every time we compare multiple times, we know one more size relationship. We can know four size relationships from four comparisons. There are a total of 2 ^ 4 = 16 combination methods for the four relations, and 4 in the order of the four numbers! = 24. That is to say, the number of possible results for four comparisons is insufficient to distinguish the 24 possible sequence. Generally, if you have n numbers, you can order them. There are n possible answers! K-times comparison can only distinguish between 2 ^ K, so only 2 ^ K> = n! It is possible to discharge the order. The logarithm is obtained on both sides of the equal sign. Therefore, log2 (N!) is required to sort n numbers !) Times. Note that we do not note that log2 (N!) is supported !) The sorting order of comparison times. Although 2 ^ 5 = 32 exceeds 4 !, However, this is not enough to indicate that the five times are sufficient. Further research is needed to determine the relationship between the four numbers by comparing them five times. The first exception occurs when n = 12, although 2 ^ 29> 12 !, However, it has been proved that the sorting of 12 numbers requires at least 30 comparisons. We can prove that log (N !) Is the same as nlogn, that is, log (N !) = Logging (nlogn ). This is the minimum number of comparisons required for sorting. It provides a lower bound to the sorting complexity. Log (N !) = Proof (nlogn) is also attached at the end of this article.

In the third question of this log, it is proved that log2 (n) is the best and almost the same method is used. This method can also be used to solve the problem of "the ball with different weights should be called several times at least. In fact, there is a set of theories called information theory. Information Theory was proposed by Shannon. He uses logarithm to represent the amount of information and entropy to represent the randomness of possible situations. Through calculation, he can know how the information you have obtained affects the determination of the final result. If our information is based on 2, then information theory becomes informatics. Basically, all information in a computer is the information based on 2 (BITS =__Bi__Nary digi__TS__So we often say Shannon is the father of digital communication. Information Theory is closely related to thermodynamic. For example, entropy is directly derived from the entropy definition of thermodynamic. This is already a serious problem. If you are interested, go to information theory and coding theory. I am also very interested in this. I don't know much about it. I really want to know more about it. Interested comrades may wish to join the discussion. Physics is amazing. physics can solve many pure mathematical problems. I can give some examples if I have time. Why should I select liberal arts.

The three sorts described later are linear time complexity, because they are not sorted by mutual comparison to determine the size relationship.

Appendix 1: proof of Σ (1/n) = entropy (log n)

First, we prove that Σ (1/n) = O (log n ). In Formula 1 + 1/2 + 1/3 + 1/4 + 1/5 +... in, we change 1/3 to 1/2, so that the two 1/2 are combined into one, and then 1/5, 1/6, and 1/7 are all converted into 1/4, so that the four 1/4 are combined into one. We extend all the 2 ^ K-1 items after 1/2 ^ K to 1/2 ^ K, so that the 2 ^ K fraction is one. Now, how many 1 s are generated in 1 + 1/2 +... + 1/n? We only need to check the number of power 2 of the number less than N. Obviously, after the number is expanded, the sum of the original formula is log n. O (logn) is the upper bound of the complexity of Σ (1/n.

Then we prove that Σ (1/n) = Ω (log n ). In Formula 1 + 1/2 + 1/3 + 1/4 + 1/5 +... in, we change 1/3 to 1/4, so that the two 1/4 are combined into a 1/2; then 1/5, 1/6 and 1/7 are all converted into 1/8, so that the four 1/8 are combined into a 1/2. We reduced all the first 2 ^ K-1 items of 1/2 ^ K to 1/2 ^ K, making the 2 ^ K fraction A 1/2 sum. Now, how many 1/2 s are generated in 1 + 1/2 +... + 1/n? We only need to check the number of power 2 of the number less than N. Obviously, after the number is reduced, the total number of original items is 1/2 * logn. Ω (logn) is the lower bound of Σ (1/n) complexity.

Appendix 2: log (N !) = Proof of logging (nlogn)

First, we prove that log (N !) = O (nlogn ). Apparently n! <N ^ N. Log (N!) is obtained from the logarithm of the two sides !) <Log (N ^ N), while log (N ^ N) is equal to nlogn. Therefore, O (nlogn) is log (N !) The upper limit of complexity.

Then we prove that log (N !) = Ω (nlogn ). N! = N (n-1) (n-2) (n-3)... 1, all the factors in the first half are reduced to n/2, and all the other half are removed, obviously N!> (N/2) ^ (n/2 ). Log (N!)> (N/2) log (n/2), the latter is Ω (nlogn ). Therefore, Ω (nlogn) is log (N !) The lower limit of complexity.

I wrote it here today. You can proofread it.

Matrix67 original

Please indicate the source of the post