David MacKay: Explaining the nature and difference of ' quick sort ', ' heap sort ' with information theory

Source: Internet
Author: User

This article is David MacKay 's use of information theory to compare performance differences caused by the essential differences in the fast and stacked rows.

Information theory is very powerful, it is not just a tool to analyze the optimal decision of theory.

It is very interesting to analyze the efficiency of the algorithm from the angle of information theory, which gives us a new way of thinking about the algorithm of sorting.

Using the concept of information theory, it is easy to understand why the speed of the fast line is so fast and where its flaws are.

Due to the lack of personal ability, the understanding of this article may still be a bit biased.

And because of the difficulty of translation, there are many places in this translation that have not been translated or used in the original sentence.

So it is recommended that you read the original heapsort, Quicksort, and Entropy

But if a friend read the following translation, and there is a better way to translate, you are welcome to leave a message, greatly appreciated!!!

----------------------------------------------------------translation is as follows----------------------------------------------------------- -------

Stacking, quick-row, and information entropy

There are many network articles for the comparison and analysis of the heap and the fast row.

Most of them are of the view that "the average time efficiency of both algorithms is gradual in Nlogn, but in practical tests, the efficiency of a good fast row is usually able to defeat the heap rows." ”

Some people further analyze and give some quantitative explanations: "The average comparison of the heap is twice times faster, but the heap avoids the catastrophic degradation of efficiency in low probability." ”

However, very few people ask the question, "Why do the stacks use twice the number of comparisons in the queue?" "People are making a lot of effort to create a sort algorithm that has two algorithms first, such as the ' introspective sort '---this algorithm provides a recursive implementation of the fast queue, and when the recursion depth is deep, select the heap row to implement.

Paul Hsieh also has a deep comparison of the fast and stacked rows. "I suspect that the actual efficiency of the stacks is far better than the perception of people, and there are some practical results to prove it," he said. "In his tests, the best compilers (the same for a heap or a fast queue) produce a result: the stack parallelism is 20% faster on the total CPU execution time.

In fact, in order to sort 3,000 objects, the heap rows averaged 61,000 comparisons, and the quick rows used only 22,000 comparisons. Why is it that the CPU executes less time compared to the number of stacks? On this point, you can see Paul Hsieh 's article: Sorting revisited.

And the question I want to discuss is: Why do the stacks compare more often than the faster? Paul Hsieh said: "What strikes me is that I can't see why the stacks are slower than the fast-running rows." And I haven't heard or read an authoritative explanation. ”

Based on the expected average amount of information (expected information content), we can get a simple explanation. To make it easier for everyone to read, let's start by introducing some other preliminary knowledge.

First look at the classic weighing problem :

Given 12 balls that look the same, 11 of them are the same weight, and 1 balls may be heavier or lighter. The thing you need to do is to use a scale without scales, each time you put the same number of balls at each end of the balance, your task is to devise a strategy that will enable you to find this particular ball and make the use of the balance the least number of times.

Many people find solutions by repeatedly trying and groping (this example can use 3 comparisons to get a special ball). This method of trial and error is obviously too cumbersome and complex, and there is a better way.

To reduce the number of attempts, we need to increase the amount of information available for an average attempt.

Shannon has shown us a good way to define the amount of information that can be obtained from the results, called the entropy of the results. (If you want to know information entropy, or more in-depth discussion of weighing issues, you can read the book I wrote: Information theory, inference, and learning algorithms).

We can quickly solve this weighing problem by trying to select the most information entropy at a time.

Sort with bits

If a sort is a sort of comparison, and each comparison produces two probabilities of the same result, the average amount of information obtained from a comparison is 1bit.

The amount of information required to sort n objects; it happens to be logn!. Bits (assuming that the object has no priority information).

Using the Stiring method, the total amount of information is: T =n log2n -n log2 e.

The average number of comparisons for any sorting algorithm is not less than T. Also, when the results of comparisons are equal (the probabilities of the two cases of the result are the same), the average number of comparisons for this sort will be close to T.

So, what causes "in general, the heap will be slower than the speed of the queue"?

Obviously, this is because the result of the comparison of the heap rows is not equal to the probability. We will explain the problem below.

Incidentally, the standard fast-line stochastic algorithm has the same flaw. However, it is infuriating that many algorithmic learners will say, "in most cases, the fast-line stochastic algorithm uses the O (NLOGN) comparison", and they lose the constant factor. What a sad sign ' O '! When someone argues that "we are concerned with the gradual performance of N in a very large situation," it is such a foolish expression. This expression seems to say that we do not need to care about the difference between a 4NlogN algorithm and a NLOGN algorithm!

However, if we calculate the constant factor of the NLOGN algorithm, or if we turn our gaze to the nature of the algorithm, everything will become very interesting. Here, I will give the final result and prove its correctness. A random fast algorithm, its average time consumption is N loge1/2n.

The actual consumption will be greater than the ideal situation. However, in the ideal case, we will t about equal to n log2n, the constant 1/log21.649 approximately equal to 1.39. If we are concerned about the number of comparisons, then we need to find a better algorithm than the fast queue.

More here ... You can see Thhat quicksort have unbalanced probabilities by imagining the last few comparisons of numbers with a pivot. If the preceding comparisons has divided on one side and at the other side, then it seems a good prediction tha T the next comparison with the pivot have a probability of 0.7 of going the first and 0.3 of going of the other.

Return to heap sort

The comparison of heap sorting is inefficient because, it may need to move the bottom number to the top, and allow them to go straight along, with a larger number of exchange positions ....

Why do the stacks do this? No one ever thought of a more beautiful way to move the maximum of two sub-heaps to the top of the heap?

How about this approach:

Improved heap ordering (this method may have been thought of before)

1. Put the element in a valid maximum heap

2. Delete the top of the heap and create an open position ' V '

Modified Heapsort (Surely someone already thought of this)

1

Put items into a valid max-heap

2

Remove the top of the heap, creating a vacancy ' V '

3

Compare the SUB-HEAP leaders directly below V, and promote the biggest one into the vacancy. Recursively repeat step 3, redefining V to be the new vacancy, until we reach the bottom of the heap.

(This was just like the sift operation of Heapsort, except that we ' ve effectively promoted a element, known to be The smaller than all others, to the top of the heap; This smallest element can automatically trickle down without needing to being compared with anything.)

4

Go to step 2

Disadvantage of this approach:it doesn ' t has the pretty in-place property of Heapsort. But we could obtain this property again by introducing a extra swap at the end, swapping the ' smallest ' element with Anot Her element at the bottom of the heap, the one which would has been removed in Heapsort, and running another sift On from the that element upwards.

Let's call this algorithm:Fast heapsort. This is not an in-situ algorithm, but, just like heapsort, it extracts the sorted items one at a time from the top of the heap.

Performance of Fast Heapsort

I evaluated the performance of fast heapsort in the case of random permutations. Performance was measured solely on the basis of the number of binary comparisons required. [Fast heapsort does require extra bookkeeping, so the CPU comparison would come out differently.]

Performance of Fast Heapsort.
Horizontal axis:number of items to be sorted, N.
Vertical Axis:number of binary comparisons. The theoretical curves show the asymptotic results for randomized Quicksort (2 N ln N) and the information-theoretic limit , Log_2 N! Simeq (N log n-n)/log 2.

I Haven ' t proved that Fast Heapsort comes close to maximizing the entropy at all step, but it seems reasonable to imag Ine that it might indeed does so asymptotically. After all, Heapsort ' s starting heap was rather like a organization in which the top dog had been made president, and the T OP dogs in Divisions A and B has been made Vice-President; A similar organization persists all the the-the-lowest level. The president originated in one of those divisions, and got he job by sequentially deposing his bosses.

Now if the boss leaves and needs to being replaced by the best person in the organization, we'll clearly need to compare the Vice-Presidents; The question is, does we expect this to be a close contest? We have little cause to bet on either vice-president, a priori. There is just, asymmetries in the Situation:first, the retiring president probably originated in one of the and the Divis Ions And second, the total numbers in those and a divisions may unequal. VP A might is the best of slightlymore people than VP B; The best of a big village are more likely to beat, the best of a small village. And the standard-of-the-making a heap can make rather lop-sided binary trees. In the organization with people, for example, Division A would contain (8+4+2+1) =15 people, and division B just (4+2+1) =7 .

To make the heap sort faster, I recommend the following two improvements:

1, let the heap more balanced. Destroy The elegant heap rules, that I's children was at (2i) and (2i+1), and that a heap was filled from left to right. Would this do any difference? perhaps not; Perhaps it ' s like a Huffman algorithm with some free choices.

2. Provide information theory to initialize the heap information processing. Does The opening heapify routine make comparisons that has entropy significantly less than one bit?

Back to the fast line

We can also give a quick ordering of information entropy processing. The fast line is a waste because it insists on comparing the two possible outcomes in different ways. About half of the time, the Clippers used a poor principal-the difference is that the main element is in the four-minute distance-and once the principal is a poor principal, the amount of information generated by each comparison with this principal is less than 1bit.

A simple way to reduce the inefficiency of the fast-row due to the poor selection of the main element is called the "three-valued method". The idea of this algorithm is to randomly take three numbers at a time and select the middle number as the main element. This method makes the probability of choosing the main element of the difference greatly reduced. But we can do better than that. Let's go back to the very beginning of the analysis of the fast-line generation information.

When we randomly select a principal and compare it to other random numbers, it is clear that the information entropy of this result is 1bit. It's a pretty start. But when we compared the other numbers to this one, we did a bad comparison. If the result of the first element comparison is greater than the main element, then, subjectively, the second number is larger than the number of main elements. In fact, the probability that the second number is larger than the principal is about 2/3 (this is a pretty Bayes theorem, when N=3, n=4, the probability of 2/3 is accurate; perhaps for all N, the probability is accurate). (1/3,2/3) Information entropy = 0.918, so the first comparison is not too bad performance-only 8% of useless information. Based on the entropy of information, we need to compare the other two elements in the second comparison. However, let's continue using the quick row. If the first two elements are larger than the primary, then the probability of the next element being larger than the primary is 3/4 (information entropy: 0.811 bits). The frequency of such occurrences will continue to increase over half the time.

Table 1 shows that after comparing five elements to a master,

State

Probability of this state

Probability that next element would go left

Entropy

Left

Right

0

5

1/6

1/7

0.59167

1

4

1/6

2/7

0.86312

2

3

1/6

3/7

0.98523

3

2

1/6

4/7

0.98523

4

1

1/6

5/7

0.86312

5

0

1/6

6/7

0.59167

Table 1

Table 2 shows the average entropy of information generated for each iteration of a fast row.

Iteration of Quicksort

Expected entropy at this step

Minimum entropy at this step

0

1

1

1

0.918296

0.918296

2

0.874185

0.811278

3

0.846439

0.721928

4

0.827327

0.650022

5

0.81334

0.591673

6

0.80265

0.543564

7

0.794209

0.503258

8

0.78737

0.468996

9

0.781715

0.439497

10

0.77696

0.413817

Table 2

When we use the fast line, we can make some reasonable decisions by calculation. For example, we can choose to continue using this principal, or select a number as the principal in the number compared to the current principal (which is a bit high). The start-up of the ' tri-value taking method ' can be seen as: As with the standard fast sorting algorithm, we first compare the two elements with the main element. If we arrive at 1/3, the probability is that it will occur, so we continue to use the principal. If we arrive at (0,2) or (2,0), we can judge that this is a poor principal and discard the principal. We then select an intermediate value from the two elements compared to the current principal, as the new principal. The choice of the main element takes an extra time to compare, but the cost can be ignored for the benefit it can gain.

We can put the "three-value method" on an objective basis, using information theory, and we can promote it.

Suppose we have compared the number of M-1 (n) with a randomly chosen principal, and we have reached the (m1,m2) case. 1, we can choose to continue to use this principal element. The information entropy for the next comparison using this principal is h_2 (p) (P = (m1+1)/(m1+m2+2)). 2, we can also hope that a middle finger search algorithm, the algorithm can find the median of M number and set as the main element. The median value can be found within a linear price (CM) (Median can found with a expected cost proportional to M. For example: The cost of the fast line is 4M.

We can evaluate the time consumption of these two choices.

If we use this principal to continue the (N-M) comparison, the expected information is approximately r = (n-m) h_2 (p) bits. (This is not very accurate, if you want to be precise, you can use the integral method to find the exact value).

If we choose to use the new principal in subsequent successive iterations (this will be very expensive-this suggests we should use a new algorithm-first using a sorting algorithm to find multiple primitives), this cost is approximately (N-M) +4 (M-1), the expected information obtained is about R ' = (n-m) h_2 ( P ') Bits, (where P ' is a new main element).

If we approximate R ' \simeq n-m then finding the new pivot have the better return-to-cost ratio if

(N-M)/((N-M) +4 (M-1)) > h_2 (P)
i.e. 1/(1+4 (M-1)/(N-M)) > h_2 (P)

We can do this, if the value of the n-m is very large, then we have a better reason to abandon the principal element.

Further modifying quicksort, we can plan Ahead:it ' s almost certainly a good idea to perfectly sort a M randomly selected Points and find their median, where M is roughly sqrt (N/log (N)), and use this fairly accurate median as the pivot for the First iteration.

Summary: ' median-of-3 ' is a good idea, but the even better (for all N greater than/SO) is ' median-of-sqrt (N/log (N)) '. If you retain the sorted subset from iteration to iteration, you'll end up with something rather like a self-balancing bin ary search tree.

Reference documents:

Nice explanation of randomized median-finding in O (n) time

Deterministic median-finding.

David MacKay December 2005
Http://www.aims.ac.za/~mackay/sorting/sorting.htmlfrom:http://blog.csdn.net/cyh_24/article/details/8094318

David MacKay: Explaining the nature and difference of ' quick sort ', ' heap sort ' with information theory

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.