Introduction to the MIT algorithm--fourth. Quicksort

Source: Internet
Author: User

The topic of introduction to MIT algorithm under this column (algorithms) is an individual's learning experience and notes on the introduction to the MIT algorithm of NetEase Open course. All the content comes from the lectures of Charles E. Leiserson and Erik Demaine teachers in MIT Open Course Introduction to algorithms. (http://v.163.com/special/opencourse/algorithms.html)

Section fourth-------Quick sort Quicksort

The main content of this lesson is divided into two parts, which is to introduce the fast sorting algorithm, analyze the efficiency of the algorithm in the best, worst and best worst alternation situations, and the other part is to introduce the stochastic fast algorithm and analyze the complexity of the algorithm. The latter is also the most exciting part of the video.

First, quick sort

The quick sort, proposed by Tony Hoare in 1962, is a sort of in-place based on the idea of Divide and conquer (Divide-and-conquer), but its efficiency depends on the sorting status of the input data. The so-called in-situ ordering, that is, the algorithm's spatial complexity is O (1), it is in the original data region in the rearrangement. Just like inserting sort, the sort is done in situ. But the merge sort is different, and it requires additional space for the merge sort operation.

As shown in the quick sorting of the idea of the division of Law.

As you can tell, the most critical part of a fast sort is the first step: the step of Differentiation (Partition), which is at the heart of the algorithm's handling of the problem. So we can think of the fast line as recursively dividing the array, just like merging the sort recursively to merge the array. There are several kinds of specific algorithms for paritition, and their pseudo-code is slightly different but their principles are the same. Of course, the most important thing is that their algorithm complexity is linear, that is, O (n). The following is a recursive pseudo-code for a quick line.

Of course, above is just a core recursive code. One way to make the quick run faster is to adjust the code here to find a special algorithm that is suitable for fewer elements. For example, with only 5 elements left, and you already know some code to efficiently sort 5 elements, it would be better to use that particular algorithm instead of recursion here. There are other ways, such as the tail recursion code, then you can use some tail recursion optimization and so on ...

Next, we analyze the efficiency of the algorithm, assuming that all the elements are different in the analysis. The code here does not work well when duplicate elements are present, but Hoare's initial division method is more efficient in the case of duplicates in the elements to be sorted. When you have time to look at that method, it uses a more complex invariant to achieve the separation process, but essentially the same.

1, the worst case analysis

What kind of situation is the worst case of fast sequencing? Of course, when the input sequence is a positive or reverse order, because this time the division is always around the largest or smallest element, then the phenomenon is that the other side of the division is always no element, so that can not use recursion to improve the efficiency of the algorithm operation.

So, in this case, the efficiency of the fast-line algorithm is recursive as shown, and the efficiency at this point is θ (n^2).

2, the best case analysis

We certainly have no way of ensuring that the input sequence satisfies the optimal situation of the queue, but we can intuitively understand that if we are lucky enough, partition divides the array into two equal sub-arrays each time it is divided. Then the recursive analysis in this case is as shown.

This is obviously the result we want. So when the partition is not equal to divide, if the proportion of two sub-series is divided into 1:9?

To understand this recursion, the main method is useless. This time the move out is always able to effectively parse the tree, as shown in.

In summary, it is possible to know that the algorithm efficiency is θ (NLGN) in the optimal case.

3. Analysis of the worst and the best alternate appearance

Above we analyzed the best and worst case algorithm efficiency, then more common situation is, when the worst case and the optimal situation at the same time have the algorithm efficiency? We assume that the worst of the algorithm and the optimal situation alternately occur, then the efficiency analysis of the algorithm as shown.

It can be learned that in such circumstances we are still lucky, the algorithm obtained by the efficiency of the same efficiency as the optimal situation. So how do we ensure we are always lucky? This is also the problem that needs to be solved for the randomization of fast sequencing.

Second, the rapid sequencing of randomization

We already know that if the input itself has been sorted, then it's bad for the fast line. So how do you avoid such a situation? One method randomly arranges the elements in the sequence, and the other method randomly selects the main element (pivot). This is the idea of a randomized quick sort, the advantage of which is that its run time does not depend on the order of the input sequence. This means that we no longer need to make any assumptions about the distribution of the input sequence, and no particular input sequence can make the algorithm extremely inefficient. The least fortunate situation can occur, but it is also because of the random number generator, not because of the input sequence.

The solution here is to randomly select the main element. As shown here is a randomized fast sorting algorithm for efficient recursive expressions.

where X (k) is an indicator random variable used to simplify the complexity of the above recursive formula.

Therefore, the algorithm efficiency of the computational randomization algorithm becomes the expected value of the recursive formula for calculating the random variables of the above indicator, as shown in the process.

After getting the recursion as shown, how to solve it becomes a problem. The method used here is the substitution method (Substitution).

In this way, we get the algorithm efficiency of the stochastic fast-row is θ (NLGN).

In practice, fast sequencing is a good algorithm. Although it is not guaranteed that the merge sort provides the worst-case NLGN run time, in practice, if the randomized fast sorting algorithm is used, it is usually 3 times times faster than the merge sort. Admittedly, it does require some coding skills to get it to that speed, you need to optimize the basics and some other tricks, but most good algorithms are based on fast sorting.

Another reason for the fast ordering is that it performs better in the cache of virtual memory.

More learning materials about introduction to algorithms will continue to be updated, so stay tuned for this blog and Sina Weibo Sheridan.

Introduction to the MIT algorithm--fourth. Quicksort

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.