Introduction to the MIT algorithm--fifth. Linear Time Sort

Source: Internet
Author: User
Tags rounds sorts square root

The topic of introduction to MIT algorithm under this column (algorithms) is an individual's learning experience and notes on the introduction to the MIT algorithm of NetEase Open course. All the content comes from the lectures of Charles E. Leiserson and Erik Demaine teachers in MIT Open Course Introduction to algorithms. (http://v.163.com/special/opencourse/algorithms.html)

Fifth section-------linear time sorting Linear times sort

The main content of this lesson is to analyze the fastest efficiency based on comparison and introduce several non-comparative linear time sorting algorithms.

First, the fastest efficiency of the sequencing can be achieved

There are several sorting algorithms already in contact, so there's a question: how fast can a sort be faster? Θ (n^2), this answer is correct only when the adjacent digital interchange is allowed, and θ (NLGN), the answer is usually correct; θ (n), the answer is also correct only in certain cases. So the right answer should be to look at the situation, depending on which operations are allowed in the calculation model you are using. The main point here is which operations are allowed during the sorting process.

The algorithms currently in contact (insert sort, merge sort, quick sort, heap sort) are all based on the comparison sorting algorithm. In the comparison sorting algorithm model, you can only do comparison operations to determine the relative order of elements, that is, we cannot multiply integers or other strange operations. We've seen that for comparison-based sorting algorithms, the worst-case time complexity is θ (NLGN). So there's a question, can we do it faster than Theta (NLGN)? Decision trees can help us answer this question.

As shown, it is a decision tree analysis that compares the sequence of three elements to a sort. Where the inner node in the diagram is marked as i:j, the first element of the sequence is compared to the J element, and Zuozi indicates that if the subsequent comparison of A[i] <= A[j], the right subtree indicates a subsequent comparison if a[i] >= a[j) The leaf node represents the sequence of sequences determined from the root node to the leaf node path.

This is the decision tree model of the comparison sort. Although it is difficult for a sorting algorithm to draw its general decision tree (the complexity of the decision tree is the exponential level of N), we can assume that any sort of comparison algorithm has its corresponding decision tree, that is, the decision tree can be used to analyze the complexity of the algorithm. The run time of the algorithm: the path length of the root node to the leaf node; worst-Case Runtime: the height of the decision tree.

Here we use decision tree analysis to prove that any decision tree that sorts n elements is the height of Ω (NLGN).

It can be known from the above proof process that the nether problem of decision Tree, also known as the lower bound problem of the comparison-based sorting algorithm. Specifically, it shows that the merge sort and heap sorting algorithms are asymptotically optimal. The stochastic fast sequencing algorithm is also optimal in the ideal case.

Second, linear time sequencing

The following is a breakthrough in the comparison model, we should try to complete the order in linear time. We know that if you don't use parallel algorithms, or what powers, you can't sort faster than linear time, because you have to traverse the data. No matter what you do, you have to traverse it, otherwise you can't sort it correctly. So linear time makes the best result we expect. So how do we sort within linear time? We need some more effective ideas. Here are two sorting algorithms that are faster than NLGN.

1. Counting sort

For the counting sort, we will do some other things for the elements instead of comparing them. What we're going to do is assume that the input sequence is an integer within a particular range. We will use this to sort within linear time. The following is a pseudo-code description of the Count sort.

Pseudocode looks really confusing, let's take a look at these four for-loop things in the following example. The 1th for loop is simple, and initialization will be C 0. The 2nd for loop iterates through the input sequence A, computes a number equal to C in a and populates the auxiliary array C to get the result:

The 3rd for loop is to calculate the sum of all the numbers before the current position in C and fill in C to get the result:

The 4th for Loop, which traverses the data in a from the back forward, and puts the data in a in the correct position in B, based on the results of the calculation in C. For example, a in the first element 4, then c[4] is the element 4 in the correct position in B, that is, 4, that is, the position of the vacancy in B. As shown, this puts the a sequence in B.

Well, how efficient is the algorithm for calculating sorting algorithms? It is easy to know by pseudo-code, and the complexity is θ (n + k). If k = O (n) is excellent, we get a linear time sorting algorithm. So it seems that we need not only to assume that these numbers are integers, but also to make sure that the range of these integers is small. But on the other hand, as long as K is less than nlgn we should be satisfied. So you can write a hybrid algorithm, if K is larger than NLGN we use the merge sort, if it is smaller than the NLGN, it is sorted by count.

A useful scenario for counting sort: If you're working with a number that's 1 bytes in length, that's fine. K is only 2^8, which is 256, you need the auxiliary sequence length of 256, which quickly o (n + N), no matter how large this is n linear time. So if you know the numbers are small this algorithm is good, but if the numbers are bigger, even if you know they're still integers, they're like 32-bit words, it's not that simple. This is because the secondary space needed is 2^32,16g.

In addition, the count sort is a stable sort: for equal elements, it preserves their order in the input sequence.

2. Base sorting

Next we're going to touch a more interesting algorithm, which uses the Count sort as a subroutine for small data size, which, combined with this algorithm, deals with larger numbers.

Cardinality sequencing is probably the oldest implementation of the algorithm, around 1890, Herman Hollerith applied to the card sorting machine (card-sorting machines). Herman Hollerith, also a lecturer at MIT, invented the earliest version of Punch cards. The cardinality sort first sorts the last one, where a stable sort algorithm is used, and then the high order is sorted. This algorithm made Hollerith make a lot of money, he invented the machine in 1911, and then in 1924, the annexation of several companies formed a new company called IBM, which may be the reason you heard hollerith. Here is an example of cardinality ordering.

With respect to the correctness of this algorithm, we assume that a set of numbers in their low t-1 bit has been sequenced, then the first T-bit is now sorted. On the T-bit, the order of the two numbers that are not equal can be correctly determined, and for the two numbers equal, based on the principle of stability, it is correct that they remain in the original order. Then according to the induction, we can know that they remain orderly.

Next, the algorithm complexity of cardinality sorting is analyzed. Assume that the count is sorted as an auxiliary stable sorting algorithm. But I don't want to sort by every number bit, and that would lose too much flexibility. Because the number can be expressed in any form, such as a binary representation of the number, you can put a few bits together as one to deal with. So we assume that the sequence is a binary integer, and that it is a B-ratio feature.

The algorithm splits each integer into B/R digits, each of which is the R bit length. In other words, this number is expressed based on 2^r prohibitions. So, B/R makes our algorithm need to run the number of rounds. And 2^r is the maximum value each digit can take, in a sense it is the K in the counting sort. So, how does the total run time count?

We have a derivative of this function about R, the solution of the derivative of 0 o'clock can get the standing point of the function, so we can always find the minimum value. There will be some more intuitive approach, in other words a less precise approach, but we can still get the right results, and of course the upper bound is accurate. Let's consider the growth process of R, here are two items that contain r, the larger the R is, the less b/r the algorithm, but according to the latter 2^r, R is not likely to be too large, and when R >> LGN, it will grow exponentially. What we want is for n instead of 2^r to determine the size of the entire polynomial, so choose R for the maximum value in this case, R = lgn.

In reality, the cardinality sort runs fast for large amounts of data input, and the code is easy to maintain. For example, if we have 32-bit numbers, and we split them into 8-bit segments, we just need to do a four-wheel linear-time operation that requires only 256 auxiliary space. If you use the NGLGN algorithm, you need about 11 rounds, as shown in.

Unfortunately, the counting sort requires more on the cache, so in practice, the cardinality sort is not so fast, unless you have a small number. Algorithms such as fast sorting can do better.

Finally, if you sort any integers, and each number is a word word, the best sorting algorithm currently known, the expected run time is n by the square root of LG (LGN), which is a random algorithm and very complex. There are a lot of very complex algorithms, but they can give us some kind of information, you can break the dependence on B, as long as you know B is a maximum word length.

More learning materials about introduction to algorithms will continue to be updated, so stay tuned for this blog and Sina Weibo Sheridan.

Introduction to the MIT algorithm--fifth. Linear Time Sort

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.