Data Structure Learning (C ++)-sorting [6] internal sorting summary

Source: Internet
Author: User

Base sort this article will mention later, I think it is a bit nondescribable to put it together with the previous sorting algorithm. We have introduced four types of sorting methods, each of which has a basic type and an improved type. For internal sorting, we are most concerned about speed, which is also the reason why fast sorting is popular. Considering the defect of fast sorting, sometimes we may use heap sorting or hill sorting and Merge Sorting.

The above may be the most direct way of thinking for selecting the sorting method (our selection range is not very wide, so it's better to turn over and tune it, take a look at a fake card). Most of us may be able to beat the world quickly out of the thinking of gamblers-in the worst case? I'm not so unlucky, right? Big deal, add "Three in the middle" (or randomly select ). However, sometimes the speed is not everything. We also need to consider the stability of sorting.

Stability is not involved in the previous step. The main consideration is that each algorithm mentioned here should be about whether or not the "stability" is not very important to the reader's thinking, and the previous tests cannot reflect stability. Now let's put it together. The stability of sorting means that for the same keywords, whether their order has changed after sorting is completed, the original state is stable, otherwise it is unstable. In fact, for a Multi-Keyword sequence, whether the results of multiple sorting can be accumulated-whether multiple sorting can eventually reach the expected order. For example, we first sort students by student ID to get a sequence (in actual application, the initial sequence is always like this, and we don't need to sort it at all), and then sort it by score, for students with the same scores, we hope that the top student numbers will be placed in the front (Expected order), If the sorting is unstable, the sequence ranked by student ID will be damaged, and the final expected sequence will not be obtained. Note that it is "expected order". In the example of rankings, if we do not require an order of student numbers with the same score, it doesn't matter whether the sorting is stable.

What sort algorithms are stable? First, let's take a look at what caused "instability ". Note that the four methods on the front side have stable algorithms (this is the current conclusion. Don't ask me how I came from, but it is, ^_^ ), the sorting logic should not be an unstable factor. There must be a movement of records (or pointer modification) in sorting. The moving methods include translation (direct insertion sorting), switching (Bubble Sorting), and shuffling (Table insertion and merging ). After careful observation, we will find that,The exchange of records at non-adjacent locations is a cause of instability.. In this way, all sorting algorithms with this risk are unstable. If such an exchange policy is adopted for the original stable algorithms, it will also lead to instability, for example, direct sorting is stable for the linked list, but unstable for the array. However, if translation is used to replace the original exchange, so the array is also stable (it is estimated that no one is willing to change the original exchange once into a translation heap ).

In addition, for an originally stable algorithm, when the keyword judgment condition is changed, for example, if the value is greater than or equal to, it will also lead to a movement that should not be moved, so that the stable algorithm becomes unstable, however, this low-level mistake is beyond our scope of discussion-deliberately changing stability to instability without performance improvement. Who is doing this and who is ill.

After understanding the nature of stability, you can see through the base sorting.

Base sort

I was surprised to hear that the sorting method can break the lower limit of O (nlogn). Actually, after reading it, we often use it in our daily life, but we didn't notice it. We have all played the "December" qualifying game. When we draw 6, we will place it in position 6 (1st rows and 6th positions). If everything goes well, finally, we will get 12 stacked rows, which are 1, 2 ...... 12. You can see that the sorting is implemented. Let's take a look at 0 ~ In the sorting of integers within 999, assuming that numbers are not repeated, the most direct is to create an array of 1000 size a []. If it is 1, put it in a [1], if it is 400, put it in a [400]. After all the numbers are set, read again from a [0] to a [999], and the sorting is complete.

It is clear that the hash search technology is used here, and the optimal search performance of the hash list is O (1). As a whole, the above 0 ~ 999 the time complexity of allocation without repeated numbers is O (n. When there are repeated numbers, the method used to deal with conflicts here is the link address method, which makes up a linked list of all repeated numbers and hangs them at the corresponding position. Obviously, this process is just a rearrangement of the linked list, so it is stable (I can't do it if it is unstable ).

The base sorting can be completed based on the above "allocation-Collection. Discussing whether the base sorting is stable is actually a ridiculous thing, because the premise that the base sorting can work must be stable-it is the cumulative result of Multi-Keyword sorting, if an operation is unstable, the entire result is incorrect.

For a single keyword, either the allocation can be exclusive and then collected. The time complexity is O (n + r) (the final collection process o (R); or too much additional storage, you can also split it into multiple keywords and store them as index drops (not rising ^_^). Naturally, you need to allocate and collect multiple times. Because the high-level keywords determine the final sequence of the sequence, the high-level allocation and collection must be performed at the end. The base sorting is generally based on LSD (ranking first.

In addition, don't think of hundreds, ten, and single-digit decomposition as soon as you see the sorting of Integers by the base number. Note that the concept of "base number" allows you to describe the number of base numbers, for example, the "1000 hexadecimal" value from 1000 to 1 ". The routine is not given, because the constraints of the base sorting are too harsh.

External sorting

This may seem mysterious, but as long as you know the role of "merging" segments in order to change the entire segment in order, you can understand that such tasks can also be completed, and the rest is how to improve efficiency.

When it comes to disks, we always think that the "memory operation time" is much smaller than the "disk reading time". However, the current technology makes disk reading time shorter and shorter, my own feeling is that the sorting of 40 MB integers is not faster than reading 40 MB of content from the hard disk (the sorting of 10 million out-of-order integers on my machine is 18 s, it may be that my algorithm is not well written ). But the second way to increase the speed is to reduce the information traffic of unnecessary slow devices, like cache in memory and memory in hard disk. All of the methods we can think of to increase the outbound speed are nothing more than reducing the information traffic of memory and external storage. The technologies used here include increasing the number of merging paths, increasing the length of initial merging segments, and optimizing the merging Tree.

However, for more than 1000 pieces of data, we never manage it by ourselves. It's all about borrowing databases. That is to say, if you do not go to the database, it is estimated that you will not be able to use the outer row. If you write to the database, the outer row is nothing more than a cool.

Compared with the common internal ranking, the external sorting may not be a technology that must be mastered. The goal of learning it should be to provide us with an idea of how to solve the problem of "insufficient memory, and how to improve the external storage performance.

If you have never seen a simulated routine, you will not be ugly.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.