Introduction to algorithms-Part 2: Base sorting and bucket sorting

Source: Internet
Author: User

In the past, when I learned the data structure of Yan Weimin, I saw the base sorting, but when I sorted it by bit, I still don't know if I can finally discharge all the correct values, it is to sort the keywords according to the need, and the final part will not be paid attention.

Base sorting is a sort of distribution. The basic idea is that the sorting process does not need to compare keywords, but is sorted by the "Allocation" and "Collection" processes. Their time complexity can reach linear O (n ). What the base sorting does is to sort the N bits separately. Intuitively, people may think they should first sort by the highest valid bit, but this is contrary to our intuition. The base sort first sorts the lowest valid digit. If we compare R bits each time, we need to perform B/R operations, and O (N + 2 ^ r) is required for counting and sorting each trip ), the total time complexity is O (B/R (n + 2 ^ r )).

For details about how to demonstrate the base sorting effect, refer to: http://blogimg.chinaunix.net/blog/upfile/070912120349.swfanimation, and

Http://www.cnblogs.com/xiaosuo/archive/2010/03/12/1654353.html.

Pseudocode in the book:

The problem is that, according to the int value, there are five digits in total, each digit is 0-9, and I am sorry to use other stable sorting. Change the Count decisively.

The implementation code is as follows:

// ================================================ ==========================================/// Name: radix. CPP // Author: Xia // copyright: NUAA // description: implementation of base sort ///================================================ =========================# include <iostream> # include <vector> # include <ctime> # include <iomanip> # include <fstream> using namespace STD; const int max = 200; // number of digits const int num = 3; // number of digits, you can also obtain void radixsort (vector <int> & V, vector <int> & B) {int I, j; // I, j loop traversal variable int rate = 1, temp = 0; // rate changes to count every bit, temp is the number of each digit for (I = 1; I <= num; I ++, rate * = 10) {// sort the I-bit vector <int> C (10); // in the loop, each Initialization is 0for (j = 1; j <= 9; j ++) C [J] + = C [J-1]; for (j = 0; j <Max; j ++) {temp = (V [J]/rate) % 10; // The Position C [temp] ++ in the current row; // C [I] contains the number equal to I} For (j = 1; j <10; j ++) C [J] + = C [J-1]; for (j = MAX-1; j> = 0; j --) {temp = (V [J]/rate) % 10; C [temp] --; B [C [temp] = V [J];} // The above is the allocation, basically the same as the Count sorting. // at this time, B is stored as the number of sorted orders for (j = 0; j <Max; j ++) // collect V [J] = B [J] ;}} int main (INT argc, char ** argv) {vector <int> V; int I; srand (unsigned) Time (null); for (I = 0; I <Max; I ++) v. push_back (RAND () % 1000); // sets the number of generated vectors to 3 <int> result (V); // Save the sorting result radixsort (v, result ); for (I = 0; I <Max; I ++) {cout <SETW (5) <result [I]; If (I + 1) % 15 = 0) cout <Endl ;}cout <Endl; return 0 ;}

The running result is as follows:

What is the speed of base sorting and fast ranking?

Although there are common cases where B = O (lgn) and the running time of the base sorting is round (N), it looks better than the average round (nlgn) of the fast sorting. However, the constant factors hidden in the scalar symbol are different. For the N keywords to be processed, although the number of times the base sort is executed is less than the number of times in the fast sorting, the time for each time is longer, so which sort is better,Depends on the implementation features of the underlying machine(For example, kubernetes can use hardware caches better than the base ),It also depends on the input data.. In addition, counting sorting is used as the base sorting for intermediate stable sorting instead of in-situ sorting, while many sort (nlgn) Comparison sorting algorithms can achieve in-situ sorting. Therefore, when memory capacity is precious, in-situ sorting algorithms such as fast sorting may be more valuable.

Take my machine as an example. In the vc6 compiler, 1457 is faster than 870, and the base sorting is. The speed is not obvious.

In addition, you can refer to a C-language version: required /? P = 2378.

This is now the case. I have never written a base sorting statement, but I don't think it's as good as counting it directly. Maybe it's applicable, like playing cards. in actual production and life.

Sort buckets

I personally feel that it is only a two-dimensional extended version of counting sorting. The basic idea is to divide the interval [0, 1] into N subintervals of the same size, also called buckets, and then distribute N inputs to buckets. Because the input is even, there is usually not a large number of buckets. When outputting, sort each bucket first, and then output each bucket element in sequence. Its pseudo code is as follows:

Reference http://www.bioisland.com/Algorithm/ShowArticle.asp? ArticleID = 161. You can use a linked list or a two-dimensional array for each bucket. The difference from counting sorting, personal feeling, is that counting sorting is one-dimensional expansion, each element is a bucket; the bucket sorting is an even sub-space, usually by 10, in this way, each bucket must be in one dimension. Therefore, it can only be an extension of counting sorting (personal understanding ). Bytes.

Analysis of Bucket sorting, can refer to: http://anwj336.blog.163.com/blog/static/8941520920109535025216/, exercise 8.4-2, bucket sorting worst case running time, in the number of bucket only 1, degraded to insert sorting, O (N ^ 2), if we need to ensure the linear time while ensuring the worst case is O (nlgn ), change the sorting in each bucket to the worst O (nlgn) algorithm, such as Merge Sorting. Of course, when the number of buckets continuously increases, when the number of buckets is equal to the maximum value of elements, it is counted and sorted. (Speaking of this, if there is a big gap between the largest element and the second largest element in counting and sorting, it will cause a great waste of space. Maybe this is the key to bucket sorting, fully saves space ).

For more information about exercise 8-3, see http://blog.csdn.net/zhanglei8893/article/details/6285689. skip this step.

Cainiao goes on ~~~

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.