Big Bucket sort Cardinal sort and count sort

Last Update:2015-03-29 Source: Internet

Author: User

Tags rand

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

One: Counting sort

(1) When the input element is an integer of n 0 to K, its run time is Θ (n + k). The count sort is not a comparison sort, and the sort is faster than any comparison sort algorithm. Because the length of the array C used to count depends on the range of data in the array to be sorted (equal to the difference between the maximum and minimum values of the array to be sorted plus 1), this makes the count sort for arrays with a large data range, which requires a lot of time and memory. For example, a count sort is the best algorithm for sorting numbers between 0 and 100, but it is not appropriate to sort names alphabetically. However, the count sort can be used in the cardinality sort algorithm to sort a large array of data ranges.
(2) The steps of the algorithm are as follows:
1. Find the largest and smallest elements in the array to be sorted
2. Count the number of occurrences of the element in the array for each value I, and deposit in the item I of array C
3. Accumulate all counts (starting with the first element in C, adding each item and the previous item)
4. Reverse-Populate the target array: Place each element I in the C (i) of the new array, subtract C (i) minus 1 for each element

(3) Code

#include <ctime> #include <iostream> #include <cstdlib> #include <cstring>using namespace std; const int num_range = 100;const int rate = 10; Binary//Output the arrvoid print_arr (const int *arr,const int &n) {int i;for (i=0; i<n; i++) {if (!i) {cout << arr [i];}       Else{cout << "<< arr[i";}} printf ("\ n");} Calculates the longest number of digits int counting_digits (int *arr,const int &n) {int digits = 0;int Mymax = Arr[0];int i;for (i=1; i!=n; ++i) {if (My Max < arr[i]) Mymax = Arr[i];} while (Mymax) {++digits; Mymax/= rate;} return digits;} Sort by counting count void Counting_sort (int *ini_arr,int *sorted_arr,const int &n) {//int digits = counting_digits (arr , n); an int *count_arr = (int *) malloc (sizeof (int) *num_range) is not used, and/or equivalent to bitmap statistic number similar to int i,j,k;memset (count_arr,0,sizeof ( int) *num_range); for (i=0; i!=n; ++i) {//Count count_arr[ini_arr[i]]++;//to the actual element value as Count_arr subscript}for (i=1; i!=num_range; + +) i) {//calculate the position after sorting count_arr[i] + = count_arr[i-1];} for (i=n-1;i>=0;--i) {int index = Count_arr[ini_arr[i]]-1;//sorted subscript Sorted_arr[index] = ini_arr[i];count_arr[ini_arr[i]]--;//This is to handle duplicate data}free (Count_arr );} int main (int argc, char *argv[]) {int n;if (ARGC < 2) {n = 10;} Else{n = Atoi (argv[1]);} int I;int *arr = (int *) malloc (sizeof (int) *n), int *sorted_arr = (int *) malloc (sizeof (int) *n); Srand (Time (0)); for (i=0; i& Lt;n; i++) {Arr[i] = rand ()% Num_range;} printf ("Ini_array:");p Rint_arr (arr, n), Counting_sort (arr, Sorted_arr, N);p rintf ("Sorted_array:");p Rint_arr (sorted       _arr, n); free (arr); Free (sorted_arr); return 0;}

Two: Cardinal sort

(1) Base order
The problem above is the sorting of multiple keywords, but the single keyword can still be used this way. Cardinal Order is to sequentially assign each group of keywords in the data that you want to queue. For example, the following sequence:
278, 109, 063, 930, 589, 184, 505, 269, 008, 083
We will each number of bits, 10 bits, hundred divided into three keywords: 278, k1 (digit) =8, K2 (10-bit) =7, k3= (hundred) = 2.
then starts with the lowest bit digit (starting with the first keyword), buckets the K1 keywords for all the data (because each number is 0-9, so the bucket size is 10), The data in the output bucket is then sequentially given in the following sequence.
930, 063, 083, 184, 505, 278, 008, 109, 589, 269
and then the bucket allocation for K2 on the above sequence, the output sequence is:
505, 008, 109, 930, 063, 269, 278, 083, 184, 589
finally for the K3 bucket allocation, the output sequence is:
008, 063, 083, 109, 184, 269, 278, 505, 589, 930
(2) Performance Analysis
It is clear that the performance of the cardinality sort is slightly worse than the bucket ordering. The bucket allocation for each keyword requires an O (n) time complexity, and the time complexity of O (n) is required to obtain a new keyword sequence after allocation. If the data to be sorted can be divided into the D keyword, then the time complexity of the cardinality sorting will be O (d*2n), of course, D is much smaller than n, so it is basically linear level. The spatial complexity of the cardinality sort is O (n+m), where M is the number of buckets. Generally n>>m, so the extra space needs about about N.
However, compared to bucket sorting, cardinality sorting does not require much of the bucket at a time. And the cardinality sort almost does not require any "compare" operations, and buckets are sorted in a relatively small bucket, where multiple data in the bucket must be sorted based on the comparison operation. Therefore, in practical application, the application scope of radix sorting is more extensive.

(3) Code

#include <ctime> #include <iostream> #include <cstdlib> #include <cstring>using namespace std; const int num_range = 100;const int rate = 10; Binary//Output the arrvoid print_arr (const int *arr,const int &n) {int i;for (i=0; i<n; i++) {if (!i) {cout << arr [i];} Else{cout << "<< arr[i";}} printf ("\ n");} Calculates the longest number of digits int counting_digits (int *arr,const int &n) {int digits = 0;int Mymax = Arr[0];int i;for (i=1; i!=n; ++i) {if (My Max < arr[i]) Mymax = Arr[i];} while (Mymax) {++digits; Mymax/= rate;} return digits;} Sort by counting count void Radix_sort (int *ini_arr,const int &n) {int digits = counting_digits (ini_arr,n); int *cnt = ( int *) malloc (sizeof (int) *rate);//0-9 number of cardinality, binary int *sorted_arr = (int *) malloc (sizeof (int) *n); int i,j;int divide = 1;for (I=0;i!=digits;++i) {memset (cnt,0,sizeof (int) *rate); for (J=0;J!=N;++J) {//statistic count sorted_arr[j] = Ini_arr[j];int index = (sorted_arr[j]/ Divide)%rate;//take one as subscript cnt[index]++;} for (j=1; j!=rate; ++j) {Cnt[j] + = cnt[j-1];//Calculate new rank subscript}//very similar to count sort, reverse order for (j=n-1; j>=0;--j) {int index = (sorted_arr[j]/divide)%rate;//take one as subscript ini_arr[ CNT[INDEX]-1] = sorted_arr[j];cnt[index]--;//handles duplicate data}divide *= rate;} Free (Sorted_arr), free (CNT);} int main (int argc, char *argv[]) {int n;if (ARGC < 2) {n = 10;} Else{n = Atoi (argv[1]);} int I;int *arr = (int *) malloc (sizeof (int) *n), Srand (Time (0)), for (i=0; i<n; i++) {Arr[i] = rand ()% Num_range;} printf ("Ini_array:");p Rint_arr (arr, n), Radix_sort (arr, n);p rintf ("Sorted_array:");p Rint_arr (arr, n); free (arr); return 0;}

Three: Bucket sorting

(1) The basic idea of barrel sequencing
Suppose there is a set of pending keyword sequences of length n K[1....N]. This sequence is first divided into M-sub-intervals (buckets). Then, based on some kind of mapping function, the keyword K of the pending sequence is mapped into the first bucket (that is, the subscript I of bucket array b), then the keyword K is used as an element in b[i] (each bucket b[i) is a set of n/m series). Then you sort all the elements in each bucket b[i] (you can use a quick row). Then enumerate the output B[0] .... B[M] is an ordered sequence.
If the pending sequence k= {49, 38, 35, 97, 76, 73, 27, 49}. This data is all between 1-100. So we customize 10 buckets and then determine the mapping function f (k) =k/10. The first keyword 49 will be positioned in the 4th bucket (49/10=4). All the keywords are then piled into the bucket in turn and are quickly sorted in each non-empty bucket.
(2) Bucket sorting cost analysis
Bucket sequencing uses the mapping of functions to reduce almost all of the comparison work. In fact, the calculation of the F (k) value of the bucket order is equivalent to the division of the Fast row, and the large amount of data has been divided into the basic ordered data block (bucket). Then only a small amount of data in the bucket can be compared to the advanced sorting.
The time complexity of sorting n keywords into buckets is divided into two parts:
1) Loop calculates the bucket mapping function for each keyword, which is O (N).
2) The time complexity of ∑o (Ni*logni) is used to sort all the data in each bucket using the advanced comparison sorting algorithm. The amount of data in which NI is the first bucket.
It is clear that part (2) is the determinant of the good or bad barrel ordering. Minimizing the amount of data in the bucket is the only way to improve efficiency (because the best average time complexity based on comparison sorting can only reach O (N*logn)). Therefore, we need to try to do the following two points:
1) The Mapping function f (k) is able to evenly distribute N data into M-buckets, so that each bucket has [n/m] data volume.
2) as much as possible to increase the number of barrels. In extreme cases, each bucket can only get one data, which completely avoids the "compare" sort operation of the data in the bucket. Of course, it is not easy to do this, the large amount of data, the F (k) function will make the bucket collection of large quantities, space waste serious. This is a tradeoff between the time cost and the space cost.
For N rows of data, M buckets, the average time complexity of the bucket ordering per bucket [n/m] data is:
O (N) +o (m* (n/m) *log (n/m)) =o (n+n* (LOGN-LOGM)) =o (N+N*LOGN-N*LOGM)
When N=m is the case, there is only one data per bucket at the limit. The best efficiency of bucket sequencing can be achieved by O (N).
Summary: The average time complexity for bucket sequencing is linear O (n+c), where c=n* (LOGN-LOGM). If the number of barrels is greater than the same N, the higher the efficiency, the better the time complexity to O (n). Of course, the spatial complexity of the bucket ordering is O (n+m), if the input data is very large, and the number of barrels is also very high, then the space cost is undoubtedly expensive. In addition, the bucket sort is stable.
I personally have a feeling: in the search algorithm, the best time complexity of the comparison-based lookup algorithm is also O (Logn). such as binary search, balance binary tree, red and black trees. But the hash table has an O (C) linear level of lookup efficiency (in the case of non-conflict, the search efficiency reaches O (1)). We have a good experience: the idea of the hash table and the barrel sort is there a wonderful work?
Reference from: http://blog.csdn.net/quietwave/article/details/8008572 http://www.cnblogs.com/hxsyl/p/3214379.html

(3) Code

#include <ctime> #include <iostream> #include <cstdlib> #include <cstring> #include <cmath >using namespace Std;const int num_range = 100;const int rate = 10; binary struct node{int data;struct Node *next;};/ /Output the arrvoid print_arr (const int *arr,const int &n) {int i;for (i=0; i<n; i++) {if (!i) {cout << arr[i];} Else{cout << "<< arr[i";}} printf ("\ n");} void Destory (Node arr[]) {;} Sort by counting count void Bucket_sort (int *ini_arr,const int &n) {int i,j; Node *t_node,*cur_node; Node *arr = (node *) malloc (sizeof (node) *n); for (i=0;i!=n;++i) {arr[i].data = -1;arr[i].next = NULL;} Divide the bucket and fill in the element for (i=0;i!=n;++i) {int tmp = (int) floor (INI_ARR[I]/10), if (NULL = = Arr[tmp].next) {//cout << "test" < < Endl;t_node = (node *) malloc (sizeof (node)); t_node->data = Ini_arr[i];t_node->next = Null;arr[tmp].next = T_ node;} else//Insert Sort {if (Arr[tmp].next->data >= ini_arr[i]) {T_node = (node *) malloc (sizeof (node)); T_node->data = Ini_ Arr[i];t_noDe->next = Arr[tmp].next;arr[tmp].next = T_node;} Cur_node = arr[tmp].next;//Cur_node has been compared with the while (Cur_node->next)//starting with Arr[tmp].next {if (cur_node->next-> Data < Ini_arr[i]) {cur_node = Cur_node->next;} Else{t_node = (node *) malloc (sizeof (node)); t_node->data = Ini_arr[i];t_node->next = cur_node->next;cur_node- >next = t_node;break;//inserted}}//while to tail if (NULL = = cur_node->next) {T_node = (node *) malloc (sizeof (node)); t_node- >data = Ini_arr[i];t_node->next = Null;cur_node->next = T_node;}} End else}//End for//sorted result for (i=0;i!=n;++i) {int J = 0;cur_node = arr[i].next;//starting from Arr[tmp].next while (Cur_node)//from a Rr[tmp].next start {ini_arr[j++] = Cur_node->data;cur_node = Cur_node->next;} to the tail}destory (arr);} int main (int argc, char *argv[]) {int n;if (ARGC < 2) {n = 10;} Else{n = Atoi (argv[1]);} int I;int *arr = (int *) malloc (sizeof (int) *n), Srand (Time (0)), for (i=0; i<n; i++) {Arr[i] = rand ()% Num_range;} printf ("Ini_array:");p Rint_arr (arr, n); Bucket_sort (ARR, N);p rintf ("Sorted_array:");p Rint_arr (arr, n); free (arr); return 0;}

Four: Sketch Analysis

(1) Base order

Five: US-wen sharing

Preface

Today found to learn a lot of things ah, I learned slowly, almost no confidence. Turn an article, inspire oneself, the important is to master the good study method.

You can also refer to one of my previous articles, http://blog.csdn.net/whuslei/article/details/6661435 .

The following is the original link:http://blog.csdn.net/heiyeshuwu/article/details/6920880.

I want to learn the technology very much how to do?

Prioritize, target, and step. Technology is boundless, and as knowledge grows, it will find more of the unknown. What we can do is:

1. Focus on the skills that are of interest and the skills required to focus on the necessary skills. If the distinction is not open, you can ask yourself: do not learn this technology will now work is not going to do? (necessary skills and skills of interest, do not learn this skill is not to blow their own rice bowls?) )

2, set the goal : in a period of time (at least six months) to select a direction, deep down ( Select a direction, at least six months, otherwise, like martial arts, halfway , remember, remember!!!) ）

3, keep moving forward : when one kind of technology grasps, then starts another technology research

4, find a circle: Find a suitable for their own circle, discuss the problem of friends, Bo Friends, QQ friends, roommates and so on. ( Technical Blog, focus on Daniel's blog, looking for technical Daniel to ask questions, find the same level of people to discuss the problem, the road of technology, is lonely, so must have their own circle)

5, 360 lines, the line of the first champion, everything is difficult to start ... Need to find a lead person ...

6, Independent solve the problem: I believe that every programmer, the most exciting moment, is to find a 3-day bug finally found, and resolved. Therefore, must have their own independent thinking, independent research, independent problem-solving time, we must maintain this excitement.

Second, want to learn technology, but the current project too much work, no time to expand, has been dragged, how to do?

Living conditions and the nature of the work is doomed we can not like in schools and foreign companies have a lot of disposable time, so to pursue technological progress can only find their own time, that is, "survival in the daytime, development at night," the truth, I will generally spend the evening and the weekend to take part of the time to learn. ( or the old adage: time is like water in a sponge, ...) The survival of the competition and work to balance the task, no matter how busy, how no time, always a week to spare time, summed up, and persist. )

Third, how to expand after the selection of a technical direction?

Learning a technology is generally divided into three stages: Getting Started > Mastering > Mastery. In general, to achieve proficiency level, to achieve a minimum of one year to accumulate

Can be divided into the following steps:

1, know it : understand the basic principles of the technology and application scenarios, and carry out a simple practice. At this stage, the official document is the best textbook;

2, standing on the shoulders of giants : Find the technology industry mature open Source class library and application system for learning, intensive reading source code ;

3, in the question and answer progress : To relevant forums to learn, ask questions or answer questions, and try to solve the problem in the forum independently, www.stackoverflow.com is the best choice.

4, bowed : Practice-related technology, to achieve the degree of independent completion of medium-scale application;

5, know the reason why: In-depth understanding of the operation mechanism of the technology, peripheral technology, performance optimization, depth of knowledge;

6, the use of resources : In the company, a lot of experts, you want to know any of the technology, almost all can find the corresponding cattle, more and these people communicate, will be more effective.

Take PHP for example:
Familiar with grammar &api-> reading Zend, CI, wordpress Source, PHP Independent development project, in-depth understanding of PHP engine implementation details, extension writing, performance optimization, etc.

Four, the external interference too much, not quiet heart to learn, how to do?

When I first came to Baidu, my predecessors gave me a suggestion: turn off QQ( Note: I think some time should be turned off the browser !!) ^_^), combined with the experience of the school, I am well aware of the importance of this proposal and gladly accepted it. In fact, meditation is a prerequisite for doing a good job, turn off QQ is just a representation, the most essential is that we have to create a better learning environment. Do technology need a quiet environment and a set of methodologies, methodology can find a lot (the Analects of the first to say the best), the environment must be our own to create. So my advice is: forget QQ, moderate Entertainment, and strive to indulge oneself to work and study . Although it is not easy to do it, it is helpful to improve the efficiency and find out the learning time.

Some complex techniques involve many directions, and each direction can be decomposed in the same way as above. Technical learning can be comprehend by analogy, when you completely thoroughly understand a technology, the other to learn faster;

The above content is to us: know how to choose, learn to give up, resistant to loneliness, to withstand the temptation.

Do technology like scholarship, about how to learn, Wang Guowei has a paragraph I have been very fond of high school since:

Ancient and modern into the big business, Brainiac, not through three kinds of state: "Last night, the West wind wither Green tree." Alone on the high-rise, looking at the end of the road. "This first realm also. "Emaciated end not regret, for Iraq to eliminate people haggard." "This second realm also. "The crowd looked for him 1100 degrees, suddenly looking back, the man was in the dim place." "This third realm is also.

Today a little depressed, originally WEBX basic process familiar, but suddenly was told to switch to SPRINGMVC mode. Although the thought is similar, but the usage still has the difference, specially Spring2 later uses the annotation. The tangled place is coming! Smattering, the heart is really not practical ah! People don't always say, "know what to do, and you need to know why!" "。 So I think of a question, how should we learn to face the new knowledge?

first reaction, Google or Baidu? Yes, a lot of people first choice should be this. But the result is half good or bad . online resources, a lot of is to take you to get started , such as some small examples ah, put some code to explain the next Ah, well-written article on a little bit by bit principle Ah! Beginning to touch new things, it really takes some examples to understand, practice. But is that enough?

Not necessarily! For the use of some tools, such as SVN, you can refer to some online articles, know how to use, and what will happen after use . But like the mature framework of spring , want to find two articles on the Internet to understand it? Hard!! Especially the idea of understanding it! This time, I think the rational approach is to search for what is relevant to the Good book ! Mature technology, generally have a lot of classic books, really need to be patient to read and read. First time don't understand, it's okay. After practice, do a few small examples and then look, you will have unexpected harvest!

After reading, it is best to compare their own examples, or refer to the article on the Internet, the idea of these technologies to write down, on the one hand, test whether you really understand, on the other hand, may help others! It's been a long time and it's very helpful to myself.

The purpose of writing this article, one is to clarify the idea, I write things when the heart is very quiet, I like this feeling, and I hope to share with you my personal understanding.

"Note" The title of this article is not very good, did not really tell you how to learn, just expressed some personal understanding of learning, if not, open-minded to accept views.

Big Bucket sort Cardinal sort and count sort

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More