1. Bucket Sequencing Introduction
Bucket sort is a count-based sorting algorithm that works by splitting the data into a finite number of buckets and then sorting each bucket separately (it is possible to use a different sorting algorithm or to sort by using the bucket sort in a recursive way). When the values in the data to be sorted are evenly distributed, the bucket sort time complexity is θ (n). Bucket sorting differs from the quick sort, not the comparison sort, and is not affected by the time complexity O (NLOGN) lower limit.
The buckets are sorted by the following 4 steps:
- 1. Set a fixed number of empty buckets.
- 2. Put the data in the corresponding bucket.
- 3. Sort the data in each bucket that is not empty.
- 4. Stitching the data in a bucket that is never empty, to get results.
Bucket sequencing, mainly for small-range integer data, and independent evenly distributed, can calculate the amount of data is very large, and meet the linear expected time. 2. Bucket Sorting algorithm Demo
For example, there is now a set of data [7, 36, 65, 56, 33, 60, 110, 42, 42, 94, 59, 22, 83, 84, 63, 77, 67, 101], how do you sort them from small to large?
Instructions for operating the procedure:
- 1. Set the number of buckets to 5 empty buckets, find the maximum value 110, the minimum value 7, the range of each bucket 20.8 = (110-7+1)/5.
- 2. Traverse the raw data to the linked list structure and put it in the corresponding bucket. The number 7, the bucket index value is 0, the formula is floor ((7–7)/20.8), the number 36, the bucket index value is 1, the formula floor ((36–7)/20.8) is calculated.
- 3. When inserting data into a bucket of the same index for the second time, determine the number that already exists in the bucket and the size of the newly inserted number, which is inserted from left to right and from small to large. For example: A bucket with an index of 2, when inserting 63 o'clock, the bucket already has 4 digit 56,59,60,65, then the number 63, inserted to the left of 65.
- 4. Merge non-empty buckets, merging 0,1,2,3,4 barrels in left-to-right order.
- 5. Get the bucket sort structure
3. Bucket sequencing C + + code implementation
8-4. Bucket sort. cpp: The entry point that defines the console application. #include "stdafx.h" #include "stdlib.h" void Bucketsort (double* a,int N) {//List node description typedef struct node{double key; struct Node * NEXT; }node; Auxiliary array elements describe typedef struct{Node * NEXT; }head; int i,j; Head Head[10]={null}; Node * p; Node * q; node * node; for (i=1;i<=n;i++) {node= (node*) malloc (sizeof (node)); node->key=a[i]; node->next=null; p = q =head[(int) (A[I]*10)].next; if (p = = NULL) {head[(int) (A[I]*10)].next=node; Continue } while (p) {if (Node->key < p->key) break; Q=p; p=p->next; } if (p = = NULL) {q->next=node; }else{node->next=p; q->next=node; }} j=1; for (i=0;i<10;i++) {p=head[i].next; while (p) {a[j++]=p->key; p=p->next; }}}int _tmain (int argc, _tchar* argv[]) {int i; Double a[13]={0,0.13,0.25,0.18,0.29,0.81,0.52,0.52,0.83,0.52,0.69,0.13,0.16};//do not consider a[0] BucketSort (a,12); for (i=1;i<=12;i++) printf ("%-6.2f", A[i]); printf ("\ n"); return 0;}
4. Bucket Sequencing Cost Analysis
Bucket sequencing uses the mapping of functions to reduce almost all of the comparison work. In fact, the calculation of the F (k) value of the bucket order is equivalent to the division of the Fast row, and the large amount of data has been divided into the basic ordered data block (bucket). Then only a small amount of data in the bucket can be compared to the advanced sorting.
The time complexity of sorting n keywords into buckets is divided into two parts:
(1) Loop calculates the bucket mapping function for each keyword, which is an O (N) time complexity.
(2) using advanced comparison sorting algorithm to sort all the data in each bucket, the time complexity is ∑o (Ni*logni). The amount of data in which NI is the first bucket.
It is clear that part (2) is the determinant of the good or bad barrel ordering. Minimizing the amount of data in the bucket is the only way to improve efficiency (because the best average time complexity based on comparison sorting can only reach O (N*logn)). Therefore, we need to try to do the following two points:
(1) The Mapping function f (k) is able to evenly distribute N data into M-buckets, so that each bucket has [n/m] data volume.
(2) Increase the number of barrels as much as possible. In extreme cases, each bucket can only get one data, which completely avoids the "compare" sort operation of the data in the bucket. Of course, it is not easy to do this, the large amount of data, the F (k) function will make the bucket collection of large quantities, space waste serious. This is a tradeoff between the time cost and the space cost.
For N rows of data, M buckets, the average time complexity of the bucket ordering per bucket [n/m] data is:
O (N) +o (m* (n/m) *log (n/m)) =o (n+n* (LOGN-LOGM)) =o (N+N*LOGN-N*LOGM)
When N=m is the case, there is only one data per bucket at the limit. The best efficiency of bucket sequencing can be achieved by O (N).
Summary: The average time complexity for bucket sequencing is linear O (n+c), where c=n* (LOGN-LOGM). If the number of barrels is greater than the same N, the higher the efficiency, the better the time complexity to O (n). Of course, the spatial complexity of the bucket ordering is O (n+m), If the input data is very large, and the number of barrels is also very high, then the space cost is undoubtedly expensive. In addition, the bucket sort is stable.
I personally have a feeling: in the search algorithm, the best time complexity of the comparison-based lookup algorithm is also O (Logn). such as binary search, balance binary tree, red and black trees. But the hash table has an O (C) linear level of lookup efficiency (in the case of non-conflict, the search efficiency reaches O (1)). We have a good experience: the idea of the hash table and the barrel sort is there a wonderful work?
8-4. Detailed Bucket sorting algorithm