In-depth analysis of the bucket Sorting Algorithm and JavaScript code implementation on Node. js, node. jsjavascript
1. Introduction to bucket sorting
Bucket sorting is a counting-based sorting algorithm that distributes data to a limited number of buckets, then sort each bucket separately (it is possible to use another sort algorithm or continue to use the bucket sort in the progressive manner ). When the values in the data to be sorted are evenly distributed, the sort time complexity of the bucket is ordered (n ). Bucket sorting is different from quick sorting. It is not compared sorting, and is not affected by the lower limit of time complexity O (nlogn.
Sort buckets by the following four steps:
(1) set a fixed number of empty buckets.
(2) Place the data in the corresponding bucket.
(3) sort the data in each non-empty bucket.
(4) splice data from a bucket that is never empty and obtain the result.
Bucket sorting is mainly applicable to small-range integer data with independent and even distribution. It can calculate a large amount of data and conform to the expected linear time.
2. Bucket Sorting Algorithm demonstration
For example, there is a set of data [7, 36, 65, 56, 33, 60,110, 42, 42, 94, 59, 22, 83, 84, 63, 77, 67,101], how to sort it in ascending order?
Procedure description:
(1) set the number of buckets to 5 empty buckets. Locate the maximum value 110 and the minimum value 7. The range of each bucket is 20.8 = (110-7 + 1)/5.
(2) traverse the original data and put it in the corresponding bucket in the linked list structure. Number 7: The bucket index value is 0. The calculation formula is floor (7-7)/20.8), the number is 36, and the bucket index value is 1, calculation Formula floor (36-7)/20.8 ).
(3) When you insert data to the bucket with the same index for the second time, determine the size of the existing number in the bucket and the newly inserted number by left to right, insert in ascending order. For example, in a bucket with an index of 2, when 63 values are inserted, the bucket contains four numbers: 56, 59, 60, and 65. Then, the number 63 is inserted to the left of 65.
(4) Merge buckets that are not empty, and merge buckets 0, 3, 4 in the left-to-right order.
(5) obtain the bucket sorting Structure
3. Nodejs Program Implementation
It is not difficult to implement a mature algorithm like bucket sorting by myself. According to the above ideas, I wrote a simple program implementation. I think the most troublesome part is using Javascript to operate the linked list.
The actual code is as follows:
'Use strict '; //////////////////////////////////////// ///// // sort buckets /////////////////////////// /// // var _ this = this, L = require ('linklist'); // linked list/*** sort ordinary array buckets, synchronize ** @ param arr Array integer Array * @ param num Number of buckets ** @ example: * sort, 1], 5) * sort ([,], 5) */exports. sort = function (arr, count) {if (arr. length = 0) return []; c Ount = count | (count> 1? Count: 10); // determines the maximum and minimum values. var min = arr [0], max = arr [0]; for (var I = 1; I <arr. length; I ++) {min = min <arr [I]? Min: arr [I]; max = max> arr [I]? Max: arr [I];} var delta = (max-min + 1)/count; // console. log (min + "," + max + "," + delta); // initialize the bucket var buckets = []; // store the data to the bucket for (var I = 0; I <arr. length; I ++) {var idx = Math. floor (arr [I]-min)/delta); // bucket index if (buckets [idx]) {// non-empty bucket var bucket = buckets [idx]; var insert = false; // insert markstone L. reTraversal (bucket, function (item, done) {if (arr [I] <= item. v) {// less than, insert L on the left. append (item, _ val (arr [I ]); Insert = true; done (); // exit traversal}); if (! Insert) {// greater than, insert L. append (bucket, _ val (arr [I]) ;}} else {// empty bucket var bucket = L. init (); L. append (bucket, _ val (arr [I]); buckets [idx] = bucket; // Linked List Implementation} var result = []; for (var I = 0, j = 0; I <count; I ++) {L. reTraversal (buckets [I], function (item) {// console. log (I + ":" + item. v); result [j ++] = item. v ;}) ;}return result ;}// linked list storage object function _ val (v) {return {v: v }}
Run the program:
Var algo = require ('. /index. js '); var data = [7, 36, 65, 56, 33, 60,110, 42, 42, 94, 59, 22, 83, 84, 63, 77, 67,101]; console. log (data); console. log (algo. bucketsort. sort (data, 5); // five bucket consoles. log (algo. bucketsort. sort (data, 10); // 10 buckets
Output:
[ 7, 36, 65, 56, 33, 60, 110, 42, 42, 94, 59, 22, 83, 84, 63, 77, 67, 101 ][ 7, 22, 33, 36, 42, 42, 56, 59, 60, 63, 65, 67, 77, 83, 84, 94, 101, 110 ][ 7, 22, 33, 36, 42, 42, 56, 59, 60, 63, 65, 67, 77, 83, 84, 94, 101, 110 ]
Note:
(1) sorting within a bucket can be implemented during the insertion process as described in the program; insertion can also be performed without sorting. During the merge process, sorting can be performed again, and fast sorting can be called.
(2) linked list, in the underlying Node API, there is a linked list implementation, I did not directly use, but called through the linklist package: https://github.com/nodejs/node-v0.x-archive/blob/master/lib/_linklist.js
4. Case study: bucket sorting statistics for college entrance examination scores
One of the most famous application scenarios of Bucket sorting is to count the scores of the college entrance examination. The number of candidates for the national college entrance examination in one year is 9 million, and the scores are classified as standard. The lowest score is 200, the highest score is 900, and there are no decimals. What should we do if we sort these 9 million numbers?
Algorithm analysis:
(1) If comparison-based sorting and fast sorting are used, the average time complexity is O (nlogn) = O (9000000 * log9000000) = 144114616 = 0.144 billion comparisons.
(2) If you use count-based sorting, bucket sorting, and average complexity, you can control the linear complexity. When you create 700 barrels, each bucket is divided from 200 to 900, O (N) = O (9000000) is equivalent to scanning pieces of data at a time.
We ran a program to compare a quick sort and a bucket sort.
// Generate 200,900 records, [] closed range data var data = algo. data. randomData (1000*1000,200,900); var s1 = new Date (). getTime (); algo. quicksort. sort (data); // fast sort var s2 = new Date (). getTime (); algo. bucketsort. sort (data, 700); // load to 700 barrels var s3 = new Date (). getTime (); console. log ("quicksort time: % sms", s2-s1); console. log ("bucket time: % sms", s3-s2 );
Output:
quicksort time: 14768msbucket time: 1089ms
Therefore, for the case of College Entrance Examination scoring, bucket sorting is more suitable! We use suitable algorithms in suitable scenarios, which will improve the performance of programs beyond hardware.
5. Bucket sorting Cost Analysis
BUT ....
Bucket sorting uses the ing relationship of functions to reduce almost all the comparisons. In fact, the function of calculating the f (k) value of Bucket sorting is equivalent to partitioning in the fast sorting, and a large amount of data has been divided into basic ordered data blocks (buckets ). Then, you only need to sort a small amount of data in the bucket.
The time complexity of sorting N keywords in buckets is divided into two parts:
(1) Calculate the bucket ing function for each keyword cyclically. the time complexity is O (N ).
(2) Sort all data in each bucket using advanced comparative sorting algorithms. The time complexity is Σ O (Ni * logNi ). Ni indicates the data volume in the I bucket.
Obviously, part (2) is the deciding factor of the bucket sorting performance. Minimizing the number of data in a bucket is the only way to improve efficiency (because the best average time complexity based on comparative sorting can only reach O (N * logN ). Therefore, we need to do the following:
(1) The ing function f (k) can evenly allocate N data records to M buckets so that each bucket has [N/M] data records.
(2) Increase the number of buckets as much as possible. In extreme cases, each bucket can only obtain one data, thus completely avoiding the "Compare" sorting operation of the data in the bucket. Of course, it is not easy to do this. When the data volume is huge, the f (k) function will cause a large number of Bucket sets and a serious waste of space. This is a trade-off between the time and space costs.
For N data to be sorted, M buckets, the average time complexity of sorting for each bucket [N/M] data is:
O(N)+O(M*(N/M)*log(N/M))=O(N+N*(logN-logM))=O(N+N*logN-N*logM)
When N = M, that is, at the limit, each bucket has only one data. The optimal efficiency of Bucket sorting can reach O (N ).
6. Summary
The average time complexity of Bucket sorting is linear O (N + C), where C = N * (logN-logM ). If the bucket quantity M is larger than the same N, the efficiency is higher, and the best time complexity is O (N ). Of course, the bucket sorting space complexity is O (N + M). If the input data is very large and the number of buckets is also very large, the space cost is undoubtedly expensive. In addition, the bucket sorting is stable.
In fact, I personally have another feeling: In the search algorithm, the best time complexity of the Search Algorithm Based on comparison is O (logN ). Such as semi-query, balanced binary tree, and red/black tree. However, the Hash table has an O (C) linear query efficiency (in case of no conflict, the query efficiency reaches O (1 )). Let's take a good look at it: Is there a perfect combination of Hash table ideas and bucket sorting?