Deep parsing bucket sorting algorithm and node.js on JavaScript code implementation

Deep parsing bucket sorting algorithm and node.js on JavaScript code implementation _node.js

Last Update:2017-01-18 Source: Internet

Author: User

Tags hash min

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Bucket Sort Introduction
Bucket sort (Bucket sort) is a counting-based sort algorithm that works by splitting the data into a finite number of buckets and then sorting each bucket (it is possible to use another sort algorithm or to continue using bucket sorting recursively). When the values in the data to be sorted are evenly distributed, the bucket sort time complexity is theta (n). Bucket sorting is different from fast sorting, not the comparison sort, and is not affected by the time complexity O (NLOGN) lower bound.
The bucket sort is performed in the following 4 steps:
(1) Set a fixed number of empty barrels.
(2) Put the data into the corresponding bucket.
(3) The data in each bucket that is not empty is sorted.
(4) Stitching is never empty in the bucket of data, get results.
Bucket sorting, mainly suitable for small range of integer data, and independent uniform distribution, can calculate a large amount of data, and meet the linear expected time.

2. Bucket Sorting algorithm Demo
For example, now there is a set of data [7, 36, 65, 56, 33, 60, 110, 42, 42, 94, 59, 22, 83, 84, 63, 77, 67, 101], how do you sort it in order from small to large?

Procedure Description:
(1) Set the number of barrels to 5 empty barrels, find the maximum value of 110, the minimum value of 7, the range of each barrel 20.8 = (110-7+1)/5.
(2) to traverse the original data, to the linked list structure, put in the corresponding bucket. The number 7, the bucket index is 0, the formula is floor ((7–7)/20.8), the number 36, the barrel index value is 1, and the formula floor ((36–7)/20.8) is calculated.
(3) When the data is inserted into the same index bucket for the second time, the size of the existing number in the bucket and the new inserted number is inserted in the order of left to right, from small to large. For example, a bucket with an index of 2, 4 digits 56,59,60,65 in the barrel when inserted 63 o'clock, and the number 63, inserted to the left of 65.
(4) Combine 0,1,2,3,4 barrels, and combine them in order from left to right.
(5) Get the structure of the bucket sort

3. Nodejs Program Implementation
like bucket sorting this mature algorithm, it is not difficult to achieve their own, according to the above ideas, I wrote a simple program implementation. I feel the most troublesome part of this is using JavaScript to manipulate the list. The
Real-world code is as follows:

' Use strict '; Bucket Sort/////////////////////////////////////////////////var _ This = this, L = require (' linklist ');//List/** * Normal array bucket, sync * * @param arr array integer array * @param number of num Buckets * * @exam PLE: * Sort ([1,4,1,5,3,2,3,3,2,5,2,8,9,2,1],5) * Sort ([1,4,1,5,3,2,3,3,2,5,2,8,9,2,1],5,0,5) * * * Exports.sort = Functio
  N (arr, count) {if (arr.length = 0) return []; Count = Count | |

  (Count > 1 count:10);
  To determine the maximum value, the minimum value of var min = arr[0], max = arr[0];
    for (var i = 1; i < arr.length i++) {min = min < Arr[i]? Min:arr[i]; max = max > Arr[i]?
  Max:arr[i];
  var delta = (max-min + 1)/count;

  Console.log (min+ "," +max+ "," +delta);

  Initialize bucket var buckets = []; Store data to bucket for (var i = 0; i < arr.length i++) {var idx = Math.floor ((arr[i)-min)/delta);//Bucket index if (Buc
      KETS[IDX]) {//non-empty bucket var bucket = Buckets[idx]; var insert = false;//Insert Standard stone l.retraversal (bucket, FuNction (item, done) {if (Arr[i] <= item.v) {//less than, left insert L.append (item, _val (Arr[i)));
          Insert = true;
      Done ();//Exit Traversal}});
      if (!insert) {//greater than, right insert l.append (bucket, _val (arr[i));
      } else {//empty bucket var bucket = l.init ();
      L.append (Bucket, _val (arr[i)); BUCKETS[IDX] = bucket;
  Link List implementation} var result = []; for (var i = 0, j = 0; I < count i++) {l.retraversal (Buckets[i], function (item) {//Console.log (i+ ":" +ite
      M.V);
    Result[j++] = ITEM.V;
  });
return result;

 }//List storage object function _val (v) {return {v:v}}}

To run the program:

var algo = require ('./index.js ');
var data = [7, +,-------------------, 67,101
, Console.log (data);
Console.log (Algo.bucketsort.sort (data,5));//5 barrels
console.log (Algo.bucketsort.sort (data,10));//10 barrels

Output:

[7,
--------------------- [7,
--------------------- [7, 22, 33, 36, 42, 42, 56, 59, 60, 63, 65, 67, 77, 83, 84, 94, 101, 110]

What needs to be stated is:

(1) The bucket can be sorted, as described in the program, in the insert process, can also insert the unordered, in the merge process, and then sorted, you can call the speed of the sort.
(2) linked list, in node's underlying API, there is a list of the implementation, I do not use directly, but through the linklist package calls: https://github.com/nodejs/node-v0.x-archive/blob/master/ Lib/_linklist.js

4. Case: Bucket ranking statistics college Entrance examination scores
bucket Sorting is one of the most famous application scenarios, which is to count the scores of college entrance exams. A year's National college Entrance Examination examinee number is 9 million people, the score uses the standard cent, the lowest 200, the highest 900, does not have the decimal number, if this 9 million numeral sorts, how should do?
Algorithm Analysis:
(1) If using the sort based on comparison, the fast sort, the average time complexity is O (nlogn) = O (9000000*log9000000) =144114616=1.44 billion times comparison.
(2) If the use of counting based sorting, bucket sorting, average time complexity, can be controlled in linear complexity, when the creation of 700 barrels from 200 to 900 cents each barrel, O (N) =o (9000000), is equivalent to scanning 900W data.
We ran a program that contrasted a quick sort and bucket sort.

Generate 100W, [200,900] data of the closed interval
var-algo.data.randomData (1000*1000,200,900);
var S1 = new Date (). GetTime ();
Algo.quicksort.sort (data)//Quick sort
var s2 = new Date (). GetTime ();
Algo.bucketsort.sort (data,700)//loaded into 700 barrels
var s3 = new Date (). GetTime ();

Console.log ("Quicksort Time:%sms", s2-s1);
Console.log ("Bucket time:%sms", S3-S2);

Output:

Quicksort time:14768ms
Bucket time:1089ms

So, for the gaokao scoring case, the bucket sort is more suitable! We put the appropriate algorithm, in the appropriate scenario, will give the program to improve performance beyond the hardware.

5. Bucket Sorting Cost Analysis
BUT .....
Bucket ordering utilizes the mapping relationship of functions, reducing almost all comparison work. In fact, the calculation of the F (k) value of the bucket sort, which is equivalent to the partitioning of the fast row, has already segmented a large amount of data into a basically ordered data block (bucket). Then only need to do a small amount of data in the bucket to do advanced comparison sorting.
The time complexity for bucket ordering of n keywords is divided into two parts:
(1) Cyclic calculation of the bucket mapping function for each keyword, this time complexity is O (N).
(2) Using the advanced comparison sort algorithm to sort all the data in each bucket, its time complexity is ∑o (Ni*logni). Where NI is the amount of data for the first bucket.
It is clear that part (2) is the determining factor in the performance of the bucket sorting. Minimizing the amount of data in the bucket is the only way to improve efficiency (because the best average time complexity based on a comparison sort can only reach O (N*logn)). Therefore, we need to try to do the following two points:
(1) The Mapping function f (k) can allocate N data evenly to M buckets so that each bucket has a [n/m] amount of data.
(2) Increase the number of barrels as much as possible. In the extreme case, each bucket can only get one data, which completely avoids the "compare" sort operation of the data in the bucket. Of course, it is not easy to do this, the large amount of data in the case, the F (k) function will make the bucket set a large number of heavy space waste. This is a trade-off between time and space costs.
For N rows of data, M buckets, the average time complexity of the bucket order for each bucket [n/m] data is:

O (N) +o (m* (n/m) *log (n/m)) =o (n+n* (LOGN-LOGM)) =o (N+N*LOGN-N*LOGM)

When N=m, that is, when there is only one data per bucket in the limit case. The best efficiency of bucket sequencing can be achieved by O (N).

6. Summarizes the average time complexity of the
bucket ordering is linear O (n+c), where c=n* (LOGN-LOGM). If the bucket number m is larger than the same N, its efficiency is higher and the best time complexity reaches O (n). Of course the space complexity of the bucket sort is O (n+m), if the input data is very large, and the number of barrels is very large, the space cost is undoubtedly expensive. In addition, the bucket sort is stable.
In fact, I personally have a feeling: In the lookup algorithm, the best time complexity of the search algorithm based on the comparison is O (Logn). such as binary search, balance binary trees, red-black trees and so on. But the hash table has an O (C) linear level of lookup efficiency (search Efficiency of O (1) is not in conflict). We have a good experience: the idea of the hash table and bucket sort is not a piece of work of the wonderful?

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More