Bucket sorting and base sorting

Source: Internet
Author: User

Basic Ideas of Bucket sorting

Assume that there is a set of key word sequences K [1... n] with a length of N. First, divide the sequence into M subintervals (buckets ). Then, based on a ing function, map the key word K of the column to the bucket I (I .e. the subscript I of the bucket array B ), then K is used as an element in B [I] (each bucket B [I] is a sequence of N/m sizes ). Then, all the elements in bucket B [I] are sorted by comparison (you can use the Quick Sort ). Then, all the content in B [0]... B [m] is an ordered sequence.

Suppose that the columns to be sorted are K = {49, 38, 35, 97, 76, 73, 27, 49 }. All the data is between 1. Therefore, we can customize 10 buckets and then determine the ing function f (K) = K/10. Then the first keyword 49 is located in 4th buckets (49/10 = 4 ). In turn, all the keywords are stacked into the bucket and sorted quickly in each non-empty bucket.

Bucket sorting Cost Analysis

Bucket sorting uses the ing relationship of functions to reduce almost all the comparisons. In fact, the function of calculating the F (k) value of Bucket sorting is equivalent to partitioning in the fast sorting, and a large amount of data has been divided into basic ordered data blocks (buckets ). Then, you only need to sort a small amount of data in the bucket.

 

The time complexity of sorting n keywords in buckets is divided into two parts:

(1)The bucket ing function for each keyword is calculated cyclically. the time complexity is O (n ).

(2) Sort all data in each bucket using advanced comparative sorting algorithms. The time complexity is Σ O (Ni * logni ). Ni indicates the data volume in the I bucket.

 

Obviously, part (2) is the deciding factor of the bucket sorting performance. Minimizing the number of data in a bucket is the only way to improve efficiency (because the best average time complexity based on comparative sorting can only reach O (N * logn ). Therefore, we need to do the following:

(1) The ing function f (k) can evenly allocate n data records to M buckets so that each bucket has [N/m] data records.

(2) Increase the number of buckets as much as possible. In extreme cases, each bucket can only obtain one data, thus completely avoiding the "Compare" sorting operation of the data in the bucket. Of course, it is not easy to do this. When the data volume is huge, the F (k) function will cause a large number of Bucket sets and a serious waste of space. This is a trade-off between the time and space costs.

 

For n data to be sorted, M buckets, the average time complexity of sorting for each bucket [N/m] data is:

O (n) + O (M * (N/m) * log (N/m) = O (N + N * (logn-logm )) = O (N + N * logn-N * logm)

When N = m, that is, at the limit, each bucket has only one data. The optimal efficiency of Bucket sorting can reach O (n ).

 

Summary: BucketThe average time complexity of sorting is linear O (N + C), where C = N * (logn-logm ). If the bucket quantity m is larger than the same N, the efficiency is higher, and the best time complexity is O (n ). Of course, the bucket sorting space complexity is O (n + M ),If the input data is huge and the number of buckets is large, the space cost is undoubtedly expensive. In addition, the bucket sorting is stable.

I personally have another feeling: In the search algorithm, the best time complexity of the Search Algorithm Based on comparison is O (logn ). Such as semi-query, balanced binary tree, and red/black tree. However, the hash table has an O (c) linear query efficiency (in case of no conflict, the query efficiency reaches O (1 )). Let's take a good look at it: Is there a perfect combination of hash table ideas and bucket sorting?

Base sort

The above problem is the sorting of multiple keywords, but this method can still be used for single keywords.

For example, the string "ABCD" "aesc" "dwsc" "rews" can regard each character as a keyword. In addition, integers 425, 321, 235, and 432 can also be digits in each digit as a keyword.

 

The principle of base sorting is to distribute buckets of each group of keywords in the data to be sorted in sequence.. For example, the following columns to be sorted:

278, 109, 063, 930, 589, 184, 505, 269, 008, 083

We divide the single digit, ten digits and a hundred digits into three keywords: 278-> K1 (single digit) = 8, K2 (ten digits) = 7, K3 = (hundred digits) = 2.

Then, the bucket is allocated for the K1 Keywords of all data starting from the first digit (starting from the last keyword) (because each number is 0-9, the bucket size is 10 ), output the data in the bucket in sequence to obtain the following sequence.

930, 063, 083, 184, 505, 278, 008, 109, 589, 269

Then the above sequence is allocated to the bucket for K2, and the output sequence is:

505, 008, 109, 930, 063, 269, 278, 083, 184, 589

Finally, for the bucket allocation of K3, the output sequence is:

008, 063, 083, 109, 184, 269, 278, 505, 589, 930

 

Performance Analysis

Obviously,The performance of base sorting is slightly worse than that of Bucket sorting. The time complexity of O (n) is required for each keyword bucket allocation, and the time complexity of O (n) is required for obtaining a new keyword sequence after allocation. If the data to be sorted can be divided into D keywords, the time complexity of the base sorting will be O (D * 2n). Of course, D is much smaller than N, so it is basically linear. The spatial complexity of base sorting is O (n + M), where M is the number of buckets. Generally, N> M. Therefore, about N additional spaces are required.

 

However, compared to bucket sorting, the number of buckets required for each base sorting is not large. In addition, base sorting does not require any "Compare" operation. When the bucket sorting is relatively small, multiple data in the bucket must be sorted by comparison operation. Therefore, in practical applications, the application scope of base sorting is wider.

Bucket sorting and base sorting

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.