Base sorting introduction and Java implementation, base sorting introduction java

Last Update:2018-01-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Basic Ideas

Base sorting (RadixSort) is developed on the basis of Bucket sorting, both of which are advanced implementations of distributed sorting. The basic idea of DistributiveSort: the sorting process does not need to compare keywords, but is sorted by the "Allocation" and "Collection" processes. Their time complexity can reach the linear order: O (n ).

Base sorting is a stable sorting algorithm, but it has some limitations:

1. keywords can be decomposed.
2. The number of record keywords is small.
3. If it is a number, it is better to be unsigned. Otherwise, the ing complexity will be increased. You can sort the numbers by plus or minus.

First, let's look at the sorting of buckets (RadixSort ).

Bucket sorting is also called BinSort. The basic idea is to set several buckets and scan records to be sorted in sequence R [0], R [1],…, R [n-1]: loads all records with keywords in a certain range into the k bucket (allocation ), connect the first and end of each non-empty bucket in sequence (collect ).

For example, you need to set A pair of 52 cards that are mixed into A group by points A <2 <... <J <Q <K sorting: You need to set 13 "buckets". During sorting, each card is placed in the corresponding bucket by number of points, and these buckets are connected at the beginning and end in sequence, then we get a deck of cards in ascending order of points.

In Bucket sorting, the number of buckets depends on the value range of the keyword. Therefore, bucket sorting requires that the type of the keyword be finite. Otherwise, an infinite bucket may be required.

Under normal circumstances, it is unpredictable to store records with the same number of keywords in each bucket. Therefore, it is advisable to set the bucket type to a linked list.

To ensure the sorting is stable, binning and connection during collection must follow the principle of first-in-first-out.

For bucket sorting, the allocation process time is O (n); the collection process time is O (m) (the linked list is used to store the input records to be sorted) or O (m + n ). Therefore, the bucket sorting time is O (m + n ). If the number of buckets m is an order of magnitude O (n), the sorting time of the buckets is linear, that is, O (n ).

As mentioned above, most of the time complexity of sorting algorithms is O (n2), and some sorting algorithms are O (nlogn ). However, bucket sorting can achieve O (n) time complexity. However, the disadvantage of Bucket sorting is that, first, the space complexity is relatively high and the extra overhead is required. There is a space overhead for sorting two arrays. One is to store the array to be sorted, and the other is the so-called bucket. For example, if the value to be sorted is from 0 m-1, m buckets are required, this bucket array requires at least m space. The elements to be sorted must be within a certain range.

Base sorting is an improvement in bucket sorting, which makes the "Bucket sorting" suitable for a larger set of element values, rather than improving performance.

We still use the example of playing cards. A card consists of two keywords: color (peach

That is, two cards. If the color is different, no matter what the face value is, the card with a low color is smaller than the color height. Only in the same color, the size relationship is determined by the face value. This is the sorting of multiple key codes.

To obtain the sorting result, we will discuss two sorting methods.

Method 1: first sort the colors and divide them into four groups: plum blossom group, square group, red heart group, and black heart group. Sort each group by the nominal value, and then connect the four groups.

Method 2: First give 13 numbering groups (numbers 2, 3 ,..., a), place the card in the corresponding serial number group according to the nominal value, and divide the card into 13 heap. Then four serial numbers (plum blossom, square, red heart, and black heart) are given according to the color, and the cards in group 2 are taken out and placed in the corresponding color group respectively, then, remove the cards in group 3 into the corresponding color group ,......, In this way, the four color groups are sorted by the nominal value, and then the four color groups are connected in sequence.

Multiple key codes are sorted sequentially from the most main key codes to the least key codes or from the least bit to the most main key codes. There are two methods:

MostSignificantDigitfirst (MSD) method:

1) group by k1 first, and divide the sequence into several sub-sequences. Key codes k1 are equal in records of the same sequence.

2) then, the groups are divided into sub-groups by k2. Then, the key codes following the key codes are further grouped until the sub-groups are sorted by the least key code kd.

3) connect each group to obtain an ordered sequence. The MSD method is introduced in the ranking of cards by colors and denominations.

LeastSignificantDigitfirst (LSD) method:

1) first sort from kd, then sort the kd-1, repeat in sequence, until grouping by k1 into the smallest sub-sequence.

2) connect each sub-sequence to obtain an ordered sequence. The method 2 introduced in the sort of playing cards by color and face value is LSD.

A single keyword of the numeric or numeric type can be considered as a multi-Keyword consisting of multiple digits or characters. In this case, you can sort it by the "distribution-Collection" method, this process is called the base sorting method. The number of possible values of each number or character is called the base number. For example, the cart color base is 4 and the par value base is 13. When organizing cards, you can either sort them by suit or by par value. When sorting by colors, the red, black, square, and flowers are first divided into 4 stacks (allocated) and then stacked (collected) In this Order ), then, the cards are divided into 13 stacks (allocated) in the order of face values and stacked together (collected) In this Order. In this way, the cards are arranged in order by secondary allocation and collection.

In the "distribution-Collection" process, the ordering stability must be ensured.

The principle of base sorting is to distribute buckets for each group of keywords in the data to be sorted in sequence. For example, the following columns to be sorted:

135, 242, 192, 93, 345, 11, 24, 19

We divide the single, ten, and hundred digits of each value into three keywords, for example:

135-> k1 (single digit) = 5, k2 (ten digits) = 3, k3 (hundred digits) = 1.

Then, the bucket is allocated for the k1 Keywords of all data starting from the first digit (starting from the last keyword) (because each number is 0-9, the bucket size is 10 ), output the data in the bucket in sequence to obtain the following sequence.

(11), (242, 192), (93), (24), (135, 345), (19) (sorting starting from the most recent keyword, ignoring hundreds and ten digits, based on numbers in a single digit)

Then the above sequence is allocated to the bucket for k2, and the output sequence is:

(11, 19), (24), (135), (242, 345), (192, 93) (refer to the most recent keyword to sort the second keyword, ignore the hundred digits and a single digit, allocated by ten digits)

Finally, for the bucket allocation of k3, the output sequence is:

(011, 019, 024, 093), (135, 192), (242), (345) (refer to the second keyword to sort the highest keyword, ignore the ten and one-digit, allocated by a hundred digits)

Sorting is completed.

Java Implementation

Public void radixSort () {int max = array [0]; for (int I = 0; I <array. length; I ++) {// find the maximum value in the array if (array [I]> max) {max = array [I] ;}} int keysNum = 0; // Number of keywords. We use a single digit, ten digits, and hundreds of digits... as a keyword, the number of keywords is the maximum number of digits while (max> 0) {max/= 10; keysNum ++ ;} list <ArrayList <Integer> buckets = new ArrayList <Integer> (); for (int I = 0; I <10; I ++) {// each possible number is 0 ~ 9, so set 10 buckets. add (new ArrayList <Integer> (); // the bucket is composed of ArrayList <Integer>} for (int I = 0; I <keysNum; I ++) {// start with the most recent keyword and distribute for (int j = 0; j <array. length; j ++) {// scan all array elements and assign the elements to the corresponding bucket. // retrieve the numbers corresponding to the I + 1 digit of the element, such as 258, now we need to retrieve the ten-digit number, 258% 100 = 58,58/10 = 5 int key = array [j] % (int) Math. pow (10, I + 1)/(int) Math. pow (10, I); buckets. get (key ). add (array [j]); // put this element into the bucket with the key keyword.} // After the allocation, copy the elements in the bucket back to the array int counter = 0 in sequence; // element counter for (int j = 0; j <10; j ++) {ArrayList <Integer> bucket = buckets. get (j); // bucket while (bucket. size ()> 0) {array [counter ++] = bucket. remove (0); // copy the first element in the bucket to the array and remove it} System. out. print ("no." + (I + 1) + "round sorting:"); display ();}}

The output is as follows:

Algorithm Analysis

Initially, it seems that the execution efficiency of the base sorting seems to be unbelievable. All you need to do is copy the original data items from the array to the linked list, and then copy them back. If there are 10 data items, there are 20 copies, and this process is repeated for each bit. If you want to sort the five-digit numbers, 20*5 = 100 copies are required. If there are 100 data items, there will be 200*5 = 1000 copies. The number of copies is proportional to the number of data items, that is, O (n ). This is the most efficient sorting algorithm we see.

Unfortunately, the more data items, the longer the keyword is required. If the number of data items increases by 10 times, the keyword must be added by one (more than one round of sorting ). The number of copies and the number of data items are directly proportional to the keyword length. The keyword length can be considered as the logarithm of N. Therefore, in most cases, the execution efficiency of base sorting goes backwards to O (N * logN), which is similar to that of quick sorting.

Summary

The above is all about the base sorting introduction and Java language implementation in this article. I hope it will be helpful to you. If you are interested, you can continue to refer to other related topics on this site. If you have any shortcomings, please leave a message.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Base sorting introduction and Java implementation, base sorting introduction java

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Base sorting introduction and Java implementation, base sorting introduction java

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support