Principle and collective implementation of shell sorting

Source: Internet
Author: User

Hill sorting was named by computer scientist Donald L. Shell. He discovered the hill Sorting Algorithm in 1959. Hill sorting is based on insert sorting, but a new feature is added, which greatly improves the execution efficiency of insert sorting.

Relying on this special implementation mechanism, Hill sorting performs well in sorting arrays of medium size for thousands of data items. Unlike the fast Sorting Algorithm with other time complexity O (N * logn), Hill sorting is not the best choice for sorting very large files. However, Hill sorting is much faster than the Sorting Algorithm with a time complexity of O (n2), such as selecting and inserting sorting, and it is very easy to implement. In the worst case, the execution efficiency is not much lower than the average execution efficiency.

Insert sorting: too many times of Replication

Because Hill sorting is based on insert sorting. Recall that when half of the sorting execution is inserted, the data items on the left of the tag are sorted in ascending order, while the data items on the right are not sorted in descending order. This algorithm extracts the data item referred to by the token and stores it in a temporary variable. Next, from the first unit on the left of the deleted data item, move the ordered data item to the right each time until the data items stored in the temporary variables can be inserted in an orderly manner.

The following are the problems caused by insertion sorting. Assume that a small data item is very close to the right end. Here it should be the location where the data item with a large value is located. Move this small data item to the correct position on the left, and all the intermediate data items must move one to the right. This step performs nearly n copies on each data item. Although not all data items must be moved to N locations, the data items are moved to n/2 locations on average. That is, n/2 shifts are performed n times, the total number is N2/2. Therefore, the execution efficiency of insert sorting is O (n2 ).

If a small data item can be moved to the left without having to move all the intermediate data items one by one in some way, the execution efficiency of this algorithm will be greatly improved.

N-Incremental sorting

By increasing the interval between elements in insertion sorting and sorting among these elements, the data items can be moved across spans. After these data items are sorted in a descending order, the hill sorting algorithm reduces the interval of data items and then sorts them accordingly. The interval between data items during sorting is called increment, which is often expressed by the letter H.

There are now 10 data items, with an increment of 4. The data items at locations 0, 4, and 8 have been sorted.

After sorting data items 0, 4, and 8, the algorithm moves one step to the right to sort data items 1, 5, and 9. This sorting process continues until all data items have completed 4-incremental sorting, that is, all data items with an interval of 4 have been sorted.

After 4-increment Hill sorting, the array can be viewed as composed of four sub-arrays: (, 8), (, 9), (), and ), the four sub-arrays are completely ordered. These subarrays are arranged in an staggered manner, but they are independent of each other.

Note: In this example, after 4-incremental Hill sorting is completed, all elements are less than two units different from their positions in the final ordered sequence. This is the meaning of the array "basic order", which is also the mystery of hill sorting. You can create an internal sorted data item set to minimize the workload required for sorting.

Insert sorting is very effective for sorting basic ordered arrays. If you only need to move one or two data items to sort data, the algorithm usually takes O (n) time. In this way, after the array completes 4-incremental sorting, you can perform normal insert sorting, that is, 1-incremental sorting. 4-combining incremental sorting and 1-incremental sorting is much faster than simply applying normal insert sorting without performing 4-incremental sorting.

Reduce Interval

The preceding example shows how to sort arrays with 10 data items at an initial interval of 4. For larger arrays, the start interval should be larger. Then the interval decreases until the interval changes to 1.

For example, an array containing 1000 data items may first take 364 as the increment, then 121 as the increment, 40 as the increment, 13 as the increment, and 4 as the increment, at last, the hill sorting is performed in increments of 1. The sequence used to form an interval is called an interval sequence. The given interval sequence is proposed by knuth, which is very common. A Series starts from 1 in reverse form and uses a recursive expression.

H = 3 * B + 1

. The initial value is 1.

There are other methods that can also generate an interval sequence. This problem will be discussed later. First, we will study the use of the kunth sequence for hill sorting.

In the sorting algorithm, the sequence generation formula is used in a short loop to calculate the initial interval. The H value is initially assigned to 1, and then the formula H = 3 * H + 1 is used to generate a sequence of 121,364, and so on. This process stops when the interval is greater than the array size. For an array containing 1000 data items, 1093 is too large for the seventh number in the sequence. Therefore, the sixth number of the sequence is used as the largest number to start the sorting process for 364-incremental sorting. Then, for each external loop that completes the entire sorting process, use the reverse push formula provided above to reduce the interval:

H = (h-1)/3

This inverted formula generates a reverse sequence of 364,121, 13. Starting from 364, each number is sorted as an increment. After the array is sorted in 1-increment, the algorithm ends.

Hill sorting is much faster than insert sorting. Why is it? When the H value is large, each sort of data items requires a small number of elements to be moved, but the distance between data items is very long. This is very efficient. When H is reduced by an hour, the number of elements to be moved in each sort increases, but the data items are close to the final position after sorting, which is more efficient for insertion sorting. It is the combination of the two cases that makes the efficiency of hill sorting so high.

Note that the subsequent sorting process does not cancel the previous sorting. For example, an array sorted by 40-increment is completed, and the result of sorting by 40-increment is maintained after sorting by 13-increment. If this is not the case, Hill sorting cannot be achieved.

Java code of hill sorting

Class arraysh {

Private long [] thearray;

Private int nelems;

Public arraysh (INT max ){

Thearray = new long [Max];

Nelems = 0;

}

Public void insert (long value ){

Thearray [nelems] = value;

Nelems ++;

}

Public void display (){

System. Out. Print ("A = ");

For (Int J = 0; j <nelems; j ++)

System. Out. Print (thearray [J] + "");

System. Out. println ("");

}

Public void shellsort (){

Int inner, outer;

Long temp;

Int H = 1;

While (H <= nelems/3)

H = H * 3 + 1;

While (h> 0 ){

For (outer = H; outer <nelems; outer ++ ){

Temp = thearray [outer];

Inner = outer;

While (inner> h-1 & thearray [inner-H]> = temp ){

Thearray [inner] = thearray [inner-H];

Inner-= h;

}

Thearray [inner] = temp;

}

H = (h-1)/3;

}

}

}

Other intervals

Choosing an interval sequence is a magic. So far, we have discussed how to generate an interval sequence using the formula H = H * 3 + 1. However, the application of other interval sequences has also achieved the success of different programs, which is just an absolute condition, that is, the gradually decreasing interval must be equal to 1 at the end. Therefore, the last sorting is a normal insertion sorting.

In Hill's original draft, he suggested that the initial spacing be n/2, and the sorting of each trip is simply divided into two halves. Therefore, for arrays with N = 100, the gradually decreasing interval sequence is 50, 25, 12, 6, 3, 1. The advantage of this method is that it does not need to calculate the sequence for finding the initial interval before sorting starts; instead, it only needs to divide N by 2. However, this is not the best sequence. Although this method is better than insert sorting for most data, this method sometimes reduces the running time to O (n2), which is not more efficient than insert sorting.

A deformation of this method is to divide every interval by 2.2 instead of 2. For arrays with N = 100, a sequence of 45, 20, 9, 4, and 1 is generated. This is a significant improvement over Division 2 because it avoids the worst case where time complexity is O (n2. No matter what the value of N is, some additional code is required to ensure that the final value of the sequence is 1. This produces results similar to the knuth sequence listed in the List.

Another possibility of a descending series is

If (H <5)

H = 1;

Else

H = (5 * h-1)/11;

The numbers in an interval sequence are generally considered important: that is, they do not have a common number except 1. This constraint makes the sorting of each trip more likely to maintain the effect of the sorting of the previous trip. The inefficiency that hill initially separated n/2 was due to its failure to comply with this rule.

It may also be possible to design an interval sequence that is as good as the interval sequence described above. However, no matter what the interval sequence is, it should be able to calculate quickly without reducing the algorithm execution speed.

Efficiency of hill sorting

So far, no one has been able to theoretically analyze the efficiency of hill sorting in some special cases. There are a variety of test-based evaluations to estimate its time level from O (N3/2) to O (N7/6 ).

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.