Java data structures and algorithms-Advanced sorting

Last Update:2018-07-31 Source: Internet

Author: User

Tags sorts

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hill sort Q: What is the hill sort?

A: Hill sort named after computer scientist Donald L.shell, who discovered the hill sorting algorithm in 1959.

A: The hill sort is based on the insertion sort, but adds a new feature that greatly improves the execution efficiency of the insert sort.

Q: What are the drawbacks of inserting sorting before recalling?

A: recall the simple sort of the "Insert Sort" section, when the insertion sort is executed halfway, the data item on the left side of the mark bit I is ordered, and the data item to the right of the marker bit is not ordered. The algorithm takes the data item referred to in the tag bit, stores it in a temporary variable, and then starts with the first element on the left of the data item that has just been removed, moving the ordered data item to the right one element at a time until the data item stored in the temporary variable can be interpolated in an orderly fashion.

A: Suppose a very small element is close to the right, and to move this very small element to the right position on the left, all intermediate elements must be moved to the right. This step performs nearly n copies of each element, although not all elements must move n positions, but the data item moves the N/2 position on average, which is equivalent to performing n N/2 shifts, which is a total of N2/2 replications, so the execution efficiency of the insertion sort is O (N2).

A: If you can in some way do not have to move all the intermediate data items, the smaller data items can be moved to the left, then the efficiency of the algorithm will be greatly improved.

Q: What is the principle of hill sorting?

A: Hill Sort by increasing the interval between the elements in the insertion sort, and inserting the sort in these spaced elements, the data items can be moved in a large span. When these data items have been sequenced, the hill sorting algorithm reduces the interval of the data items and then sorts them down in sequence.

A: the interval before the data item for these sorts is called increment, and is customarily denoted by the letter H.
The first step to sort an array with 10 data items is shown in increments of 4 o'clock, and the data items in positions 0, 4, and 8th are already in order.

When the 0, 4, and 8th data items are sorted, the algorithm swims one step to the right, sorting the 1, 5, and 9th data items, and the sorting process continues until all data items have been sorted with an increment of 4. The process is as follows:

After the completion of the hill sort with an increment of 4, the array can be considered to consist of 4 sub-arrays: (0, 4, 8), (1, 5, 9), (2, 6), (3, 7). These 4 sub-arrays are perfectly ordered, and the sub-arrays are staggered to each other, but independent of each other.

A: The above diagram illustrates the ordering of an array with 10 data items in increments of 4. For larger arrays, the starting interval should also be larger, and then the interval decreases until the interval becomes 1. The next step is for any size array, how to choose the interval?

Q: How to choose the interval?

A: For example, an array with 1000 data items may be preceded by 364 increments, then 121 increments, then 40 increments, then 13 increments, and then 4 increments, and the hill is ranked most in increments of 1. The sequence used to form the interval (121,40,13,4,1) is called the interval series. The interval sequence represented here is presented by Knuth.

A: The sequence begins in reverse Form 1, and is generated by a recursive expression H = 3 * H +, with an initial value of 1. The first two columns of the following table show the sequence of this formula.

A: in a sorting algorithm, the initial interval is calculated by using the generation formula of the sequence in a short loop. The H value is initially assigned a value of 1, then the formula H = 3 * H + 1 generates the sequence 1,4,13,40,121,364 and so on. This process stops when the interval is greater than the size of the array.

A: for an array with 1000 data items, the seventh digit of the sequence is 1093 too large. So use the 6th number 364 of the sequence as the largest number to start the sorting process, making the order of the increment 364. Then, each time you complete an external loop of the sort, use the formula provided earlier to reduce the interval: h = (h - 1) / 3 . This inverted formula generates an inverse sequence of 364,121,40,13,4,1. Starting with 364, sorts each number as an increment. When the array is sorted with increments of 1, the algorithm ends.

A: Example: Shellsort.java

Q: Are there any other interval sequences?

A: the selection interval sequence can be called a magic, in addition to the H = h * 3 + 1 generation interval sequence, there are other interval sequences. These intervals have only one absolute condition, that is, the gradually decreasing interval must be equal to 1 at the end, so the last order is a normal insertion sort.

A: at the very beginning, the initial interval of the hill sort is n/2, which simply divides each trip into two halves, so that the interval sequence for the array with size 100 is reduced gradually to 50,25,12,6,3,1. The benefit of this method is that you do not need to start sorting to calculate the sequence until the initial interval is found, and only divide n by 2. But this kind of proof is not the best sequence. Although this method is better than the insertion sort for most data, this method sometimes reduces the run time to O (2).

A: The code for the Flaming interval is as follows:

(51;  (1;

This method divides each interval by 2.2, not both. For n=100 arrays, a sequence 45,20,9,4,1 is generated. This is much better than 2, because this avoids some of the worst-case scenarios that cause time complexity O (N2) to occur.

A: digital coprime of interval sequences are often considered important, meaning that they do not have a convention number other than 1, and that this constraint makes it more likely to keep the previous sequence sorted. The inefficiency of the N/2 interval is attributed to the failure to follow this rule.

A: Maybe you can also design a sequence as good or even better as the interval sequence described above. However, you should be able to calculate quickly, without slowing down the algorithm execution speed.

Q: How efficient is hill sorting?

A: So far, no one has been able to theoretically analyze the efficiency of hill sequencing, except in some special cases. There are various test-based evaluations that estimate its time-level from O (N3/2) to O (N7/6).

A: The following table compares the slower insertion sort and faster quick sort, and also lists some of the estimated large O values for hill sort. Note that nx/y represents the Y-root of the X-square of N (n equals 100,N3/2 is 1003 squared, and the result is 1000). In addition (LOGN) 2 represents the square of n logarithm, which is usually co-log2n.

Partition Q: What is a partitioning algorithm?

A: Division (partitioning) is the fundamental foundation of the fast ordering that is discussed later, so it is explained as a separate section.

A: dividing data is a group of data that divides data into two groups so that all keywords are larger than a specific value, so that all the keywords are smaller than the specific value of the item in another group.

A: partitioning algorithm: When Leftpointer encounters a data item that is smaller than the pivot, it continues to move right because the position of the data item is already on the right side of the array. However, it stops when it encounters a data item that is larger than the pivot. similarly rightpointer. Two inner while loops, the first applies to Leftpointer, the second applies to Rightpointer, which controls the scanning process because the pointer exits the while loop, so it stops moving. Here is a simplified code for a scan of data items that are not in place:

 while  (leftpointer <  Right && mlarray[++leftpointer  < pivot) {}  While  (rightpointer > left  && mlarray[--rightpointer> pivot) {} swap  (leftpointerrightpointer

When both loops are exited, leftpointer and Rightpointer both point to the data item on the wrong side of the array, so the two data items are exchanged. After the interchange, continue to move the two data items. When two pointers finally meet, the partitioning process ends and exits the outer while loop.

Example: Arraypartition.java

A: The run time of the partitioning algorithm is O (N).

Quick Sort Q: What is quick sort?

A: no doubt, fast sorting is the most popular sort algorithm, because there are plenty of reasons, in most cases, fast sorting is the fastest and execution time is O (N * logn) level. The quick sort was discovered by C.a.rhoare in 1962.

A: with the introduction of the previous partitioning algorithm, it is easy to understand the quick sort. The fast sort algorithm essentially divides an array into two sub-arrays and then recursively calls itself to quickly sort each sub-array.

A: The basic recursive quick Sort algorithm code is simple, here's an example:

PublicvoidRecquicksort(IntLeft,IntRight){If(Right-Left<=0){If size is 1, it ' s already sortedReturn;}else {//size is 2 or larger //partition range int partitionindex = partitioning (leftright); //sort left side recquicksort (left partitionindex -1); //sort right side recquicksort (partitionindex Span class= "o" >+ 1right); }             /span>

There are three basic steps:
1) dividing the array or sub-array to the left and right;
2) call itself to sort the left;
3) call itself to sort the right.
After one partition, all data items on the left sub-array are smaller than the right sub-array.
As long as the left and right sub-arrays are sorted separately, the entire array is ordered.

A: How do I sort the pairs of arrays? Implemented by recursion. This method first checks whether the array contains only one data item, and if the array contains only one, then the array is ordered and the method returns immediately, which is the base value condition in the recursive process.
If the array contains two or more data items, the algorithm calls the previous partitioning () method to divide the array. method returns the subscript index of the split boundary. Dividing pivot gives the demarcation of two sub-arrays, as shown in.

After the array is divided, recquicksort () recursively calls itself, and the left part of the array is called once (sort from the data item on the leftmost to the partitionIndex-1 position), and the right part of the array is also called once (from Partitionindex + 1 to sort data items on the right position). Note Both of these recursive calls do not contain data items for the array subscript partitionindex. Why not include this data item? Does the data item labeled Partitionindex need not be sorted?

Q: what pivot (pivot) should be chosen for the division?

A: So how does the partitioning () method choose A Hub? Here are some related ideas:
1) The value of the keyword for a specific data item should be selected as the pivot: into this data item for pivot (pivot);
2) You can select any one of the data items as a hub. For simplicity, we assume that the data item at the right end of the sub-array to be divided is always selected as pivot;
3) After the partition is completed, if the hub is inserted into the boundary between the left and right sub-arrays, then the hub falls to the final position after the order.

Shows the case where the item with the keyword 36 is the pivot. Because you can't really separate an array as shown in the figure, this diagram is just an imaginary situation. So how do you move the hub to its right position?

All data items on the right sub-array can be moved to the right by one position to free up the pivot location. However, this is inefficient and unnecessary. Remember that although all data items on the right sub-array are larger than the pivot, they are not sorted yet, so they can be moved inside the right sub-array without any effect. Therefore, in order to simplify the operation of inserting the hub into the correct position, only the leftmost data item (currently 63) of the swap hub and the right sub-array is available.
This switching operation places the pivot in the correct position, i.e. between the left and right sub-arrays. 63 jumps to the far right, as shown in:

When the hub is switched to the dividing position, it falls where it should be at the end. All subsequent operations may take place on the left or right, and the hub itself will no longer move.

Example: Quicksort.java

Q: Why does performance drop to O (n2)?

A: If the data is in reverse order, then using the above program to sort, you will find that the algorithm runs quite slowly.

A: The problem is the choice of the hub, ideally, you should select the data item of the sorted value as the pivot. That is, half of the data items should be larger than the hub, and half of the data items are smaller than hubs. It is optimal for a fast sorting algorithm to have two sub-arrays of equal size. If the fast sort algorithm has to sort the first two sub-arrays divided, it will reduce the efficiency of the algorithm, because the larger sub-arrays must be divided more times.

A: The worst division of an array of n data items is a sub-array with only one data item, and the other sub-array containing N-1 data items.

A: in this case, the benefits of partitioning are gone, and the execution efficiency of the algorithm is reduced to O (N2). In addition to being slow, there is another potential problem, and when the number of partitions increases, the number of calls to the recursive method increases, and each method call increases the size of the recursive work stack that is required. If the number of calls is too many, the recursive work stack may overflow, thus causing the system to fail. So can you improve the way you choose a hub?

Q: What is the "three data entry" division?

A: The method should be simple but avoids the case of selecting the largest or smallest data item as a hub. All data items can be detected, and the actual calculation of which data item is a median data item should be the ideal hub, but because the process takes longer than the sort itself, it is not possible.

A: The tradeoff solution is to find the middle value of the first, last, and intermediate elements of the array and use it for the hub. This scheme is called "Three data entry", such as:

Finding the median data item of three data items is much faster than finding the median data item for all data items, and it also effectively avoids the opportunity to select the largest or smallest data item as a hub when the data is already in order or reversed.

A: of course, there are some very special data permutations that make the execution of three data items inefficient, but usually it's a fast and effective way to select a hub.

Java data structures and algorithms-Advanced sorting

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More