Simple TimSort and timsort
You are welcome to discuss it. please correct me if any errors exist.
If you need to reprint, please indicate the source http://www.cnblogs.com/nullzx/
1. Principle and Implementation of TimSort Sorting Algorithm in simple version
The TimSort sorting algorithm is the default Sorting Algorithm for object arrays in Python and Java. The essence of the TimSort sorting algorithm is the Merge Sorting Algorithm, which only performs a lot of Optimization on the Merge Sorting Algorithm. In daily life, the data we need to sort is usually not completely random, but partially ordered or partially reverse. Therefore, TimSort makes full use of the sorted data for merging and sorting. Now we provide a simple version of TimSort sorting algorithm, which is mainly optimized as follows:
1.1
Use the original ordered Fragment
First, specify the minimum merging length..Check the original ordered fragment in the array. If the sorted length is smaller than the minimum merged length, then, the sorted segments are expanded by inserting the sorting (This avoids merging fragments with a smaller length because the efficiency is relatively low ). Add the starting index position and the ordered length of the ordered fragment to the stack.
1.2
Avoid merging a long ordered segment and a small ordered segment because the efficiency is relatively low:
(1) If there are at least three ordered sequences in the stack, we use X, Y, and Z to represent three existing Sequence Fragments from the top of the stack to the bottom, when the length of the three conditions is X + Y> = Z, merge.
(1.1) If X is the maximum length among the three, first extract X, Y, and Z from the stack, then merge Y and Z, and then merge the results of Y and Z into the stack, last X
(1.2) Otherwise, X and Y are taken out of the stack and merged into the stack. Note: In fact, we don't actually get out of the stack. There are some skills in writing code to achieve the same effect, and the efficiency is higher.
(2) If the condition X + Y> = Z is not met or there are only two sequences in the stack, we use X and Y to represent the length of the two existing sequences from the top to the bottom of the stack in sequence, if X> = Y, merge the merged ordered fragments into the stack.
1.3
The so-called gallop
) Mode to reduce the Data Length involved in the merge
Assume that the two sorted fragments to be merged are X and Y. If the first m elements of the X segment are smaller than the first elements of the Y segment, therefore, the m elements do not need to be involved in merging, because the m elements are still located in the original position after merging. Similarly, if the last n elements of the Y segment are larger than those of the last element of X, the last n elements of Y do not need to be merged. In this way, the length of the merged array is reduced (the simple version does not do this), and the length of the data to be sorted array and the secondary array is less, thus improving the efficiency of merging.
2. Java source code
Package datastruct; import java. lang. reflect. array; import java. util. arrays; import java. util. random; import java. util. extends; public class SimpleTimSort <T extends Comparable <? Super T> {// the minimum length of the private static final int MIN_MERGE = 16; // The private final T [] a; // The secondary array private T [] aux; // use two arrays to indicate the stack private int [] runsBase = new int [40]; private int [] runsLen = new int [40]; // indicates that the top pointer of the stack is private int stackTop = 0; @ SuppressWarnings ("unchecked") public SimpleTimSort (T [] a) {this. a = a; aux = (T []) Array. newInstance (a [0]. getClass (),. length);} // T [from, to] already ordered. nelements after T [to] are inserted into the ordered sequence private void InsertSort (T [] a, int from, int to, int n) {int I = to + 1; while (n> 0) {T tmp = a [I]; int j; for (j = I-1; j> = from & tmp. compareTo (a [j]) <0; j --) {a [j + 1] = a [j];} a [++ j] = tmp; I ++; n -- ;}} // returns the maximum number of ordered segments starting from a [from]. private int maxAscendingLen (T [] a, int from) {int n = 1; int I = from; if (I> =. length) {// return 0 out of range;} if (I =. length-1) {// only one element returns 1;} // at least two elements, if (a [I]. compareTo (a [I + 1]) <0) {// ascending part while (I + 1 <=. length-1 & a [I]. compareTo (a [I + 1]) <= 0) {I ++; n ++;} return n ;}else {// descending part, here is a strict descending order, there cannot be a case of> =, otherwise the stability cannot be guaranteed while (I + 1 <=. length-1 & a [I]. compareTo (a [I + 1])> 0) {I ++; n ++;} // reverse int j = from; while (j <I) {T tmp = a [I]; a [I] = a [j]; a [j] = tmp; j ++; I --;} return n ;}} // stack private void pushRun (int base, int len) {runsBase [stackTop] = base; runsLen [stackTop] = len; stackTop ++;} // returns-1, indicating that Ordered fragment public int needMerge () {if (stackTop> 1) {// at least two run sequences int x = stackTop-2; // x> 0 indicates at least three run sequences if (x> 0 & runsLen [x-1] <= runsLen [x] + runsLen [x + 1]) {if (runsLen [x-1] <runsLen [x + 1]) {// note that runsLen [x + 1] is the largest value in runsLen [x] and runsLen [x-1] // combine runsLen [x] and runsLen [x-1] First segment runreturn -- x ;} else {return x ;}} elseif (runsLen [x] <= runsLen [x + 1]) {return x ;}else {return-1 ;}} return-1 ;} // return the first element of the last segment. the position where the previous segment should be located is private I. Nt gallopLeft (T [] a, int base, int len, T key) {int I = base; while (I <= base + len-1) {if (key. compareTo (a [I])> = 0) {I ++;} else {break;} return I ;} // return the position where the last element of the previous clip is located in the last segment. private int gallopRight (T [] a, int base, int len, T key) {int I = base + len-1; while (I> = base) {if (key. compareTo (a [I]) <= 0) {I --;} else {break;} return I;} public void mergeAt (int x) {int base1 = runsBase [x]; int len1 = runsLen [x]; int Base2 = runsBase [x + 1]; int len2 = runsLen [x + 1]; // merge run [x] and run [x + 1]. After the merge, the base does not need to be changed, length needs to change runsLen [x] = len1 + len2; if (stackTop = x + 3) {// move down the top element of the stack, eliminating the need for first-out stacks after merging, re-stack runsBase [x + 1] = runsBase [x + 2]; runsLen [x + 1] = runsLen [x + 2];} stackTop --; // fly mode, reduce the length of the merge int from = gallopLeft (a, base1, len1, a [base2]); if (from = base1 + len1) {return ;} int to = gallopRight (a, base2, len2, a [base1 + len1-1]); // the length of the two segments to be merged into the System. ar Raycopy (a, from, aux, from, to-from + 1); int I = from; int iend = base1 + len1-1; int j = base2; int jend = to; int k = from; int kend = to; while (k <= kend) {if (I> iend) {a [k] = aux [j ++];} elseif (j> jend) {a [k] = aux [I ++];} elseif (aux [I]. compareTo (aux [j]) <= 0) {// The equal sign ensures the stability of sorting. a [k] = aux [I ++];} else {a [k] = aux [j ++];} k ++ ;}/// forcibly merge the sequence private void forceMerge () {while (stackTop> 1) {mergeAt (stackTop-2) ;}// tim Public void timSort () {// n indicates the remaining length int n =. length; if (n <2) {return;} // The length to be sorted is smaller than MIN_MERGE, And if (n <MIN_MERGE) {insertSort (a, 0, 0,. length-1); return;} int base = 0; while (n> 0) {int len = maxAscendingLen (a, base); if (len <MIN_MERGE) {int abscent = n> MIN_MERGE? MIN_MERGE-len: n-len; insertSort (a, base + len-1, abscent); len = len + abscent;} pushRun (base, len ); n = n-len; base = base + len; int x; while (x = needMerge ()> = 0) {mergeAt (x );}} forceMerge ();} public static void main (String [] args) {// randomly generate test case Random rnd = new Random (System. currentTimeMillis (); boolean flag = true; while (flag) {// first generates an all-ordered array Integer [] arr1 = new Integer [1000]; for (int I = 0; I <arr1.length; I ++) {arr1 [I] = I ;}// some random values for (int I = 0; I <(int) (0.1 * arr1.length); I ++) {int x, y, tmp; x = rnd. nextInt (arr1.length); y = rnd. nextInt (arr1.length); tmp = arr1 [x]; arr1 [x] = arr1 [y]; arr1 [y] = tmp ;} // for (int I = 0; I <(int) (0.05 * arr1.length); I ++) {int x = rnd. nextInt (arr1.length); int y = rnd. nextInt (int) (arr1.length * 0.01) + x); if (y> = arr1.length) {continue;} while (x <y) {int Tmp; tmp = arr1 [x]; arr1 [x] = arr1 [y]; arr1 [y] = tmp; x ++; y --;}} integer [] arr2 = arr1.clone (); Integer [] arr3 = arr1.clone (); Arrays. sort (arr2); SimpleTimSort <Integer> sts = new SimpleTimSort <Integer> (arr1); sts. timSort (); // compare SimpleTimSort sorting with the sorting result provided by the library function. // if no result is printed, the sorting result is correct. if (! Arrays. deepEquals (arr1, arr2) {for (int I = 0; I <arr1.length; I ++) {if (! Arr1 [I]. equals (arr2 [I]) {System. out. printf ("% d: arr1 % d arr2 % d \ n", I, arr1 [I], arr2 [I]) ;}} System. out. println (Arrays. deepToString (arr3); flag = false ;}}}}
3. Notes for the TimSort Algorithm
The TimSort algorithm only merges two consecutive segments to ensure the stability of the algorithm.
The minimum merging length has a certain relationship with the stack length. If the minimum merging length is increased, the stack length should also be increased, otherwise, the stack may be out of bounds (the stack in the Code is implemented through an array of 40 ).
4. Full version of The TimSort Algorithm
In fact, the full version of the TimSort algorithm will have a lot of Optimizations in the above simple TimSort algorithm. For example, when the length of an ordered sequence is smaller than the minimum merging length, we can use a binary search method to find the position to be inserted to expand the length of the array. For example, in the flying mode, the first element of the second sequence is located at the position of the first sequence, and a small auxiliary space can be used for merging, if you are interested, you can view the source code in Java to learn.