Probe into the optimization of timsort merging sorting algorithm and the Java implementation _java

Source: Internet
Author: User
Tags arrays assert

Brief introduction
MergeSort to have the reverse row of the input when the complexity is O (n^2), and Timsort is in this case, the mergesort to optimize the resulting, the average complexity of n*o (log n), the best case is O (n), the worst case n *o (log n). And Timsort is a sort of stability. The idea is to partition the rows first and then merge the partitions to look like the mergesort steps, but some of them are optimized for reverse and large-scale data.

The optimization idea of merging sort
Merge ordering has the following optimization methods:

As with fast sorting, you can use insert sorting or select sorting for decimal groups to avoid recursive calls.
Before the merge () call, you can determine if A[MID] is less than or equal to a[mid+1]. If so, then there is no need to merge, the array is already in order. The reason is simple, since two arrays are already ordered, then A[mid] is the maximum value of the first child array, and a[mid+1 is the smallest value of the second child array. When A[mid]<=a[mid+1], the array is ordered as a whole.
To conserve the time that the elements are copied to the secondary array, the roles of the original and secondary arrays can be exchanged at each level of the recursive invocation.
The merging process in the merge () method needs to determine whether I and J are out of bounds, that is, one half is exhausted. You can use another way to remove the code that detects if one of the halves has been exhausted. The step is to copy the second half of the array a[to aux[in descending order, and then merge from both ends. For arrays {1,2,3} and {2,3,5}, the first child array is replicated as usual, the second is copied from the back, and the element in the final aux[] is {1,2,3,5,3,2}. The disadvantage of this approach is to make the merge sort into an unstable sort. The code implementation is as follows:

void merge (int[] A, int lo, int mid, int hi, int[] aux) {for
(int k = lo; k <= mid; k++) {
  aux[k] = a[k];
for
(int k = mid + 1;k <= hi k++) {
  Aux[k] = a[hi-k + mid + 1];
}
int i = lo, j = Hi;   From both ends to the middle for
(int k = lo; k <= hi; k++)
  if (Aux[i] <= aux[j]) a[k] = aux[i++];
  else a[k] = aux[j--];
}

Steps for Timsort

Partition

The idea of partitioning is to scan an array once, the sequence of consecutive positive sequences (if sorted in ascending order), or "strict" (which guarantees the stability of the sorting algorithm) as a partition (run), if it is a reverse sequence, reverses the elements in the partition. For example
1,2,3,6,4,5,8,6,4 partitioning result is
[1,2,3,6],[4,5,8],[6,4]
Then reverse the reverse sequence.
[1,2,3,6],[4,5,8],[4,6]

Merge

Consider an extreme example, where the length of the partition is 10000,10,1000,10,10, and we would of course like to have 10 10 combined into 20, 20 and 1000 to merge into 1020 so that if you merge from left to right, Each time you use the 10000 array and go to the small array merge, the price is too high. So we can use a strategy to optimize the order of the merges.

Instance

As an example of Comparabletimsort.sort () in Java, a run stack is used to determine if a merge should be made.

    if (Nremaining < min_merge) {
      int initrunlen = countrunandmakeascending (A, lo, hi);
      Binarysort (A, lo, hi, lo + initrunlen);
      return;
    }


A sort less than min_merge (32), which is sorted directly after partitioning with a binary insertion

int minrun = Minrunlength (nremaining);
    do {
      //Find the starting position of the next partition and also flip the reverse sequence
      int runlen = countrunandmakeascending (A, lo, hi);

      Ensure that run in run stack is greater than minrun, and if the current partition is too small, remove the element from the back to make up the
      if (Runlen < Minrun) {
        int force = nremaining <= the minrun? Nremaining:minrun;
        Binarysort (A, lo, lo + force, lo + runlen);
        Runlen = Force;
      }

      Put run into the run stack
      ts.pushrun (lo, runlen);
      To determine if it should be merged, I start at the top of the stack and know that it cannot be merged
      //1 runlen[i-3] > Runlen[i-2] + runlen[i-1] 
      //2. runlen[i-2] > Runle N[i-1]
      ts.mergecollapse ();


      Lo + + Runlen;
      Nremaining-= Runlen;
    } while (nremaining!= 0);

    Merge all remaining runs to complete sort
    assert lo = hi;
    Merge the remaining run
    ts.mergeforcecollapse ();
    Assert ts.stacksize = = 1;


One of the more important functions in the look inside

/**
* If the last 2 run lengths add up to one longer than the previous, then use the middle position run and the shorter front and back run one merge
* If the last 2 run lengths add up to the previous one, then the next 2 run will be
merged
 /* * private void Mergecollapse () {while
    (StackSize > 1) {
      int n = stackSize-2;
      if (n > 0 && runlen[n-1] <= runlen[n] + runlen[n+1]) {
        if (runlen[n-1) < Runlen[n + 1])
          N--;
   mergeat (n);
      } else if (Runlen[n] <= runlen[n + 1]) {
        mergeat (n);
      } else {break
        ;//invariant is established
      }
   }
  }

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.