Merge sort (mergesort)

Source: Internet
Author: User

The first meeting of the thought of division and treatment when the scale of the problem can be divided, the algorithm of dividing is often very effective: continue to divide the scale of the problem until the size of the sub-problem is sufficient to urinate directly to solve, and then continuously integrate the solution of the sub-problem to obtain a larger solution, and finally get a complete solution. Merge sorting is a simple example of a divide-and-conquer algorithm.
Some people may think that fast sorting is also a divide-and-conquer algorithm, but I do not think so, because the fast sort is the first part of the solution of the big problem, then the sub-problem to complete the solution, and no integration sub-problem this step, so hard to say, the fast sort should be algorithm of "rule Division"
Simple illustration (not a bit too simple)


How to decompose? Merge sort the problem is divided into two sub-problems, namely the left half interval and the right half interval, so that the recursive function is obtained:
#define MID (i) (I >> 1)///i/2/*****************************************    function: Merge sort    Description: On interval [Low, high] Range of data sort *****************************************/void mergesort (int* Low, int* high) {    int range = high-low;/// The number of interval elements    if (range > 1)///   The sub-problem of size 1 is already solved, so only sub-problems with scale greater than 1 are processed    {        int* mid = Mid (range) + low;///Find split Point        ///Recursive solution sub Problem        mergesort (Low, mid);        MergeSort (Mid, high);        Merge (Low, Mid, high);   Merge two sub-problems, this function will be implemented    }}
It is not possible to apply tail recursion optimization because node information needs to be saved in order to perform the merge (merge) process. Readers can think about why there is no infinite recursion for the size 2 issue.
How to merge? When merging two sub-problems we know that the sub-problem corresponds to the interval is already orderly. So we can get the minimum value of the whole interval by comparing the current minimum value of the two intervals, and then we can complete the merging (similar to select Sort) by selecting the minimum value continuously. The whole process is a linear O (n), n is the number of two interval elements. However, the entire process requires an auxiliary array to hold the minimum values (which cannot always be the location of other elements), so for efficiency, first declare a large enough auxiliary array:
#define MID (i) (I >> 1)///i/2int* helper; Auxiliary array/********************************************** Function: Merge function Description: Merging ordered interval [Low, mid] and [mid, High] left is the traversal The needle right is the traversal pointer for an interval. Helper for local variable override global declaration this is done to reduce the number of lines of code time complexity: O (high-low) *************************        /void Merge (int* Low, int* Mid, int*-High, int*-left, int*-right, int* helper) {while (true) {            if (*left <= *right)///equal current subscript small priority, making the algorithm stable {* (helper++) = * (left++);    The IF (left >= mid)////right interval is empty {while (right < High) * (helper++) = * (right++);  Copy the rest of the right interval to break the past;            Jump out of the Loop (outer)}} else {* (helper++) = * (right++);                If >= high///right interval is empty {while (left < mid) * (helper++) = * (left++);  Break Jump out of Outer Loop}}} while (High > Low) * (--high) = * (--heLper); and copy it back.}/***************************************** function: Merge sort description: Data sorting time complexity for range [Low, high): O (NLGN) *************** /void Mergesortroutine (int* Low, int* high) {int range = high-low;////Interval element number if (range &G T 1)////For a size of 1 sub-problem itself is already solved, so only the size of the sub-problem of greater than 1 {int* mid = Mid (range) + low;///////To find the split point///recursive Solution sub-problem MergeS        Ortroutine (Low, mid);        Mergesortroutine (Mid, High);   Merge (Low, Mid, high, low, Mid, helper); Merge two sub-problems}}/**************************************** function: Merge sort "Shell" ****************************************/void   MergeSort (int* Low, int* high) {helper = new Int[high-low];        The auxiliary array also has a maximum number of input elements if (helper! = nullptr) {mergesortroutine (low, high);    Delete[] Helper;    Free memory} else return; Not enough space to start merge sort}

The time complexity of the merge order above is very good analysis, at most LGN layer problem, each layer is spent O (n) so is O (NLGN), and the worst and best case is the same.

Optimization about merge sort optimization or quite a lot of, here first a very good optimization: in the original merge function, the elements of the left and right intervals will be copied to the auxiliary array by size, and then one by one copied back. This process is true but does not take into account those elements that are in the right place. For example, when the left interval is empty, will the remaining elements in the right interval still need to be copied into the copy array? The answer is not needed because they are already in the right place:
/************ Functions: Optimized merge Function Description: Combined ordered interval [low, mid] and [mid, High] left is the traverse pointer of the right interval The Traversal pointer helper for the right interval overrides the global declaration for a local variable to reduce the number of lines of code time complexity: O (high-low) ****************************************** /void Merge (int* Low, int* Mid, int*-High, int*-left, int*-right, int* helper) {while (true) {if (*left &L            t;= *right)///equal to the current subscript small priority, making the algorithm stable {* (helper++) = * (left++); if (left >= mid) break;            The left interval sweeps out the outer loop directly, and the remaining elements in the right interval are in the correct position} else {* (helper++) = * (right++);                If >= high///right interval is empty {while (left < mid) * (helper++) = * (left++);  Break   Jump out of Outer Loop}}} while (Right > Low) * (--right) = * (--helper); Copy it back, but skip the left element of the right interval} 
This not only makes the code shorter, but also accelerates the program in many cases. The same can be applied to the elements of the left interval that are originally in the correct position, resulting in:
/********************************************** function: Optimized merge function Description: Combined ordered interval [low, mid] and [mid, High] right is the traversal pointer Helper overrides global declaration for local variables this is done to reduce the time complexity of code lines: O (high-low) **********************************************/void        Merge (int* Low, int* Mid, int* High, int* right, int* helper) {////////Shrink left boundary, no longer consider elements of the left interval originally in the correct position while (*low <= *right)  if (++low >= mid) return;    If the elements of the left interval are all in the correct position, then the right interval is the same, and the direct return to int* = low;   Set the left interval traversal pointer * (helper++) = * (right++);    Don't waste the results of the above cycle failure ... if (right >= high)///zone empty while (left < mid) * (helper++) = * (left++);                Copy the remainder of the left interval past else while (true) {if (*left <= *right)///Equal time subscript small priority, making the algorithm stable {                * (helper++) = * (left++); if (left >= mid) break;                The left interval sweeps out the outer loop directly, and the remaining elements in the right interval are in the correct position} else {* (helper++) = * (right++);             If >= high///right interval is empty {       while (left < mid) * (helper++) = * (left++);  Copy the left interval to the past break;   Jump out of Outer Loop}}} while (Right > Low) * (--right) = * (--helper); Copy it back, but skip the remaining elements in the right zone}
While the worst-case scenario is the same as the original merge function, most of the cases are optimized, especially when the arrays are originally ordered, and each layer simply traverses O (N/2) elements, which is more efficient than a quick sort.
The expectation of a reduced "Daoteng" number is the average of the following CNT:
int cnt = 0;  counter void merge (Int* low, int* Mid, int* High, int* right, int* helper) {while (*low <= *right) {cnt++;    One more element on the left is not involved in copying if (++low >= mid) {cnt + = High-right;        No return to the right;    }} int* left = low;    * (helper++) = * (right++);    if (right >= high) while (left < mid) * (helper++) = * (left++);                else while (true) {if (*left <= *right) {* (helper++) = * (left++);    if (left >= mid) {cnt + = High-right;                The number of elements on the right without participating in the copy break;                }} else {* (helper++) = * (right++);                    if (right >= high) {when (Left < mid) * (helper++) = * (left++);                Break   }}} while (Right > Low) * (--right) = * (--helper); Copy it back again, but skip right.The remaining elements of the interval} 
When the number of the left and right interval elements is K/2, the number of elements that are not involved in copying is the condition of the first I small element is divided on the left, and the main element of the i + 1 is divided on the side, but the K/2 large element should be considered separately. The number of elements on the right without a copy is the same as the left interval, because it is symmetric. Then we get the following series:
Here is my test data (the elements are heterogeneous and randomly arranged): you can see that this optimization reduces the extra operation by an average of O (n) times.
Make the leaves "thicker" the same merge sort can also be optimized by inserting sort, also because the small size of the data insertion sort constant small reason. And because of the introduction of insertion sort we know that the leaf width must be greater than 1, so the merge function can be simplified:
#define FACTOR 10///Leaf Width # define MID (i) (I >> 1)///i/2int* helper; Auxiliary array/********************************************** function: Optimized merge function Description: Merging ordered bands [Low, mid] and [mid, High] right is the interval Traversal pointer helper for local variable overrides global declaration this is done to reduce the number of lines of code time complexity: O (high-low) ********************************************* */void Merge (int* Low, int* Mid, int* High, int* right, int* helper) {///////Shrink left boundary, no longer consider elements of the left interval originally in the correct position while (*low <= *  right) {if (++low >= mid) return;    If the elements of the left interval are all in the correct position, then the right interval is the same, and the direct return} int* = low;   Set the left interval traversal pointer * (helper++) = * (right++);    Don't waste the results of the above cycle failure ...    Because the leaves are more than 1, so the previous two sentences will not be used.            while (true) {if (*left <= *right)///is equal when the subscript is small priority, making the algorithm stable {* (helper++) = * (left++); if (left >= mid) break;            The left interval sweeps out the outer loop directly, and the remaining elements in the right interval are in the correct position} else {* (helper++) = * (right++); If >= high///right interval empty {while (left < mid) * (helper++) = * (left++);  Copy the left interval to the past break;   Jump out of Outer Loop}}} while (Right > Low) * (--right) = * (--helper); Copy it back, but skip to the left of the right interval.}/************************************* function: Optimized insert sort Description: Data sorting time complexity for interval [low, high): O (n + inver SE) *************************************/static void Improvedinsertionsort (int* low, int* high) {for (int* cur = low; + +cur < high;     I/////is actually starting from the second element, because the first one is already ordered {int tmp = *CUR;///////temporary Save the value to be inserted int* destpos = cur;            Records the correct placement of the element that is currently being inserted, which is initialized to its original position///The first test is presented separately if (* (--DESTPOS) > tmp) {do            {* (Destpos + 1) = *destpos;     }while (--destpos >= low && *destpos > tmp);   Test whether the previous one is the target location * (Destpos + 1) = tmp; The last Test failure makes the destindex smaller than the actual 1}}}/***************************************** function: Merge sort description: Data row for range [Low, high] Order time Complexity: O (NLGN) *****************************************/void Mergesortroutine (int* Low, int* high) {int range = high-low;////The number of interval elements if (range > FACTOR)///For Size 1 sub-question The problem itself is already solved, so only sub-problems with a scale greater than 1 are processed {int* mid = Mid (range) + low;        ID);        Mergesortroutine (Mid, High);   Merge (Low, Mid, high, Mid, helper); Merge two sub-problems} else Improvedinsertionsort (low, High);} /**************************************** function: Merge sort "shell" ****************************************/void mergeSort (int   * Low, int* high) {helper = new Int[high-low];        The auxiliary array also has a maximum number of input elements if (helper! = nullptr) {mergesortroutine (low, high);    Delete[] Helper;    Free memory} else return; Not enough space to start merge sort}

Bottom-up merge sort the decomposition step of the above recursive version of the merge sort is to get the sub-problem by dividing the parent problem, but in fact the sub-problem can be directly found by us, because the identification of a sub-problem is an interval, and the interval is determined by the number of the left and right endpoints. So we can calculate the range of sub-problems we want right now:
#define MID (i) (I >> 1)///I/2#define Next_gap (i) (I <<= 1)////Next Step int* helper; Auxiliary array/********************************************** function: Optimized merge function Description: Merging ordered bands [Low, mid] and [mid, High] right is the interval Traversal pointer helper for local variable overrides global declaration this is done to reduce the number of lines of code time complexity: O (high-low) ********************************************* */void Merge (int* Low, int* Mid, int* High, int* right, int* helper) {///////Shrink left boundary, no longer consider elements of the left interval originally in the correct position while (*low <= *  right) if (++low >= mid) return;    If the elements of the left interval are all in the correct position, then the right interval is the same, and the direct return to int* = low;   Set the left interval traversal pointer * (helper++) = * (right++);    Don't waste the results of the above cycle failure ... if (right >= high)///zone empty while (left < mid) * (helper++) = * (left++);                Copy the remainder of the left interval past else while (true) {if (*left <= *right)///Equal time subscript small priority, making the algorithm stable {                * (helper++) = * (left++); if (left >= mid) break;             The left interval sweeps out of the outer loop directly, and the remaining elements in the right interval are in the correct position} else {   * (helper++) = * (right++); If >= high///right interval is empty {while (left < mid) * (helper++) = * (left++);  Copy the past break;   Jump out of Outer Loop}}} while (Right > Low) * (--right) = * (--helper); Copy it back, but skip the rest of the right interval.}/************************************************ function: Bottom-up version merge sort description: Data for range [Low, low + range) Sequencing time complexity: O (NLGN) ************************************************/void mergesortroutine (int* Low, int* high, int range ) {for (int gap = 2; MID (GAP) < range;            Next_gap (GAP)) for (int* right = low + gap, * mid = low + mid (GAP), mid < high, right + = gap, Mid + + gap) Merge (Right-gap, Mid, right > High high:right, Mid, Helper);} /**************************************** function: Merge sort "shell" ****************************************/void mergeSort (int   * Low, int* high) {helper = new Int[high-low];  The auxiliary array also saves the INPUT element count if (helper! = nullptr) {      Mergesortroutine (Low, High, high-low);    Delete[] Helper;    Free memory} else return; Not enough space to start merge sort}
But I don't know why it's so much quicker to get a recursive version on my computer, but it's strange.
PostScript if the content is wrong or there is any proposal please comment below, thank you.

Merge sort (mergesort)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.