"Algorithm" 1, quick sort

Last Update:2017-08-03 Source: Internet

Author: User

Tags benchmark

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1, the basic idea of fast sorting:

Quick sort using the idea of divide and conquer, the waiting sequence is divided into two parts by a sort of order, and some of the recorded keywords are smaller than the keywords recorded in the other part. The two parts of the records are then sequentially sorted to achieve the order of the entire sequence.

2. Three steps to quickly sort:

(1) Select Benchmark: In a pending sequence, pick an element in a certain way, as a "datum" (pivot)

(2) Split operation: divides the sequence into two sub-sequences with the actual position of the datum in the sequence. At this point, the elements on the left side of the datum are smaller than the datum, and the elements to the right of the datum are larger than the datum

(3) Quickly sort two sequences recursively, until the sequence is empty or there is only one element.

3. How to select a benchmark

For divide-and-conquer algorithm, when each partition, if the algorithm can be divided into two equal-length sub-sequence, then the efficiency of the divide-and-conquer algorithm will reach the maximum. In other words, the selection of benchmarks is important. The selection of the datum determines the length of the two two sub-sequences, which has a decisive effect on the efficiency of the whole algorithm.

Ideally, the chosen benchmark would be able to divide the ordered sequence into two equal-length sub-sequences

We introduce the methods of three selection benchmarks

Method (1): Fixed position

Thought: Takes the first or last element of a sequence as a datum

Basic Quick Sort

[CPP]View PlainCopy

int Selectpivot (int arr[],int low,int high)
{
return Arr[low]; //Select the first element of the selection sequence as a datum
}

Note: Basic quick sort selects the first or last element as the baseline. However, this is always a very bad way to deal with.

Test data:

Test data analysis: If the input sequence is random, the processing time can be acceptable. If the array is already in order, the segmentation at this point is a very bad one. Because each partition can only be ordered to reduce the sequence of one, at this time for the worst case, the rapid sorting into a bubble sort, the time complexity of θ (n^2). Moreover, it is quite common for the input data to be ordered or partially ordered. Therefore, it is very bad to use the first element as a pivot element, and to avoid this, the following two methods of obtaining a baseline are introduced.

Method (2): Randomly selected datum

Reasons for Introduction: In order to arrange the sequence is partially ordered, fixed selection of pivot to make the efficiency of the fast row, to alleviate this situation, the introduction of a randomly selected pivot

Thought: Take any element in the ordered sequence as a benchmark

randomization algorithm

[CPP]View PlainCopy

/* Randomly select the position of the pivot, between low and high */
int selectpivotrandom (int arr[],int low,int high)
{
//Where the pivot is generated
Srand ((unsigned) time (NULL));
int pivotpos = rand ()% (high-low) + low;
//Swap the element of the pivot position with the low position element, and you can call the partitioning function just like a normal fast row
Swap (Arr[pivotpos],arr[low]);
return Arr[low];
}

Test data:

Test Data analysis:: This is a relatively safe strategy. Since the position of the pivot is random, the resulting segmentation will not always result in inferior segmentation. When the entire array number is equal, the worst case is still, and the time complexity is O (n^2). In fact, the probability of a stochastic quick ordering to get a theoretical worst case is only 1/(2^n). So randomization fast sequencing can achieve the desired time complexity of O (NLOGN) for the vast majority of input data. A predecessor made a brilliant summary: "Random rapid sequencing can meet a person's lifelong personality needs." ”

Method (3): Three count (Median-of-three)

Reason for introduction: Although the probability of bad segmentation is reduced when the pivot is selected randomly, the worst case or O (n^2), to alleviate this situation, introduced the three-digit pick pivot

Analysis: The best division is to divide the sequence to be sorted into equal-length subsequence, the best state we can use the middle value of the sequence, that is, the number of N/2. However, this is difficult to figure out, and will significantly slow down the speed of sorting. The median estimate can be obtained by randomly selecting three elements and using their median value as the pivot element. In fact, randomness does not help much, so the general practice is to use the median value of the three elements on the left, right, and center positions as pivot elements. It is clear that the three-digit median split method eliminates the bad situation of pre-sorted inputs and reduces the number of comparisons of the fast rows by approximately 14%

Example: The sequence to be sorted is: 8 1 4 9 6 3 5 2 7 0

The left is: 8, the right is 0, the middle is 6.

We take three numbers here to sort the middle number as the pivot, then the pivot is 6

Note: When selecting the middle axis value, it can be selected from the middle left and right three medium selection to five elements or more elements, in general, there will be (2t+1) mean partition method (median-of-(2t+1), three mean partitioning method English is Median-of-three).

Specific idea: treat the data in low, mid, and high three positions in the sort sequence, take the data in the middle of them as a pivot, and store the pivot with 0 subscript elements.

That is, using three-digit, and 0 subscript elements to store the pivot.

/*function: Take the data in low, mid, and high three positions in the sorted sequence and select the data in the middle of them as the pivot*/intSelectpivotmedianofthree (intArr[],intLowintHigh ) {    intMID = Low + ((high-low) >> 1);//Computes the subscript of an element in the middle of an array//Selecting a pivot using the three-digit method    if(Arr[mid] > Arr[high])//target: Arr[mid] <= Arr[high]{swap (Arr[mid],arr[high]); }    if(Arr[low] > Arr[high])//target: Arr[low] <= Arr[high]{swap (Arr[low],arr[high]); }    if(Arr[mid] > Arr[low])//target: Arr[low] >= Arr[mid]{swap (arr[mid],arr[low]); }    //at this time, Arr[mid] <= Arr[low] <= Arr[high]    returnArr[low]; //hold the median value of these three positions on the low position//you can use the element of the low position as a pivot without changing the split function.}

Test data:

Test data analysis: Using the three-digit selection pivot Advantage is still obvious, but it's still not working with duplicate arrays

Optimization 1, use Insert sort when the length of the sequence to be sorted is split to a certain size.

Cause: For very small and partially ordered arrays, the quick row is better than the interpolation. When the length of the sequence to be sorted is split to a certain size, the efficiency of continuing the split is worse than the insert sort, where you can use the interpolation instead of the fast

Cutoff range: The sequence length to be sorted n = 10, although a similar result can be produced in any cutoff range between 5~20, this practice also avoids some harmful degradation scenarios. From data structure and algorithmic analysis Mark Allen weiness

[CPP]View PlainCopy

if (high-low + 1 < 10)
{
Insertsort (Arr,low,high);
return;
}//else, the normal execution of the fast row

Test data:

Test data analysis: For a random array, use three to select the pivot + interpolation, efficiency or can be improved a bit, really for the sorted array, is no use. Because the ordered sequence is ordered, each partition can only subtract one from the sorted sequence. At this point, the interpolation is not playing a role. So there is no time to see the reduction. In addition, the three-digit selection pivot + interpolate or cannot handle the repeating array

Optimization 2, at the end of a split, you can put the elements equal to the key together, continue the next split, no longer the same as the key element split

Example:

Sequence to sort 1 4 6 7 6 6 7 6 8 6

Select Pivot in three-count: Number of subscript 4 6

After conversion, the sequence to be split: 6 4 6 7 1 6 7 6 8 6

Pivot Key:6

After this partition, the result of being equal to the key element is not processed: 1 4 6 6 7 6 7 6 8 6

The next two sub-sequences are: 1 4 6 and 7 6 7 6 8 6

The result of equality with key elements after this division: 1 4 6 6 6 6 6 7 8 7

The next two sub-sequences are: 1 4 and 7 8 7

After comparison, we can see that, after a division, the elements equal to the key together, can reduce the number of iterations, efficiency will improve a lot

Process: In the process, there will be two steps

The first step, in the partitioning process, puts the key equal element into the two ends of the array

The second step, after dividing, moves the element equal to the key around the pivot.

Example:

Sequence to sort 1 4 6 7 6 6 7 6 8 6

Select Pivot in three-count: Number of subscript 4 6

After conversion, the sequence to be split: 6 4 6 7 1 6 7 6 8 6

Pivot Key:6

The first step, in the partitioning process, puts the key equal element into the two ends of the array

Results: 6 4 1 6 (Pivot) 7 8 7 6 6 6

At this point, all elements equal to 6 are placed at both ends.

The second step, after dividing, moves the element equal to the key around the pivot.

Results: 1 4 66 (pivot) 6 6 6 7 8 7

At this point, all elements equal to 6 are moved around the pivot.

After that, the 1 4 and 7 8 72 sub-sequences are in a fast row

Code

voidQSort (intArr[],intLowintHigh ) {    intFirst =Low ; intLast =High ; intleft =Low ; intright =High ; intLeftlen = 0; intRightlen = 0; if(high-low + 1 < 10) {insertsort (Arr,low,high); return; }        //One split    intKey = Selectpivotmedianofthree (Arr,low,high);//Selecting a pivot using the three-digit method             while(Low <High ) {         while(High > Low && Arr[high] >=key) {            if(Arr[high] = = key)//working with equal elements{swap (Arr[right],arr[high]); Right--; Rightlen++; } High--; } Arr[low]=Arr[high];  while(High > Low && arr[low] <=key) {            if(Arr[low] = =key)                {swap (arr[left],arr[low]); Left++; Leftlen++; } Low++; } Arr[high]=Arr[low]; } Arr[low]=key; //The end of a quick row//move the element that is the same as the pivot key around the final position of the pivot    inti = Low-1; intj =First ;  while(J < left && Arr[i]! =key)        {swap (arr[i],arr[j]); I--; J++; } I= low + 1; J=Last ;  while(J > Right && arr[i]! =key)        {swap (arr[i],arr[j]); I++; J--; } QSort (Arr,first,low-1-Leftlen); QSort (Arr,low+ 1 +rightlen,last);}

Test data:

Test data analysis: three-digit selection pivot + interpolation + aggregation of equal elements of the combination, the effect is surprisingly good.

Cause: In an array, if there are equal elements, then a lot of redundant partitioning can be reduced. This is particularly evident in the repeating array.

In fact here, the role of Plug and Play is not big.

Optimization 3: Optimizing Recursive operations

The fast-line function has two recursive operations at the end of the function, and we can use the tail-recursive optimization

Advantages: If the sequence to be sorted is extremely unbalanced, the depth of recursion will approach N, and the size of the stack is very limited, each recursive call will cost a certain amount of stack space, the more parameters of the function, the more space each time the recursion cost. After optimization, the stack depth can be reduced and the original O (n) is reduced to O (Logn), which will improve performance.

Code:

[CPP]View PlainCopy

void QSort (int arr[],int low,int. High)
{
int pivotpos =-1;
if (high-low + 1 <)
{
Insertsort (Arr,low,high);
return;
}
While (Low < High)
{
Pivotpos = Partition (Arr,low,high);
QSort (arr,low,pivot-1);
Low = pivot + 1;
}
}

Note: After the first recursion, low is useless, at which point the second recursion can use the loop instead of

Test data:

Test data analysis: In fact, this optimization compiler will optimize itself, compared to the method without optimization, the time is hardly reduced

Optimization 4: Using parallel or multithreaded processing of sub-sequences (slightly)

All the data tests:

Summary: The most efficient quick-row combination is: three-digit + interpolation + aggregation equal element, which is more efficient than the sort function in STL

Note: Because the test data is not stable, the data will only reflect the approximate situation. If there is no multiplier increase or decrease in time, only small changes, we can look at the same time.

Transferred from: http://blog.csdn.net/insistgogo/article/details/7785038

"Algorithm" 1, quick sort

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More