Data structure and algorithm: the mainstream interior sorting algorithm

Last Update:2017-01-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The common algorithms in data structure mainly involve finding and sorting. Lookup operations depend primarily on the way data is organized (sequentially stored, chain list storage, tree storage, etc., the mainstream of sequential lookup, binary lookup, interpolation lookup, hash search, and so on, the operation of the comparison is simple and clear; the ranking algorithm is the hottest topic in the algorithm, and the key points of the algorithm include the time, The need for space and the stability of the order. Of course, the C + + standard library has encapsulated a large number of container classes, as well as find, sort, stable_sort and other common algorithms, engineering development in the direct introduction of the iterator parameters can be used.

First, basic knowledge

1.1 Classification of sorts

(1). Stability of the order:
If the primary keyword is consistent in the sort, for hypothetical ki=kj (i≠j), RI is ahead of RJ (that is, i<j) in the sort sequence, and if RI is still ahead of RJ after sorting, then the ordering method is stable; That is, the same key code records, after sorting the relative order of these records remain unchanged, the ranking method is said to be unstable.
(2). Inner sort (Internal sort) and out sort (External sort):
The inner sort is in the whole process of sorting, all the records to be sorted are placed in memory; the outer sort is because the number of sorts is too large to be placed in memory at the same time, the whole sort process needs to be exchanged between inside and outside for multiple data. As we know later, merge sort can handle the problem of external sorting.
(3). According to the principle of the algorithm classification
A. Insert sort: Direct insertion sort, binary insertion sort, hill sort
B. Exchange sort: bubble sort, cocktail sort, quick sort
C. Select sort: Direct Select sort, heap sort,
D. Merge sort: Merge sort
E. Allocation sort: count sort, bucket sort, cardinal sort

1.2 Complexity of the sorting algorithm

Time complexity of 1.2.1 algorithm
In the analysis of the algorithm, the total execution times of the statement T (n) is the function of the problem scale N, then the change of T (n) with N is analyzed and the order of magnitude of T (N) is determined.
The time complexity of the algorithm, which is the time metric of the algorithm, is recorded as: T (n) =o (f (n)), it indicates that as the scale of the problem increases, the growth rate of the algorithm execution time is the same as that of F (n), which is called the asymptotic time complexity of the algorithm, which is referred to as the time complexity, where f (n) is a function of the problem scale N. This notation is called the Big O notation.

1.2.2 Derivation of large O-order method
(1). Substituting constant 1 for all additive constants in the running time;
(2). In the modified run times function, only the highest order items are retained;
(3). If the highest order exists and is not 1, then the constant multiplied by the item is removed, resulting in the resulting result.

The common complexity of 1.2.3
To analyze the complexity of the algorithm, the key is to analyze the operation of the cycle structure, the common time complexity is:

O (1) constant order;
O (LOGN) logarithmic order;
O (n) linear order;
N (logn) indicates NLOGN order;
O (n^2) squared order;
O (n^3) cubic order;
O (2^n) exponential order.
The order of the above is also the order of complexity from small to large.
Ii. Summary of Classical sequencing methods

2.1 Bubble Sort

Bubble sort is an interchange sort, and its basic idea is: 22 compare the keywords of adjacent records, if the reverse order is exchanged until there is no reverse sequence of records.
(1). The most common version: After two rounds of loops, the current I and each of the following elements are compared, if not the smallest to switch to the position of I, so that each round of the external loop will be able to determine the elements of the I position, but the rest of the record sorting does not help, inefficient. (inner-loop
virtual void Do_sort (std::vector<int>& store) override {
    const size_t SZ = Store . Size ();
    for (size_t i=0 i<sz; ++i)
        for (size_t j=i+1; j<sz; ++J)
            if (Store[i] > Store[j])
               Std::swap (Store[i], store[j ]);
}
(2). Authentic Bubble Sort: each time within the loop, a close neighbor of two elements of the comparison, the smaller number of forward exchange, like bubble, make sure that the smallest value each time to reach the correct position, while the rest of the elements can be corresponding to the direction of movement.

virtual void Do_sort (std::vector<int>& store) override {
Const size_t SZ = Store.size ();
for (size_t i=0; i<sz; ++i)
for (size_t j=0; j<sz-1; ++j)
if (Store[j] > Store[j+1])
Std::swap (Store[j], store[j+1]);
}
(3). Optimization: In the above authentic bubble sort, if after a round of checks, there is no swap activity, it shows that the order has been ordered, this time you can jump straight out of the loop to indicate that the order is complete.
The total time complexity of the bubble sort is O (n^2).
Bubble sort is a stable sort method.

2.2 Cocktail sort (shaker sort)

This sort is also called bidirectional bubble sort, which is a slightly improved version of bubble sort. Ordinary bubble sort can only be traversed in one order each time from the trip. The Shaker sort, which consists of two directions per traversal, begins by traversing the two element positions where the last exchange of the records occurs, and then traversing from this position from the back forward, this two-way alternating comparison not only makes the small float up to the surface, will also make the big sink to the bottom, so the efficiency will be relatively high.

virtual void Do_sort (std::vector<int>& store) override {
Const size_t SZ = Store.size ();
size_t left=0, right=sz-1;
size_t i=0;
while (Left<right) {
The first time, from left to right
For (I=left i<right; ++i) {
if (Store[i] > Store[i+1])
Std::swap (Store[i], store[i+1]);
}
--right;
for (i=right; i>left;-I.) {
if (Store[i] < store[i-1])
Std::swap (Store[i], store[i-1]);
}
+ + left;
}
}
This sort is also a stable sort algorithm.

2.3 Simple Selection sort

The process of simple selection sequencing is to select the smallest key record from the n-i+1 record, and then to exchange it directly with the first (1≤i≤n) record by n-i The comparison between the secondary keys, and then to fill the remaining sort bit by this method.
From the simple selection process, it seems that the number of comparisons is not reduced, but the most important feature is that the exchange of mobile data is very few times, so the efficiency is higher.

virtual void Do_sort (std::vector<int>& store) override {
Const size_t SZ = Store.size ();
size_t index = 0;
For (size_t i=0 i<sz; ++i) {
index = i;
for (size_t j=i+1; j<sz; ++j)
if (Store[j] < Store[index])
index = j;

if (I!= index)
Std::swap (Store[i], store[index]);
}
}
Because of the exchange of elements, a simple choice of sort is not a stable sort algorithm.
The total time complexity of simple selection sorting is O (n^2).

2.4 Insert Sort

2.4.1 Direct Insert Sort
The basic operation of direct insertion sort is to insert a record into an ordered table that has already been sorted so as to get a new ordered table with a record number increasing by 1.
Direct insertion is a sort of insertion of the method, the actual use can be in the head of the sequence to add a sentinel, the I data placed on the Sentinel after the vacated a position, easy to move the data back, to find the vacancy after the location of the sentinel position of the original data can be inserted.

virtual void Do_sort (std::vector<int>& store) override {
std::list<int> tmp;
Auto it = Tmp.begin ();
for (auto& I:store)
{
for (it = Tmp.begin (); It!=tmp.end () && *it<i; ++it)
Continue
Tmp.insert (it, i);
}
Store.assign (Tmp.cbegin (), Tmp.cend ());
}
The time complexity of the direct insertion sort method is O (n^2).
The direct insertion sort is a stable sort algorithm.

2.4.22-Point Insertion Sort
The binary insertion sort is also called binary insertion sort, which is an improvement of the variation of the direct insertion sort in the front, which is mainly used to optimize the binary search method. The above method inserts the sort directly into the container, and the following is the simulation of the insertion sort using the traditional array method.

virtual void Do_sort (std::vector<int>& store) override {
Const size_t SZ = Store.size ();
For (size_t i=1 i<sz; ++i) {
int elem = Store[i];
int left = 0;
int right = I-1;
while (left <= right)//when left==right, also need to judge mid before or after
{
ssize_t mid = (left + right)/2; Rounding to 0.
if (Elem > Store[mid])
left = mid + 1; Must be +1-1, otherwise adjacent sequence dead loop
Else
right = Mid-1;
}
for (size_t j=i; j>left;--j)
STORE[J] = store[j-1];
Store[left] = Elem;
}
}
2.5 Hill Sort

Hill sort of idea is: the backlog sequence is divided into several subgroups, at which time the number of records to be sorted is less, the sequence can be directly inserted in these subsequence, when the whole sequence is basically orderly, the attention is only basic order, and then the whole record for a direct insertion sort.
However, the grouping here is not a simple contiguous grouping, but a sequence of "incremental/increment" is made into a subsequence, which enables the switching of jumps, so that the efficiency of the sorting is improved. As the increment decreases, the pace of the jump moves slowly, and the whole series becomes more "basically orderly". You also need to be careful to ensure that the final increment=1 to achieve the last fine sort, and then the entire sequence becomes orderly.

virtual void Do_sort (std::vector<int>& store) override {
Const size_t SZ = Store.size ();
size_t gap = sz >> 1;
while (GAP) {
All gaps in the gap is a group of elements, the first element is not sorted, so skip gap
For (size_t i=gap i<sz; ++i) {
int elem = Store[i];
int j = i;
while (J>=gap && Elem < Store[j-gap]) {
STORE[J] = Store[j-gap]; Mobile gap
J-= Gap;
}
STORE[J] = Elem;
}
Gap >>= 1;
}
}
There is no unified method for the selection of incremental/increment at present. The time complexity of hill sorting is O (n^ (3/2)).
Hill sorting is not a stable sort algorithm because the record is a jump-motion.

2.6 Heap Sort

Heap sequencing is a kind of improvement to the simple selection sort by using the data structure of the heap.
The heap structure is a complete binary tree with the following properties: The value of each node is greater than or equal to the value of its left and right child nodes, called a large top heap, or the value of each node is less than or equal to its left and right children's node, called a small top heap. The root node must be the largest (small) of all nodes in the heap, while the larger (smaller) nodes are also closer to the root node.
For a two-forked tree, according to its properties: For node N, its parents are nodes [N/2]; for node I, its left and right subtree are 2i and 2i+i.
The following is an example of a large heap method, the basic idea of which is to construct a sequence of rows into a large top heap, at which point the maximum value of the entire sequence is the root node of the top of the heap. Remove it (in fact, swap it with the end element of the heap array), at this point the end element is the maximum value), and then the remaining n-1 sequence is reconstructed into a large top heap (adjusted to make it a large top heap), so that the second decimal value in n elements is obtained. So repeated execution, you can get an ordered sequence.

void Buildheap (std::vector<int>& store, size_t curr/* Parent * *, size_t last/* Tail, inclusive * *) {
size_t Child = 2*curr + 1; Left child
int elem = Store[curr];
while (child <= last) {
The larger of the two sons
if (Child<last && store[child]<store[child+1])
++child;
if (Elem >= Store[child])
Break
Element exchange, recursively to child nodes, and the other son doesn't have to.
Store[curr] = Store[child];
Curr = child;
Child = 2*curr + 1;
}
Store[curr] = Elem;
}
virtual void Do_sort (std::vector<int>& store) override {
Const size_t SZ = Store.size ();

for (int i= ((sz-1)-1)/2; i>=0;-i)//build heap First
Buildheap (store, I, sz-1);
for (int i=sz-1; i>0;-I.) {
Std::swap (Store[0], store[i]);
Buildheap (store, 0, i-1);
}
}
It starts with L-&GT;LENGTH/2 because it needs to be handled by nodes with children. The key operation in Heapadjust is to place the maximum number of two children and the parent node recursively on the parent node from the bottom to the top, and then swap the end of the array with the maximum value.
The time complexity of building and adjusting heaps is O (logn), so the time complexity of heap sorting is O (Nlogn) in general. Because the comparison and exchange of records are carried out in leaps and bounds, heap ordering is also an unstable sort method.

2.7 Merge Sort

Merging in data structures is defined as the process of combining two or more ordered tables into a new ordered table.
The principle of merge ordering is to assume that the initial sequence n records can be regarded as N ordered subsequence, the length of each subsequence is 1 and then 22 is merged to obtain an ordered sequence with a length of 2 or 1, which is not less than N/2, and then 22 merges recursively, ..., so repeat until you get an ordered sequence of N. This sort of sorting method is called 2-way merge sort.
The specific operation is to separate the do_merge_sort, and then do_merge_merge to merge, is a very typical recursive call form.

void Do_merge_sort (std::vector<int>& store,
size_t Beg, size_t last) {
if (Beg < last) {
size_t mid = (beg + last)/2;
Do_merge_sort (store, beg, mid);
Do_merge_sort (store, mid+1, last);
Do_merge_merge (store, Beg, last, mid);
}
}
void Do_merge_merge (std::vector<int>& store,
size_t Beg, size_t, size_t mid/*included in first*/)
{
size_t index_1 = Beg, index_2 = mid+1;
size_t index_s = 0;
Std::vector<int> Tmp_vec (Last-beg + 1);
while (index_1 <= mid | | | Index_2 <=) {
if (Index_1 > mid) {
while (Index_2 <= last)
tmp_vec[index_s + +] = store[index_2++];
}
else if (Index_2 > last) {
while (Index_1 <= mid)
tmp_vec[index_s + +] = store[index_1++];
}
else{
if (Store[index_1] < store[index_2])
tmp_vec[index_s + +] = store[index_1++];
Else
tmp_vec[index_s + +] = store[index_2++];
}
}
Copy the final result to the corresponding section
for (size_t i=0; i<tmp_vec.size (); ++i)
Store[beg+i] = Tmp_vec[i];
}
Merging requires all elements to be scanned and merged, so the complexity is O (n), and because it is a hierarchical structure similar to a two-fork tree, the recursive traversal requires O (LOGN), and the overall time complexity is O (NLOGN).
There is only 22 comparison in merge sort, there is no jump operation, so merge sort is a stable sort algorithm.

2.8 Quick Sort

The quick sort and name partitioning sort is actually a bubble sort upgrade, which belongs to the swap sort class. The quick sort increases the distance between comparison and movement, moves the larger record directly to the back, and the smaller records move directly from the back to the front, reducing the total number of comparisons and the number of mobile exchanges.
Quick Sort basic idea: each time chooses one datum data, through a trip sort divides the backlog records into the independent two parts, the part record is smaller than the other part record, then carries on the sorting to these two parts record, finally achieves the entire sequence order.
The core idea of fast ordering is data partitioning and recursive invocation.

void Do_quick_sort (std::vector<int>& store, size_t left, size_t right) {
size_t I=left, J=right;
int pivot = Store[i];
while (I&LT;J) {
while (I<j && store[j] > Pivot)
--j;
if (I&LT;J)
Std::swap (Store[i], store[j]); Pivot = = Store[j]
while (I<j && store[i] < pivot)
++i;
if (I&LT;J)
Std::swap (Store[i], store[j]); Pivot = = Store[i]
}
if (left!= i)
Do_quick_sort (store, left, i-1);
if (right!= j)
Do_quick_sort (Store, j + 1, right);
}
The while loop above is the core part of the entire quick sort, the design of the comparison of clever, on the right to look for smaller than pivot elements, and then swap it to the first half [i] position, and then in front of looking for a larger than pivot value, exchange it to the second half of [j] position, All data movement throughout the process is efficient and useful. When the check of the target I, J intersection, the end of this round of sorting.
In the optimal case (the value of the pivot is selected just in the middle of the entire sort range), the time complexity of the fast sort algorithm is O (NLOGN), in the worst case, its time complexity is O (n^2), and the time complexity of fast sorting is O (Nlogn) on average.
Fast sorting is an unstable sort method because of the comparison and exchange of keywords.
The selection of pivot pivot is very important to the performance of the whole sorting algorithm, the basic algorithm is to select the first value on the left as the pivot, and other derivation algorithms can be pivot worthy of selection.

2.9 Linear time sorting

2.9.1 Count Sort
The basic algorithm is as follows:
(1) To remove the minimum N1 and the maximum value of the N2 sequence, and to establish an array of n2-n1+1 lengths;
(2). Iterate through the elements in order to count the corresponding array items according to the values of the elements to be sorted;
(3). This step is more critical, counting array counts, preserving the first n items and count_arr[k] + + = Count_arr[k-1]; This is actually the mapping relationship between the value and the final arrangement position;
(4). Then iterate through the elements to find the corresponding final sort position in the Count_arr according to the element value, which is counted into the sort result.
The disadvantage is that the space requirements are relatively large.

virtual void Do_sort (std::vector<int>& store) override {
Auto it = std::max_element (Store.cbegin (), Store.cend ());
int max_item = *it;
std::vector<int> result (Store.size ());
Std::vector<int> Bucket (Max_item + 1);
for (auto& Elem:store)
Bucket[elem] + +;
Key, adjust count for index
for (size_t I=1; i<bucket.size (); ++i)
Bucket[i] + = bucket[i-1];
Get the element position, keep the result
for (auto& Elem:store)
Result[bucket[elem]-1] = Elem;
store = result;
}
2.9.2 Bucket Sort
Bucket sort is an upgraded version of the counting sort, through a mapping function to distribute the data to the barrel, and then the barrel inside if more than one element can be used to quickly sort and so on operation, and finally realize the bucket data merge;
(1). Set the number of barrels: Bucketcount = Math.floor ((maxvalue-minvalue)/bucketsize) + 1;
(2). The mapping relationship between the elements to be arranged and the bucket: Buckets[math.floor ((Arr[i]-minvalue)/bucketsize)].push (arr[i));
(3). To the barrel of the small to large merger operation: Arr.push (Buckets[i][j]);
Bucket Order of the F (k) value of the calculation, its role is equivalent to the rapid row in the division, has been a large number of data into a basic ordered data block (bucket), and then only need to do a small amount of data in the bucket to do advanced comparison sorting. With enough memory, the more buckets are, the better, ensuring that the elements in each bucket are as few as possible or even one element.

2.9.3 Cardinal Order
Cardinality sequencing includes: sorting from high levels (MSD) and sorting from low levels (LSD), most of which are sorted by LSD. The main idea is from low to high, in order to do many times in the bucket, when the highest bit bucket after the end of the whole data is ordered.

Summary of sorting algorithms

Others summed up the sorting algorithm based on the choice of: first, when the amount of data is small, choose to insert or select the sort, do not use bubble sort; Secondly, when the data volume is large and the space complexity of life, choose a quick sort or heap sort; again, when the amount of data is allowed to use more additional space, You can select the bucket sort, and finally, when adding new data to the sorted record, select Insert Sort.
In terms of stability, merge ordering is a good algorithm for the applications that care a lot about sort stability.

After the collation of the code has been uploaded, the same welcome review.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More