I always know that the insertion sorting will be more efficient when the input size is relatively small, but I cannot know the input size, today, I specifically wrote several small programs to test the time consumption of several sort algorithms that increase with the input size.
Test Environment
CPU 3.0 GHz dual-core 1 GB memory centos VM g ++ 4.9.1
Constructs random integer arrays in advance to calculate the total time consumed by various sorting algorithms
Insert sort vs bubble sort
As expected, insert sorting is generally better than Bubble Sorting in any input scale.
Insert sort vs Quick Sort vs merge sort
It can be seen that when the input size is less than 100, the insert sorting is better than the merge and quick sorting. When the input size is less than 200, insertion sorting is better than Merge Sorting. When the input size is less than 30, the insertion sorting efficiency is more than 50% higher than the quick sorting efficiency. When the input size is less than 50, the insertion sorting efficiency is more than 90% higher than the Merge Sorting efficiency.
Quick sorting vs STD: Sort
STD: sort is about 25% faster than normal. I read the STL code briefly. The basic process of STD: sort is as follows:
1. Take the median of begin, end, and mid and divide the sequence into two parts.
2.1 If the sub-sequence length is greater than 16, the sub-sequence is executed 1
2.2 If the number of recursive layers exceeds lg (N), heap sorting is performed on subsequences. // The number of recursive layers is too deep. This generally occurs when the split distribution is extremely uneven.
3. insert and sort the sequence
In the implementation of STD: sort, STL selects 16 as the threshold value. The subsequences smaller than this value are inserted for optimization, however, in the insert sort and Quick Sort comparison experiment, when the input scale is less than 100, the insert sort is always better than the Quick Sort, especially when the input scale is less than 30. Why does STL use 16 as the threshold? To solve this problem, a small experiment was conducted to add the threshold value as a parameter to the quick sorting, and sort the array with a size of 10 W and 2 K, the corresponding curve between sorting efficiency and threshold is obtained:
It can be seen that in the threshold value of 15 ~ In the range of 100, the modified quick sorting is better than STD: sort, in the range of 30 ~ The peak value is even greater at around 50. In this case, why should STD: Sort use 16 as the threshold?
The author of STL certainly won't write a number randomly. After comparing the code of fast sorting and insertion sorting, a preliminary guess may be related to the data comparison between them and the number of copies.
The ratio of insert sorting to the number of comparisons of selected sorting (calculated by the swap operation as three replication operations), and the ratio of the number of replicas to the time efficiency in different input scales, as we can see, with the increase of input size, the efficiency advantage of insert sorting is gradually decreasing compared with that of fast sorting, while the comparison and moving times are growing exponentially. For some custom types, their comparison and replication operations often take several times as many as integers. for STD: sort, which is universal, choosing 16 as the threshold may be a moderate choice. When we develop our own sorting algorithms, we can flexibly choose the insert threshold value based on the nature of the input data. For some simple data structures, we can choose a relatively large threshold value (such as 30 ~ 50), and for some relatively complex data structures, You need to select a relatively small threshold value.
Merge Sorting vs STD: stable_sort
STD: stable_sort is about 50% more efficient than recursive Merge Sorting, mainly because:
1. stable_sort first divides the input sequence into several groups based on seven elements, and inserts and sorts them in each group.
2. When merging and inserting sorted ordered sequences, certain data copies are reduced by mergeto (A, B, step) & mergeto (B, A, step * 2 ).
STD: Sort vs STD: stable_sort
When the input size is small (<30), sort is about 30% more efficient than stable_sort. This may be because sort can make full use of the performance improvement brought about by insertion sorting when the input size is small.
As the input size increases, the gap between sort and stable_sort is gradually reduced, about 5%.