Handling high-level sorting of massive amounts of data-hill Sort (c + +)

Last Update:2014-08-13 Source: Internet

Author: User

Tags sorts

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduction to the hill algorithm

Common sorting algorithms generally fall into two categories in terms of average time complexity:
O (n^2): Bubble sort, select sort, insert sort
O (NLOGN): merge sort, quick sort, heap sort

Simple sorting time complexity is generally o (n^2), such as bubble sort, select sort, insert sort, etc.
Advanced sort time complexity is generally o (NLOGN), such as merge sort, quick sort, heap sort.
Two kinds of algorithms with the larger sorting set, the greater the efficiency difference, in the size of the order of 1W, both types of algorithms can be controlled in the millisecond level, but when the size of more than 10W, simple sorting often need a few seconds, minutes or even hours to complete the sorting, while the advanced sort can still be done in a short time.

The hill sort described today is an evolutionary sorting algorithm from the insertion sort, which is also an advanced sort, except that the time complexity is O (n^1.5), slightly inferior to several other advanced sorts, but much better than the simple ordering of O (n^2). Hill sort does not have obvious short boards, unlike merge sort requires a lot of auxiliary space, and unlike the fast sort in the worst case and the average execution efficiency difference is relatively large, and the code is simple, easy to implement.
In general, in the case of a medium-sized number of sorts, you can prioritize the use of hill, and when you find that execution is inefficient, use other advanced sorting instead.

The actual test did a time-consuming comparison of the high-volume sort of the various advanced sorts (yes, the bubble sort is funny.) ), you can see that the efficiency of hill sorting is several times higher than other O (Nlogn) advanced sorting, the difference of the size of 1W is negligible, but when the data size exceeds 10W, it is obvious that the efficiency of hill sorting is much worse than that of other advanced sorts. This efficiency gap will grow larger as data scales.

In summary: Hill sorting is good for medium-sized data, and sorting out very large data is not the best choice.

Algorithmic Stability: unstable

basic Concepts

What is an increment?
Increments are also called step lengths. To make a figurative analogy: a bookshelf with a row of books, and now we have a copy of every x book, this variable x is called an increment.

Hill sort principle
Textbook expression:
First, take an integer less than n D1 as the first increment, grouping all the records in the file. All records with a multiple of D1 are placed in the same group. The direct insert sort is performed in each group first, then the second increment d2<d1 repeats the above groupings and sorts until the increment dt=1 (DT&LT;DT-L&LT;...&LT;D2&LT;D1) is taken, that is, all the records are placed in the same group for direct insert sorting.
Plain English expression:
Still take the above example as metaphor: a bookshelf with a row of books, now from the first book every number of x book, in that book affixed with red stickers, after the red sticker, once again from the second book every number of x book affixed with blue sticker (with the previous color is different), repeat the sticker process until all the books are filled with stickers. Then the book with the same color stickers to do the insertion sort. Then rip off all the stickers and re-sticker the book, this time every number Y book on the Sticker (y>x), all the books are plastered before the insertion sort. Repeat sticker sorting, sticker sorting this process until the last 1 books on the sticker (i.e. each book is affixed with the same color sticker), and then inserted into the order.

Implementing code >

#include"stdafx.h"#include<iostream>#include<ctime>using namespacestd;inta[100000];#defineBegin_record \{clock_t ____temp_begin_time___; ____temp_begin_time___=clock ();#defineEnd_record (dtime) \Dtime=float(Clock ()-____temp_begin_time___)/clocks_per_sec;}/*Hill Insert Sort procedure A-pending array S-start boundary of sort area delta-increment len-array length to be sorted*/voidShellinsert (intA[],intSintDeltaintLen) {    inttemp, I, j, K;  for(i = s + Delta; i < len; i + =Delta) {         for(j = I-delta; J >= S; J-=Delta)if(A[j] < a[i]) Break; Temp=A[i];  for(k = i; k > j; k-=Delta) {A[i]= A[i-Delta]; } a[k+ Delta] =temp; }}/*Hill sort A-array to be sorted len-array length*/voidShellsort (intA[],intLen) {    inttemp; intDelta//Incremental//Hibbard increment sequence formulaDelta = (len +1)/2-1;  while(Delta >0)//constant change increment, array iteration grouping for direct insert sort, until increment is 1    {         for(inti =0; i < Delta; i++) {Shellinsert (A, I, Delta, Len); } Delta= (Delta +1)/2-1; }}voidShellSort2 (intA[],intLen) {    inttemp; intDelta//Incremental//Hill increment sequence formulaDelta = len/2;  while(Delta >0)    {         for(inti =0; i < Delta; i++) {Shellinsert (A, I, Delta, Len); } Delta/=2; }}voidPrintArray (intA[],intlength) {cout<<"Array Contents:";  for(inti =0; i < length; i++)    {        if(i = =0) cout<<A[i]; Elsecout<<","<<A[i]; } cout<<Endl;}int_tmain (intARGC, _tchar*argv[]) {    floatTim; inti;  for(i =0; I <1000000; i++) {A[i]=int(rand ()%100000); } cout<<"10W number of hill sort:"<<Endl;  for(i =0; I <1000000; i++) {A[i]=int(rand ()%100000); } Begin_record ShellSort2 (A,sizeof(a)/sizeof(int)); End_record (Tim) cout<<"Hill Incremental sequence run time:"<< Tim <<"s"<<Endl;  for(i =0; I <1000000; i++) {A[i]=int(rand ()%100000); } Begin_record Shellsort (A,sizeof(a)/sizeof(int)); End_record (Tim) cout<<"Hibbard Incremental sequence run time:"<< Tim <<"s"<<Endl; System ("Pause"); return 0;}

View Code

The efficiency of the Hill sort
The delta sequence of Hill sort is the most important factor affecting the efficiency of hill sorting, so far there is no perfect increment sequence formula. What kind of increment should be chosen is the best, and it is still a math problem.

See the following two increment sequences:

N/2, N/4, n/8...1

1, 3, 7...2^k-1

The first sequence is called the hill increment sequence, and when using hill increments, the time complexity of the hill sort in the worst case is O (n*n).

The second sequence, called the Hibbard increment sequence, uses the Hibbard increment, in which case the hill sort has a time complexity of O (N^3/2) at worst.

The Hibbard of the 10W unordered numbers are ranked by Hill increment sequence, increment sequence, time-consuming comparison, and the order of 10W magnitude, the efficiency of Hibbard increment sequence vigil increment sequence is several times higher. Although Hibbard is not the perfect incremental sequence, the performance is already very good, so in practical applications, Hill sort uses Hibbard increment sequence more.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More