Comparison-based inner Sorting Algorithm (3)

Source: Internet
Author: User

I have posted two inner sorting articles. (1) At that time, the Merge Sorting was not written out. (2) today, the non-recursive quicksort stack <node *> has memory leakage, and the main program option function is not well supported, so I wrote it again today.

During large-scale sorting, we found that the size of 1million integer data is 6.8 MB, and INT occupies 4b of the current platform.

1 million = 1000000 = 106 ≈ 220 total capacity = 4b * 220 = 4 m ≈ 6.8 m, because there are spaces, carriage returns, and some information about the file itself. The data size of 100million is 673 MB, and that of 1billion is 6.6 GB. The size limits of various file systems are described below. The maximum applied memory of the current system is 400 GB.

 

C/C ++ source code: sort. cpp

Function: seven common inner sorting algorithms

# Include <iostream> <br/> # include <stack> <br/> # include <cassert> <br/> # include <cstring> <br/> # include <cstdio> <br/> using namespace STD; </P> <p> void bubble (int A [], int N); <br/> void select (int A [], int N ); <br/> void insert (int A [], int N); <br/> void shell (int A [], int N ); <br/> void Merge (int A [], int N); <br/> void heap (int A [], int N ); <br/> void quick (int A [], int N); </P> <p> // return 0 Success 1 fail <br /> Int deal_opt (string & in, string & out, Int & N, Int & times, int argc, char * argv []); <br/> int deal_in (int * & A, Int & N, int N, string file); <br/> int deal_out (int A [], int N, string file); </P> <p> typedef void (* func) (int A [], int N); <br/> func sort_func [] = {bubble, select, insert, Shell, merge, heap, quick}; <br/> string sort_name [] = {"bubble", "select", "insert", "shell ", "merge", "Heap", "quick" }; <br/> const int sort _ Num = sizeof (sort_func)/sizeof (func); <br/> const int num_per_line = 10; </P> <p> int main (INT argc, char * argv []) <br/>{< br/> string infile ("din.txt"), OUTFILE ("dout _"); <br/> string help ("command [-I infile] [-O OUTFILE] [-N arrnum] [-T sorttimes]"); <br/> int num = 0, * arr = NULL, * arr1 = NULL; <br/> int n = 1024*1024*1024; <br/> int sort_times = 1; </P> <p> If (0 = deal_opt (infile, OUTFILE, N, sort_times, argc, Argv) {<br/> If (0 = deal_in (ARR, num, N, infile) {<br/> arr1 = new int [num]; <br/> for (INT I = sort_num-1; I> = 0; I --) {<br/> clock_t S = clock (); <br/> for (Int J = 0; j <sort_times; j ++) {<br/> memmove (arr1, arr, sizeof (INT) * num ); <br/> (* sort_func [I]) (arr1, num); <br/>}< br/> double timeused = (double) (clock ()-S) /clocks_per_sec; <br/> cout <sort_name [I] <"timeused is" <timeused <"S" <Endl; <br/> if (1 = d Eal_out (arr1, num, OUTFILE + sort_name [I]) {<br/> cout <"Incorrect write OUTFILE" <Endl; <br/> cout <pelp <Endl; <br/>}< br/> Delete [] arr; <br/> Delete [] arr1; <br/>} else {<br/> cout <"Incorrect read infile" <Endl; <br/> cout <pelp <Endl; <br/>}< br/>}else {<br/> cout <"Incorrect option" <Endl; <br/> cout <pelp <Endl; <br/>}< br/> return 0; <br/>}</P> <p> int Str Ing_to_num (char STR []) {<br/> int Len = strlen (STR), sum = 0; <br/> for (INT I = 0; I <Len; I ++) {<br/> assert (STR [I]> = '0' & STR [I] <= '9 '); <br/> sum = sum * 10 + STR [I]-'0'; <br/>}< br/> return sum; <br/>}< br/> int deal_opt (string & in, string & out, Int & N, Int & times, int argc, char * argv []) {<br/> for (INT I = 1; I <argc; I ++) {<br/> If (! Strncmp ("-I", argv [I], 2) & I <argc-1) {<br/> In = argv [I + 1]; <br/> I ++; <br/>} else if (! Strncmp ("-o", argv [I], 2) & I <argc-1) {<br/> out = argv [I + 1]; <br/> I ++; <br/>} else if (! Strncmp ("-n", argv [I], 2) & I <argc-1) {<br/> N = string_to_num (argv [I + 1]); <br/> I ++; <br/>} else if (! Strncmp ("-T", argv [I], 2) & I <argc-1) {<br/> times = string_to_num (argv [I + 1]); <br/> I ++; <br/>}else {<br/> return 1; <br/>}< br/> return 0; <br/>}< br/> int deal_in (int * & A, Int & N, int N, string file) {<br/> file * fptr = NULL; <br/> A = new int [N]; <br/> If (fptr = fopen (file. c_str (), "R "))! = NULL) {<br/> int data; <br/> n = 0; <br/> while (n <n & (fscanf (fptr, "% d ", & Data ))! = EOF) <br/> A [n ++] = data; <br/> fclose (fptr); <br/> return 0; <br/>}else {<br/> return 1; <br/>}< br/> int deal_out (int A [], int N, string file) {<br/> file * fptr = NULL; <br/> If (fptr = fopen (file. c_str (), "W "))! = NULL) {<br/> for (INT I = 0; I <n; I ++) {<br/> fprintf (fptr, "% d/T ", A [I]); <br/> if (I % num_per_line = num_per_line-1) <br/> fprintf (fptr, "/N "); <br/>}< br/> fclose (fptr); <br/> return 0; <br/>}else {<br/> return 1; <br/>}< br/> inline void swap (Int & A, Int & B) {<br/> int TMP =; <br/> A = B; <br/> B = TMP; <br/>}< br/> void bubble (int A [], int N) {<br/> for (INT I = 1; I <n; I ++) <br/> for (Int J = 1; j <= n-I; J ++) <br/> if (a [J] <A [J-1]) <br/> swap (A [J], a [J-1]); <br/>}< br/> void select (int A [], int N) {<br/> for (INT I = 0; I <n-1; I ++) {<br/> int min = I; <br/> for (Int J = I + 1; j <n; j ++) <br/> if (a [J] <A [Min]) <br/> min = J; <br/> swap (A [I], A [Min]); <br/>}< br/> void insert (int A [], int N) {<br/> for (INT I = 1, J; I <n; I ++) {<br/> int TMP = A [I]; <br/> for (j = 0; j <I & A [J] <= TMP; j ++); <br/> for (int K = I-1; k> = J; k --) <br/> A [k + 1] = A [k]; <br/> A [J] = TMP; <br/>}< br/> void shell (int A [], int N) {<br/> int h; <br/> for (H = 1; H <N/9; H = 3 * H + 1); <br/> for (; h> 0; h/= 3) {<br/> for (INT I = H, J; I <n; I + = H) {<br/> int TMP = A [I]; <br/> for (j = 0; j <I & A [J] <= TMP; J + = H); <br/> for (int K = I-H; k> = J; k-= H) <br/> A [K + H] = A [k]; <br/> A [J] = TMP; <br/>}< br/> void merge1 (int A [], int L, int R) {<br/> If (L <r) {<br/> int mid = (R-l)/2 + L; <Br/> merge1 (A, L, mid); <br/> merge1 (A, Mid + 1, R ); <br/> int * B = new int [R-l + 1]; <br/> int I = 0, j = L, K = Mid + 1; <br/> while (j <= Mid & K <= r) B [I ++] = A [J] <A [k]? A [J ++]: A [k ++]; <br/> while (j <= mid) B [I ++] = A [J ++]; <br/> while (k <= r) B [I ++] = A [k ++]; <br/> memmove (a + L, B, sizeof (INT) * I); <br/> Delete [] B; <br/>}< br/> void Merge (int A [], int N) {<br/> merge1 (A, 0, n-1); <br/>}< br/> void heapify (int A [], int I, int N) {<br/> # define LC (I) (2 * I + 1) <br/> # define RC (I) (2 * I + 2) <br/> while (I <n/2) {<br/> int max = A [I], f = 0; <br/> If (max <A [LC (I)] & lc (I) <n) max = A [LC (I)], F = 1; <br/> If (max <A [RC (I)] & rc (I) <n) F = 2; <br/> if (1 = f) {<br/> swap (A [I], a [LC (I)]); <br/> I = Lc (I); <br/>} else if (2 = f) {<br/> swap (A [I], A [RC (I)]); <br/> I = RC (I); <br/>}else <br/> break; <br/>}< br/> void heap (int A [], int N) {<br/> If (n <= 1) return; <br/> for (INT I = n/2-1; I> = 0; I --) <br/> heapify (A, I, n ); <br/> swap (A [0], a [n-1]); <br/> for (INT I = n-2; I> = 1; I --) {<br/> heapify (A, 0, I + 1); <br/> Swap (A [0], a [I]); <br/>}< br/> struct node {<br/> node (int, int B): l (a), R (B) {}< br/> int L, R; <br/> }; <br/> void quick (int A [], int N) {<br/> stack <node *> S; <br/> S. push (new node (0, n-1); <br/> while (! S. empty () {<br/> int L = S. top ()-> L; <br/> int r = S. top ()-> r; <br/> Delete S. top (); <br/> S. pop (); <br/> If (L <r) {<br/> int I = L, j = r, substring = A [l]; <br/> while (I <j) {<br/> while (I <= J & A [I] <= strong) <br/> I ++; <br/> while (I <= J & A [J]> = running) <br/> j --; <br/> if (I <j) swap (A [I], a [J]); <br/>}< br/> swap (A [L], a [J]); <br/> If (j-1-l> 0) <br/> S. push (new node (L, J-1); <br/> If (r-j-1> 0) <br/> S. push (new node (J + 1, R); <br/>}< br/>

 

C/C ++ source code: Data. cpp

Function: generate random data of a certain scale.

# Include <iostream> <br/> # include <cassert> <br/> # include <cstring> <br/> # include <ctime> <br/> # include <cstdlib> <br/> using namespace STD; </P> <p> long CAL (char s []) {<br/> long sum = 0; <br/> int Len = strlen (s ); <br/> for (INT I = 0; I <Len; I ++) {<br/> assert (s [I]> = '0' & S [I] <= '9 '); <br/> sum = sum * 10 + (s [I]-'0'); <br/>}< br/> return sum; <br/>}</P> <p> const int n = 1024*1024; <br/> const int Num_per_line = 10; </P> <p> int deal_opt (string & out, long & N, int argc, char * argv []) <br/> {<br/> for (INT I = 1; I <argc; I ++) {<br/> If (! Strncmp ("-o", argv [I], 2) & I <argc-1) {<br/> out = argv [I + 1]; <br/> I ++; <br/>} else if (! Strncmp ("-n", argv [I], 2) & I <argc-1) {<br/> N = CAL (argv [I + 1]); <br/> I ++; <br/>}else <br/> return 1; <br/>}< br/> return 0; <br/>}< br/> int main (INT argc, char * argv []) <br/>{< br/> srand (Time (null )); <br/> long scale = 10000; <br/> string OUTFILE ("data.txt "); <br/> string help ("command [-o outfile] [-N num]"); </P> <p> If (0 = deal_opt (OUTFILE, scale, argc, argv) {<br/> file * fptr = NULL; <br/> If (Fptr = fopen (OUTFILE. c_str (), "W "))! = NULL) {<br/> for (long I = 0; I <scale; I ++) {<br/> fprintf (fptr, "% d/T ", rand () % N); <br/> if (I % num_per_line = NUM_PER_LINE-1) <br/> fprintf (fptr, "/N "); <br/>}< br/>}else {<br/> cout <"Incorrect write OUTFILE" <Endl; <br/> cout <pelp <Endl; <br/>}< br/>}else {<br/> cout <"Incorrect options" <Endl; <br/> cout <pelp <Endl; <br/>}< br/> return 0; <br/>}< br/>

 

 

Running result:

Data scale: 10 unordered datasets, 1 million iterations

100 million unordered data sets and 1024 iterations

Data scale: 100 ordered data sets, 1024 iterations

10 million unordered data sets

Data size: 10 million ordered data sets (pay attention to performance degradation of quicksort)

Million unordered data sets

1billion unordered dataset (quick, heap, merge, and other O2 methods have been eliminated)

Note that in small scale, if all datasets can be loaded into the memory without considering the effect of page feed, the three sorting time complexities are O (nlgn). The data display speed is the fastest, and the merge Time is similar to the same, the heap is slower than the other two methods, which is about 1.5 times the relationship. However, when there are 1 billion data records, the memory capacity is about 4 GB. Fast and heap arrays need to traverse the entire array, which may cause bumps. The nature of the merge determines that the data it processes each time has a strong locality and will not be very bumpy, therefore, the merge route is several times better than the other two types of performance.

 

 

Conversion: Various file system size restrictions

 
NTFS (Windows): supports up to 2 TB of partitions and 2 TB of files.
 
Fat16 (Windows): supports a maximum partition of 2 GB and a maximum file of 2 GB.
 
FAT32 (Windows): supports a maximum partition of 128 GB and a maximum file of 4 GB.
 
Ext2
 
Maximum file size: 1 Tb
 
Maximum File limit: only limited by the file system size
 
Maximum partition/file system size: 4 TB
 
Maximum File Name Length: 255 characters
 
Default minimum/maximum block size: 1024/4096 bytes
 
Default inode allocation: 1 for every 4096 bytes
 
Maximum load before force FS check: 20 (configurable)
 
// Redhat9 is the ext3 File System by default.
 
Ext3
 
Maximum file size: 1 Tb
 
Maximum File limit: only limited by the file system size
 
Maximum partition/file system size: 4 TB
 
Maximum File Name Length: 255 characters
 
Default minimum/maximum block size: 1024/4096 bytes
 
Default inode allocation: 1 for every 4096 bytes
 
Maximum load before force FS check: 20 (configurable)
 
Reiserfs
 
Maximum file size: 1 Tb
 
Maximum File limit: 32 K directory, 4.2 billion files
 
Maximum partition/file system size: 4 TB
 
Maximum File Name Length: 255 characters
 
JFS
 
Minimum File System size: 16 MB
 
Maximum file size: restricted by the architecture
 
Maximum File limit: limited by the file system size
 
Default minimum/maximum block size: 1024/4096 bytes
 
Default inode allocation: Dynamic

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.