I have posted two inner sorting articles. (1) At that time, the Merge Sorting was not written out. (2) today, the non-recursive quicksort stack <node *> has memory leakage, and the main program option function is not well supported, so I wrote it again today.
During large-scale sorting, we found that the size of 1million integer data is 6.8 MB, and INT occupies 4b of the current platform.
1 million = 1000000 = 106 ≈ 220 total capacity = 4b * 220 = 4 m ≈ 6.8 m, because there are spaces, carriage returns, and some information about the file itself. The data size of 100million is 673 MB, and that of 1billion is 6.6 GB. The size limits of various file systems are described below. The maximum applied memory of the current system is 400 GB.
C/C ++ source code: sort. cpp
Function: seven common inner sorting algorithms
# Include <iostream> <br/> # include <stack> <br/> # include <cassert> <br/> # include <cstring> <br/> # include <cstdio> <br/> using namespace STD; </P> <p> void bubble (int A [], int N); <br/> void select (int A [], int N ); <br/> void insert (int A [], int N); <br/> void shell (int A [], int N ); <br/> void Merge (int A [], int N); <br/> void heap (int A [], int N ); <br/> void quick (int A [], int N); </P> <p> // return 0 Success 1 fail <br /> Int deal_opt (string & in, string & out, Int & N, Int & times, int argc, char * argv []); <br/> int deal_in (int * & A, Int & N, int N, string file); <br/> int deal_out (int A [], int N, string file); </P> <p> typedef void (* func) (int A [], int N); <br/> func sort_func [] = {bubble, select, insert, Shell, merge, heap, quick}; <br/> string sort_name [] = {"bubble", "select", "insert", "shell ", "merge", "Heap", "quick" }; <br/> const int sort _ Num = sizeof (sort_func)/sizeof (func); <br/> const int num_per_line = 10; </P> <p> int main (INT argc, char * argv []) <br/>{< br/> string infile ("din.txt"), OUTFILE ("dout _"); <br/> string help ("command [-I infile] [-O OUTFILE] [-N arrnum] [-T sorttimes]"); <br/> int num = 0, * arr = NULL, * arr1 = NULL; <br/> int n = 1024*1024*1024; <br/> int sort_times = 1; </P> <p> If (0 = deal_opt (infile, OUTFILE, N, sort_times, argc, Argv) {<br/> If (0 = deal_in (ARR, num, N, infile) {<br/> arr1 = new int [num]; <br/> for (INT I = sort_num-1; I> = 0; I --) {<br/> clock_t S = clock (); <br/> for (Int J = 0; j <sort_times; j ++) {<br/> memmove (arr1, arr, sizeof (INT) * num ); <br/> (* sort_func [I]) (arr1, num); <br/>}< br/> double timeused = (double) (clock ()-S) /clocks_per_sec; <br/> cout <sort_name [I] <"timeused is" <timeused <"S" <Endl; <br/> if (1 = d Eal_out (arr1, num, OUTFILE + sort_name [I]) {<br/> cout <"Incorrect write OUTFILE" <Endl; <br/> cout <pelp <Endl; <br/>}< br/> Delete [] arr; <br/> Delete [] arr1; <br/>} else {<br/> cout <"Incorrect read infile" <Endl; <br/> cout <pelp <Endl; <br/>}< br/>}else {<br/> cout <"Incorrect option" <Endl; <br/> cout <pelp <Endl; <br/>}< br/> return 0; <br/>}</P> <p> int Str Ing_to_num (char STR []) {<br/> int Len = strlen (STR), sum = 0; <br/> for (INT I = 0; I <Len; I ++) {<br/> assert (STR [I]> = '0' & STR [I] <= '9 '); <br/> sum = sum * 10 + STR [I]-'0'; <br/>}< br/> return sum; <br/>}< br/> int deal_opt (string & in, string & out, Int & N, Int & times, int argc, char * argv []) {<br/> for (INT I = 1; I <argc; I ++) {<br/> If (! Strncmp ("-I", argv [I], 2) & I <argc-1) {<br/> In = argv [I + 1]; <br/> I ++; <br/>} else if (! Strncmp ("-o", argv [I], 2) & I <argc-1) {<br/> out = argv [I + 1]; <br/> I ++; <br/>} else if (! Strncmp ("-n", argv [I], 2) & I <argc-1) {<br/> N = string_to_num (argv [I + 1]); <br/> I ++; <br/>} else if (! Strncmp ("-T", argv [I], 2) & I <argc-1) {<br/> times = string_to_num (argv [I + 1]); <br/> I ++; <br/>}else {<br/> return 1; <br/>}< br/> return 0; <br/>}< br/> int deal_in (int * & A, Int & N, int N, string file) {<br/> file * fptr = NULL; <br/> A = new int [N]; <br/> If (fptr = fopen (file. c_str (), "R "))! = NULL) {<br/> int data; <br/> n = 0; <br/> while (n <n & (fscanf (fptr, "% d ", & Data ))! = EOF) <br/> A [n ++] = data; <br/> fclose (fptr); <br/> return 0; <br/>}else {<br/> return 1; <br/>}< br/> int deal_out (int A [], int N, string file) {<br/> file * fptr = NULL; <br/> If (fptr = fopen (file. c_str (), "W "))! = NULL) {<br/> for (INT I = 0; I <n; I ++) {<br/> fprintf (fptr, "% d/T ", A [I]); <br/> if (I % num_per_line = num_per_line-1) <br/> fprintf (fptr, "/N "); <br/>}< br/> fclose (fptr); <br/> return 0; <br/>}else {<br/> return 1; <br/>}< br/> inline void swap (Int & A, Int & B) {<br/> int TMP =; <br/> A = B; <br/> B = TMP; <br/>}< br/> void bubble (int A [], int N) {<br/> for (INT I = 1; I <n; I ++) <br/> for (Int J = 1; j <= n-I; J ++) <br/> if (a [J] <A [J-1]) <br/> swap (A [J], a [J-1]); <br/>}< br/> void select (int A [], int N) {<br/> for (INT I = 0; I <n-1; I ++) {<br/> int min = I; <br/> for (Int J = I + 1; j <n; j ++) <br/> if (a [J] <A [Min]) <br/> min = J; <br/> swap (A [I], A [Min]); <br/>}< br/> void insert (int A [], int N) {<br/> for (INT I = 1, J; I <n; I ++) {<br/> int TMP = A [I]; <br/> for (j = 0; j <I & A [J] <= TMP; j ++); <br/> for (int K = I-1; k> = J; k --) <br/> A [k + 1] = A [k]; <br/> A [J] = TMP; <br/>}< br/> void shell (int A [], int N) {<br/> int h; <br/> for (H = 1; H <N/9; H = 3 * H + 1); <br/> for (; h> 0; h/= 3) {<br/> for (INT I = H, J; I <n; I + = H) {<br/> int TMP = A [I]; <br/> for (j = 0; j <I & A [J] <= TMP; J + = H); <br/> for (int K = I-H; k> = J; k-= H) <br/> A [K + H] = A [k]; <br/> A [J] = TMP; <br/>}< br/> void merge1 (int A [], int L, int R) {<br/> If (L <r) {<br/> int mid = (R-l)/2 + L; <Br/> merge1 (A, L, mid); <br/> merge1 (A, Mid + 1, R ); <br/> int * B = new int [R-l + 1]; <br/> int I = 0, j = L, K = Mid + 1; <br/> while (j <= Mid & K <= r) B [I ++] = A [J] <A [k]? A [J ++]: A [k ++]; <br/> while (j <= mid) B [I ++] = A [J ++]; <br/> while (k <= r) B [I ++] = A [k ++]; <br/> memmove (a + L, B, sizeof (INT) * I); <br/> Delete [] B; <br/>}< br/> void Merge (int A [], int N) {<br/> merge1 (A, 0, n-1); <br/>}< br/> void heapify (int A [], int I, int N) {<br/> # define LC (I) (2 * I + 1) <br/> # define RC (I) (2 * I + 2) <br/> while (I <n/2) {<br/> int max = A [I], f = 0; <br/> If (max <A [LC (I)] & lc (I) <n) max = A [LC (I)], F = 1; <br/> If (max <A [RC (I)] & rc (I) <n) F = 2; <br/> if (1 = f) {<br/> swap (A [I], a [LC (I)]); <br/> I = Lc (I); <br/>} else if (2 = f) {<br/> swap (A [I], A [RC (I)]); <br/> I = RC (I); <br/>}else <br/> break; <br/>}< br/> void heap (int A [], int N) {<br/> If (n <= 1) return; <br/> for (INT I = n/2-1; I> = 0; I --) <br/> heapify (A, I, n ); <br/> swap (A [0], a [n-1]); <br/> for (INT I = n-2; I> = 1; I --) {<br/> heapify (A, 0, I + 1); <br/> Swap (A [0], a [I]); <br/>}< br/> struct node {<br/> node (int, int B): l (a), R (B) {}< br/> int L, R; <br/> }; <br/> void quick (int A [], int N) {<br/> stack <node *> S; <br/> S. push (new node (0, n-1); <br/> while (! S. empty () {<br/> int L = S. top ()-> L; <br/> int r = S. top ()-> r; <br/> Delete S. top (); <br/> S. pop (); <br/> If (L <r) {<br/> int I = L, j = r, substring = A [l]; <br/> while (I <j) {<br/> while (I <= J & A [I] <= strong) <br/> I ++; <br/> while (I <= J & A [J]> = running) <br/> j --; <br/> if (I <j) swap (A [I], a [J]); <br/>}< br/> swap (A [L], a [J]); <br/> If (j-1-l> 0) <br/> S. push (new node (L, J-1); <br/> If (r-j-1> 0) <br/> S. push (new node (J + 1, R); <br/>}< br/>
C/C ++ source code: Data. cpp
Function: generate random data of a certain scale.
# Include <iostream> <br/> # include <cassert> <br/> # include <cstring> <br/> # include <ctime> <br/> # include <cstdlib> <br/> using namespace STD; </P> <p> long CAL (char s []) {<br/> long sum = 0; <br/> int Len = strlen (s ); <br/> for (INT I = 0; I <Len; I ++) {<br/> assert (s [I]> = '0' & S [I] <= '9 '); <br/> sum = sum * 10 + (s [I]-'0'); <br/>}< br/> return sum; <br/>}</P> <p> const int n = 1024*1024; <br/> const int Num_per_line = 10; </P> <p> int deal_opt (string & out, long & N, int argc, char * argv []) <br/> {<br/> for (INT I = 1; I <argc; I ++) {<br/> If (! Strncmp ("-o", argv [I], 2) & I <argc-1) {<br/> out = argv [I + 1]; <br/> I ++; <br/>} else if (! Strncmp ("-n", argv [I], 2) & I <argc-1) {<br/> N = CAL (argv [I + 1]); <br/> I ++; <br/>}else <br/> return 1; <br/>}< br/> return 0; <br/>}< br/> int main (INT argc, char * argv []) <br/>{< br/> srand (Time (null )); <br/> long scale = 10000; <br/> string OUTFILE ("data.txt "); <br/> string help ("command [-o outfile] [-N num]"); </P> <p> If (0 = deal_opt (OUTFILE, scale, argc, argv) {<br/> file * fptr = NULL; <br/> If (Fptr = fopen (OUTFILE. c_str (), "W "))! = NULL) {<br/> for (long I = 0; I <scale; I ++) {<br/> fprintf (fptr, "% d/T ", rand () % N); <br/> if (I % num_per_line = NUM_PER_LINE-1) <br/> fprintf (fptr, "/N "); <br/>}< br/>}else {<br/> cout <"Incorrect write OUTFILE" <Endl; <br/> cout <pelp <Endl; <br/>}< br/>}else {<br/> cout <"Incorrect options" <Endl; <br/> cout <pelp <Endl; <br/>}< br/> return 0; <br/>}< br/>
Running result:
Data scale: 10 unordered datasets, 1 million iterations
100 million unordered data sets and 1024 iterations
Data scale: 100 ordered data sets, 1024 iterations
10 million unordered data sets
Data size: 10 million ordered data sets (pay attention to performance degradation of quicksort)
Million unordered data sets
1billion unordered dataset (quick, heap, merge, and other O2 methods have been eliminated)
Note that in small scale, if all datasets can be loaded into the memory without considering the effect of page feed, the three sorting time complexities are O (nlgn). The data display speed is the fastest, and the merge Time is similar to the same, the heap is slower than the other two methods, which is about 1.5 times the relationship. However, when there are 1 billion data records, the memory capacity is about 4 GB. Fast and heap arrays need to traverse the entire array, which may cause bumps. The nature of the merge determines that the data it processes each time has a strong locality and will not be very bumpy, therefore, the merge route is several times better than the other two types of performance.
Conversion: Various file system size restrictions
NTFS (Windows): supports up to 2 TB of partitions and 2 TB of files.
Fat16 (Windows): supports a maximum partition of 2 GB and a maximum file of 2 GB.
FAT32 (Windows): supports a maximum partition of 128 GB and a maximum file of 4 GB.
Ext2
Maximum file size: 1 Tb
Maximum File limit: only limited by the file system size
Maximum partition/file system size: 4 TB
Maximum File Name Length: 255 characters
Default minimum/maximum block size: 1024/4096 bytes
Default inode allocation: 1 for every 4096 bytes
Maximum load before force FS check: 20 (configurable)
// Redhat9 is the ext3 File System by default.
Ext3
Maximum file size: 1 Tb
Maximum File limit: only limited by the file system size
Maximum partition/file system size: 4 TB
Maximum File Name Length: 255 characters
Default minimum/maximum block size: 1024/4096 bytes
Default inode allocation: 1 for every 4096 bytes
Maximum load before force FS check: 20 (configurable)
Reiserfs
Maximum file size: 1 Tb
Maximum File limit: 32 K directory, 4.2 billion files
Maximum partition/file system size: 4 TB
Maximum File Name Length: 255 characters
JFS
Minimum File System size: 16 MB
Maximum file size: restricted by the architecture
Maximum File limit: limited by the file system size
Default minimum/maximum block size: 1024/4096 bytes
Default inode allocation: Dynamic