The first chapter of this book presents a seemingly simple question, with up to 10 million different integer data in the hard disk's file, and how to sort it as quickly as possible with 1 m of memory.
Each number can be stored with 4BYTE,1M 250 000 data, obviously, as long as 250 000 data sorted each time, write to the file can be repeated 40 times.
So how do you choose 250,000 data for each traversal? There are two strategies:
1, 10 million data traversal 40 times, the first I traverse, the judge whether the number belongs to [i*250000,i*250000+249999), if yes, then read into memory, when the first traversal is completed, the internal
There are 250,000 data in the storage, which is larger than the previous i-1, but smaller than the 40-i traversal. Therefore, the first traversal is selected into the memory of the number of sorting, output to the hard disk file,
Append to i-1 output to that file.
Features: simple, rough, but the number of traversal, only serial, the file was read 40 times, the machine runs 2 minutes 17 seconds 214 milliseconds.
2, 10 million data traversal 1 times, Group I 250,000 data into memory, sort, output to file I, 10 million data after the completion of a traversal, the generation of 40 content of an orderly temporary file,
After merging these files, you can.
Features: Only one read in the source file, sorting can be multi-threaded, the number of Io is still high. In serial cases, this machine runs for 21 seconds and 221 milliseconds
So how can it reach within 10 seconds?
Then analyze the characteristics of the problem:
A, the data does not exceed the maximum value
B. All data is not duplicated
c, each piece of data is only one number
Then you can apply for a 10 million bit long bit vector, the subscript is the bit of I is 1, then there is a number I.
3. A bitmap sorting method for this problem is obtained.
Apply a bit vector length of 10 million bits bit[10000000], all bits set to 0, sequentially read the file to be sorted, each read into a number I, will bit[i] set to 1. When all of the data is read into the complete
Bit does the traversal from beginning to end, if bit[i]=1, then output I to the file, when the traversal is complete, the file is already sorted. This machine runs for 9 seconds and 49 milliseconds.
Note: There is no bit type of programming language to implement bit manipulation
Take the int type under 32-bit operating system as an example. If you need to apply n bits, you need to have a[n/32+1] int type to allow the next n bit (of course, some bits in the last int are wasted)
When the position I is 1, you can use the following actions:
The first bit must be in the number of int array A (I/32), the offset is (i%32), the I position is 1, requires the number of (I/32) and a number B phase or can be, B is required (I%32) bit is 1, the other bits are
is 0, so there is the following statement:
a[i/] | (1 << (i%
To ensure the fastest operating speed, the above rewrite is as follows:
A[i >>5] |= (1));
Sort 10 million data---programming Zhu Ji Nanxiong Second Edition Chapter One