The question is raised
A file that contains up to n positive integers, each of which is less than N, where n=10^7. Assuming a maximum of 1 m of memory space is available, how do I sort it, considering the spatial and temporal optimizations?
Conventional Thinking
We assume that these integers are stored in integer type (the size of the generic integer is 4 bytes), then 1M bytes can store 250 000 data. Because the input file can have a maximum of 10^7 data, you can complete the sorting by traversing the input file 40 times. The first time an integer in the range of [0,249 999] is read into memory, the second will read the integer in the [250 000,499 999] range into memory, and so on. Each time the data is read, the data is sorted (some sort algorithm can be used) and output. Obviously, we have to read and write the files repeatedly, which is not what we expected. Below we propose a more reasonable algorithm-bitmap sorting algorithm .
Bitmap Sorting algorithm
If we want to read the entire contents of a file at once (up to 10 million integers), the problem is how to represent these numbers in 1M of memory. We can use bitmaps to represent collections. We can use a character array of length 20 to represent a collection of all positive integers less than 20. Example : The following string can represent the collection {1,2,3,5,8,13}: 0 1 1 1 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 The position of the value in the collection is set to 1, all other locations are 0.
So we're using a string with 10 million bits to represent this file, where, when and only if the integer i exists in the file, the I bit is 1. so we can solve this problem in three steps:
- Initializes all positions of the string to 0.
- Reads each integer in the file one by one and resets the corresponding integer to the subscript 1
- Checks each bit of a string one by one, and if 1, outputs the subscript for this element
the compromise between time and space and the mutual benefit
In many of our problems, we are confronted with the tradeoff between time and space, and Mao Zedong also said in the protracted war that space change time (said to be Chiang Kai-shek first). However, the above procedure is to reduce the program space requirements while also reducing the running time . Because there is less data to process, less time is needed to process the data, and there is no repeated read of the file, which further avoids the disk's access time. of course, only when the original design is not the best solution, it is possible to win time and space.
Ames ' Razor.
设计者确定其设计已经达到完美的标准不是不能再添加任何东西,而是不能再减少任何东西。
Original: http://blog.csdn.net/tengweitw/article/details/45895989
Nineheadedbird
"Algorithm idea" bitmap sorting algorithm