In the first chapter of "Programming Zhuji", is quite wonderful, this bitmap sort is from there. The bitmap sort described in the book is very ingenious (saves time and space very much), which comes from a thorough analysis of actual problems and data, and then chooses an efficient solution. Combined with some information on the Internet and its own program to achieve a simple record of it.
1. Bitmap Understanding
We all know that the graphics format bitmap storage, in fact, in pixels as a small square, a lattice of vertical and horizontal accumulation up. Each small square represents a color, of course, if for black-and-white color diagram is simpler, only need a bit bit to indicate. This and our data in the computer storage format is similar, the memory is also like a lattice of bit bit bits criss-cross. Because of this inspiration, we found a bit of bit like a queue, the order is very rigorous, if our data can be in a transformation (logically) can be ordered with the bit bit, then we will be in the order of bit bit to output it is not just sorted data collection?
2. The concept of indexing
By the description above, it's easy to think of something-index. The use of the index in our database is undoubtedly very important, so much so that the performance of a single table query with a lot of data is entirely dependent on it. Its similarity to bitmaps lies in the fact that if we think of each row of data as a unit of data, then the index can be thought of as a way of mapping the data to a storage space through a transformation. If the order of the data is the same as the order of the indexes, so when we access the storage space sequentially, we get an ordered dataset. Of course, in many cases, indexes are part of the data, but in Oracle there is the concept of functional indexing, which fully expresses this transformation and mapping relationship.
3. A clever way of sorting--bitmap sorting
Bitmaps are inherently and sorted, because they are the most essential ordered carriers. The questions are as follows:
Input: A file containing n positive integers, each with a positive integer less than the 7 n,n equal to 10 (10 million). and the positive integers in the file do not have duplicate and associated data.
Output: Enter ascending order of integers
Constraints: limited to about 1M of memory, adequate disk space, required to complete the order in the 10s clock
The analysis is as follows: Assuming that the integer occupies 32 bits, 1 m memory can store about 250,000 integers, the first method is to use a disk-based merge sort algorithm, the second is to cut 0-9999999 into 40 intervals, 40 scans (10000000/250000), Reads 250,000 integers in one interval at a time, and uses a quick sort in memory. The third solution, proposed in the book, is to use bitmap (or bit vector) to represent all data sets (note the condition, the data is not duplicated), so that the data can be read into memory at one time, reducing the number of scans.
Lenovo: To grasp the significance of the problem, the telephone number in this issue is a practical significance is the phone number in the entire phone number of the table, more characteristic is that the phone number itself reflects such a seat information. If we set up 10 million bit bits, each one indicates whether the phone number exists at that location (setting 1 is present, 0-not present), and the bit number is the phone number itself, so we iterate through all the bits, the number of the output bit number 1, is that the sort of phone number? Ingenious: Because we use the meaning of the data itself!
The pseudo code for the algorithm is as follows:
Phase 1: Initializing an empty collection
For I=[0,n)
bit[i]=0;
Phase 2: Read data I and set bit[i]=1
For all I in the input file
Bit[i]=1;
Phase 3: Results of output ordering
For I=[0,n)
If Bit[i]==1
Write I on the output file
The time complexity of the algorithm is O (N)
We have seen the high efficiency of bitmap sorting and wonderful ingenious, for our data sorting, can think about: analysis of our data characteristics is critical, any problem may be from the analysis of the characteristics of the breakthrough, Consider the fact that our data doesn't exist. A transformation method allows him to map to this digital relationship. There is no duplication of data that allows us to use bitmap sorting to make it much more efficient.
4. Bitmap sorting Code instance (C language Implementation):
This is sorted only by the 20 random numbers (0~9999999) that are produced by using a bitmap-sorted method. Here are two points to be noted:
4.1 Here the most original bitmap sort does not hold the same number of occurrences, multiple occurrences of the same number, the sort will ignore the existence of multiple identical digits, leaving only one. In fact, a little modification of the following program, can also be implemented compatible with this situation, for example, you can add an array count[10000000] to the bitmap array of the number of values is 1 to count. (Of course, this will exceed 1MB of memory space, but the original title when there is no duplication of data, it can be implemented under the 1MB memory limit conditions)
4.2 is a digression, I in the generation of random number of tests, the first I put the srand random number of seeds in the generation of random number Myrand () function, then each call will produce almost the same random number. It was later learned that the random number seed should be placed in the main function that called Myrand () in order to produce random numbers. (In fact, it can also produce the same number, it's just that I have 20 random numbers in the range of 0~9999999, and they have a lower probability of repetition. In addition, to prevent duplication of data from being generated, I do not know that I will also print the number of numbers after the sort, if the number of numbers after the sort is less than before the order, There is a duplicate number in the sorted input data.
Bitmap sorting, space-time efficient to high realm #include <stdio.h> #include <math.h> #include <time.h> #define BITSPERWORD 32 # Define SHIFT 5 #define MASK 0x1F #define N 10000000 #define M int a[1 + N/bitsperword]; void set (int i) {a[i >> shift] |= (1<< (i & MASK));} void clr (int i) {a[i >> SHIFT] &= ~ (1<&L t; (I & MASK)); int test (int i) {return a[i >> SHIFT] & (1<< (i & MASK));} int Myrand ()/* produces a random number between 0~1 * * {int nu M num = rand ()% 10000000; return num; int main (void) {int i; int j; int arr[m]; int count=0; for (i = 0; i < N; i++) {clr (i);}//while (scanf ("%d" (i)!= EOF) {//set (i);//} Srand ((unsigned) time (NULL)); Note that this random number seed cannot be placed in the generated random number Myrand () function, otherwise each call will produce almost the same random number printf ("The Count of the array is%d:/n", M); for (j = 0; J < M; J + +) {//For simple correctness test arr[j]=myrand ();//Note that the number entered cannot be repeated//otherwise when only printf ("%d/t", Arr[j]) is entered at once;} for (j = 0; j < M; J + +) {//For simple correctness test set (arr[j]); \ printf ("/nafter sorted:/n"); for (i = 0; i < N;i++) {if (Test (i)) {printf ("%d/t", I); count++}} printf ("/nafter sorted count is%d/n", count); Prints out the number of sorted numbers, and if there are duplicate numbers as input, the number of sorted numbers is less than before the sort. return 0; }