The problem is described as follows:
There are 250 million integers (these 250 million integers are stored in an array, and the array is placed in external memory or memory, without further specific instructions);
Ask to find out the number of numbers that are not duplicated in these 250 million numbers;
In addition, the available memory is limited to 600M;
Require the algorithm to be as efficient and optimal as possible;
Use a bit to indicate whether a number exists, 32bit unsigned integer has 4G, total 4G bits, 8 bits per byte, requires 4G/8 = 512M bytes
1, apply 512M of memory, as a number if there is a flag flag, all clear 0
2, set the register count, clear 0
3, read in a number, see if the corresponding flag is 0, if the 0,flag 1,count plus 1, if 1, do not handle
4, repeat 3 until all integers are processed
Title: A non-repeating integer is found from 250 million positive integers.
Answer:
int has 4 bytes, 32 bit bits, can represent up to 2^32 positive integers, or 4G positive integers (1g=2^30,1k=2^10)
With the 2Bitmap method, each positive integer is marked with two bit, 00 means no, 01 means 1 times, and 10 indicates multiple occurrences.
To create a bucket array with 2Bitmap notation for 4G positive integers, a total of 4g*2bit=1g memory is required.
Scans an integer, and if the flag bit is 00 (=0), the flag of the corresponding position is set to 01 (=1). If 01, it is set to 10 (=2).
The array is represented by a byte array, byte[] flag;
A byte 8bit,
| i/j |
3 |
2 |
1 |
0 |
| 0 |
00 |
00 |
00 |
00 |
| 1 |
00 |
00 |
00 |
00 |
When the conversion,
line number int I=NUMBER/4;
Number of columns int j=number%4;
The number corresponding to the flag bit is int numberflag= (flag[i]& (0x03<< (2*j))) >> (2*J);
You may need to set a new flag value when you scan an integer, set
(flag[i]& (~ (0x03<< (2*j)) &0xff) | (((newflag%4) << (2*j)) &0xff)
The flag bit is the same as the bit update, the others remain unchanged.