give 4 billion non-repeating unsigned integers, not sequenced. Given an unsigned integer, how can you efficiently determine whether the data is present.
The direct idea is that we store these unsigned integers in memory and then make one by one comparisons with the given number.
Let's analyze that an unsigned integer is 4 bytes, 4 billion * 4 bytes, 1 billion bytes is probably 4 G, and 4 billion unsigned integers are 16G
Computer memory is generally not so big, even if there is such a large, but also to the operating system or something, the direct storage of plastic is not good to achieve;
Moreover, this comparison again, if not exist, then to compare 4 billion times, the efficiency is too low
Now, we can use bitmap---bitmap to solve this problem. That saves space and can solve this problem quickly.
the concept of bitmaps
A bitmap is a representation of the existence or absence of data using every bits of each element of an array;
0 means no, 1 indicates the advantage of having bitmaps
Bitmaps can quickly determine whether a stored keyword exists
and 4 billion integers, an integer occupies only one bit, the space occupancy rate is 31 times times less than before, it only needs about 500M. Anatomy of a bitmap Step 1 Choose a data structure that we have learned
Bitmaps need to be bits, by using the bits 0, one to indicate whether a number exists
We use vectors to store these numbers, and we can put vectors into any basic type; Step 2 to find the corresponding bits
This time we put size_t unsigned, with 32 bits. In other words, a vector of a corresponding subscript can indicate whether 32 numbers exist;
Vector<size_t> _v;//stores the status of each number size_t the number of
_size;//flags deposited
Now give us a number, we divide it by 32, the quotient is the corresponding subscript, the remainder is the corresponding subscript in the number of bits;
Step 3 Consider how to place the position at 1
We know that binary or operation can be a position of 1;
The code for the position
Put, the position of NUM is 1
void Set (size_t num)
{//index is
used to determine which of the elements in the vector//value represents the number of points in
the element from the low to the high.
the//index is shifted to the right by 5 bits, equivalent to 32, with the shift operation High efficiency
int index = num >> 5;
int value = num%;
Use or operation, the position is 1
_v[index] |= 1 << (value-1);
_size++;
}
Step 4 How to reset, erase a number in the bitmap
or using a bitwise operation, this time with XOR;
The same is 0, dissimilarity is 1;
Move out, place Num's position at 0
void Reset (size_t num)
{//index is
used to determine which of the elements in the vector//value represents the number of points in
the element from low to high.
the//index is shifted to the right by 5 bits, equivalent to 32, with the shift operation High efficiency
int index = num >> 5;
int value = num%;
Using the XOR operation, the position is 0
_v[index] ^= 1 << (value-1);
_size--;
}
Step 5 Determine if a number exists in the bitmap
Still bit arithmetic, with, all 1 is 1, otherwise 0
Used to determine whether a number has been stored in a bitmap in
bool Find (size_t num)
{
int index = num >> 5;
int value = num%;
Determine if the bit is 1
return (_v[index] >> (value-1)) & 1;
}
Source Code
#include <vector>
//define Bitmap
class BitMap
{public
:
//Bitmap constructor
BitMap (size_t size = 1024)
: _size (0)
{
//an unsigned reshape (size_t) can have 32 bits representing 32 numbers
_v.resize (SIZE/32 + 1);
}
Put, the position of NUM is 1
void Set (size_t num)
{//index is
used to determine the number of elements
in the vector Value indicates which of the//index from the low to high number in the element is shifted to the
right by 5 bits, equivalent to 32, with the shift operation High efficiency
int index = num >> 5;
int value = num%;
Use or operation, the position is 1
_v[index] |= 1 << (value-1);
_size++;
}
Move out, where Num is located at 0
void Reset (size_t num)
{//index is
used to determine the number of elements
in the vector Value indicates which of the//index from the low to high number in the element is shifted to the
right by 5 bits, equivalent to 32, with the shift operation High efficiency
int index = num >> 5;
int value = num%;
Using the XOR operation, the position is 0
_v[index] ^= 1 << (value-1);
_size--;
}
Used to determine whether a number has been stored in a bitmap in
bool Find (size_t num)
{
int index = num >> 5;
int value = num%;
Determine if the bit is 1
return (_v[index] >> (value-1)) & 1;
}
Protected:
vector<size_t> _v;
size_t _size;
};
Some applications of bitmaps
apply 1 To sort using bitmaps
Iterate through the numbers and put them in the corresponding position on the bitmap. Then, from low to high, traverse through each bit;
The flaw is for negative numbers, to be converted to unsigned integers, the output of the time minus the minimum value can be applied 2 to determine if there are duplicates
Iterate over these numbers and place them in the bitmap if they do not exist, or the duplicate elements are found.