Data Structure: bitmap Method

Source: Internet
Author: User
Tags bitset
Moved to: http://www.wypblog.com/archives/148
1. The bitmap method is abbreviated as bitmap. The so-called bitmap stores a certain state with each bit. It is suitable for large-scale data, but there are not many data states. It is usually used to determine whether a data storage does not exist. There is A bitset container in STL, which is actually a bitset method. Reference bitset Introduction: A bitset is a special container class that is designed to store bits (elements with only two possible values: 0 or 1, TrueOr False,...). The class is very similar to a regular array, but optimizing for space allocation: each element occupies only one bit (which is eight times less than the smallest elemental type in C ++: Char). Each element (each bit) can be accessed individually: for example, for a given bitset named Mybitset, The expression Mybitset [3]Accesses its fourth bit, just like a regular array accesses its elements. 2. Data Structure unsigned int bit [N];
In this array, you can store N * sizeof (int) * 8 data records, but the maximum number can only be N * sizeof (int) * 8-1. If the range of data to be stored is 0-15, we only need to make N = 1 so that we can store the data. For example:
If the data is [5, 1, 7, 15, 1024, 6, 10], it is stored in this structure. 3. Related operation 1. Write data to define an array: unsigned char bit [8 *]; in this way, 8 k x 8 = 64 K unsigned short data can be stored. Bit stores the byte location and bit location (byte 0 ~ 8191, 0-bit ~ 7)

For example, if the write speed is 1234, the byte order is 1234/8 = 154; the bits is 1234 & 0b111 = 2, then 1234 is placed at the subscript 154 byte of the bit, And the byte Number 2 (0 ~ 7) set to 1

Byte location: int nBytePos = 1234/8 = 154;

Position: int nBitPos = 1234 & 7 = 2;

// Set the 2 position of the 154 bytes of the array to 1 unsigned short val = 1 <nBitPos; bit [nBytePos] = bit [nBytePos] | val; // write 1234 to get arrBit [154] = 0b00000100

Another example is write 1236,

Byte location: int nBytePos = 1236/8 = 154;

Position: int nBitPos = 1236 & 7 = 4

/// Set the 4 position of the array's 154 bytes to 1val = 1 <nBitPos; arrBit [nBytePos] = arrBit [nBytePos] | val; // write 1236 again to get arrBit [154] = 0b00010100

Function implementation:

#define SHIFT 5  #define MAXLINE 32  #define MASK 0x1F  void setbit(int *bitmap, int i){      bitmap[i >> SHIFT] |= (1 << (i & MASK));  }
2. Read refers to positioning
bool getbit(int *bitmap1, int i){      return bitmap1[i >> SHIFT] & (1 << (i & MASK));  } 
Iv. disadvantages of Bitmap
  1. Poor readability
  2. Although the number of elements stored in bitmap is more than the general practice, the size of stored elements is limited by the size of the bucket. Bitmap storage properties: the number of stored elements is equal to the maximum value of the elements. For example, 1 K bytes of memory can store 8 k elements with a maximum size of 8 K. (The maximum element value is 8 K, which is very limited !) For example, to store a value of 65535, the memory must be 65535/8 = 8 K bytes. The bitmap method is not suitable for storing unsigned int values (about 2 ^ 32/8 = 0.5 billion bytes of memory ).
  3. For storing signed data, bitmap requires two bits to represent a signed element. This allows the number of elements that can be stored in the bitmap, and the maximum element value is halved. For example, the short type of data can be stored in 8 K bytes of memory space. The value range is-32 K ~ 32 K.
5. Bitmap Method Application 1. Give 4 billion unsigned int integers without sorting out the order, and then give a number, how to quickly determine if this number is among the 4 billion
First, store the 4 billion numbers in bitmap, and then determine whether the given number is in bitmap.
2. Use bitmap to determine whether an integer array is duplicated.
Traverse the array, put it into bitmap one by one, and check whether it has appeared in bitmap. If it does not appear, it is a repeated element.
3. Sort integer arrays by bitmap
First, traverse the array to obtain the maximum and minimum values of the array, and then narrow the bitmap range based on the maximum and minimum values. Note that the negative numbers of int values must be converted to unsigned int values for processing. In addition, the minimum value must be subtracted from the number when the bit is obtained.
4. Find non-repeated integers among the 0.25 billion integers. Note that the memory is insufficient to accommodate these 0.25 billion integers.
One way to take the test is to use 2-Bitmap (2bit for each number, 00 indicates that it does not exist, 01 indicates that it appears once, 10 indicates multiple times, and 11 indicates meaningless ). In fact, two general bitmaps can be used here, that is, whether the first bitmap stores the integer or not. If yes, set it in the second bitmap. In this way, you can use a simple 1-bitmap. Vi. Implementation

It must be in http://blog.csdn.net/w397090770/article/details/7388319.

#include <iostream>#include <cstdlib>#include <cstdio>#include <cstring>#include <fstream>#include <string>#include <vector>#include <algorithm>#include <iterator>#define SHIFT 5#define MAXLINE 32#define MASK 0x1Fusing namespace std;//  w397090770  //  wyphao.2007@163.com  //  2012.11.29void setbit(int *bitmap, int i){bitmap[i >> SHIFT] |= (1 << (i & MASK));}bool getbit(int *bitmap1, int i){return bitmap1[i >> SHIFT] & (1 << (i & MASK));}size_t getFileSize(ifstream &in, size_t &size){in.seekg(0, ios::end);size = in.tellg();in.seekg(0, ios::beg);return size;}char * fillBuf(const char *filename){size_t size = 0;ifstream in(filename);if(in.fail()){cerr<< "open " << filename << " failed!" << endl;exit(1);}getFileSize(in, size);char *buf = (char *)malloc(sizeof(char) * size + 1);if(buf == NULL){cerr << "malloc buf error!" << endl;exit(1);}in.read(buf, size);in.close();buf[size] = '\0';return buf;}void setBitMask(const char *filename, int *bit){char *buf, *temp;temp = buf = fillBuf(filename);char *p = new char[11];int len = 0;while(*temp){if(*temp == '\n'){p[len] = '\0';len = 0;//cout<<p<<endl;setbit(bit, atoi(p));}else{p[len++] = *temp;}temp++;}delete buf;}void compareBit(const char *filename, int *bit, vector<int> &result){char *buf, *temp;temp = buf = fillBuf(filename);char *p = new char[11];int len = 0;while(*temp){if(*temp == '\n'){p[len] = '\0';len = 0;if(getbit(bit, atoi(p))){result.push_back(atoi(p));}}else{p[len++] = *temp;}temp++;}delete buf;}int main(){vector<int> result;unsigned int MAX = (unsigned int)(1 << 31);unsigned int size = MAX >> 5;int *bit1;bit1 = (int *)malloc(sizeof(int) * (size + 1));if(bit1 == NULL){cerr<<"Malloc bit1 error!"<<endl;exit(1);}memset(bit1, 0, size + 1);setBitMask("file1", bit1);compareBit("file2", bit1, result);delete bit1;cout<<result.size();sort(result.begin(), result.end());vector< int >::iteratorit = unique(result.begin(), result.end());ofstreamof("result");ostream_iterator<int>output(of, "\n");copy(result.begin(), it, output);return 0;}

(For more information, see http://www.wypblog.com/archives/148 .)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.