Analytic bitmap processing massive data and its Realization method analysis _c language

Source: Internet
Author: User
Tags int size
"What is Bit-map?"
The so-called bit-map is to use a bit bit to mark the corresponding value of an element, and key is that element. Because the bit is used to store the data, the storage space can be greatly saved.
If there is so much to know about Bit-map, let's look at a concrete example, assuming that we want to sort the 5 elements (4,7,2,5,3) within 0-7 (assuming these elements are not duplicates). Then we can use the Bit-map method to achieve the purpose of sorting. To represent 8 digits, we only need 8 bit (1Bytes), first we open 1Byte space, all the bit bits of these spaces are set to 0 (the following figure:)

Then iterate through the 5 elements, first the first element is 4, then the 4 corresponding position is 1 (this can be done p+ (I/8) | (0x01<< (i%8)) Of course, the operation here involves Big-ending and little-ending, which defaults to big-ending, because it's zero-based, so put the fifth position in one (as shown below):


Then the second element 7 is processed, the eighth bit is set to 1, and then the third element is processed until the final processing of all the elements and the corresponding position is 1, where the bit status of the memory is as follows:


And then we're going to go through the bit area now, and the bit is the number output of a bit (2,3,4,5,7), so that's the order. The following code gives a bitmap usage: sort.
Copy Code code as follows:

Defines 8 bit bits in each byte
#include <memory.h>
#define BYTESIZE 8
void Setbit (char *p, int posi)
{
for (int i=0; I (posi/bytesize); i++)
{
p++;
}

*p = *p| (0x01<< (posi%bytesize));//Assign the bit bit to a value of 1
Return
}

void Bitmapsortdemo ()
{
For simplicity's sake, we don't consider negative numbers.
int num[] = {3,5,2,10,6,12,8,14,9};

Bufferlen This value is determined by the maximum value in the data to be sorted
The maximum value to be sorted is 14, so only 2 bytes (16 bit) are required
It's OK.
const int Bufferlen = 2;
Char *pbuffer = new Char[bufferlen];

To place all bit positions at 0, the result is unpredictable.
memset (Pbuffer,0,bufferlen);
for (int i=0;i<9;i++)
{
First, the corresponding bit bit is placed to 1
Setbit (Pbuffer,num[i]);
}

Output sort Results
for (int i=0;i<bufferlen;i++)//processing one byte at a time (byte)
{
for (int j=0;j<bytesize;j++)//handles each bit bit in the byte
{
To determine whether the bit is 1, for output, the judgment here is rather stupid.
The first J-bit mask (0x01<<j) is obtained and the
Bit and this mask is done and manipulated. Finally determines whether the mask and the processed
Same result
if ((*pbuffer& (0x01<<j)) = = (0x01<<j))
{
printf ("%d", I*bytesize + j);
}
}
pbuffer++;
}
}

int _tmain (int argc, _tchar* argv[])
{
Bitmapsortdemo ();
return 0;
}

"Scope of application"
Data can be quickly found, weight, delete, generally the data range is 10 times times the size of int
"Fundamentals and Essentials"
Use bit arrays to indicate whether some elements exist, such as 8-bit phone numbers
"Extended"
Bloom filter can be seen as an extension of the Bit-map
"problem instance"
1 A certain file contains a number of phone numbers, each number is 8 digits, statistics of the number of different numbers.
8 bits up to 99 999 999, approximately 99m bit, approximately 10 m bytes of memory can be. (Can be understood as a number from 0-99 999 999, each number corresponds to a bit bit, so only need 99M a bit==1.2mbytes, so that the small 1.2M of memory to represent all the 8-digit phone)
2 250 million integers to find the number of distinct integers, the memory space is not enough to accommodate the 250 million integers.
Expand the Bit-map, use 2bit to represent a number, 0 means not present, 1 indicates a single occurrence, 2 indicates 2 times and above, and if the value of the corresponding position is 0, it is 1 if it is 1, and if it is 2, it remains unchanged. Or we do not use 2bit to express, we have two bit-map to simulate the implementation of this 2bit-map, is the same reason.
The implementation of bitmap C language
Copy Code code as follows:

Bitmap.h

/*
The implementation of *bitmap C language
*/
#ifndef _bitmap_h_
#define _bitmap_h_

/*
* Function: Initialize bitmap
Parameters
The size of the *size:bitmap, that is, the number of bit bits
*start: Starting Value
* Return Value: 0 indicates failure, 1 indicates success
*/
int bitmap_init (int size, int start);

/*
* Function: Set the corresponding bit of the value index to 1
*index: The value to set
* Return Value: 0 indicates failure, 1 indicates success
*/
int bitmap_set (int index);

/*
* Function: Take bitmap value of the first bit
*i: Pending position
* Return Value:-1 indicates failure, otherwise returns the value of the corresponding bit
*/
int bitmap_get (int i);

/*
* Function: Returns the value corresponding to the index bit
*/
int bitmap_data (int index);

/* FREE Memory * *
int Bitmap_free ();

#endif

Bitmap.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "Bitmap.h"

unsigned char *g_bitmap = NULL;
int g_size = 0;
int g_base = 0;

int bitmap_init (int size, int start)
{
G_bitmap = (char *) malloc ((size/8+1) *sizeof (char));
if (G_bitmap = NULL)
return 0;
G_base = start;
G_size = size/8+1;
memset (G_bitmap, 0x0, g_size);
return 1;
}

int bitmap_set (int index)
{
int quo = (index-g_base)/8;
int remainder = (index-g_base)%8;
unsigned char x = (0x1<<remainder);
if (Quo > G_size)
return 0;
G_bitmap[quo] |= x;
return 1;
}

int bitmap_get (int i)
{
int quo = (i)/8;
int remainder = (i)%8;
unsigned char x = (0x1<<remainder);
unsigned char res;
if (Quo > G_size)
return-1;
res = g_bitmap[quo] & x;
return res > 0? 1:0;
}

int bitmap_data (int index)
{
Return (index + g_base);
}

int Bitmap_free ()
{
Free (G_BITMAP);
}

Test program BITMAP_TEST.C:

#include <stdio.h>
#include "Bitmap.h"

int main ()
{
int a[] = {5,8,7,6,3,1,10,78,56,34,23,12,43,54,65,76,87,98,89,100};
int i;
Bitmap_init (100, 0);
for (i=0; i<20; i++)
Bitmap_set (A[i]);
for (i=0; i<100; i++)
{
if (Bitmap_get (i) > 0)
printf ("%d", Bitmap_data (i));
}
printf ("n");
Bitmap_free ();
return 0;
}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.