A small BitMap Algorithm for big data processing. It processes bitmap data.

Source: Internet
Author: User

A small BitMap Algorithm for big data processing. It processes bitmap data.

A small BitMap Algorithm for external sorting of big data (the memory cannot load all sorting elements), removing duplicate elements, and quickly finding randomly deleted elements, the core idea is to use a number as a subscript (index) to index a bit to indicate whether a number exists. The time complexity of sorting is O (N ), the complexity O (N/8) of the required extra space. An example of an algorithm that supports the entire int range (supported by positive and negative numbers) is as follows:


char BitMask[] = {0x80 , 0x40 , 0x20 , 0x10 , 0x8 , 0x4 , 0x2 , 0x1};int WriteNumberBitToByte(char *ByteArra , unsigned int ByteArraSize , int Number){//printf("%d,%d,%d\n",(ByteArraSize * 4) - 1,-(ByteArraSize*4),Number);if (((int)(ByteArraSize * 4) - 1) < Number || Number<-(int)(ByteArraSize*4) ){return 0;//failed,number out of bytearra.}int BaseArraBitPos = ByteArraSize *4;//ByteArraSize *8 /2BaseArraBitPos+=Number;printf("BaseArraBitPos=%d,Number=%d\n",BaseArraBitPos,Number);ByteArra[BaseArraBitPos/8] |= Mask[BaseArraBitPos%8];return 1;//success}int IsNumberBitInByte(char *ByteArra , unsigned int ByteArraSize , int Number){if (((int)(ByteArraSize * 4) - 1) < Number || Number<-(int)(ByteArraSize*4) ){return 0;//failed,number out of bytearra.}int BaseArraBitPos = ByteArraSize *4;//ByteArraSize *8 /2BaseArraBitPos+=Number;if (ByteArra[BaseArraBitPos/8] & BitMask[BaseArraBitPos%8]) {return 1;}return 0;//number not found.}void PrintOrderedBitMap(char *BitMap,unsigned int BitMapCount){int MinmumNumber = -(BitMapCount*8/2);int MaximumValue = (BitMapCount*8/2)-1;for (int i = MinmumNumber; i <= MaximumValue; ++i){if (IsNumberBitInByte(BitMap,BitMapCount,i)){printf("%d,", i);}}printf("\n");}int main(){int Arra[] = {3,-4,2,0,-1,-8,7,-12,10};int MaximumValue =Arra[0],MinmumValue=Arra[0];for (int i = 0; i < sizeof(Arra)/sizeof(Arra[0]); ++i){if(MaximumValue<Arra[i]) {MaximumValue = Arra[i];}if (MinmumValue>Arra[i]){MinmumValue = Arra[i];}}MaximumValue=MaximumValue<0?-MaximumValue:MaximumValue;MinmumValue=MinmumValue<0?-MinmumValue:MinmumValue;MaximumValue=MaximumValue>MinmumValue?MaximumValue:MinmumValue;printf("MaximumValue=%d\n",MaximumValue);//unsigned int BitMapCount = (MaximumValue*2+7)/8;unsigned int BitMapCount = (MaximumValue+3)/4;BitMapCount = BitMapCount>0?BitMapCount:1;char *BitMap = (char*)malloc(BitMapCount);for (int i = 0; i < sizeof(Arra)/sizeof(Arra[0]); ++i){WriteNumberBitToByte(BitMap,BitMapCount,Arra[i]);}PrintOrderedBitMap(BitMap,BitMapCount);}


An example of an algorithm that only supports the unsigned int range is as follows:

char BitMask[] = {0x80 , 0x40 , 0x20 , 0x10 , 0x8 , 0x4 , 0x2 , 0x1};int WriteNumberBitToByte(char *ByteArra , unsigned int ByteArraSize , unsigned int Number){if (((ByteArraSize * 8) - 1) < Number ){return 0;//failed,number out of bytearra.}int BytePos = Number / 8;int BitPos = Number % 8;ByteArra[BytePos] |= BitMask[BitPos];return 1;//success}int IsNumberBitInByte(char *ByteArra , unsigned int ByteArraSize , unsigned int Number){if ((ByteArraSize * 8 - 1) < Number ){return 0;//failed,number out of bytearra.}int BytePos = Number / 8;int BitPos = Number % 8;if (ByteArra[BytePos] & BitMask[BitPos]) {return 1;}return 0;//number not found.}


The above algorithms use a bit to represent a number, that is, there are only two possibilities, either there or none, which can be extended to one byte to represent a number, in this way, we can count the repeated elements in the range of 255 times, And the principle is similar.


In addition, bit is used to represent the number of an int, which saves 31 times the memory space, that is, int (4*8), bit (8/1 ), therefore, the more data volume is used, the more obvious the advantage is, the premise is that the scenario applies this method.

Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.