A small BitMap Algorithm for big data processing. It processes bitmap data.
A small BitMap Algorithm for external sorting of big data (the memory cannot load all sorting elements), removing duplicate elements, and quickly finding randomly deleted elements, the core idea is to use a number as a subscript (index) to index a bit to indicate whether a number exists. The time complexity of sorting is O (N ), the complexity O (N/8) of the required extra space. An example of an algorithm that supports the entire int range (supported by positive and negative numbers) is as follows:
char BitMask[] = {0x80 , 0x40 , 0x20 , 0x10 , 0x8 , 0x4 , 0x2 , 0x1};int WriteNumberBitToByte(char *ByteArra , unsigned int ByteArraSize , int Number){//printf("%d,%d,%d\n",(ByteArraSize * 4) - 1,-(ByteArraSize*4),Number);if (((int)(ByteArraSize * 4) - 1) < Number || Number<-(int)(ByteArraSize*4) ){return 0;//failed,number out of bytearra.}int BaseArraBitPos = ByteArraSize *4;//ByteArraSize *8 /2BaseArraBitPos+=Number;printf("BaseArraBitPos=%d,Number=%d\n",BaseArraBitPos,Number);ByteArra[BaseArraBitPos/8] |= Mask[BaseArraBitPos%8];return 1;//success}int IsNumberBitInByte(char *ByteArra , unsigned int ByteArraSize , int Number){if (((int)(ByteArraSize * 4) - 1) < Number || Number<-(int)(ByteArraSize*4) ){return 0;//failed,number out of bytearra.}int BaseArraBitPos = ByteArraSize *4;//ByteArraSize *8 /2BaseArraBitPos+=Number;if (ByteArra[BaseArraBitPos/8] & BitMask[BaseArraBitPos%8]) {return 1;}return 0;//number not found.}void PrintOrderedBitMap(char *BitMap,unsigned int BitMapCount){int MinmumNumber = -(BitMapCount*8/2);int MaximumValue = (BitMapCount*8/2)-1;for (int i = MinmumNumber; i <= MaximumValue; ++i){if (IsNumberBitInByte(BitMap,BitMapCount,i)){printf("%d,", i);}}printf("\n");}int main(){int Arra[] = {3,-4,2,0,-1,-8,7,-12,10};int MaximumValue =Arra[0],MinmumValue=Arra[0];for (int i = 0; i < sizeof(Arra)/sizeof(Arra[0]); ++i){if(MaximumValue<Arra[i]) {MaximumValue = Arra[i];}if (MinmumValue>Arra[i]){MinmumValue = Arra[i];}}MaximumValue=MaximumValue<0?-MaximumValue:MaximumValue;MinmumValue=MinmumValue<0?-MinmumValue:MinmumValue;MaximumValue=MaximumValue>MinmumValue?MaximumValue:MinmumValue;printf("MaximumValue=%d\n",MaximumValue);//unsigned int BitMapCount = (MaximumValue*2+7)/8;unsigned int BitMapCount = (MaximumValue+3)/4;BitMapCount = BitMapCount>0?BitMapCount:1;char *BitMap = (char*)malloc(BitMapCount);for (int i = 0; i < sizeof(Arra)/sizeof(Arra[0]); ++i){WriteNumberBitToByte(BitMap,BitMapCount,Arra[i]);}PrintOrderedBitMap(BitMap,BitMapCount);}
An example of an algorithm that only supports the unsigned int range is as follows:
char BitMask[] = {0x80 , 0x40 , 0x20 , 0x10 , 0x8 , 0x4 , 0x2 , 0x1};int WriteNumberBitToByte(char *ByteArra , unsigned int ByteArraSize , unsigned int Number){if (((ByteArraSize * 8) - 1) < Number ){return 0;//failed,number out of bytearra.}int BytePos = Number / 8;int BitPos = Number % 8;ByteArra[BytePos] |= BitMask[BitPos];return 1;//success}int IsNumberBitInByte(char *ByteArra , unsigned int ByteArraSize , unsigned int Number){if ((ByteArraSize * 8 - 1) < Number ){return 0;//failed,number out of bytearra.}int BytePos = Number / 8;int BitPos = Number % 8;if (ByteArra[BytePos] & BitMask[BitPos]) {return 1;}return 0;//number not found.}
The above algorithms use a bit to represent a number, that is, there are only two possibilities, either there or none, which can be extended to one byte to represent a number, in this way, we can count the repeated elements in the range of 255 times, And the principle is similar.
In addition, bit is used to represent the number of an int, which saves 31 times the memory space, that is, int (4*8), bit (8/1 ), therefore, the more data volume is used, the more obvious the advantage is, the premise is that the scenario applies this method.
Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.