[Arrangement] bit-MAP/bloom-Filter

Source: Internet
Author: User

Bit-map: A bit is used to mark the value corresponding to an element, and the key is the element. Because bit is used to store data, the storage space can be greatly reduced.

If we want to sort the Five Elements (, 3) in the range of 0 to 7 (if there are no duplicates), we can use the bit-Map Method to sort the elements. To indicate the number of 8 bits, only 8 bits (1 bytes) are required. First, we open up a 1 byte space and set all bits in these spaces to 0 (for example :)

Then traverse these five elements. First, the first element is 4, and then the corresponding position of 4 is 1 (you can operate P + (I/8) in this way) | (0 × 01 <(I % 8). Of course, the operations here involve big-ending and little-ending. The default value is big-ending ), because it starts from scratch, we need to set the fifth digit to one (for example ):

Then, process the second element 7, set the eighth position to 1, and then process the third element until all the elements are processed, and set the corresponding position to 1, at this time, the bit status of the memory is as follows:

Then, we traverse the bit area and output the numbers (, 7) of the bit, so as to sort the bit. The following code provides a bitmap usage: sorting.

// Defines that each byte has eight bits
# Include <memory. h>
# Define bytesize 8
Void setbit (char * P, int posi)
{
For (INT I = 0; I <(posi/bytesize); I ++)
P ++;
* P = * p | (0x01 <(posi % bytesize); // assign this bit value to 1
Return;
}

Void bitmapsortdemo ()
{
// For simplicity, we do not consider negative numbers.
Int num [] = };

// The bufferlen value is determined based on the maximum value of the data to be sorted.
// The maximum value to be sorted is 14. Therefore, only two bytes (16 bits) are required)
.
Const int bufferlen = 2;
Char * pbuffer = new char [bufferlen];

// Set all bits to 0; otherwise, the result is unpredictable.
Memset (pbuffer, 0, bufferlen );
For (INT I = 0; I <9; I ++)
Setbit (pbuffer, num [I]); // first, set the corresponding bit to 1

// Output the sorting result
For (INT I = 0; I <bufferlen; I ++) // process one byte each time (byte)
{
For (Int J = 0; j <bytesize; j ++) // process each bit in this byte
{
// Determine whether the bit is 1 and output the result. The result is stupid.
// Obtain the mask (0x01 <j) for the J-bit.
// Bit and the mask. Finally, determine whether the mask and the processed
// The result is the same.
If (* pbuffer & (0x01 <j) = (0x01 <j ))
Printf ("% d", I * bytesize + J );
}
Pbuffer ++;
}
}

Int _ tmain (INT argc, _ tchar * argv [])
{
Bitmapsortdemo ();
Return 0;
}

1) it is known that a file contains some phone numbers, each of which is an 8-digit number.

8-bit 99 999 999 at most, about 99 m bit, about 10 M bytes of memory (can be understood as a number from 0-99 999, each number corresponds to a bit, so only 1.2 MB bit = Mbytes is required, so that a small memory of about MB represents all 8-digit phones)

2) The number of non-repeated integers in the 0.25 billion integers. The memory space is insufficient to accommodate the 0.25 billion integers.

Extend the bit-map function and use 2 bits to represent a number. 0 indicates that the number does not appear. 1 indicates that the number appears once. 2 indicates that the number appears twice or more. When traversing these numbers, if the value of the corresponding position is 0, it is set to 1; if it is 1, it is set to 2; if it is 2, it remains unchanged; or it is not expressed by 2 bits, it is the same to simulate 2bit-map using two bit-maps.

Bytes ----------------------------------------------------------------------------------------------------------------------

Bloom
Filter: a space-efficient random data structure. It uses a bit array to represent a set in a concise manner and determines whether an element belongs to the set. Bloom
The efficiency of filter has a certain price: when determining whether an element belongs to a set, it is possible to mistakenly consider the elements that do not belong to this set to belong to this set (false
Positive), So Bloom
Filters are not suitable for applications with "Zero errors", but bloom can tolerate low error rates.
Filters use very few errors in exchange for storage space savings.

Set representation and element Query

Next, let's take a look at how the bloom filter represents a set with a bit array. In the initial state, the bloom filter is an array containing M bits, and each bit is set to 0.

To express S = {x1, x2 ,..., Xn} is a set of n elements. Bloom filter uses k mutually independent hash functions (hash
And map each element in the set to {1 ,..., M} range. For any element x, the position HI (x) mapped by hash function I is set to 1 (1 ≤ I ≤ k ). Note: If a location is set to 1 multiple times, only the first time will take effect, and the next few times will not have any effect. In, K = 3, and two hash functions select the same position (from the fifth digit on the left, that is, the second "1 ).

 

When determining whether y belongs to this set, apply K hash functions to Y. If all hi (y) locations are 1 (1 ≤ I ≤ k ), Y is the element in the Set; otherwise, Y is not the element in the set. Y1 in is not an element in the Set (because Y1 points to a "0" bit); Y2 or belongs to this set, or exactly false
Positive.

Bloom
Filter maps elements in the set to an array. If K (k is the number of Hash Functions) ing bits are all 1, it indicates that the element is not in this set. Counting bloom
Filter (CBF) extends each bit in the bit array to a counter, which supports the deletion of elements. Spectral bloom
Filter (SBF) associates it with the number of occurrences of the Set element. SBF uses the minimum value in counter to represent the occurrence frequency of elements.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.