How Python implements bitmap data structures

Source: Internet
Author: User
Bitmap is a very common data structure, such as for Bloom filter, for the sorting of non-repeating integers, and so on. Bitmap are typically based on arrays, where each element of an array can be considered a series of binary numbers, and all elements make up a larger binary set. For Python, integer types are signed by default, so the number of available bits for an integer is 31 bits.

Bitmap realization Idea

Bitmap is used to operate on each one. For example, if a Python array contains 4 32-bit signed integers, the total available bit is 4 * 31 = 124 bits. If you want to operate on a 90th bits, get to the first element of the array of operations, get the appropriate bit index, and then perform the operation.

Shown as a 32-bit integer, the default is a signed type in Python, the highest bit is the sign bit, and bitmap cannot use it. The left side is high, the right side is low, the lowest bit is the No. 0 position.

Initialize bitmap

First you need to initialize the bitmap. Take 90 for this integer, because a single integer can only use 31 bits, so 90 is divided by 31 and rounded up to know that several array elements are required. The code is as follows:

#!/usr/bin/env python#coding:utf8class Bitmap (object):d EF __init__ (self, max): self.size = Int ((max + 31-1)/+)                 #向 On rounding if __name__ = = ' __main__ ': bitmap = Bitmap ("x") print ' requires%d elements. '% bitmap.size

$ python bitmap.py requires 3 elements.

Calculate index

Once you have determined the size of the array, you can create the array. If you want to save an integer in this array, you first need to know the element that is stored in the array, and then you know it is on the first part of the element. Therefore, the computed index is divided into:

    1. Computes the index in the array

    2. Computes the bit index in the array element

Computes the index in the array

The index in the array is actually the same as the size of the previously computed array. Just before the maximum number calculation, now replace any integer that needs to be stored. But a little different, the index in the array is rounded down, so the implementation of the Calcelemindex method needs to be modified. Change the code to read as follows:

#!/usr/bin/env python#coding:utf8class Bitmap (object):d EF __init__ (self, max): self.size  = Self.calcelemindex (  Max, True) Self.array = [0 for I in range (self.size)]def calcelemindex (self, num, up=false): "Up is True for rounding up, otherwise rounding down" if Up:return int (num + 31-1) #向上取整return num/31if __name__ = = ' __main__ ': Bitmap = Bitmap (+) print ' array requires%d elements. '% Bitmap.sizeprint ' 47 should be stored on the array of%d elements. '% Bitmap.calcelemindex (47)

A $ python bitmap.py array requires 3 elements. 47 should be stored on the 1th element of the array.

So it's important to get the largest integer, otherwise it's possible to create an array that doesn't fit some data.

Computes the bit index in the array element

The bit index in the array element can be obtained by the modulo operation. A bit index is obtained by making the integer that needs to be stored and 31 modulo. Change the code to read as follows:

#!/usr/bin/env python#coding:utf8class Bitmap (object):d EF __init__ (self, max): self.size  = Self.calcelemindex (  Max, True) Self.array = [0 for I in range (self.size)]def calcelemindex (self, num, up=false): "Up is True for rounding up, otherwise rounding down" if Up:return int (num + 31-1) #向上取整return num/31def calcbitindex (self, num): Return num% 31if __name__ = = ' __main_ _ ': Bitmap = Bitmap (+) print ' array requires%d elements. '% Bitmap.sizeprint ' 47 should be stored on the array of%d elements. '% bitmap.calcelemindex print ' 47 should be stored on%d bits of the array element%d. '% (Bitmap.calcelemindex (+), Bitmap.calcbitindex (47),)

A $ python bitmap.py array requires 3 elements. 47 should be stored on the 1th element of the array. 47 should be stored on the 16th bit of the 1th array element.

Don't forget to count from the No. 0 place Oh.

1 operation

The bits default is 0, and a position of 1 indicates that the data is stored in this bit. Change the code to read as follows:

#!/usr/bin/env python#coding:utf8class Bitmap (object):d EF __init__ (self, max): self.size  = Self.calcelemindex (  Max, True) Self.array = [0 for I in range (self.size)]def calcelemindex (self, num, up=false): "Up is True for rounding up, otherwise rounding down" if Up:return int (num + 31-1) #向上取整return num/31def calcbitindex (self, num): Return num% 31def set (self, num): Elem Index = Self.calcelemindex (num) byteindex = self.calcbitindex (num) elem      = self.array[elemindex]self.array[ Elemindex] = Elem | (1 << byteindex) If __name__ = = ' __main__ ': bitmap = Bitmap (All) bitmap.set (0) Print Bitmap.array

$ python bitmap.py[1, 0, 0]

Since the No. 0 position is counted, if you need to store 0, you need to place the No. 0 position 1.

Clear 0 operation

A location of 0, which discards the stored data. The code is as follows:

#!/usr/bin/env python#coding:utf8class Bitmap (object):d EF __init__ (self, max): self.size  = Self.calcelemindex (  Max, True) Self.array = [0 for I in range (self.size)]def calcelemindex (self, num, up=false): "Up is True for rounding up, otherwise rounding down" if Up:return int (num + 31-1) #向上取整return num/31def calcbitindex (self, num): Return num% 31def set (self, num): Elem Index = Self.calcelemindex (num) byteindex = self.calcbitindex (num) elem      = self.array[elemindex]self.array[ Elemindex] = Elem | (1 << byteindex) def clean (self, i): Elemindex = Self.calcelemindex (i) Byteindex = Self.calcbitindex (i) elem      = Self.array[elemindex]self.array[elemindex] = Elem & (~ (1 << byteindex)) If __name__ = = ' __main__ ': bitmap = Bitma P (Bitmap.set) (0) bitmap.set print bitmap.arraybitmap.clean (0) print Bitmap.arraybitmap.clean (print) Bitmap.array

$ python bitmap.py[1, 8, 0][0, 8, 0][0, 0, 0]

Clear 0 and set 1 are reciprocal operations.

Test whether a bit is 1

Determines whether a bit is 1 in order to take out previously stored data. The code is as follows:

#!/usr/bin/env python#coding:utf8class Bitmap (object):d EF __init__ (self, max): Self.size = Self.calcelemindex (max, true) Self.array = [0 for I in range self.size]]def calcelemindex (self, num, up=false): "Up is true for rounding up, otherwise rounding down" if up:r  Eturn int (num + 31-1) #向上取整return num/31def calcbitindex (self, num): Return num% 31def set (self, num): Elemindex = Self.calcelemindex (num) byteindex = self.calcbitindex (num) elem = Self.array[elemindex]self.array[elemindex] = Elem | (1 << byteindex) def clean (self, i): Elemindex = Self.calcelemindex (i) Byteindex = Self.calcbitindex (i) Elem = self . array[elemindex]self.array[elemindex] = Elem & (~ (1 << byteindex)) def test (self, i): Elemindex = Self.calcelemindex (i) Byteindex = Self.calcbitindex (i) if Self.array[elemindex] & (1 << byteindex): Return Truereturn Falseif __name__ = = ' __main__ ': bitmap = Bitmap (All) bitmap.set (0) Print Bitmap.arrayprint bitmap.test (0) Bitmap.set (1) Print bitmap.test (1) Print bitmap.test (2) bitmap.clEAN (1) Print bitmap.test (1) 

$ python bitmap.py[1, 0, 0]truetruefalsefalse

Next, you implement a sort of non-repeating array. It is known that the largest element of an unordered, non-negative integer array is 879, which is ordered naturally. The code is as follows:

#!/usr/bin/env python#coding:utf8class Bitmap (object):d EF __init__ (self, max): Self.size = Self.calcelemindex (max, true) Self.array = [0 for I in range self.size]]def calcelemindex (self, num, up=false): "Up is true for rounding up, otherwise rounding down" if up:r  Eturn int (num + 31-1) #向上取整return num/31def calcbitindex (self, num): Return num% 31def set (self, num): Elemindex = Self.calcelemindex (num) byteindex = self.calcbitindex (num) elem = Self.array[elemindex]self.array[elemindex] = Elem | (1 << byteindex) def clean (self, i): Elemindex = Self.calcelemindex (i) Byteindex = Self.calcbitindex (i) Elem = self . array[elemindex]self.array[elemindex] = Elem & (~ (1 << byteindex)) def test (self, i): Elemindex = Self.calcelemindex (i) Byteindex = Self.calcbitindex (i) if Self.array[elemindex] & (1 << byteindex): Return       Truereturn Falseif __name__ = = ' __main__ ': MAX = 879suffle_array = [0, 2,, +,, 879, 340, 123 = []bitmap = Bitmap (MAX) for Num in Suffle_array:biTmap.set (num) for I in range (MAX + 1): If Bitmap.test (i): Result.append (i) print ' original array:%s '% Suffle_arrayprint ' sorted array is: %s '% result

$ python bitmap.py original array:   [45, 2, 78, 35, 67, 90, 879, 0, 340, 123, 46] The sorted array is: [0, 2, 35, 45, 46, 67, 78, 90, 123, 340, 879]

Conclusion

Bitmap is implemented, it is very simple to use it for sorting. Other languages can also implement bitmap, but for statically typed languages, such as C/golang, because you can directly declare unsigned integers, the available bits become 32 bits, just change the above code 31 to 32, please note.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.