Spent less than a week to read a reids design and implementation of the book, feel that the overall design some place is really ingenious, the structure of the connection between the very close, but very simple, logical not too much, but learned a Bitcount count 1 method is more ingenious, recorded
Read a foreigner's introduction of very detailed
Reproduced here
OK, let's go through the code line:
Line 1:
i = i - ((i >> 1) & 0x55555555);
First of all, the significance of the constant 0x55555555
is, written using the JAVA/GCC style binary literal notation),
0x55555555 = 0b01010101010101010101010101010101
That's, all it odd-numbered bits (counting the lowest bit as bit 1 = odd) 1
are, and all the even-numbered bits are 0
.
The expression ((i >> 1) & 0x55555555)
thus shifts i
the bits of right by one, and then sets all the even-numbered bits to zero. (Equivalently, we could ' ve first set all the odd-numbered bits of to zero with and then i
& 0xAAAAAAAA
shifted the R Esult right by one bit.) For convenience, let's call this intermediate value j
.
What happens if we subtract this from the j
original i
? Well, let's see how would happen if i
had only and bits:
i j i - j----------------------------------0 = 0b00 0 = 0b00 0 = 0b001 = 0b01 0 = 0b00 1 = 0b012 = 0b10 1 = 0b01 1 = 0b013 = 0b11 1 = 0b01 2 = 0b10
hey! We ' ve managed to count the bits of our two-bit number!
OK, but what if have more i
than and bits set? In fact, it's pretty easy-to-check that the lowest-bits of i - j
the would still be given by the table above, and so WI ll the third and fourth bits, and the fifth and sixth bits, and so and. In particular:
-
Despite the >> 1
, the lowest-bits of i-j
are not affecte D by the third or higher bits of i
, since they ' ll is masked out of J
by the & 0x55555555
; and
-
Since the lowest-bits of J
can never have a greater numerical value than t Hose of i
, the subtraction would never borrow from the third bit of i
: thus, the lowest Both bits of i
also cannot affect the third or higher bits of i-j
.
In fact, by repeating the same argument, we can see that the calculation on this line, in effect, applies the table above To all of the two-bit blocks in i
parallel. That's, after executing, the lowest, and the new value of would now contain the number of i
Bits set among the corresponding bits i
in the original value of, and so would the next, and so on.
Line 2:
i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
Compared to the first line, this one's quite simple. First, note that
0x33333333 = 0b00110011001100110011001100110011
Thus, i & 0x33333333
takes the two-bit counts calculated above and throws away every second one of the them, while (i >> 2) & 0x33333333
does the same after shifting right by both i
bits. Then we add the results together.
Thus, in effect, what's this line does was take the bitcounts of the lowest both and the Second-lowest the The Origina l input, computed on the previous line, and add them together to give the bitcount of the lowest four bits of the Input. And, again, it does this on parallel for all the 8 four-bit blocks (= hex digits) of the input.
Line 3:
return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
OK, what's going on here?
Well, first of all, (i + (i >> 4)) & 0x0F0F0F0F
does exactly the same as the previous line, except it adds the adjacent four-bit Bitcou NTS together to give the bitcounts of each eight-bit block (i.e. byte) of the input. (Here, unlike to the previous line, we can get away with moving &
the outside the addition, since we know that the Eig Ht-bit Bitcount can never exceed 8, and therefore would fit inside four bits without overflowing.)
Now we had a 32-bit number consisting of four 8-bit bytes, each byte holding the number of 1-bit in that byte of the orig Inal input. (Let's call these bytes A
, B
, and C
D
.) So why happens when we multiply this value (let's call it k
) by 0x01010101
?
Well, since 0x01010101 = (1 << 24) + (1 << 16) + (1 << 8) + 1
, we have:
k * 0x01010101 = (k << 24) + (k << 16) + (k << 8) + k
Thus, the highest byte of the result ends up being the sum of:
- Its original value, due
k
to the term, plus
- The value of the next lower byte, due
k << 8
to the term, plus
- The value of the second lower byte, due
k << 16
to the term, plus
- The value of the fourth and lowest byte, due to the term
k << 24
.
(in general, there could also are carries from lower bytes, but since we know the value of each byte are at 8, we know The addition would never overflow and create a carry.)
That's, the highest byte of k * 0x01010101
ends up being the sum of the bitcounts of the bytes of the input, i.e. the total bi Tcount of the 32-bit input number. The final then simply shifts this value off from the >> 24
highest byte to the lowest.
Ps. This code could easily is extended to 64-bit integers, simply by changing the 0x01010101
to 0x0101010101010101
>> 24
>> 56
. Indeed, the same method would even work for 128-bit integers; would require adding one extra shift/add/mask step, however, since the number is no longer quite fits into a n 8-bit Byte.
It is important for the 32-bit data as a whole, first of which 2 bits for a set of 1 of the number, and then a group of 4 bits, calculated the number of 1, and then a group of 8 bits, calculated the number of 1
In fact, 1 of the number is the sum of 4 8-bit, can be calculated by solving
such as 0XBBBBBBBB *0x01010101= 0xbbbbbbbb * (1<<24+1<<16+1<<8+1)
The total number of the last 32 bits in the 1 is stored in the high 8 bits, at this time only need >>24 to be able to find out
Of course, in practice, we can calculate a number of 32 bits at a time, such as the calculation of 4*32-like efficiency compared to the traversal of the savings of 128 times times, relative to the table method is also faster 4*4 times
Redis bitcount variable-precision Swar algorithm