Byte reverse bits in reverse order

Source: Internet
Author: User

Http://www.hookcn.org/2011/01/reverse-bits.html

 

The question originated from a company is very simple:

Input a byte (8 bits) and sort it in reverse order.

That is to say, if the eight bits in the input byte are "abcdefgh", you need to get "hgfedcba ". As an interview or written question, naturally, it implies a requirement that the efficiency should be as high as possible.
There is another extended version for this problem, perhaps more on the Internet: the input is not a byte but a 32-bit integer (DWORD), which is in reverse order.

To be honest, sometimes such a question is a little tricky. Do you need to evaluate the programmer's abilities like this? The reason why we add the tag "trick and trick" to it is also to express the meaning: the use of such technologies is not extensive, on the contrary, most of the time it is not used at all. To optimize the code efficiency is the most obvious reason, it is understandable to use these methods in the Code Section with the most core and bottleneck. We should not use them indiscriminately at ordinary times. Grandpa Donald
Knuth, Shen Shu
Author of taocp) also has a famous saying:"Premature optimization is the source of all evil"(Premature optimization is the root of all edevil). However, there are not many disadvantages in research. You can at least develop your mind, exercise your mental power, and learn optimization methods. If you are lucky, you can take over a pen or interview question.

An ordinary solution is as follows (here the input is uint, And the byte version corresponds to the modification type ):

?
123456789101112 typedef
unsigned int
UINT; UINT
reverse_bits(
UINT
input) {
    const
UINT BITS_OF_BYTE =
8; // The number of bits per byte
    UINT
result = 0;            
// The result is stored here    // Process each bit in the following Loop
    for
(UINT
i =
0; i < sizeof(input) * BITS_OF_BYTE; i++) {
        // Extract the last bit of the input and add the result. The other bits are shifted to the left.
        result = (result <<
1) | (input &
1);         input >>=
1;         // Remove the last one from the right shift
    }
    return
result; }

However, this solution is obviously inefficient. First, it takes N cycles to process an N-bit integer. In each loop, four instructions are contained in the loop body, there are two more commands, namely, 6 N, to modify the cyclic variables and to jump to the condition. (The value assignment command can be ignored because these variables do not exceed the register size and can be optimized and kept in registers ). For simple tasks such as byte reverse order, 48 commands are required, which is a little lengthy.

Is there a solution with fewer commands? Of course! But these solutions are not as straightforward and easy to understand as ordinary solutions.

One of the many answers I have read on the internet is as follows:

?
12345678910 // Exchange twov = ((v >> 1) &
0x55555555) | ((v &
0x55555555) <<
1); // Exchange the first two and the last two of every four digits
v = ((v >> 2) &
0x33333333) | ((v &
0x33333333) <<
2); // Exchange the first four and last four digits of every eight digits
v = ((v >> 4) &
0x0F0F0F0F) | ((v &
0x0F0F0F0F) <<
4); // Swap two adjacent bytesv = ((v >> 8) &
0x00FF00FF) | ((v &
0x00FF00FF) <<
8); // Exchange the first and second double bytesv = ( v >>
16) | ( v               <<
16);

The above code processes 32-bit integers. If the input is byte, you only need to have three similar rows, as shown below:

?
123456 // Exchange twov = ((v >> 1) &
0x55) | ((v &
0x55) << 1);
// abcdefgh -> badcfehg // Exchange the first two and the last two of every four digits
v = ((v >> 2) &
0x33) | ((v &
0x33) << 2);
// badcfehg -> dcbahgfe // Switch the first four digits and the last four digitsv = ( v >> 4
) |  (v         << 4);
// dcbahgfe -> hgfedcba

Of course, there is no problem with the longer input. This mode can be expanded to 64-bit, 128-bit ......

The magic of this Code is that, if we exchange two bytes through an operation (for example, switching A and H), the other bits are not affected by this operation, therefore, you can naturally consider "Parallel" operations for the exchange of multiple locations. So with the above solution, the central idea is to divide each bit into groups, one-time exchange of all adjacent groups. Then, by changing the size of the switch group, each bit will eventually reach the place where it needs to go. The switching scale of this solution is from small to large. In fact, it can be from large to small. If you are interested, you can try it on your own.

The number of commands for this group exchange solution is 5 * log2 (N)-2, which is no more than an order of magnitude than that of the ordinary solution. At n = 32, the number of commands increased by more than 23:, which has greatly improved. However, programmers who love the world are still not satisfied. In the case that n = 8 is a byte in reverse order, 13 commands are used in this solution. Is there even less?

Please refer to the following solution by God (with 64-bit operations ):

?
12 unsigned
char b; // Bytes to be reversed
b = (b * 0x0202020202ULL
& 0x010884422010ULL) %
1023;

Although I have already read this solution over and over again, it is still deeply shocked by the whimsy contained in it. I only used three commands! Here, I will try to explain how this method is implemented. First, use multiplication to copy the original byte into five portions, and add them to a 64-bit integer at the beginning and end. Then, use the & operation to retrieve the specific bit. The results of these two operations are, the eight digits of the original byte are placed in the correct position of the five "10-bit groups" respectively ("correct" refers to the position that should be placed after the reversal ). Finally, use "% 1023" to overlay the five "10-bit groups" and the final result is displayed! Take a look at the specific computing process listed below to better understand:

For ease of reading, the original byte uses uppercase letters, and the "0" in the formula is replaced by the character ".". I hope this will be clearer. ...... 1 ....... 1 ....... 1 ....... 1 ....... 1. // 0x02020202 * abcdefgh ---------------------------------------------------...... H ....... H ....... H ....... H ....... h. // There Is A 0 on the tail. Don't forget it ...... g ....... g ....... g ....... g ....... g ....... F ....... F ....... F ....... F ....... F ....... E ....... E ....... E ....... E ....... E ....... d ....... d ....... d ....... d ....... d ....... C ....... C ....... C ....... C ....... C ....... B ....... B ..... .. B ....... B ....... B ....... A ....... A ....... A ....... A ....... a. ---------------------------------------------------...... abcdefghabcdefghabcdefghabcdefghabcdefgh. &...... 1 .... 1... 1 .... 1... 1 .... 1... 1 ........ 1 .... ---------------------------------------------------...... A .... f... B .... g... C .... h... d ........ E .... (*) note: % ..................................... 1111111111 ------------------------------------- --------------....................................... Hgfedcba (*) cannot be clearly connected here. We can split it into 10 digits :......... A .... f... B .... g... C .... h... d ........ E .... you can see that in such a group, each bit of the original byte is in the correct position (the maximum two digits are zero ).

Source of the above computing process diagram
Log4think, changed. Thanks to the meticulous efforts of Simon!

If you really love it, you must say that this computation process is barely understood, but there are still several questions that have not been explained:

  1. Why do we need to copy 5 instead of 6 or 4?
  2. Why is there a 0 on the tail?
  3. Why is a group composed of 10 digits instead of XX digits?
  4. Why is the result of the calculation of % 1023 superimposed by a group of 10 digits?

Okay. The answer is as follows:

Why do we need to copy 5 instead of 6 or 4?
The answer to this question is straightforward: because there are not enough 4 copies, how can we do it? There are too many 6 copies, not required.
If we need to prove the above argument, we will see it later.

Why is there a 0 on the tail?
I guess the author of this solution first tried to use 0x0101010101 as the multiplier. It is just found that this is a waste of the copy of the byte of the second BIT (because all eight digits are intact, and each bit is not in the correct position), so the multiplier is moved to the left, in this way, the last byte copy can get at least one e in the correct position. In fact, only one single digit can be obtained at most, which is easy to verify.

Why is a group composed of 10 digits instead of XX digits?
First of all, the grouping with fewer than eight digits certainly does not work. Why cannot we select eight digits. Grouping by 8 bits obviously does not work. You will find that each group is the same and you can only select the same bit. Then try to select the correct position for a group of 9 bits? You will find that five copies are not enough. Therefore, the 10-bit group is already the smallest group.
So is it better for a group with a larger number than 10 digits? You must know that no matter whether one or more digits are moved to the left at the beginning and then multiplied, it is clear that the minimum group can only select one single digit, while the remaining group can select at most two digits [1]. therefore, it takes at least five groups to select eight digits (strictly speaking, the 5th groups with the highest bits can be incomplete, so at least four groups + one bits are required ).
Since the 10-bit grouping is the smallest group and only requires five numbers, this is already the best.

1. This conclusion proves. Simply put, we have a reverse sequence (such as 87654321...) and a forward sequence (such as 12345678...). The length is the size of a group. The two sequences correspond to bits (8-1, 7-2 ,...). If the first coincidence (I mod 8 J) occurs at the ordinal number (I, j), the ordinal number of the subsequent bits (one is increased by one) also to 8 with the remainder can overlap, that is, the reverse order of I-4 and positive order of J + 4, reverse order of I-8 and positive order of J + 8, and so on. Note that each group can only select a maximum of eight low bits for superposition. Obviously
I, j is how much, a maximum of two digits (I-4, J + 4) AND (I, j) can be selected.

In fact, because at least four groups and one bit are required, the maximum number of groups can be 15 characters under the 64-bit limit. In fact, it is easy to verify that 10-bit grouping and 14-bit grouping are only two feasible grouping methods.

Why is the result of the calculation of % 1023 superimposed by a group of 10 digits?
This is based on the following principle: % (2n-1) is actually to write this number into a 2n hexadecimal number and then take the sum of the coefficients of each order (strictly speaking, it is just the same remainder ), the order coefficients written as 2n numbers are n-bit groups. Thus, the result of % (2n-1) is the result of grouping by N digits. In particular, % 1023 is the superposition of order coefficients according to 1024 (210), which is the superposition of 10 groups.
As a matter of fact, this principle does not have to be 2n hexadecimal. We can also draw stronger conclusions. For any X-base, we have: "Any integer N, the sum of the coefficients expanded in x base is the same as N % (X-1 ". Use the formula to express the following:

Considering the Integer Polynomial p (x) = AXN + bXn-1 +... + Z, there are
P (x) mod (X-1) returns a + B +... + z

The proof is also straightforward. Set y = x-1 to the above formula. If you are interested in the process, you can go here and thank Simon for helping you write the formula.

In particular, if X = 10, .. Z is an integer in the range [0, 9]. p (x) is a 10-digit number written in order to expand, so it is easy to get the following fast computing skills:

A. N mod 9 then (the sum of all numbers in N) mod 9 then (the sum of all numbers in N) mod 9... and so on
B. It is immediately available from the previous one. "N can be divisible by 9" is equivalent to "n's numbers and can be divisible by 9"
C. In particular, because 10 = 32 + 1, the above two computation techniques are also true for 3. For example, the sum of the numbers in 3 is also a multiple of 3.

I believe that everyone has learned these things in elementary school. Are you more familiar with these things? :)

Finally, the answer is complete. So again, we strongly recommend what we just saw, as the "Byte Reverse Order" solution of God's assistance-only three instructions are required. If it has any disadvantages, I am afraid that Division (remainder) and 64-bit environments are used.

What if Division is not required? What if there is only 32 bits?
Of course, there are other wonderful solutions to meet these conditions. In fact, several algorithms in this article come from here (English). There are many other tricks and tricks about bit operations. If you are interested, you can visit them on your own.

Http://graphics.stanford.edu /~ Seander/bithacks.html

 

 

N = (N & 0x55555555) <1 | (N & 0 xaaaaaaaa)> 1;
N = (N & 0x33333333) <2 | (N & 0 xcccccccc)> 2;
N = (N & 0x0f0f0f) <4 | (N & 0xf0f0f0f0)> 4;
N = (N & 0x00ff00ff) <8 | (N & 0xff00ff00)> 8;
N = (N & 0x0000ffff) <16 | (N & 0xffff0000)> 16;

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.