In-depth discussion using bitmask instead of Branch (1): Generate mask using signed shift

Source: Internet
Author: User
Tags bitmask

A few years ago I wrote an article "Optimizing branch code-avoiding jump command line congestion" (http://blog.csdn.net/zyl910/article/details/1330614 ). Because at that time, it was a rough preparation of notes. I have gained new experiences over the past few years, so I decided to discuss it in depth and answer my comments.

Housisong (http://blog.csdn.net/housisong) mentioned using signed shift to generate a mask --
(Assume N is a 32-bit signed number): (n> 31) WHEN n> = 0, the result is 0x00000000. When n <0, the 0xffffffff mask is obtained, then use the mask to merge the branches.

This is a good idea, avoiding access to status registers.
But this solution also has limitations --
1. Some programming languages (such as VB6) do not have a signed shift operator.
2. Only comparison with 0 can be judged. However, in many cases, we need to obtain the mask after comparing specific integers.

There is no signed shift operator, which is not a problem. Currently, most mainstream programming languages Support symbol shifting. For example, Microsoft has created VB. NET and supports symbol shifting.
The simplest way to calculate the mask after a specific integer comparison is the method used in the previous article-generate a mask based on "the result of C language comparison is 0 and 1. But now we know that this method will access the Status Register, which affects the efficiency. Is there a way to not rely on the Status Register?

One idea is to use subtraction to convert "compare with a specific integer" to "compare with 0 ". However, this may cause overflow, resulting in incorrect calculation results or throwing an exception (when an integer overflow exception is checked ). If overflow processing is performed, the complexity and efficiency are increased.
I have been suffering from this problem for a long time. Later, I suddenly thought that in some cases, it was not necessary to handle the overflow problem.

Because the most commonly used signed type in image processing is a 16-bit integer, I use a 16-Bit Signed INTEGER (signed short) Here. The mask should be shifted to a 15-bit value.

I. Computing mask

Comparison between 1.1 and 0

Let's warm up and review the mask algorithm when compared with 0 --
Mask = n> 15 // when "<0", all 1, and "> = 0", all 0

After adding an anti-operator, you can obtain the mask --
Mask = ~ (N> 15) // "> = 0" indicates that all values are 1, and "<0" indicates that all values are 0.

If you calculate the negative value for n first, you can obtain the mask --
Mask = (-N)> 15 // when "> 0", all 1, and "<= 0" are 0
Note: overflow occurs. This is because integers generally use the complement representation. For a 16-bit signed number, the value range is [-32768,327 67] and cannot represent 32768 of the positive number. If an integer overflow exception is ignored, the negative result of-32768 is-32768. After a signed shift, the value is changed to full 1, which is inconsistent with the concept of "> 0.

Add an anti-operator to obtain the mask --
Mask = ~ (-N)> 15) // "<= 0" indicates that all values are 1, and "> 0" indicates that all values are 0.
Note: An overflow occurs when n is-32768.

1.2 comparison with x

Although the formula above is compared with 0, it is difficult to understand it because it is not written to 0. So now add 0 --
Mask = (n-0)> 15 // "<0" when all 1, "> = 0" when all 0
Mask = ~ (N-0)> 15) // "> = 0" when all 1, "<0" when all 0
Mask = (0-n)> 15 // "> 0" indicates that all values are 1, and "<= 0" indicates that all values are 0.
Mask = ~ (0-N)> 15) // "<= 0" indicates that all values are 1 and "> 0" indicates that all values are 0.

In the above formula, we can replace 0 with any integer x --
Mask = (n-x)> 15 // when "<X" is set to 1, "> = x" is set to 0.
Mask = ~ (N-x)> 15) // "> = x" indicates that all values are 1, and "<X" indicates that all values are 0.
Mask = (X-N)> 15 // "> X" indicates that all values are 1, and "<= x" indicates that all values are 0.
Mask = ~ (X-N)> 15) // "<= x" indicates that all values are 1 and "> X indicates that all values are 0.

Because it is an arbitrary integer x, there may be overflow during subtraction.

Ii. saturation Processing

When writing an image processing program, the RGB value usually exceeds the range [0,255. At this time, you have to perform saturation processing to saturation the value of the cross-border to the boundary, that is, the code like this:
If (r <0) r = 0;
If (r> 255) r = 255;

Now we will use bitwise operations to optimize such code.

2.1 Processing when "<0"

We can use and computation to change the value to 0. We should select the mask of ["> = 0" for all 1 and "<0" for all 0. The statement is --
Mask = ~ (N> 15) // "> = 0" indicates that all values are 1, and "<0" indicates that all values are 0.
M = N & Mask

Sort it into an expression --
M = N &~ (N> 15)

Because this formula does not use subtraction, it does not overflow when n is any value.

2.2 "> 255" Processing

We can use the or operation to change the number that exceeds the range to 1. Then perform the operation with 0xff and change the value out of the range to 255. This is based on 255 (0xff), which is exactly 8 characters lower.

How can we determine that it is out of range? There are three policies --
A. "> 255 ". Standard mode.
B. "> = 256 ". Because of the continuity of integers.
C. "> = 255 ". After saturation processing of 255, the result is still 255.

To avoid status registers, you can only use the previous bitmask algorithm instead of comparison statements. The three policies are listed to find the most efficient solution.
Review the comparison between "1.2" and "X" to find out the formula for judging "> X" and "> = x --
Mask = ~ (N-x)> 15) // "> = x" indicates that all values are 1, and "<X" indicates that all values are 0.
Mask = (X-N)> 15 // "> X" indicates that all values are 1, and "<= x" indicates that all values are 0.

After comparison, we found that the calculation of "> X" is the least, so we should select policy. The statement is --
Mask = (255-n)> 15
M = (N | mask) & 0xff

Sort it into an expression --
M = (N | (255-n)> 15) & 0xff

Note that this formula is valid only when n is greater than or equal to 0.

2.3 saturation Processing

Now we start to consider the actual saturation processing, that is, the "<0" is corrected to 0, and the "> 255" is corrected to 255.

First, sort out the above results --
M = N &~ (N> 15) // processing when "<0. N is valid for any value.
M = (N | (255-n)> 15) & 0xff // "> 255. Valid only when n is greater than or equal to 0.

Because "> 255" is invalid when n is less than 0, and "<0" is valid at any time. Therefore, we can first process "> 255" and then "<0" to block the intermediate error value. The statement is --
M = (n | (255-n)> 15 ))&~ (N> 15) & 0xff

Analysis --
When n <0: although the value of "(N | (255-n)> 15)" is invalid, (N> 15) "is 0, and" & ~ (N> 15) ", the result is 0.
When N> = 0 and n <= 255: the value of "(255-n)> 15)" is 0, "| (255-n)> 15) "will retain the original value. "~ (N> 15) "values are all 1," & ~ (N> 15) "will also retain the original value.
When N> 255: "(255-n)> 15)" values are all 1, "~ (N> 15) "value is also full 1, and the result of" & 0xff "is 255.

Because we generally Save the result to a Byte variable, and perform a forced type conversion, we do not need to "& 0xff "--
M = (byte) (n | (255-n)> 15 ))&~ (N> 15 ))

Iii. Practical Application

In practical use, the above Code is relatively long and difficult to maintain. It can be encapsulated as a macro and promoted by the way --
# Define limitsw_fast (n, BITs) (n) | (signed short) (1 <(BITs)-1-(N)> 15 )) &~ (Signed short) (n)> 15 ))
# Define limitsw_safe (n, BITs) (limitsw_fast (n, BITs) & (1 <(BITs)-1 ))

Bits indicates the number of BITs, for example, byte is 8 --
# Define limitsw_byte (N) (byte) (limitsw_fast (n, 8 )))

Iv. Test code

The test code is as follows --

// Use a bitmask for saturation. Use a negative toggle to generate a mask.
# Define limitsu_fast (n, BITs) (n) &-(n)> = 0) |-(n) >=( 1 <(BITs ))))
# Define limitsu_safe (n, BITs) (limitsu_fast (n, BITs) & (1 <(BITs)-1 ))
# Define limitsu_byte (N) (byte) (limitsu_fast (n, 8 )))

// Use a bit mask for saturation. Use a signed shift to the Right to generate a mask.
// # Define limitsw_fast (n, BITs) (n )&~ (Signed short) (n)> 15) | (signed short) (1 <(BITs)-1-(N)> 15 ))
# Define limitsw_fast (n, BITs) (n) | (signed short) (1 <(BITs)-1-(N)> 15 )) &~ (Signed short) (n)> 15 ))
# Define limitsw_safe (n, BITs) (limitsw_fast (n, BITs) & (1 <(BITs)-1 ))
# Define limitsw_byte (N) (byte) (limitsw_fast (n, 8 )))

Signed short Buf [0x10000]; // place the value in the array to avoid excessive Compiler Optimization

Int main (INT argc, char * argv [])
{
Int I; // cyclic variable (32-bit)
Signed short N; // Current Value
Signed short m; // Temporary Variable
Byte by0; // use if branch for saturation Processing
Byte by1; // The bitmask is used for saturation processing. A negative mask is used to generate a mask.
Byte by2; // use a bit mask for saturation. Use a signed shift to the Right to generate a mask.

// Printf ("Hello world! \ N ");
Printf ("= noifcheck = \ n ");

// Initialize Buf
For (I = 0; I <0x10000; ++ I)
{
Buf [I] = (signed short) (I-0x8000 );
// Printf ("% d \ n", Buf [I]);
}

// Check "<0" for processing
Printf ("[test: less0] \ n ");
For (I = 0; I <0x8100; ++ I) // [-32768,255]
// For (I = 0x7ffe; I <= 0x8002; ++ I) // [-2, 2]
{
// Load the value
N = Buf [I];

// Use if branch for saturation Processing
M = N;
If (M <0) m = 0;
By0 = (byte) m;

// Use a bitmask for saturation. Use a negative toggle to generate a mask.
By1 = (byte) (N &-(n> = 0 ));
If (by1! = By0) printf ("[Error] 1.1 neg: [% d] % d! = % D \ n ", N, by0, BY1); // verify

// Use a bit mask for saturation. Use a signed shift to the Right to generate a mask.
By2 = (byte) (N &~ (Signed short) n> 15 ));
If (by2! = By0) printf ("[Error] 1.2 SAR: [% d] % d! = % D \ n ", N, by0, by2); // verify
}

// Check "> 255" for processing
Printf ("[test: great255] \ n ");
For (I = 0x8000; I <0x10000; ++ I) // [0, 32767]
// For (I = 0x80fe; I <= 0x8102; ++ I) // [254,258]
{
// Load the value
N = Buf [I];

// Use if branch for saturation Processing
M = N;
If (M> 255) M = 255;
By0 = (byte) m;

// Use a bitmask for saturation. Use a negative toggle to generate a mask.
By1 = (byte) (N |-(N >=256 ));
If (by1! = By0) printf ("[Error] 2.1 neg: [% d] % d! = % D \ n ", N, by0, BY1); // verify

// Use a bit mask for saturation. Use a signed shift to the Right to generate a mask.
By2 = (byte) (N | (signed short) (255-n)> 15 ));
If (by2! = By0) printf ("[Error] 2.2 SAR: [% d] % d! = % D \ n ", N, by0, by2); // verify
}

// Check saturation Processing
Printf ("[test: Saturation] \ n ");
For (I = 0; I <0x10000; ++ I) // [-32768,327 67]
// For (I = 0x7ffe; I <= 0x8102; ++ I) // [-2,258]
{
// Load the value
N = Buf [I];

// Use if branch for saturation Processing
M = N;
If (M <0) m = 0;
Else if (M> 255) M = 255;
By0 = (byte) m;

// Use a bitmask for saturation. Use a negative toggle to generate a mask.
By1 = limitsu_byte (N );
If (by1! = By0) printf ("[Error] 3.1 neg: [% d] % d! = % D \ n ", N, by0, BY1); // verify

// Use a bitmask for saturation. Use a negative toggle to generate a mask.
By2 = limitsw_byte (N );
If (by2! = By0) printf ("[Error] 3.2 SAR: [% d] % d! = % D \ n ", N, by0, by2); // verify
}

Return 0;
}

 

 

Test result --

All passed!

 

Download source code --
Http://files.cnblogs.com/zyl910/noifCheck.rar

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.