Optimizing branch code -- avoid redirection command congestion

Source: Internet
Author: User

File: noifop.txt
Name: optimized branch code -- avoid redirection command congestion
Author: zyl910
Blog: http://blog.csdn.net/zyl910/
Version: v2.00
Updata: 2006-10-11

(Pay attention to modifying the downloaded extension)

I. Cause-saturation Processing

When compiling an image processing program, the RGB value usually exceeds the range [0,255. At this time, you have to perform saturation processing to saturation the value of the cross-border to the boundary, that is, the code like this:
If (r <0) r = 0;
If (r> 255) r = 255;
If (G <0) G = 0;
If (G> 255) G = 255;
If (B <0) B = 0;
If (B> 255) B = 255;

However, such code execution efficiency is very low. This is because if blocks are compiled and translated into Jump commands, which seriously affects the efficiency of modern ultra-pipeline CPU pipelines.
At this time, the CPU manufacturer proposed two solutions: one is to increase the branch prediction hardware to minimize the impact of Jump commands on the pipeline; the other is to design SIMD instruction sets such as MMX and SSE, they are inherently saturated computing commands and can also be computed in parallel.
Branch prediction has a better effect on the jump compiled by the loop statement, because the jump will be executed during the loop, and the failure will be predicted only when the last loop ends. The saturation processing effect is not very good, because the RGB values calculated in each loop are different (because they are not the same pixel), and the possibility of failure prediction is very high. Therefore, branch prediction has little contribution to saturation processing.
Use the SIMD command. It is a perfect solution, and SIMD commands are strongly recommended when conditions are met. However, because the advanced language cannot describe the SIMD command, you can only compile the SIMD code by hand, which undoubtedly makes it difficult to understand the code. Furthermore, we sometimes need to write image processing programs on Virtual Machine platforms such as Java and. net. At this time, we cannot directly use the SIMD instruction set.
Therefore, we need an algorithm that performs saturation processing under a general instruction set (expressed in advanced languages) and avoids if redirection.

First, try the saturation processing algorithm that is less than zero.

Do you still remember the role of "and" operations? When an integer and mask are processed and computed, the original value is retained for the full 1 mask, and zero is returned for the full 0 mask.
Then think about how to generate the necessary mask. The result of C-Language Comparison is 0 and 1. How can we convert them into a mask of all 0 or all 1? The answer is simple. Calculate or subtract one. Because the negative expression is simple, you like to use negative:
N & =-(n> = 0)

First, compare N with 0. When N> = 0, the comparison result is 1. When n <0, the comparison result is 0.
Then calculate the negative value. When N> = 0, the result is-1 (all 1). When n <0, the result is all 0.
Then perform the "and" Operation on the original number and the mask obtained above.

With the above algorithm, we can easily come up with an algorithm that handles more than 255 cases:
N = (N |-(N >=256) & 0xff

First, compare N with 256. When n <256, the comparison result is 0. When n> = 256, the comparison result is 1.
Then calculate the negative value. When N> = 256, the result is-1 (full 1). When n <256, the result is 0.
Then, perform the "or" Operation on the original number and the mask obtained above.
Finally, perform operations with 0xff to change all 1 to 0xff (decimal 255 ). As for n <256, It is originally in the range of [0,255], and the result remains unchanged.

Can I optimize it? Since a number cannot be less than 0 and greater than 255 at the same time, you can combine the two lines of code into one line. Because we generally Save the result to a Byte variable, and perform a forced type conversion, we do not need "& 0xff ". Finally, it is too troublesome to write so long code each time. It should be encapsulated into a macro:
# Define limitsu_fast (n, BITs) (n) &-(n)> = 0) |-(n) >=( 1 <(BITs ))))
# Define limitsu_safe (n, BITs) (limitsu_fast (n, BITs) & (1 <(BITs)-1 ))

Bits indicates the number of BITs, for example, byte is 8. If you think there are too many parameters, you can define macros or define inline functions:
# Define limitsu_byte (N) (byte) (limitsu_fast (n, 8 )))

Analyze the computing complexity:
The IF method requires two comparisons and two jumps;
This method requires two comparisons (and two comparisons are converted to numerical values), two negative computations, one operation and one operation, and one or operation.

It can be seen that this method adds a total of 6 bitwise operations when two jumps are reduced. Fortunately, bitwise operations are simple commands that can be completed within one clock cycle. Therefore, in the case of modern ultra-pipeline CPU, these 6-bit operations have less overhead than if redirection.

The above is just an algorithm in the case that the comparison result is 0 or 1 like the C language. What if the comparison result is 0 or-1 like basic? In fact, it is very simple. What we need is 0 and-1. Basic also supports and, Or, XOR, and other operators. Write the Code as follows:
By = (n and (n >=0) or (n >=256) and & HFF)

Test: omitted. See v100.rar in the olddirectory.

 

Ii. Promotion

This method is not only suitable for saturation processing, but also can be applied to other aspects.

2.1 condition mask

# Define ifmasknum (N, C) (n) &-(c ))

Parameters:
N: mask
C: condition. 0 or 1

Return Value: If C is 1, the return value is N. If C is 0, the return value is 0.

 

2.2 min and Max

# Define fastmin (A, B) (a) + (B)-(a) &-(B) <())))
# Define fastmax (A, B) (a) + (B)-(a) &-(B)> ())))

Explanation:
When B is compared with A, it will actually perform the subtraction operation, and there is "B-A" in front, which will be optimized by the compiler.
Note that this method will cause overflow, so only languages like C will not throw an integer overflow exception.

 

2.3 case-sensitive Conversion

# Define charucase (C) (c) ^ (0x20 &-(c)> = 'A' & (c) <= 'Z ')))
# Define charlcase (C) (c) ^ (0x20 &-(c)> = 'A' & (c) <= 'Z ')))

Explanation:
The ASCII code of 'A' is 0x41
The ASCII code of 'A' is 0x61
They differ by 0x20, that is, D5.
So when the condition is met, I can reverse the bitwise (note that "^" is an exception or operator ).

 

2.5 to hexadecimal characters

# Define tohexchar (I) ('0' + (I) + ('A'-('0' + 10) &-(I)> = 10 )))

Explanation:
The "('A'-('0' + 10)" will be compiled and optimized to a constant.
As for how to perform inverse conversion, we can only use the lookup table method.
 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.