Transferred from: bytes
MMX commands can be used to accelerate on desktops, but MMX commands are not widely used on arm CPUs.
I used a small trick to optimize the Alpha. Alpha mixing is actually very simple. All you need to do is to break down the source color C1, the destination color C2, and then divide the color
The numbers R1, G1, B1, R2, G2, and b2 are calculated by the formula (CLR * Alpha + CLR * (32-alpha)/32, respectively, finally, combine them into a color value. However, the computation speed is very slow. So now we need to use a technique. First, for a 16-bit color, the general format is 565, indicating the binary digits of the RGB component, 1. Then we can use a 32-bit variable to advance the green component of the color to the 2 format. In this way, there is a carry space in the middle of each color component, so there is no need to break down this color value. Then, the two color values after deformation are calculated according to the above formula. After calculation, the two color values are changed back to the 565 format, and an Alpha hybrid calculation is completed. The C language source code is as follows:
- _ Inline void makealpha (word * wpsrc, word * wpdes, word walpha)
- {
- Register DWORD D1; // The intermediate variable used for calculation. It is declared as a register variable faster.
- Register Word WA = * wpsrc; // source color
- Register Word WB = * wpdes; // target color
- Register DWORD alpha = walpha; // Alpha value. The color depth of 16-bit colors is 32,
- // The Alpha value is 0-32.
- // (C1-c2) * alpha/32 + C2 deformed by (C1 * Alpha + C2 * (32-alpha)/32
- // To reduce a multiplication operation
- // The following formula is intentionally written as this way, and the compiler will handle it well.
- // Faster than this
- // C1 = (WA <16) | wa) & 0x7e0f81f );
- // 16-bit deformation 32-bit 0x7e... f is 00000111111000001111100000011111 of Binary
- // C2 = (WB <16) | WB) & 0x7e0f81f );
- // D1 = (c1-c2) * alpha)> 5 + C2;
- // Divide by 32 to shift the right to 5 digits, but the displacement operation is much faster than multiplication and division,
- // For example, A * 320 can be written as a * 256 + A * 64 => (A <8) + (A <6)
- D1 = (WA <16) | wa) & 0x7e0f81f)-(WB <16) | WB) & 0x7e0f81f )) * alpha)> 5) + (WB <16) | WB) & 0x7e0f81f) & 0x7e0f81f;
- WA = (d1 & 0xffff0000)> 16; // G... r... B =>... g ..
- WB = D1 & 0 xFFFF; // G... r... B => r... B
- * Wpdes = wa | WB; // RGB
- }
We use the multiplication method written in C language. The compiler can only perform partial optimization. The generated Assembly is composed of some cycles of shift addition. Sometimes the addressing of array elements is similar, therefore, in addition to compiling, there is also room for optimization. This method achieves computation efficiency close to that of Assembly and has nothing to do with CPU, so it is easy to transplant.
Original article address Http://blog.pdafans.com /? 72643/viewspace-1056.html