Fast Alpha Algorithm for 16-bit colors on ARM CPU

Source: Internet
Author: User
Transferred from: bytes
MMX commands can be used to accelerate on desktops, but MMX commands are not widely used on arm CPUs.
I used a small trick to optimize the Alpha. Alpha mixing is actually very simple. All you need to do is to break down the source color C1, the destination color C2, and then divide the color
The numbers R1, G1, B1, R2, G2, and b2 are calculated by the formula (CLR * Alpha + CLR * (32-alpha)/32, respectively, finally, combine them into a color value. However, the computation speed is very slow. So now we need to use a technique. First, for a 16-bit color, the general format is 565, indicating the binary digits of the RGB component, 1. Then we can use a 32-bit variable to advance the green component of the color to the 2 format. In this way, there is a carry space in the middle of each color component, so there is no need to break down this color value. Then, the two color values after deformation are calculated according to the above formula. After calculation, the two color values are changed back to the 565 format, and an Alpha hybrid calculation is completed. The C language source code is as follows:
  1. _ Inline void makealpha (word * wpsrc, word * wpdes, word walpha)
  2. {
  3. Register DWORD D1; // The intermediate variable used for calculation. It is declared as a register variable faster.
  4. Register Word WA = * wpsrc; // source color
  5. Register Word WB = * wpdes; // target color
  6. Register DWORD alpha = walpha; // Alpha value. The color depth of 16-bit colors is 32,
  7. // The Alpha value is 0-32.
  8. // (C1-c2) * alpha/32 + C2 deformed by (C1 * Alpha + C2 * (32-alpha)/32
  9. // To reduce a multiplication operation
  10. // The following formula is intentionally written as this way, and the compiler will handle it well.
  11. // Faster than this
  12. // C1 = (WA <16) | wa) & 0x7e0f81f );
  13. // 16-bit deformation 32-bit 0x7e... f is 00000111111000001111100000011111 of Binary
  14. // C2 = (WB <16) | WB) & 0x7e0f81f );
  15. // D1 = (c1-c2) * alpha)> 5 + C2;
  16. // Divide by 32 to shift the right to 5 digits, but the displacement operation is much faster than multiplication and division,
  17. // For example, A * 320 can be written as a * 256 + A * 64 => (A <8) + (A <6)
  18. D1 = (WA <16) | wa) & 0x7e0f81f)-(WB <16) | WB) & 0x7e0f81f )) * alpha)> 5) + (WB <16) | WB) & 0x7e0f81f) & 0x7e0f81f;
  19. WA = (d1 & 0xffff0000)> 16; // G... r... B =>... g ..
  20. WB = D1 & 0 xFFFF; // G... r... B => r... B
  21. * Wpdes = wa | WB; // RGB
  22. }

We use the multiplication method written in C language. The compiler can only perform partial optimization. The generated Assembly is composed of some cycles of shift addition. Sometimes the addressing of array elements is similar, therefore, in addition to compiling, there is also room for optimization. This method achieves computation efficiency close to that of Assembly and has nothing to do with CPU, so it is easy to transplant.
Original article address Http://blog.pdafans.com /? 72643/viewspace-1056.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.