Practical Verification of improving alphablend Efficiency

Source: Internet
Author: User

Address: http://blog.csdn.net/alien75/article/details/6061907


In the previous article <thinking about how to improve alphablend efficiency>, I mentioned several ways to improve efficiency. However, all these methods are not verified. I recently found that there are still many improvements through searching for some materials and hands-on verification. The following is a summary:

I. Basic Knowledge
1. What is bpp16, bpp24, and bpp32?
2. What is rgb555, rgb565, and rgb888?
3. What is color depth?
Color depth should be the level of color that can be expressed by R, G, and B. For example, bpp24 or bpp32-bit single color is 8 bits, and its color depth is 256 bits. The bpp16 in the window system is actually rgb555, so the color depth is 32. To generate rgb565 in windows, other tools are required. You can see the bpp24 bitmap generated by the drawing tool. Then, you can only convert it to bpp15 by using ACDSee for editing. In this regard, you can use a third-party tool to generate an rgb565 image, and its attribute median depth is 32 bitmap, which is verified.

Ii. Analysis
The color depth of bpp16 is 32, while that of bpp24 or bpp32 is 256. According to the description of the alphablend formula in msdn (the constant of the series is 255), it should not support the bpp16 hybrid operation, however, after actual verification, it is found that the constant-level Hybrid Operation bpp16 is acceptable, while the pixel-level operation is not acceptable (this is because bpp16 does not have channel a information ), therefore, we need to implement the pixel-level hybrid operation of bpp16 (this algorithm should also be feasible for constant-level operations ).
I mentioned in the article <thoughts on improving alphablend efficiency> that PNG and BMP have their own advantages and disadvantages. However, if rgb565 is used, you can use a self-implemented algorithm to implement constant-level hybrid operations, in this case, the file size is reduced (rgb565/bpp16 is only half of rgb888/bpp32), and the PNG decoding process is saved by rgb565/bpp16. In theory, the efficiency can be further improved. Of course, to implement pixel-level hybrid operations, you need to maintain an information file of Channel A, which is better than zero even though the size is reduced by only 1/4.
When there are many image resources, you can consider packaging and storing them. When necessary, load them to the memory and decompress them for processing. This should improve the efficiency and verify it when there is time.
During verification, the test platform is a 800*480 * bpp16 simulator generated by vs2005sp1 (release mode, optimized speed) and deviceemulatorbsp, two bpp32 or bpp16 full-screen bitmaps stored on the SD card are used for system or user-defined alphablend function operation and then pasted to the device DC, only count the time of the Hybrid Operation (calculate the average value for five times). This time should be different on the actual device, but it should be similar from a vertical perspective. The image resources are also different on the SD card or nandflash.

Iii. test example
1. Use the system Dib method (shloaddibitmap) to load bpp16 for constant-level computation of alphablend IN THE SYSTEM
367 ms (device 248 ms)
2. Load bpp32 in Dib mode (without Channel A) for constant-level computation of alphablend IN THE SYSTEM
376 ms (device 258 ms)
3. Load bpp32 (with Channel A) in the dib mode for the alphablend pixel-level operation of the System
438 ms (device 295 ms)
4. Use the dib mode to load bpp32 images with a channel for custom pixel-level operations
34 Ms (61 ms)
5. Load bpp16 (rgb565) images in Dib mode for Hybrid Operation
(1) split the R, G, and B of the source and target, and then calculate and merge them (see reference 2)
24 Ms (36 ms)
(2) do not split the R, G, and B of the source and target directly after preprocessing (see reference 1)
Time consumption: 16 ms (device 25 ms)
(3) Two pixels of the source and target are calculated at a time (see reference 2)
18 ms (device 30 ms)
(4) use assembly to optimize the single pixel operation function makealpha (2)
Time consumed 32 ms (35 ms for the device)
(5) use assembly to optimize (3)
Todo
(6) Optimization Using SIMD commands (SIMD was available at the beginning of armv6 Architecture)
Todo

Iv. Summary
1. algorithms are very important
2. Local Assembly has little impact on efficiency improvement
3. I found a misunderstanding that persists when using createdibsection.
When creating Dib of rgb565, You need to specify a color table, which is actually the masks of R, G, and B (r corresponds to 0xf800, G corresponds to 0x07e0, B Corresponds to 0x001f) used to extract R, G, and B from pixels. This information is carried by bmicolors, a member of bitmapinfo (the second parameter of createdibsection). If this parameter is not specified, the system displays a partial color. However, the bitmapinfo member bmicolors cannot carry all data with only one DWORD. Therefore, you must dynamically allocate a buffer (Length: 40 + 16 = 56 bytes) or you can define a bitmapinfo structure to display rgb565 properly. If the static system structure bitmapinfo is used, the bmicolors is forcibly converted to a DWORD pointer, and the color table value assignment is actually out-of-bounds. The consequence is that the display can be normal in debug mode, and the color is partial in release mode (see reference 3)
4. The shloaddibitmap of the system is similar to that of the self-implemented Dib Method for bitmap loading.

V. Problems
The best example is used in the verification process. It does not take into account the processing of pitch, stide, zoom, and so on. In actual applications, it is necessary to deal with these situations.

Vi. References
Reference 1: http://linux.chinaunix.net/bbs/thread-1117753-1-1.html
Example 2: http://blog.csdn.net/linzhengqun/archive/2009/06/15/4269259.aspx
Reference 3: http://social.msdn.microsoft.com/Forums/en-US/vssmartdevicesnative/thread/57bf9025-e536-499e-a343-77c7dd93189e

// ================================================ ==============

Note ::

1. I found a misunderstanding that persists when using createdibsection.
When creating Dib of rgb565, You need to specify a color table, which is actually the masks of R, G, and B (r corresponds to 0xf800, G corresponds to 0x07e0, B Corresponds to 0x001f) used to extract R, G, and B from pixels. This information is carried by bmicolors, a member of bitmapinfo (the second parameter of createdibsection). If this parameter is not specified, the system displays a partial color. However, the bitmapinfo member bmicolors cannot carry all data with only one DWORD. Therefore, you must dynamically allocate a buffer (Length: 40 + 16 = 56 bytes) or you can define a bitmapinfo structure to display rgb565 properly. If the static system structure bitmapinfo is used, the bmicolors is forcibly converted to a DWORD pointer, and the color table value assignment is actually out-of-bounds. The consequence is that the display can be normal in debug mode, and the color is partial in release mode (see reference 3)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.