A fast algorithm for real color turning into high color

Source: Internet
Author: User

Why do I have to convert color depth in real time?

In general, 2D games in the bitmap, regardless of what color depth stored in the external memory, after the addition is converted to the desired color depth. We do not pay much attention to the color depth conversion said to take time.

But now it's different. One of the most important projects in the future of cloud wind is to make a super 2D engine. will support Voxel objects and real-time light processing such characteristics, and in the light processing, 32 level of light brightness level is also far from enough, so the future of the 2D game development trend should be true color, at least in the internal operation of the use. In some cases, we may need to do 15/16bit high color output, so it is necessary to find a faster method of real-time processing.

Here's a few things to explore, although graphics cards can support one of 15 or bit colors, but here are all examples of 16bit colors:

First take a look at the C version:

red=(truecolor>>8)&0xf800;
green=(truecolor>>5)&0x7e0;
blue=(truecolor>>3)&0x1f;
hicolor=red|green|blue;
This is certainly quite slow, so we still have to draw on the compilation. And the assembly can greatly optimize it:lodsd    ;RRRRRRRR GGGGGGGG BBBBBBBB
shr eax,3    ;000RRRRR RRRGGGGG GGGBBBBB
shl al,2    ;000RRRRR RRRGGGGG GBBBBBxx
shl ax,3    ;000RRRRR GGGGGGBBB BBxxxxx
dec esi
shr eax,5    ;00000000 RRRRRGGG GGGBBBBB
stosw
Is it a lot more streamlined? Unfortunately, although it looks very concise, but due to the large number of use of some registers, the impact on the pipeline is very large. The code almost minimizes the efficiency of the assembly line. There are many optimizations, we can handle two points in one loop, using EAX and EBX respectively, and then interleaving those codes; Or the second half of the above code to check the table, I believe all can improve speed. But I would also like to propose another option, using the MMX command level: mm7=f800f800f800f800
mm6=fc00fc00fc00fc00
------------------------------
PUNPCKLBW Mm0,[red+edx]
; Mm0=rrrrrrrr 00000000 Rrrrrrrr 00000000 Rrrrrrrr 00000000 Rrrrrrrr 00000000
PUNPCKLBW Mm1,[green+edx]
; mm1= Gggggggg 00000000 Gggggggg 00000000 Gggggggg 00000000 Gggggggg 00000000
PUNPCKLBW Mm2,[blue+edx]
; Mm2=BBBBBBBB 00 000000 bbbbbbbb 00000000 Bbbbbbbb 00000000 Bbbbbbbb 00000000
pand mm0,mm7
; mm0=rrrrr000 00000000 RRRRR000 00000000 RRRRR000 00000000 RRRRR000 00000000
pand mm1,mm6
; mm1=gggggg00 00000000 GGGGGG00 00000000-GGGGGG00 00000000 GGGGG G00 00000000
psrlw mm2,11
mm2=00000000 000BBBBB 00000000 000BBBBB 00000000 000BBBBB 00000000 000BBBBB
PSRLW m m1,5
mm1=00000ggg GGG00000 00000GGG GGG00000 00000GGG GGG00000 00000GGG GGG00000
por mm0,mm2
por mm0,mm1
; mm0=rrrrrggg gggbbbbb rrrrrggg gggbbbbb rrrrrggg gggbbbbb rrrrrggg gggbbbbb
Movq [dis+edx*2],mm0
Add Edx,4
Our use of MMX is for its parallel operations, the direct use of parallel processing from the RGB888 format into RGB565 seems impossible, but if we separate RGB three pigments, it becomes possible. 4 pigments can be read simultaneously, processed in parallel, and then merged so that 4 points are processed within a loop. Considering the CACHE efficiency, it is best not to RGB three block of memory too open. My suggestion is that each row of the bitmap is divided into three parts, Red, Green, and Blue.

The above method can continue to optimize, this article aims to inspire the inspiration of friends, find a better way.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.