When the exercise is done, the effect is not good. The CPU does not provide a command such as pmulluw.
The processing effect is obviously different from that of Matlab, especially the background. I can't find the problem, but it is most likely on the register symbol.
Gray formula: Gray = (R * 76 + G * 150 + B * 30)> 8
The signed range is [-32768-32767], and the unsigned range is [0-65536]. The former is obviously not enough to store 255*150 = 38250, so it overflows.
It is difficult to increase the bit width to simulate unsigned multiplication, reduce the operation precision, and reduce the source and throttling.
It's just a beginner. This Command requires more training.
Void asmargb2gray (bitmapdata * Data) {uint Height = data-> height; uint width = data-> width; pix * P = (pix *) Data-> scan0; uint n = height * width; pix CP [] = {0x014c961e}; _ ASM {push ESI; MoV ECx, N; pxor mm7, mm7; MoV ESI, [p]; movd mm2, [Cp]; LP: movq MM1, mm2; movd mm0, [esi]; // mm0: 00 00 00 00 Alpha r g B mov Al, [ESI + 3]; // Al: Alpha punpcklbw mm0, mm7; // mm0: 00 Alpha 00 R 00g 00 B punpcklbw MM1, mm7; // MM1: 00 01 00 4C 00 96 00 1E movq mm3, mm0; movq MM5, MM1; // simulate pmulluw movq mm4, MM5 pmullw MM5, mm3/* a * B lo 16 × 16 unsigned */pmulhuw mm4, mm3/* a * B Hi 16 × 16 unsigned */movq mm6, MM5 punpcklwd MM5, mm4 punpckhwd mm6, mm4 movq mm0, mm5; movq MM1, mm6; psrlw mm0, 8; psrlw MM1, 8; packssdw mm0, mm0; packssdw MM1, MM1; movd edX, mm0; movd EBX, MM1; SHR edX, 16; add DL, BL; shr ebx, 16; add DL, BL; MoV [esi], DL; MoV [ESI + 1], DL; moV [ESI + 2], DL; MoV [ESI + 3], Al; add ESI, 4; Dec ECx; jnz LP; pop ESI; Emms ;}}
Effect:
It is obviously different from MATLAB processing.