Use of MMX Instruction Set in C ++

Source: Internet
Author: User
Tags unpack
In 《 Several tips on inline assembly This article briefly introduces how to use inline assembly in C ++. Article Then, we will introduce how to use the MMX command.

I. general principles of inline assembly:

1. Free use of General registers; (eax, EBX, ECx, and EDX)
2. Other registers are retained by stack, used, and finally restored;
It is generally like the following:

 
_ ASM {push ebppush ESP .......... // Use EBP and esppop esppopebp}

Ii. inline assembler _ ASM can be used separately:
For example:

 
_ ASM mov eax, anyval1 _ ASM mov EBX, anyval2

3. The function return value can be directly stored in eax without warning.
For example:

 
Int anyfun (...... /* Anyparm */) {int irtn; // function return value ...... // Function statement _ ASM mov eax, irtn // replace return irtn; but the compiler will issue a warning and ignore it}

4. inline assembly is case-insensitive and the syntax is the same as that of normal assembly.

For example:

 
_ ASM {mov eax, ebxmov eax, EBX // same as the previous sentence}

Note: C ++ variables are case sensitive.

5. Use the _ ASM or _ ASM keyword as much as possible for inline assembly, instead of the standard C ++ ASM keyword(This is what Microsoft says)
The above is just a supplement to inline assembly. I will write a series of articles about inline assembly. Next I will discuss in detail the call of MMX commands in the previous topic.

1. Introduction to MMX Instruction Sets:

[Data transmission instructions]
Movq // transmit a 64-bit integer
Movd // transmit a 32-bit integer

[Data packaging and conversion commands]
Packsswb // pack words into bytes with signed saturation.
Packssdw // pack doublewords into words with signed saturation.
Packuswb // pack words into bytes with unsigned saturation.
Punpckhbw // unpack High-Order bytes.
Punpckhwd // unpack high-order words.
Punpckhdq // unpack High-Order doublewords.
Punpcklbw // unpack low-order bytes.
Punpcklwd // unpack low-order words.
Punpckldq // unpack low-order doublewords.
Note: I have never used this group of commands. I don't know what to do. Please kindly advise me! Thank you!

[Arithmetic commands]
Paddb
Paddw
Paddd
Paddsb
Paddsw
Paddusb
Paddusw
Psubb
Psubw
Psubd
Psubsb
Psubsw
Psubusb
Psubusw
Pmulhw
Pmullw
Pmaddwd

[Comparison command]
Pcmpeqb compare packed bytes for equal.
Pcmpeqw compare packed words for equal.
Pcmpeqd compare packed doublewords for equal.
Pcmpgtb compare packed signed byte integers for greater.
Pcmpgtw compare packed signed word integers for greater.
Pcmpgtd compare packed signed doubleword integers for greater.
This set of commands are used to compare data in groups.

[Bit logic command]
Pand bitwise logical and.
Pandn bitwise logical and not.
Por bitwise logical or.
Pxor bitwise logical exclusive or.
These commands are basically the same as And, XOR, and all perform logical operations by bit.

[Shift and cyclic shift commands]
Psllw // shift packed words left logical.
Pslld // shift packed doublewords left logical.
Psllq // shift packed quadword left logical.
Psrlw // shift packed words right logical.
Psrld // shift packed doublewords right logical.
Psrscsi // shift packed quadword right logical.
Psraw // shift packed words right arithmetic.
Psrad // shift packed doublewords right arithmetic.

[Status management command]
Emms // empty MMX state.
In VC, all MMX commands must be cleared after being called.
For example:

 
_ ASM {..... MMX statement Emms // clear status}

The above are all MMX commands. You can test and use the commands in them. The working principle is single command and multi-data.

2. Precautions for using MMX Instruction Sets

Because the FPU and MMX registers are in the same group of registers within the CPU, you should pay attention to correct state conversion When referencing the above registers at the same time. The specific practices will be discussed later. You just need to remember that you cannot simply mix the above two instruction sets.
Before each call, check whether the CPU supports the MMX instruction set to avoid exceptions. For details, refer to the following example:

 
MoV eax, 1; Request for feature flagscpuid; 0fh, 0a2h cpuid instructiontest edX, 00800000 h; Is Ia MMX technology bit (bit 23 of edX); In feature flags set? Jnz mmx_policy_found

This sectionCodeReference manual from Intel, so you can use it with confidence.

3. The following example shows how to use the MMX command.

__ int8i8_a [2] [16]; // The Byte operands. Two groups of 16 _ int16 i16_a [8] in each group. // operand _ int32 i32_a [4] ;__ int64 i64_a [2]; i64_a [0] = 0; i64_a [1] = 0; i32_a [0] = 1000; i32_a [1] = 1000; i32_a [2] = 3; i32_a [3] = 4; i16_a [0] = 10; i16_a [1] = 20; i16_a [2] = 30; i16_a [3] = 40; i16_a [4] = 50; i16_a [5] = 60; i16_a [6] = 70; i16_a [7] = 80; i8_a [0] [0] = 1; i8_a [0] [1] = 1; i8_a [0] [2] = 1; i8_a [0] [3] = 1; i8_a [0] [4] = 1; i8_a [0] [5] = 1; i8_a [0] [6] = 1; i8_a [0] [7] = 1; i8_a [0] [8] = 1; i8_a [0] [9] = 1; i8_a [0] [10] = 1; i8_a [0] [11] = 1; i8_a [0] [12] = 1; i8_a [0] [13] = 1; i8_a [0] [14] = 1; i8_a [0] [15] = 1; i8_a [1] [0] = 2; i8_a [1] [1] = 2; i8_a [1] [2] = 2; i8_a [1] [3] = 2; i8_a [1] [4] = 2; i8_a [1] [5] = 2; i8_a [1] [6] = 2; i8_a [1] [7] = 2; i8_a [1] [8] = 2; i8_a [1] [9] = 2; i8_a [1] [10] = 2; i8_a [1] [11] = 2; i8_a [1] [12] = 2; i8_a [1] [13] = 2; i8_a [1] [14] = 2; i8_a [1] [15] = 2 ;__ ASM {movq MM1, [i64_a] movq mm2, [i64_a] movq mm2, [i32_a + 8] psubd mm2, [i32_a] movq [i32_a], mm2movq MM1, [i16_a] paddsw MM1, [i16_a + 8] movq [i16_a], mm1movq MM1, [i8_a] movq mm2, [i8_a + 8] paddb MM1, [i8_a + 16] paddb mm2, [i8_a + 24] movq [i8_a], mm1movq [i8_a + 8], mm2emms // finally clears the MMX Status Register and returns it to the system correctly.} 

you can set breakpoints and watch methods to observe changes in registers and variables. Here, only some commands are referenced, the most notable is the operation on the i16_a, i8_a, and i32_a arrays. I performed arithmetic operations on them at will, we can see that when I add two sets of byte array data, I only use two instructions. This is a common instruction set. This is the charm of single command and multi-data. At the same time, you can also see that the 64-bit integer operation is much simpler. However, the MMX command set does not seem to provide Division operations. Therefore, you need to use the algorithm . In addition, MMX registers are a set of 64-bit registers named from the MM0-MM7.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.