4. Data shift command
PsllwMm, mm/M64
PsllwMm, imm8
Specify the number of digits of the destination register by words in the source memory (or imm8 immediate count)Logical left shiftThe removed bits are lost.
The characters removed from the lower-level domain name are not moved into the higher-level domain name.
Example:
When mm0 = 0 xFFFF FFFF, run psllw mm0, 1
Mm0 = 0 xfffe fffe
PsrlwMm, mm/M64
PsrlwMm, imm8
Specify the number of digits of the destination register by words in the source memory (or imm8 immediate count)Right Shift of LogicThe removed bits are lost.
The characters with high characters removed will not be moved into the lower ones.
Example:
When mm0 = 0 xFFFF FFFF, run psrlw mm0, 1
Mm0 = 0x7fff 7fff 7fff 7fff
PslldMm, mm/M64
PslldMm, mm imm8
Specify the number of digits in the source memory (or imm8 immediate number) for the destination register by double-CharacterLogical left shiftThe removed bits are lost.
The bit removed from the low double-word will not be moved into the high double-word.
Example:
When mm0 = 0 xffffffff ffffff, run pslld mm0, 1
Mm0 = 0 xfffffffe fffffffe
PsrldMm, mm/M64
PsrldMm, imm8
Specify the number of digits in the source memory (or imm8 immediate number) for the destination register by double-CharacterRight Shift of LogicThe removed bits are lost.
The bit removed from the double-byte won't be moved into the double-byte.
Example:
When mm0 = 0 xffffffff ffffff, run psrld mm0, 1
Mm0 = 0x7fffffff 7 fffffff
5. Multiplication instructions
Pmullw mm, mm/M64
In parallel, 16 bits are multiplied by words. The result is 16 bits low, and the corresponding words are put into the destination register.
Example:
When mm0 = 0x0000 0002 ACFE
MM1 = 0x0000 0009 0000 CeF3, execute pmullw,
Then mm0 = 0x0000 0000 0012 991a
2*9 = 18, 18 = 0000 0012 h. The result is 16-bit 0012 lower.
0x0acfe =-21250, 0xcef3 =-12557,-21250 *-12557 = 266836250 = 0x 0fe7 991a. The result is 16-bit 991a lower.
Pmulhw mm, mm/M64
In parallel, 16 bits are multiplied by words. The result is 16 bits in height and the corresponding words in the destination register are obtained.
Example:
When mm0 = 0x0000 0002 ACFE
MM1 = 0x0000 0009 0000 CeF3, execute pmulhw,
Mm0 = 0x0000 0000 0000 0fe7
2*9 = 18, 18 = 0000 0012 h. The result is 16-bit 0000 in height.
0x0acfe =-21250, 0xcef3 =-12557,-21250 *-12557 = 266836250 = 0x 0fe7 991a, the result is 16-bit high 0fe7.
Pmaddwd mm, mm/M64
Align the point multiplication of signed vectors by words.
High 32-bit | low 32-bit
Destination register: A0 | A1 | A2 | A3
Source register: B0 | B1 | B2 | B3
Destination register result: A0 * b0 + A1 * B1 | A2 * B2 + A3 * B3
Summary:
1. The shift Command performs parallel shift based on 16-bit or 32-bit.
2. Shift commands are divided into logical left shift and logical right shift.
3. There are only three multiplication commands. The data unit of parallel multiplication is the 16-bit signed number.