When you use the intrinsics function to operate the SIMD Instruction Set (MMX/SSE/avx, etc.), the SIMD data types of different lengths are displayed, which are divided into multiple compression formats. Therefore, I have designed a set of SIMD variable naming rules to effectively improve code readability.
1. Introduction to SIMD Data Types
SIMD data types include --
_ M64: 64-bit tightening INTEGER (MMX ).
_ M128: 128-bit tightening single precision (SSE ).
_ M128d: 128-bit tightening dual-precision (sse2 ).
_ M128i: 128-bit tightening INTEGER (sse2 ).
_ M256: avx ).
_ M256d: 256-bit compression dual-precision (avx ).
_ M256i: 256-bit tightening INTEGER (avx ).
Note: Compressed integers include 8-bit, 16-bit, 32-bit, 64-bit signed, and unsigned integers.
The correspondence between these data types and registers is --
64-bit mm register (mm0 ~ Mm7) :__ M64.
128-bit SSE register (xmm0 ~ Xmm15) :__ m128, _ m128d, and _ m128i.
256-bit avx register (ymm0 ~ Ymm15) :__ m256, _ m256d, and _ m256i.
Ii. SIMD variable naming rules
Refer to Hungarian notation to add a type prefix before the variable name.
The Type prefix is 3 lower-case letters, the first letter represents the register width, and the last two letters represent the compressed data type.
Register width (first letter )--
M: 64-bit mm register. __ M64
X: 128-bit SSE register. Corresponding to _ m128, _ m128d, and _ m128i.
Y: 256-bit avx register. Corresponding to _ m256, _ m256d, and _ m256i.
Compress data type (two letters )--
MB: 8-bit data. It is used when only the length is known and the specific compression format is unknown. (B: byte)
MW: 16-bit data. (W: Word)
MD: 32-bit data. (D: doubleword)
MQ: 64-bit data. (Q: quadword)
MO: 128-bit data. (O: octaword)
MH: 256-bit data. (H: hexword)
UB: an 8-bit unsigned integer.
UW: A 16-bit unsigned integer.
UD: A 32-bit unsigned integer.
Uq: A 64-bit unsigned integer.
IB: an 8-bit signed integer.
IW: A 16-bit signed integer.
ID: 32-bit signed integer.
IQ: A 64-bit signed integer.
FH: A 16-bit floating point number, that is, a half-precision floating point number. (H: half)
FS: 32-bit floating point number, that is, single-precision floating point number. (S: Single)
FD: 64-bit floating point number, that is, double-precision floating point number. (D: Double)
For example --
Mub: 64-bit compressed byte (64-bit MMX register, which stores 8 8-bit unsigned integers ).
XFS: 128-bit tightening single precision (128-bit SSE register, which stores 4 single-precision floating point numbers ).
Xid: 128-bit Compress With signed characters (the 128-bit SSE register contains four 32-bit signed integers ).
Yfd: 256-bit tightening dual-precision (256-bit avx register, which stores four double-precision floating point numbers ).
Yfh: 256-bit tightening semi-precision (256-bit avx register, which stores 16 semi-precision floating point numbers ).
Iii. Sample Code
For example, the SSE accumulative sum program --
Int sum3_intrinsics (int * a, int size) {If (null = A) return 0; If (size <0) return 0; int S = 0; // return value _ m128i xidsum = _ mm_setzero_si128 (); // accumulate. [Sse2] assign the initial value 0 _ m128i xidload; // load int cntblock = size/4; // number of blocks. SSE registers can process four DWORD int cntrem = size & 3; // the remaining number of _ m128i * P = (_ m128i *) A; For (INT I = 0; I <cntblock; ++ I) {xidload = _ mm_load_si128 (p); // [sse2] load xidsum = _ mm_add_epi32 (xidsum, xidload ); // [sse2] signed 32-bit tightening addition + + P;} // process the remaining int * q = (int *) P; For (INT I = 0; I <cntrem; ++ I) S + = Q [I]; // combine the accumulated value with xidsum = _ mm_hadd_epi32 (xidsum, xidsum ); // [ssse3] signed 32-bit horizontal addition xidsum = _ mm_hadd_epi32 (xidsum, xidsum); S + = _ mm_cvtsi128_si32 (xidsum ); // [sse2] returns a low 32-bit return s ;}
Code from --
Http://topic.csdn.net/u/20120102/01/fc8d7aa4-bffc-4d9a-a34a-5056c6d27b54.html
# 9th floor