Currently, many open-source C/C ++ Based on the x86 processor environment, and the assembler included in the objective-C/C ++ compiler uses the at&t format. At&t assembly differs from other processors (such as arm and Blackfin) in x86 instruction sets. It is significantly different from Intel's custom Assembly format. The GCC assembler supports intel syntax. You can refer to my previous blog post to learn how to use it. However, for the latest llvm2.0, the intel syntax feature has been discarded. Therefore, it is still advantageous to understand at&t Assembly syntax.
First, at&t assembler implements access to the memory by adding a suffix to specify the access byte width, while Intel adds the access Width limit word before the memory operand, for example:
(Intel) mov dword ptr [edX], eax (at&t) movl % eax, (% EDX)
Most commands use the following suffixes for at&t:
B bytes (8 bits), corresponding to Intel's byte PTR
W (16 bits), corresponding to Intel's word PTR
L dual-text (actually long, 32-bit), corresponding to Intel's DWORD PTR
Q 4 words (64-bit), corresponding to Intel's qword PTR
However, if it is an x87 floating point instruction, use the following Suffix:
S single precision Floating Point (short, 32-bit) corresponds to Intel's dword ptr. If x87's instruction suffix for storage access is default, the default suffix is S.
L Double Precision Floating Point (long, 64-bit), corresponding to Intel's qword PTR.
T-extended Double Precision Floating Point (twelve-byte, 96 bits (the actual precision is 80 bits due to four-byte alignment), corresponding to Intel's tbyte PTR.
Second, in the at&t format, the order of the operands after the command is opposite to that of the Intel format. At&t is the source operand before and the destination operand after:
(Intel) mov rax, RBx (at&t) mov % RBx, % Rax
(Intel) mov eax, CS: VAR (at&t) movl % CS: var, % eax
(Intel) pinsrw xmm8, [RDI], 1 (at&t) pinsrw $1, (% RDI), % xmm8
(Intel) pshuflw xmm8, xmm9, 0a1h (at&t) pshuflw $0xa1, % xmm9, % xmm8
Again, the addressing representation between at&t and Intel syntax is also different:
(Intel) sub eax, [EBX + ECx * 4 h-20 h] (at&t) subl-0x20 (% EBX, % ECx, 0x4), % eax
The above indicates that you do not need to add the $ symbol before the scaling factor and the number of offsets.
This can be summarized as follows: If Intel's addressing mode is [<base register> + <index register> * <scale> + <OFFSET>], at&t indicates: <OFFSET> (<base register>, <index register>, <scale>)
Finally, at&t and Intel have different instructions:
In intel command representation, movsx is used for signed extended mov commands, while movs is used for at&t. In Intel, the zero-extended mov command is movzx, in at&t, it is represented by movz, for example:
(Intel) movzx rax, byte PTR [RSI + 3] (at&t) movzb 3 (% RSI), % Rax
(Intel) movsx rax, word PTR [RSI + 4] (at&t) movsw 4 (% RSI), % Rax
Movs commands are discarded string operation commands in Intel command sets.
At at&t, the rep command can also be used to repeat the storage copy operation. For example, in the Apple llvm assembler, the statement can be written as follows:
// The first parameter is in % RDI, the second parameter is in % RSI, and the third parameter is in % rdx_mymemcpy: mov % RDX, % rcX rep movsb RET