Transferred from Chen Lijun's book "deep analysis of Linux kernel source code"
Http://www.kerneltravel.net/kernel-book/chapter II v1.20the hardware base for Linux/2.6.1.htm
2.6.1 comparison between at&t and Intel assembly languages
We know that Linux is a member of the UNIX family. Although the history of Linux is not long, many things related to it all originate from UNIX. For Linux's 386 assembly language, it also originated from UNIX. Unix was initially developed for PDP-11 and has been transplanted to vax and 68000 series of processors whose assembly languages use at&t's Instruction format. When UNIX is transplanted to i386, it naturally uses at&t's assembly language format instead of Intel format. Although these two assembly languages differ in syntax, the hardware knowledge is the same. Therefore, if you are very familiar with Intel syntax formats, then you can easily "transplant" it to at&t. Next we will compare the syntax formats of Intel and at&t, so that you can quickly "transplant" your previous knowledge.
1. prefix
In intel syntax, there is no prefix for both registers and immediate numbers. But in at&t, the register is preceded by "%", and the number is preceded by "$ ". In Intel's syntax, hexadecimal and binary instant numbers are suffixed with "H" and "B", while in at&t, hexadecimal instant numbers are preceded by "0x ", table 2.2 provides several examples.
Table 2.2 differences between Intel and at&t prefixes
Intel syntax |
At&t syntax |
MoV eax, 8 |
Movl $8, % eax |
MoV EBX, 0 ffffh |
Movl $0 xFFFF, % EBX |
Int 80 h |
Int $0x80 |
2. Direction of the operand
The direction of the intel and at&t operands is exactly the opposite. In intel syntax, the first operand is the destination operand and the second operator is the source operand. In at&t, the first number is the source operand, and the second number is the destination operand. It can be seen that at&t's syntax conforms to people's common reading habits.
For example, in intel, mov eax, [ECx]
In at&t, movl (% ECx), % eax
3. memory unit operations
From the example above, we can see that the memory operations are also different. In Intel's syntax, the base register is enclosed by "[]", while in at&t, It is enclosed.
For example, in intel, mov eax, [EBX + 5]
At at&t, movl 5 (% EBX), % eax
4. Indirect addressing
Compared with Intel's syntax, at&t's indirect addressing method may be more obscure. Intel's command format is segreg: [base + Index * scale + disp], while at&t's format is % segreg: disp (base, index, scale ). Here, index/scale/disp/segreg are all optional and can be simplified. If index is specified without scale, the default value of scale is 1. The segreg segment register depends on the instruction and whether the application runs in the real mode or the protection mode. In the real mode, it depends on the instruction, while in the protection mode, segreg is redundant. In at&t, when the immediate number is used in scale/disp, it should not be prefixed with "$" before it. Table 2.3 provides its syntax and several examples.
Table 2.3 memory operand syntax and example
Intel syntax |
At&t syntax |
Command Foo, segreg: [base + Index * scale + disp] |
Command % segreg: disp (base, index, scale), foo |
MoV eax, [EBX + 20 H] |
Movl 0x20 (% EBX), % eax |
Add eax, [EBX + ECx * 2 h |
Addl (% EBX, % ECx, 0x2), % eax |
Lea eax, [EBX + ECx] |
Leal (% EBX, % ECx), % eax |
Sub eax, [EBX + ECx * 4 h-20 H] |
Subl-0x20 (% EBX, % ECx, 0x4), % eax |
It can be seen from the table that at&t's syntax is relatively obscure, because [base + Index * scale + disp] can be seen at a glance, and disp (base, index, scale) this is not possible.
This addressing method is often used to access a field in a specific element in an array of data structures. Base is the starting address of the array, scale is the size of each array element, and index is the subscript. If the array element is still a structure, DISP is the displacement of a specific field in the structure.
5. Operation Code suffix
In the above example, you may have noticed that there is a suffix behind the at&t operation code, which indicates the size of the operation code. "L" indicates a long integer (32 bits), "W" indicates a word (16 bits), and "B" indicates a byte (8 bits ). In Intel's syntax, byte PTR and word PTR must be added before the memory unit operand, which correspond to "long" with dword ptr and "DWORD ". Table 2.4 provides several examples.
Table 2.4 suffixes of operation codes
Intel syntax |
At&t syntax |
MoV Al, BL |
Movb % BL, % Al |
MoV ax, BX |
Movw % BX, % ax |
MoV eax, EBX |
Movl % EBX, % eax |
MoV eax, dword ptr [EBX] |
Movl (% EBX), % eax |