Comparison between AT&T assembly and Intel assembly by linuxkernel (newbie)
Since everyone is interested in assembly, I may wish to join in. Let's get down to the truth.
Differences between Intel and AT&T syntax
The syntax of Intel and AT&T assembly languages is different on the surface, which will lead to the first time that people who have just learned INTEL assembly saw AT&T assembly
Or vice versa. So let's start with the basics.
Prefix
There is no register prefix or immediate number prefix in Intel assembly. In the AT&T assembly, the Register has a "%" prefix, and the number of immediately registers is
A "$" prefix. In Intel statements, hexadecimal and binary data are suffixed with "h" and "B", respectively.
If the first digit is a letter, a "0" prefix must be added before the value.
For example,
Intex Syntax
Mov eax, 1
Mov ebx, 0ffh
Int 80 h
AT&T Syntax
Movl $1, % eax
Movl $ 0xff, % ebx
Int $0x80
As you can see, AT&T is hard to understand. [Base + index * scale + disp] seems to be better understood than disp (base, index, scale.
Operand usage
The operands in Intel statements are used in the opposite way as in at&t statements. In an Intel statement, the first operand represents the destination, and the second operand represents the destination.
The operand indicates the source. However, in the at&t statement, the first operand represents the source and the second operand represents the destination. In this case, at&t syntax
The benefits are obvious. We read from left to right and write from left to right, which is more natural.
For example,
Intex syntax
Instr DEST, Source
MoV eax, [ECx]
At&t syntax
Instr source, dest
Movl (% ECx), % eax
Memory operations
As we can see above, the usage of memory operations is also different. In intel statements, address registers are enclosed by "[" and "]".
In at&t statements, "(" and ")" are used.
For example,
Intex syntax
MoV eax, [EBX]
MoV eax, [EBX + 3]
At&t syntax
Movl (% EBX), % eax
Movl 3 (% EBX), % eax
The form of commands used to process complex operations in at&t syntax is much more difficult than that in Intel syntax. Statement on intel
In the form of segreg: [base + index * scale + disp]. In the AT&T statement, the form is
% Segreg: disp (base, index, scale ).
Index/scale/disp/segreg are optional and can be removed. If Scale is not specified but index is specified
The default value is 1. Segreg depends on the command itself and whether the program runs in the real mode or pmode. In real mode, it depends on
Command itself, which is not required in pmode. Do not add the "$" prefix to the immediate number used for scale/disp in the AT&T statement.
For example
Intel Syntax
Instr foo, segreg: [base + index * scale + disp]
Mov eax, [ebx + 20 h]
Add eax, [ebx + ecx * 2 h]
Lea eax, [ebx + ecx]
Sub eax, [ebx + ecx * 4 h-20 h]
AT&T Syntax
Instr % segreg: disp (base, index, scale), foo
Movl 0x20 (% ebx), % eax
Addl (% ebx, % ecx, 0x2), % eax
Leal (% ebx, % ecx), % eax
Subl-0x20 (% ebx, % ecx, 0x4), % eax
Suffix
As you have noticed, AT&T syntax has a suffix, which indicates the size of the operand. "L" represents long,
"W" indicates word, and "B" indicates byte. In Intel syntax, a similar representation is used to process memory operations,
Such as byte ptr, word ptr, and dword ptr. "Dword" obviously corresponds to "long ". This is a bit similar to
Type, but since the size of the registers used corresponds to the assumed data type, it is unnecessary.
Example:
Intel Syntax
Mov al, bl
Mov ax, bx
Mov eax, ebx
Mov eax, dword ptr [ebx]
AT&T Syntax
Movb % bl, % al
Movw % bx, % ax
Movl % ebx, % eax
Movl (% ebx), % eax
Note: from now on, all examples use the AT&T syntax.
System Call
This section describes how to use the Assembly Language System in linux. System calls include all in Part 2 of the/usr/man/man2 Manual
. These functions are also listed in/usr/include/sys/syscall. h. An important list of these functions is
In http://www.linuxassembly.org/syscall.html. These functions are executed through linux interrupt service: int $0x80
System calls with less than six parameters
For all system calls, the system call number is in % eax. For system calls with less than six parameters, the parameters are stored in sequence.
In % ebx, % ecx, % edx, % esi, % edi, the returned values of system calls are stored in % eax.
The system call number can be found in/usr/include/sys/syscall. h. Macro is defined as SYS,
Such as SYS_exit and SYS_close.
Example: (hello world Program)
Refer to the help manual of write (2). the write operation is declared as ssize_t write (int fd, const void * buf, size_t count );
In this way, fd should be stored in % ebx, buf in % ecx, count in % edx, SYS_write in % eax, followed
Int $0x80 statement to execute the system call. The return value of the system call is stored in % eax.
$ Cat write. s
. Include "defines. h"
. Data
Hello:
. String "hello world/n"
. Globl main
Main:
Movl $ SYS_write, % eax
Movl $ STDOUT, % ebx
Movl $ hello, % ecx
Movl $12, % edx
Int $0x80
Ret
$
The same is true for system calls with less than five parameters. It's just that the unused registers remain unchanged. Like open or fcntl
A system call with an optional additional parameter knows how to use it.
System calls with more than 5 parameters
System calls with more than five parameters still save the system call number in % eax, but the parameters are stored in the memory and point to the first
The parameter pointer is saved in % ebx.
If you use the stack, the parameters must be pushed into the stack in reverse order, that is, the order from the last parameter to the first parameter. Then copy the stack pointer
To % ebx. Or copy the parameter to an allocated memory area and save the address of the first parameter in % ebx.
Example: (mmap is used as an example of system calling ). Use mmap () in C ():
# Include
# Include
# Include
# Include
# Include
# Define STDOUT 1
Void main (void ){
Char file [] = "MMAP. s ";
Char * mappedptr;
Int FD, filelen;
FD = fopen (file, o_rdonly );
Filelen = lseek (FD, 0, seek_end );
Mappedptr = MMAP (null, filelen, prot_read, map_shared, FD, 0 );
Write (stdout, mappedptr, filelen );
Munmap (mappedptr, filelen );
Close (FD );
}
MMAP () parameter arrangement in memory:
% ESP + 4% ESP + 8% ESP + 12% ESP + 16% ESP + 20
00000000 filelen 00000001 00000001 FD 00000000
Equivalent Assembler:
$ Cat MMAP. s
. Include "defines. H"
. Data
File:
. String "mmap. s"
Fd:
. Long 0
Filelen:
. Long 0
Mappedptr:
. Long 0
. Globl main
Main:
Push % ebp
Movl % esp, % ebp
Subl $24, % esp
// Open ($ file, $ O_RDONLY );
Movl $ fd, % ebx // save fd
Movl % eax, (% ebx)
// Lseek ($ fd, 0, $ SEEK_END );
Movl $ filelen, % ebx // save file length
Movl % eax, (% ebx)
Xorl % edx, % edx
// Mmap (NULL, $ filelen, PROT_READ, MAP_SHARED, $ fd, 0 );
Movl % edx, (% esp)
Movl % eax, 4 (% esp) // file length still in % eax
Movl $ PROT_READ, 8 (% esp)
Movl $ MAP_SHARED, 12 (% esp)
Movl $ fd, % ebx // load file descriptor
Movl (% ebx), % eax
Movl % eax, 16 (% esp)
Movl % edx, 20 (% esp)
Movl $ SYS_mmap, % eax
Movl % esp, % ebx
Int $0x80
Movl $ mappedptr, % ebx // save ptr
Movl % eax, (% ebx)
// Write ($ stdout, $ mappedptr, $ filelen );
// Munmap ($ mappedptr, $ filelen );
// Close ($ fd );
Movl % ebp, % esp
Popl % ebp
Ret
$
Note: The Source Code listed above is different from the source code in the example at the end of this article. The Code listed above does not indicate other
System Call, because this is not the focus of this section, the source code listed above only opens the mmap. s file, and the source code of the example needs to be read
Command line parameters. In this mmap example, lseek is also used to obtain the file size.
Socket System Call
The Socket System Call uses the unique system call number SYS_socketcall, Which is saved in % eax. The Socket function
A sub-function number of/usr/include/linux/net. h is determined and stored in % ebx. Point to System Call Parameters
A pointer of is stored in % ecx. The Socket system call is also executed through int $0x80.
$ Cat socket. s
. Include "defines. h"
. Globl _ start
_ Start:
Pushl % ebp
Movl % esp, % ebp
Sub $12, % esp
// Socket (AF_INET, SOCK_STREAM, IPPROTO_TCP );
Movl $ AF_INET, (% esp)
Movl $ SOCK_STREAM, 4 (% esp)
Movl $ IPPROTO_TCP, 8 (% esp)
Movl $ SYS_socketcall, % eax
Movl $ SYS_socketcall_socket, % ebx
Movl % esp, % ecx
Int $0x80
Movl $ SYS_exit, % eax
Xorl % ebx, % ebx
Int $0x80
Movl % ebp, % esp
Popl % ebp
Ret
$
Command Line Parameters
In linux, the command line parameters are placed on the stack. First, argc followed by a string pointing to the command line
Array (** argv) composed of pointers and ended with a null pointer. Next is a pointer to an environment variable.
Array (** envp ). Such things can be easily obtained in asm and demonstrated in the example code (args. s.
GCC inline assembly
In this section, GCC inline assembly only involves x86 applications. The operand constraints are different from those on other processors. About this part
At the end of this article.
The basic inline assembly in gcc is very easy to understand, as shown in figure
_ Asm _ ("movl % esp, % eax"); // look familiar?
Or
_ Asm __("
Movl $1, % eax // SYS_exit
Xor % ebx, % ebx
Int $0x80
");
If you specify the input and output data used as ASM and specify which register will be modified, the execution efficiency of the program will be improved.
Input/output/modify are not required. The format is as follows:
_ ASM _ ("": Output: input: Modify );
Output and input must contain an operand constraint string followed by a c expression enclosed in parentheses.
The output operand constraint must be preceded by a "=", indicating that this is an output. There may be multiple outputs, multiple inputs and
Multiple modified registers. Each "entry" should be separated by ",", and the total number of entries is less than 10.
The operand constraint string can be the name that contains the entire register or abbreviated.
Abbrev table
Abbrev register
A % eax/% ax/% Al
B % EBX/% BX/% BL
C % ECx/% CX/% Cl
D % edX/% dx/% DL
S % ESI/% Si
D % EDI/% di
M memory
For example:
_ ASM _ ("test % eax, % eax",:/* No output */: "A" (FOO ));
Or
_ Asm _ ("test % eax, % eax",:/* no output */: "eax" (foo ));
You can use the keyword _ volatile __after _ asm _: "You can use the keyword _ volatile _ after _ asm _.
Methods to prevent a 'asm 'command from being deleted, moved, or combined ." (From the gcc info file "Assembler
Instructions with C Expression Operands "Section)
$ Cat inline1.c
# Include
Int main (void ){
Int foo = 10, bar = 15;
_ Asm _ volatile _ ("addl % ebxx, % eax"
: "= Eax" (foo) // ouput
: "Eax" (foo), "ebx" (bar) // input
: "Eax" // modify
);
Printf ("foo + bar = % d/n", foo );
Return 0;
}
$
You may have noticed that the Register uses the prefix "%" instead of "% ". This is necessary when using the output/input/modify field,
This is because the alias for registers in other domains is used. I will discuss this issue immediately.
You can simply specify "a" instead of "eax" or force a special register such as "eax", "ax", "al ",
This also applies to other general-purpose registers (listed in the Abbrev table ). When you use special registers in the current Code
This seems useless, so gcc provides a register alias. There can be up to 10 aliases (% 0-% 9), which is why only 10 aliases are allowed
The reason for the input/output.
$ Cat inline2.c
Int main (void ){
Long eax;
Short bx;
Char cl;
_ Asm _ ("nop; nop"); // to separate inline asm from the rest
// The code
_ Volatile _ asm __("
Test % 0, % 0
Test % 1, % 1
Test % 2, % 2"
:/* No outputs */
: "A" (long) eax), "B" (short) BX), "C" (char) Cl)
);
_ ASM _ ("NOP; NOP ");
Return 0;
}
$ Gcc-O inline2 inline2.c
$ GDB./inline2
Gnu gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
Welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnulibc1 "...
(No debugging symbols found )...
(GDB) disassemble main
Dump of worker er code for function main:
... Start: inline ASM...
0x8048427: NOP
0x8048428: NOP
0x8048429: NOP
0x804842a: mov 0 xfffffffc (% EBP), % eax
0x804842d: mov 0 xfffffffa (% EBP), % BX
0x8048431: mov 0xfffffff9 (% EBP), % Cl
0x8048434: Test % eax, % eax
0x8048436: Test % BX, % BX
0x8048439: Test % Cl, % Cl
0x804843b: NOP
0x804843c: NOP
0x804843d: NOP
... End: inline ASM...
End of worker er dump.
$
As you can see, the code generated by inline assembly puts the values of variables into the registers they specify in the input field, and then continues
Execute the current code. The compiler automatically detects the operand Size Based on the variable size, so that the corresponding register is
Alias % 0, % 1 and % 2 are replaced (an error occurs when the size of the specified operand in the memory is returned when the register alias is used)
Aliases can also be used in operand constraints. This does not allow you to specify more than 10 entries in the input/output domain. This is what I can think.
The only usage is when you specify the operand constraint as "Q" so that the compiler can select between registers A, B, C, and D.
When this register is modified, we do not know which register is selected, so we cannot specify it in the modify domain.
In this case, you only need to specify "".
Example:
$ Cat inline3.c
# Include
Int main (void ){
Long eax = 1, EBX = 2;
_ ASM _ volatile _ ("add % 0, % 2"
: "= B" (long) EBX)
: "A" (long) eax), "Q" (EBX)
: "2"
);
Printf ("EBX = % x/N", EBX );
Return 0;
}
$