example 18.1. The simplest assembly procedure
#PURPOSE: Exits and returns a# status code back to the Linux kernel# #INPUT: none# #OUTPUT: Returns a status code. This can is viewed# by typing## Echo $?## after running the program# #VARIABLES: # %eax holds the system cal L number# %EBX holds the return status#. Section. Data. Section. Text. Globl _START_START:MOVL $,%eax# this is the Linux kernel command# number (System call) for exiting# a program movl $4,%ebx# This is the status number we will# return To the operating system.# change this around and it will# return different things to# echo $? int $0x80# This wakes up the kernel to run# the exit command
Save this program as a file hello.s
(the assembler is usually used .s
as the filename suffix), and then use the assembler (assembler) to as
translate the mnemonic in the assembler into machine instructions to generate the target file hello.o
:
$ as Hello.s-o hello.o
Then use the linker (Linker, or link editor) ld
to link the target file to an hello.o
executable file hello
:
$ ld Hello.o-o Hello
In the 2nd section main
, "Functions and startup routines," we will refer to the process of linking multiple target files into an executable file, which is the main function of the link. Although we have only one target file in this example, we also need to be linked to be an executable because the linker modifies some of the information in the target file, which is explained in detail in section 5.2, "executable file". Now execute this program, it only does one thing is to exit, exit status (Exit status) is 4, in the shell can use special variables to $?
get the exit status of the previous command:
$./hello$ Echo $?4
The number in the program #
represents a single-line comment, similar to the C-language //
comment. The non-commented code is interpreted on a row-by-line basis.
. Section. Data
The first name in the assembler .
is not the mnemonic of the instruction, it is not translated into a machine instruction, but a special instruction to the assembler, called the assembly instruction (assembler Directive) or pseudo operation (pseudo-operation), Because it is not a real instruction, add a " pseudo " word. .section
indicates that the code is divided into several segments (section), when the program is loaded and executed by the operating system, each segment is loaded to a different address with different read, write, and execute permissions. The data of the .data
segment save program is readable and writable, and the global variables of the C program belong to the .data
segment. There is no data defined in this program, so the .data
segment is empty.
. section. Text
.text
The segment Save code is read-only and executable, and the following commands belong to this .text
segment.
. globl _start
_start
is a symbol, the symbol in the assembler represents an address, can be used in the instruction, the assembler after the processing of the assembler, all the symbols are replaced by the address value it represents. In C language we access a variable by variable name, in fact, read and write a memory unit of an address, we call a function through the function name, in fact, jump to the address of the first instruction of the function, so the variable name and function names are symbols, essentially represents the memory address.
.globl
The instruction tells the assembler that _start
this symbol is to be linked to, so give it a special tag in the symbol table of the target file (described in section 5.1, "Destination file"). _start
just as the function of C program main
is special, is the entrance of the whole program, the linker will find the address of the symbol in the target file when linking, _start
set it as the entry address of the whole program, so each assembler should provide a _start
symbol and .globl
declare it. If a symbol is not .globl
declared with an indication, it means that the symbol is not used by the linker.
_start:
_start
Here is the same as the C language of the statement label. The assembler computes the address of each data object and each instruction when it processes the assembler, and when the assembler sees such a label, it takes the address of the following instruction as the address _start
represented by the symbol. and _start
This symbol is more special, it represents the address is the entire program's entry address, so the next instruction becomes movl $1, %eax
the first command executed in the program.
MOVL $,%eax
This is a data transfer instruction that generates a number 1 inside the CPU and then transmits it to the eax
register. mov
The following l
indicates a long, indicating a 32-bit delivery instruction. The number generated internally by the CPU is called the immediate number (Immediate), in the assembler, immediately before the number is added, the $
register name %
is preceded by, in order to be separated from the symbol name area.
MOVL $4,%EBX
Similar to the previous instruction, generates an immediate number of 4, which is transferred to the ebx
register.
int $0x80
The first two instructions are prepared for this instruction, and the following actions occur when the command is executed:
int
The instruction is called the soft interrupt instruction, can use this instruction to deliberately produce an exception, the previous chapter said, exception processing and interrupt similar, the CPU switches from user mode to privileged mode, and then jumps to the kernel code to execute the exception handler.
- The immediate number 0x80 in the
-
int
directive is a parameter that is determined by this parameter in the exception handler, This exception is called System call. The kernel provides a number of system services for use by user programs, but these system services cannot be called like library functions (such as printf
) because the CPU is in user mode when executing the user program and cannot call the kernel function directly. So you need to switch CPU mode through the system call, through the exception handler to enter the kernel, the user program can only pass through the register several parameters, after the kernel design good code route, and not by the user program arbitrary, want to tune which kernel function to tune which kernel function, so as to ensure that the system services are safely called. After the call is over, the CPU switches back to user mode, continuing with the instructions that follow the int
directive, which appears to the user program as if the function was called and returned.
eax
And the value of the ebx
register is the two parameters passed to the system call, eax
the value is the system call number, 1 means the _exit
system call, ebx
the value is passed to _exit
the system call parameters, that is, the exit state. _exit
This system call terminates the current process without returning it to continue execution. In the future we will talk about other system calls, is also int $0x80
caused by instructions, eax
the value is the number of system calls, different system calls require different number of parameters, such as some need ebx
, ecx
, edx
Three register values to do parameters, Most system calls are returned after the user program continues to execute, in this case the _exit
system call is more special.
x86 compilation of two grammars: Intel Grammar and T-grammar
The x86 assembler has two different syntaxes, using Intel syntax in Intel's official documentation, and Windows using Intel syntax, while the UNIX platform assembler always uses the-t syntax, so this book uses the-t syntax. mov %edx,%eax
If this instruction is written in Intel syntax, the mov eax,edx
register name is not added %
, and the source operand and the target operand are swapped. This book does not discuss the differences between the two syntaxes in detail, and the reader can refer to [ASSEMBLYHOWTO].
Introduction x86 compilation of books, UNIX platform books are used at the AT-t syntax, such as [Groudup], other books generally use the Intel syntax, such as [x86assembly].
Exercises
1, the example of this section to int $0x80
remove the instructions, assembly, link can also pass, but the execution of the time there is a paragraph error. Can you explain the reason?
Did not switch to the kernel state, there are no arguments but no calls.
The simplest assembler