A simple assembly procedure
Take this simple assembler code as an example.
. Section. data.section. Text.globl _start _start: MOVL $ 1 ,%EAXMOVL $ 4 ,%EBX int $0x80
(Note that globl is not GLOBAL;MOVL (MOVL) is not MOV1 (mov one))
Save the program as DEMO.S, and then use assembler as to translate the mnemonic in the assembler into the machine instruction (the assembly instruction corresponds to the machine instruction) to generate the target file DEMO.O. Then use the linker LD to link the target file demo.o to the executable file demo (although there is only one target file but also need to be linked to become an executable file because the linker modifies some information in the destination file). This program has only one thing to do is exit, exit status is 4. You can echo $ in the shell to get the exit status of the previous command.
"Explanation": Assembler with "." The first name is not a mnemonic of the instruction, it is not translated into a machine instruction, but rather a special instruction to the assembler, called a assembly instruction or pseudo operation.
. section. data.section. Text
The . section indicates that the code is divided into several segments (sections), and when the program is loaded by the operating system, each segment is loaded into a different address with different read and write execution permissions.
The data section of the Save program is read-write, and the global variables of the C program belong to the. Data segment. The above program does not define the data, so. Data is empty.
The . Text section saves the code, is read-only and executable, and the following commands belong to this. Text segment.
. globl _start
_start is a symbol, which represents an address in the assembler, which can be used in instructions, and all symbols are replaced with the address values represented by the assembler after it has been processed by the assembler. In c we can access a variable by variable name, in fact, read and write a memory unit of an address, we call a function through the function name is actually the address of the first instruction to the function, so the variable name and function names are symbols, essentially represents the memory address.
The . Globl instructs the assembler to tell the _start that the symbol is to be linked to, so give it a special mark in the symbol table of the target file. _start is just like the main function of the C program is the entry of the entire program, the linker will find the link in the target file of the _start symbol represents the address, set it to the entire program's entry address, so each assembler to provide a _start symbol and use the. GLOBL declaration. If a symbol is not used. GLOBL indicates that the symbol will not be linked to.
_start:
_start here is the same as the C language's statement label. The assembler computes the address of each data object and each instruction when it processes the assembler, and when the assembler sees such a label, it takes the address of the following instruction as the address represented by the _start symbol. and _start This symbol is more special things the entry address of the entire program, so the next instruction Movl $ $,%eax becomes the first command executed in the program.
MOVL $1,%eax
This is a data transfer instruction that generates a number 1 inside the CPU and then transmits it to the EAX register. the L at the back of the Mov means long, which indicates a 32-bit delivery instruction. The number generated internally by the CPU is called an immediate number, preceded by a "$"in the assembler, preceded by "%", so that it is separated from the symbol name area.
MOVL $4,%EBX
Similar to the previous instruction, generates an immediate number of 4, which is transferred to the EBX register.
int $0x80
The first two instructions are prepared for this instruction, when executing this instruction:
1. The int directive is called a soft interrupt instruction and can be deliberately generated with this instruction. Exceptions are handled like interrupts, the CPU switches from user mode to privileged mode, and then jumps to the kernel code to execute the exception handler.
2. The immediate number 0x80 in the INT directive is a parameter that, in the exception handler, determines what to do with this parameter, and in the Linux kernel, the int $0x80 is called the system call. The kernel provides a number of system services for use by the user program, but these system services cannot be called as library functions (such as printf) because the CPU is in user mode and cannot call the kernel function directly when executing the user program, so it is necessary to switch the CPU mode through the system call and enter the kernel through the exception handler. The user program can only pass a few parameters through the register, and then to follow the kernel design code route, and not by the user program arbitrary want to tune that kernel function, so that the system service is safe to call, after the end of the call the CPU switch back to user mode, continue to execute the instructions after the int instruction, The user program looks like a function call and return.
3. The value of the EAX and EBX registers is the two parameters passed to the system call, the value of EAX is the system call number, 1 means the _exit system call, and the EBX value is the parameter passed to the _EXIT system call, that is, the exit state. _exit This system call terminates the current process without returning it to continue execution. Different system calls require a different number of parameters, some will require EBX, ECX, edx Three register values to do parameters, most of the system calls will be returned after the user program continues to execute, _exit system calls special.
The two syntaxes of the x86 assembly: the Intel syntax and the at&t Syntax x86 assembler have always had two different syntaxes, using Intel syntax in Intel's official documentation, and Windows using Intel syntax, The UNIX platform assembler always usesthe at&t syntax, so this book uses the at&t syntax. MOV%edx,%eax This instruction if the Intel syntax to write, is mov eax,edx, register name does not add the number, and the source operand and the target operand position interchange. This book does not discuss the differences between the two syntaxes in detail, and the reader can refer to [ASSEMBLYHOWTO]. There are many books about the x86 assembly, and the UNIX platform books usethe at&t syntax, such as [Groudup], other books generally use Intel syntax, such as [x86assembly].
Second, the Register of x86
x86 's Universal registers are eax,ebx,ecx,edx,edi,esi. These registers can be used arbitrarily in most directives. However, some instructions restrict the use of some of these registers for some purpose, such as the Division directive IDIVL the dividend in the EAX register, the edx register must be 0, and the divisor can be any register. The quotient of the calculated result is stored in the EAX register (covering the divisor), and the remainder is stored in the edx register.
x86 's Special registers are ebp,esp,eip,eflags. An EIP is a program counter. EFlags saves the flag bits generated during the calculation, including carry, overflow, 0, negative four flag bits, in x86 's document these flags are called CF, of, ZF, SF. EBP and ESP are used to maintain stack frames for function calls.
Iii. Second compilation procedure
The assembler that asks for a set of maximum values:
&edi movl Data_item (,%edi,4),%eaxcmpl%ebx,%eaxjle start_loopmovl%eax,%ebxjmp Start_looploop_exit:mov $,%eaxint $0x8 0
Assembly Link Execution:
This program finds a maximum number in a set of numbers and takes it as the exit state of the program. This number is given in the. Data section:
Data_items:3,ten,9,5,9, 0 ,
The . Long indicates that a set of numbers, 32 bits per number, is equivalent to a C array. The array begins with a label data_items, and the assembler takes the first address of the array as the address represented by the Data_items symbol , data_items similar to the array name in C. Data_items This label does not have a. GLOBL declaration is because it is used only within this assembler, the linker does not need to know the existence of this name. Except for. Long declarations commonly used:
- . Byte, also declares a group of numbers, 8 bits per number
- . ASCII, for example:. ASCII "Hello World", which declares 11 numbers and takes the ASCII code of the corresponding character. Unlike the C language, it is declared that the end of the string does not have the '-s ' character.
The last number of the Data_items array is 0, we compare each number sequentially in one loop, and then terminate the loop when we hit 0. In this loop:
- The EDI register holds the current position in the array, adding 1 to the value of the EDI each time a number is compared, pointing to the next number in the array.
- The EBX register is saved to the maximum value that has been hit so far, and the value of EBX is updated if a larger number is found.
- The EAX register holds the number currently being compared, and each time EDI is updated, the next number is read into the EAX.
_start: MOVL $ 0,%edi
Initializes an EDI, pointing to the No. 0 element of the array.
Movl Data_items (,%edi,4),%eax
This instruction transmits the No. 0 element of the array to the EAX register . Data_items is the first address of the array, the value of EDI is the subscript of the array, and 4 means that each element of the array is 4 bytes, then the address of the EDI element in the array should be data_items+edi*4. reading the data from this address is written as the instructions above.
MOVL%eax,%EBX
The initial value of a ebx is also the No. 0 element of the array.
Enter a loop below, at the beginning of the loop is indicated by the label Start_loop, after the end of the loop is denoted by the label Loop_exit.
Start_loop: Cmpl $ 0 ,%eax JE loop_exit
Compare the value of EAX is not 0, if it is 0 indicates to the end of the array, it is necessary to jump out of the loop. The cmpl instruction subtracts two operands, but the result of the calculation is not saved, but the flag bit in the EFlags register is changed based on the result of the calculation. If the two operands are equal, the result is the ZF position 1 in 0,eflags. Je is a conditional jump instruction, it checks the eflags in the ZF bit, the ZF bit is 1 jump, the ZF bit is 0 does not jump continue to execute the next instruction. (Conditional jump instruction and comparison instruction are used together)je e means equal.
incl%edimovl Data_items (,%edi,4),%eax
Adds 1 to the EDI value, passing the next array in the array to the EAX register.
Cmpl%ebx,%eax Jle Start_loop
Compares the current array element eax to the maximum value found so far ebx, if the former is less than or equal to the latter, the maximum does not change, jump to the beginning of the loop to compare the next number, or continue to execute the next instruction. Jle is also a conditional jump instruction, le means less than or equal.
movl%eax,%EBX jmp Start_loop
The maximum value is updated ebx then jumps to the beginning of the loop to continue comparing the next number. jmp is an unconditional jump instruction , and what conditions do not judge direct jumps. The instruction behind the Loop_exit designator is used by the _exit system to exit the program.
Four, the way of addressing
Memory addresses can be represented in multiple ways in the instruction when accessing memory. Memory addressing can be represented in the directive as a common format:
Address_or_offset (%base_or_offset,%index,multiplier)
The address it represents can be computed like this:
FINAL ADDRESS = address_or_offset + base_or_offset + MULTIPLIER * INDEX
where Address_or_offset and multiplier must be constants, Base_or_offset and index must be registers. Some of these 4 items are omitted in some addressing methods, which are equivalent to 0 of these items.
- Direct addressing: Using only address_or_offset addressing, such as MOVL address,%eax transmits the 32-bit number at the addresses to the EAX register.
- Address addressing: Movl data_items (,%edi,4),%eax is in this way, and it is convenient to access the array
- Indirect addressing: Use only base_or_offset addressing, such as MOVL (%eax),%EBX, to treat the value of the EAX register as an address, and transfer the 32-bit number at this address to the EBX register.
- Base Address: Use only Address_or_offset and Base_or_offset addressing, such as MOVL 4 (%eax),%EBX, which is convenient for accessing struct members, such as the base address of a struct stored in the EAX register, One of the members in the structure body offset is 4 bytes, to read this member can use this command.
- Immediate number addressing: one of the instructions is an immediate number, example: MOVL $%eax.
- Register addressing: One of the operands in the instruction is a register. Registers are denoted by mnemonics in the assembler, in which a few bits are used to denote the number of registers, and these bits and addresses can be considered registers, but the memory address is not in an address space.
Hello world on Assembler can see another article from me: http://www.cnblogs.com/orlion/p/5316519.html
x86 compilation Program Basics (T-grammar)