The simplest assembler

Source: Internet
Author: User

example 18.1. The simplest assembly procedure
#PURPOSE: Exits and returns a# status code back to the  Linux kernel# #INPUT:   none# #OUTPUT:  Returns a status code. This can is viewed# by  typing##  Echo $?## after  running the program# #VARIABLES: #  %eax holds the system cal L number#  %EBX holds the return status#. Section. Data. Section. Text. Globl _START_START:MOVL $,%eax# this is the  Linux kernel command# number (System call) for exiting# a program movl $4,%ebx# This is the status number we will# return To the operating system.# change this around and it will# return different things to# echo $? int $0x80# This wakes up the kernel to run# the exit command

Save this program as a file hello.s (the assembler is usually used .s as the filename suffix), and then use the assembler (assembler) to as translate the mnemonic in the assembler into machine instructions to generate the target file hello.o :

$ as Hello.s-o hello.o

Then use the linker (Linker, or link editor) ld to link the target file to an hello.o executable file hello :

$ ld Hello.o-o Hello

In the 2nd section main , "Functions and startup routines," we will refer to the process of linking multiple target files into an executable file, which is the main function of the link. Although we have only one target file in this example, we also need to be linked to be an executable because the linker modifies some of the information in the target file, which is explained in detail in section 5.2, "executable file". Now execute this program, it only does one thing is to exit, exit status (Exit status) is 4, in the shell can use special variables to $? get the exit status of the previous command:

$./hello$ Echo $?4

The number in the program # represents a single-line comment, similar to the C-language // comment. The non-commented code is interpreted on a row-by-line basis.

. Section. Data

The first name in the assembler . is not the mnemonic of the instruction, it is not translated into a machine instruction, but a special instruction to the assembler, called the assembly instruction (assembler Directive) or pseudo operation (pseudo-operation), Because it is not a real instruction, add a " pseudo " word. .sectionindicates that the code is divided into several segments (section), when the program is loaded and executed by the operating system, each segment is loaded to a different address with different read, write, and execute permissions. The data of the .data segment save program is readable and writable, and the global variables of the C program belong to the .data segment. There is no data defined in this program, so the .data segment is empty.

. section. Text

.textThe segment Save code is read-only and executable, and the following commands belong to this .text segment.

. globl _start

_startis a symbol, the symbol in the assembler represents an address, can be used in the instruction, the assembler after the processing of the assembler, all the symbols are replaced by the address value it represents. In C language we access a variable by variable name, in fact, read and write a memory unit of an address, we call a function through the function name, in fact, jump to the address of the first instruction of the function, so the variable name and function names are symbols, essentially represents the memory address.

.globlThe instruction tells the assembler that _start this symbol is to be linked to, so give it a special tag in the symbol table of the target file (described in section 5.1, "Destination file"). _startjust as the function of C program main is special, is the entrance of the whole program, the linker will find the address of the symbol in the target file when linking, _start set it as the entry address of the whole program, so each assembler should provide a _start symbol and .globl declare it. If a symbol is not .globl declared with an indication, it means that the symbol is not used by the linker.

_start:

_startHere is the same as the C language of the statement label. The assembler computes the address of each data object and each instruction when it processes the assembler, and when the assembler sees such a label, it takes the address of the following instruction as the address _start represented by the symbol. and _start This symbol is more special, it represents the address is the entire program's entry address, so the next instruction becomes movl $1, %eax the first command executed in the program.

MOVL $,%eax

This is a data transfer instruction that generates a number 1 inside the CPU and then transmits it to the eax register. movThe following l indicates a long, indicating a 32-bit delivery instruction. The number generated internally by the CPU is called the immediate number (Immediate), in the assembler, immediately before the number is added, the $ register name % is preceded by, in order to be separated from the symbol name area.

MOVL $4,%EBX

Similar to the previous instruction, generates an immediate number of 4, which is transferred to the ebx register.

int $0x80

The first two instructions are prepared for this instruction, and the following actions occur when the command is executed:

  1. intThe instruction is called the soft interrupt instruction, can use this instruction to deliberately produce an exception, the previous chapter said, exception processing and interrupt similar, the CPU switches from user mode to privileged mode, and then jumps to the kernel code to execute the exception handler.

  2. The immediate number 0x80 in the
  3. int directive is a parameter that is determined by this parameter in the exception handler, This exception is called System call. The kernel provides a number of system services for use by user programs, but these system services cannot be called like library functions (such as printf ) because the CPU is in user mode when executing the user program and cannot call the kernel function directly. So you need to switch CPU mode through the system call, through the exception handler to enter the kernel, the user program can only pass through the register several parameters, after the kernel design good code route, and not by the user program arbitrary, want to tune which kernel function to tune which kernel function, so as to ensure that the system services are safely called. After the call is over, the CPU switches back to user mode, continuing with the instructions that follow the int directive, which appears to the user program as if the function was called and returned.

  4. eaxAnd the value of the ebx register is the two parameters passed to the system call, eax the value is the system call number, 1 means the _exit system call, ebx the value is passed to _exit the system call parameters, that is, the exit state. _exitThis system call terminates the current process without returning it to continue execution. In the future we will talk about other system calls, is also int $0x80 caused by instructions, eax the value is the number of system calls, different system calls require different number of parameters, such as some need ebx , ecx , edx Three register values to do parameters, Most system calls are returned after the user program continues to execute, in this case the _exit system call is more special.

x86 compilation of two grammars: Intel Grammar and T-grammar

The x86 assembler has two different syntaxes, using Intel syntax in Intel's official documentation, and Windows using Intel syntax, while the UNIX platform assembler always uses the-t syntax, so this book uses the-t syntax. mov %edx,%eaxIf this instruction is written in Intel syntax, the mov eax,edx register name is not added % , and the source operand and the target operand are swapped. This book does not discuss the differences between the two syntaxes in detail, and the reader can refer to [ASSEMBLYHOWTO].

Introduction x86 compilation of books, UNIX platform books are used at the AT-t syntax, such as [Groudup], other books generally use the Intel syntax, such as [x86assembly].

Exercises

1, the example of this section to int $0x80 remove the instructions, assembly, link can also pass, but the execution of the time there is a paragraph error. Can you explain the reason?

Did not switch to the kernel state, there are no arguments but no calls.

The simplest assembler

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.