Linux Platform x86 compilation (iv): from "Hello world!" "Start

Source: Internet
Author: User

"Copyright Notice: respect for the original, reproduced please retain the source: blog.csdn.net/shallnet, the article only for learning Exchange, do not use for commercial purposes"
Assembly language programs consist of well-defined segments, each of which has its own purpose. The three most commonly used segments are: Data segment, BSS segment, text segment. The text snippet is the place where the script is declared within the executable program, and all assembler must have a text segment, and the data and BSS segments are optional, but are often used in the program. The data segment declares a variable with an initial value, and the BSS segment declares a data element initialized with a value of 0, which is commonly used as a buffer for the assembler. The layout of the assembly language program.
The GNU assembler uses the. Section command statement to declare a segment: The section statement has only one parameter-the segment type. Layout is the general way in which the assembler arranges the segments. A BSS segment is always preceded by a text segment, and the data segment can be preceded by a text segment, but it is easier to read and understand before it is placed. Like other high-level languages, when an assembly language program is linked to an executable, the linker must know what the starting point in the program is, just like the main function in C. The GNU Assembler uses a default label, _start, as the entry point for the application, and generates an error message if the linker cannot find the tag. If you write a set of tools that are used by an external assembly language or C language program, you need to use the. GLOBL command to declare each function segment label, and the. GLOBL command is a program label that declares that an external program can access. So, the basic template for writing assembly language generally is this:
. Section.data    < Here is the initialization variable >.SECTION.BSS    < here is an uninitialized variable >.section.text.globl _start_start:    < Here is the instruction code >
with the template, you can start creating assembly language programs, and we'll start with the simplest program, just like learning high-level languages. Writing assembly language mainly works in writing. The text section, which mainly writes the script to implement the application. Assembly language allows programmers to use mnemonics to denote scripts, mnemonics enable programmers to use English-style words to denote individual scripts, and assembler can easily convert assembly language mnemonics into original scripts. This allows the assembler programmer to not have to understand what each byte of the script represents, and the child needs to use a more memorable mnemonic (such as push, MOV, sub, call) to represent the script. Example of the following script:
5589 E583 EC 08c7 FC 0083 EC 0c6a 00E8 D1 FE FF FF
can be written as the following assembly code:
Push%ebpmov  %esp,%ebp  Sub  $0x8,%espmovl $0x1, -4 (%EBP) Sub  $0xc,%esppush $0x0call 8048348
    • Data segment
as with high-level languages, writing assembly-language programs requires managing some type of variable, and the data and BSS sections in assembly language provide a way to define variables. Data segments are the most common location for defining variables. defining a variable in a data segment requires two statements: a symbol, a command. a symbol is similar to the name of a variable in a C program, and it is just a place to use as a reference pointer when the assembler tries to access the memory location. The command implements how many bytes are reserved for the data element referenced by the symbol, similar to the high-level language specified data type. Assembly language uses the following commands:. ASCII,. Asciz,. Byte,. Double,. float,. int,. Long,. Octa,. Quad,. Short,. Single. For example , define the variables as follows:
. section. Datamsg:        . ASCII "This is a test message"
data segments are primarily used to define variable data, but you can also use commands. equ defines static data symbols, similar to the definition constants for high-level languages. For example:
. equ var 3</span>
When you reference static data, you need to add a $ symbol before the variable name, such as transferring the value of Var to the EAX register:
MOVL $var,%eax
    • BSS segment
defining a data element in a BSS segment is somewhat different than defining it in a data segment, and you do not need to specify a specific data type. the GNU assembler uses two commands to declare buffers, the. Comm command declares a common area of memory for uninitialized data, and the. Lcomm command declares an uninitialized data-local common memory region that does not allow access from outside the local assembler code. It is used in the following format:
. comm Symbol, length
Symbol is the sign assigned to the memory area, and length is the number of bytes contained in the memory area. one benefit of declaring data in a BSS segment is that the data is not included in the executable program, and the data defined in the data segment must be included in the executable program.
Let 's take a look at assembly language's Hello World program:
#hello. S sample program-to-print Hello World information.section data    #数据段声明msg:    . ASCII "Hello world!\n"    # The string to output    len=.-msg                        #字符串长度. Section. Text    #代码段声明 #. Global main# Main:.global _start     #指定入口函数_start:                                 #函数在屏幕上输出hello world!movl $len,%edx               #第三个参数: string length movl $msg,%ecx             #第二个参数: Hello world! string movl $,%ebx< c12/> #第一个参数: Output file descriptor Movl $4,%eax                  #系统调用号sys_writeint $0x80                            #调用内核功能 # below for Exit program code MOVL $,%ebx                #第一个参数: Exit return code MOVL $%eax                #系统调用sys_exitint $0x80                        #调用内核功能
The results of the compilation execution are as follows:
$ as-o hello.o hello.s$ ld-o Hello hello.o $./hello Hello world!$
system calls under Linux are implemented by means of interrupts (int 0x80). When the int 0X80 instruction is executed, the system call number is stored in the register eax, and the parameters passed to the system call must be placed in the Register Ebx,ecx,edx,esi,edi in order, and when the system call is complete, the return value can be obtained in the register EAX. The function call that corresponds to system call number 4 isSys_write, on the application its function is defined as follows:ssize_t Write (int fd, const void *buf, size_t count);The parameter fd, BUF, and count exist in registers ebx, ECX, and edx, while the system call Sys_write is placed in register eax, and the return value can be obtained from register 0x80 when the int eax instruction is executed.
Note that if you compile with gcc there is a problem, GCC looks for the main tag instead of the _start tag, so change the _start in the program to main directly using GCC to compile the link is no problem.

Linux Platform x86 compilation (iv): from "Hello world!" "Start

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.