Assembly Language Programming Reading notes (3)-Program examples

Source: Internet
Author: User

The main description of the three aspects of the content: the first is the assembly language program template, and the template involved in some of the knowledge points; the second is how to debug assembly language, and the third is how to call C library function in assembly language. 1. Composition of assembly language

Assembly language consists of paragraph (section), the code executed in a program, called the text section, the program may also have defined variables, there are the variables to be paid to the initial value in the data segment (the database), there is no value assigned to the original or to 0 of initial values placed in the BSS section. The text section must be there, and data and BSS may not.

2. Definition of paragraph

Defines a segment with a. Section syntax. Like what:

. Section. Text defines the paragraph,

. Section. Data defines the segment,

. Section. BSS defines a BSS segment.

The order does not have the required requirements, but in order to facilitate others to take over and understand your program, it is recommended to use from top to bottom in accordance with the order of data,bss,text paragraph definition.

3. Define the program start point

The text segment must define a starting point for the execution of the program, the LD defaults to _start, and GCC defaults to the standard library code, the code entry is _start, a program is executed, and then jumps to main to execute, so GCC by default requires the external source to define main, and cannot define _start, However, if you use the parameter-nostdlib, then the standard library code will not be connected by default, the entry point with _start is also not a problem; In addition, both LD and GCC support the-e parameter to specify the entry point, at which point any label can be used as an entry point.

This is illustrated in the following example. Reading notes with assembly language programming (2)-related tools 64-bit system CPUID2.S For example, the reader may not understand the details of the program, but later in this article, after reading this article, understand that the program is not a problem. At the moment just clear the program used to output the CPU ID of the vendor's string. The source program entry is _start. As follows:

Cpuid2.s # Cpuid2.s file.section. DataOutput:    . Asciz "CPUID is '%s ' \ n". Section. BSS    . Lcomm Buffer, 12. Section. Text.globl _start_start:    NOP    movl $,%eax    cpuid    movl $buffer,%edi    movl%ebx, (%edi)    MOVL%edx, 4 (%edi)    movl%ecx, 8 (%edi)    pushl $buffer    pushl $output    printf    Addl $8,%esp    PUSHL $    exit 

In the following three cases where the entry is _start,main,xxxx (any label), the AS,LD and GCC assembly CPUID2.S are used to generate the executable file.

1). Entrance for _start

Generate the executable file with As,ld (the entry is _start), as shown in:

GCC generates an executable file (the entry is _start), such as:

2). Entrance is main

The source program changes the label _start to Main. In this case, the LD can use the-e main parameter, then the main is the starting point of the program, GCC has two ways, one is to use the default compilation method, that is, without the-nostdlib parameter, GCC will connect the library code, the resulting execution file size is larger, Another way is to use-nostdlib, but specify the entry point with-e main.

Generate the executable file with As,ld (the entry is main), as shown in:

GCC generates the executable file in the first form (the entry is main), such as:

GCC generates the executable file in the second way (the entry is main), such as:

The first method generates a size of 4799, and the second generates a size of 2133.

3). Entrance for XXXX (arbitrary marking)

CPUID2.S source program to change the _start to XXXX, that is, the entrance to XXXX. This must be compiled or compiled using the-e xxxx.

Generate the executable file with As,ld (the entry is XXXX), as shown in:

GCC generates an executable file (the entry is XXXX), such as:

The visible entrance to XXXX contains the above entry for the case of _start and main.

In this case, regardless of the entry point tag, whether it is running on a 32-bit system or a 64-bit system, you can assemble and compile the assembler according to the following command.

# Suppose the assembler to assemble or compile has N, for INPUT_FILE1.S,INPUT_FILE2.S,...,_FILEN.S. n is greater than or equal to 1.
# input File list input_file1.s,...input_filen.s using {INPUT_FILE.S}, similarly, {INPUT_FILE.O} represents a series of. o Files
# The output executable file is output_file.
# The absolute path where the libc.so is located is represented by/libc_path, and the absolute path ld-linux.so.2 is represented by/ld-linux_path.
# The entry point label is Entry_point.
# [] is called the C library function is required to the part, to have not called C library, it is not required.
# then as,ld the command to generate the executable file is as follows:
As--32-o input_file1.o input_file1.s
As--32-o input_file2.o Input_file2.s
...
As--32-o input_filen.o input_filen.s

Ld-m elf_i386-e Entry_point [-dynamic-linker/ld-linux_path/ld-linux.so.2]-O output_file [-L/LIBC_PATH-LC] {Input_fil E.O}

# GCC commands to generate executable files are as follows:

Gcc-m32-e entry_point-nostdlib-o output_file [-L/LIBC_PATH-LC] {INPUT_FILE.S}

#注: If the source program does not call the C library function and uses the instructions in [] to connect or compile, the error "/usr/lib/libc.so.1:bad ELF interpreter: There is no file or directory"

4. External Program Label Declaration

If the code in one assembler file calls the label or function of another assembler file, you must declare that the label or function is. GLOBL (should be the global abbreviation, globally, and can be called across files).

For example, assembler Test3.s and TEST4.S,TEST3.S call Test4.s's function fun4, if TEST4.S does not. Globl Fun4 this line, then the compilation will prompt the error, plus this line does not have any problems.

Test3.s . Section. Text.globl _start_start:call FUN4MOVL $,%EAXMOVL $,%ebxint $0x80

Test4.s . Section. Text#.globl FUN4FUN4:MOVL $,%EAXMOVL,%ebxaddl%ebx,%eaxret


#符号把. Globl Fun4 commented, then the assembly into TEST3.O and TEST4.O after the connection, will prompt in the function _start, there is no definition fun4. Such as:

Remove the # and let the. globl fun4 function, without any problems, such as:

In summary, any label or function, if you want to be ready to call other files, then be sure to use the. GLOBL declaration.

5. Assembler Template

After the above discussion, it is easy to get the assembler template, as follows:

. Section. Data
    < initialize the value of the data here >
. Section. BSS
    < uninitialized data here >
. section. Text
. Globl Entry_point
Entry_point:
    < code instructions here >

Where Entry_point is the starting point of the program.

6. Examples of assembler programs

The example in the book is to read the CPU's Vendor ID (Vendor ID) with the CPUID assembler instruction. A few points of knowledge are briefly described before understanding this program.

1). About the CPUID directive

Input parameters are passed through the register eax, and after the CPUID is performed, the output is passed through Ebx,ecx,edx. As long as the eax=0 is known, ECX,EDX,EBX gets a 4 byte high of the string of the vendor ID, 4 bytes in the middle, and 4 bytes lower. The string for the vendor ID is arranged in the small end, which is to put the low byte first, that is, the vendor ID is [EBX][EDX][ECX].

This can be further understood by Test_cpuid.s the test code for CPUID, as shown in the following code:

Test_cpuid.s # Test_cpuid.s Program.section text.globl _START_START:NOPMOVL $,%EAXCPUIDMOVL $,%EAXMOVL $,%ebxint $ 0x80

With debug parameters-gstabs after generating the execution file with kdbg debugging, set breakpoints after the CPUID instruction, such as:

Register EBX,EDX,ECX, each byte is ASCII encoded, and these byte encodings are translated into strings according to EBX,EDX,ECX from low byte to high byte, as shown below:

That is, the vendor ID is "Genuineintel".

2). About Linux system calls

With the 0x80 software interrupt (int $0x80), you can call the kernel function of Linux, which kernel function is determined by the EAX register, and the parameters passed to the function have different meanings according to the function called, which is usually passed by Ebx,ecx,edx. The book will not be discussed further until the 12th chapter. The two system calls currently in use are simple to understand.

The first is call number 1th, called Exit Function Sys_exit (ret), eax=1 represents the call number, Ebx=ret passes the first argument, which is the return value returned to the parent process, that is, Sys_exit (ret) is equivalent to the following assembly code:

# sys_exit (ret) system call assembly code
MOVL $,%eax
MOVL $ret,%EBX
int $0x80

The second call is call 4th, called the function sys_write (int fd, const void *buf, size_t count), and three parameters are passed in Ebx,ecx,edx respectively, representing the file descriptor, the first address of the buffer to write, and the buffer byte length. As we all know, Linux with the file descriptor 1 is used to represent standard output (stdout), the default is to display the terminal, so to the display terminal to print length of string str, that is, sys_write (1, str, length) equivalent to the following assembly code:

# sys_write (1, str, length) assembly code for system calls
MOVL $4,%eax
MOVL $,%EBX
MOVL $str,%ECX
MOVL $length,%edx
int $0x80

3). Complete code example Cpuid.s

Under Code CPUID.S, read the vendor ID of the CPU, and then print to the screen. The code is:

Cpuid.s # CPUID.S program, print CPU Vendor id.section. DataOutput:    . ASCII "CPU ID is ' xxxxxxxxxxxx ' \ n"    #在data段定义字符串, The final result replaces xxxxxxxxxxxx with the output. Section. Text.globl _start_start:    movl,%eax                                  #获取CPU厂商ID到ebx, edx,ecx    CPUID        movl $output,%edi    movl%ebx, one (%edi) movl%edx, (%edi) movl%ecx    , (%edi)                      # Update Xxxxxxxxxxxx, in the string is the 11th byte beginning    movl $4,%eax    movl $,%ebx    movl $output,%ecx movl $    ,%edx    C18>int $0x80                                            #输出字符串, including line breaks within length of 25 bytes    MOVL $,%eax    movl $,%ebx    int $0x80                                           # Exit program, return 0 value

Build the executable and execute the results such as:

Understanding the above 1) and 2) of the knowledge, plus the comments in the code, this code is not difficult to understand, the premise is that a little bit of assembly language is necessary (not movl do not know it? ), do not repeat.

7. Debug Assembler

With kdbg Debug program can not remember GDB's instructions, kdbg the entire operation interface is very clear, to single-step operation, set breakpoints, observe variables, registers, memory, there is no problem, such as:

But there is a special need to note that the first NOP instruction after _start, if not, can not observe the memory, will be prompted out of range, then the NOP plus then everything is normal, the book mentions that no this NOP will cause unable to set breakpoints at _start, GdB Bug, And I in the kdbg is unable to wipe the memory, the breakpoint can be set. Therefore, it is recommended to add this NOP instruction first when debugging.

8. Call the C library function

The Cpuid.s code example above uses soft interrupts to invoke the Linux kernel functions for printing and exiting programs, and another way to implement printing and exit, which is to call the C language standard library function, print is printf, exit is exit.

If cpuid= "Genuineintel", then you can use printf ("CPU ID is '%s ' \ n", CPUID) to print and Cpuid.s the same output information. The following is an example of printf to describe how the assembler invokes C library functions.

Calling the subroutine is called call, then the call printf, if you want to implement printf ("CPU ID is '%s ' \ n", CPUID), then how to pass the two parameters? This is to use the stack, in general, the C language in the right-to-left order into the stack, that is, the first address of the string cpuid first into the stack, and then "CPU ID is '%s ' \ n" The first address into the stack. The assembly code into the stack is PUSHL. Therefore, you can implement printf as follows ("CPU ID is '%s ' \ n", CPUID).

# Assuming that the first address of "CPU ID is '%s ' \ n" is output, read back the CPUID string with the first address of buffer
# then the following code implements the C library function call:printf(output, buffer)
PUSHL $buffer
PUSHL $output
Printf

Note that the C language stack has a high address at the bottom, and the stack pointer esp register becomes 4 smaller after PUSHL, so if buffer and output are no longer used after call printf, you can set ESP to point to the address before the stack. So that these two parameters occupy a stack space that can be used. That is, after PUSHL two times, ESP is reduced by 8, so the ESP needs to be added 8 to restore the original stack position, i.e. Addl $8,%esp.

Here it is easy to understand: for the C library function func (param1, param2, ..., Paramn), the calling method is to precede the arguments from right to left in the stack, then call func. For further research, the ABI interface documentation is available for download.

Exit (ret) is easily implemented as a calling method:

Exit Assembly call code for (ret)
PUSHL $ret
Exit

The ABI of the 32-bit system and the 64-bit ABI (application Binary interface) are not the same, so the x64 system cannot be called so, so running 32-bit programs on 64-bit systems must follow the assembly and connection of 32 bits, Otherwise, an error will occur. can refer to the assembly language Program design reading notes (2)-related tools 64-bit system chapter article.

9. Rewrite the CPUID.S-CPUID2.S paradigm by invoking the C library function

In fact, this example is "3." Defines the example in the content of the program start point. Understand the meaning of Cpuid.s, understand how to call C library function, then this example can be easily understood. There are also two points to be explained for this example.

. Asiz: Because printf prints a string that ends in 0, you need to define a string that ends in 0, so use. Asciz instead of. ASCII.

. LCOMM: Declaration set aside a piece of local memory, here. Lcomm Buffer, 12 means a block of local memory of 12 bytes in size, and the first address is expressed in buffer.

The code is clear, first reading the CPUID to the EBX,EDX,ECX register, and then using the EDI Register as an index, stitching the read string into buffer, then invoking printf with output and buffer respectively, equivalent to calling printf (" The CPU ID is '%s ' \ n ', buffer) to print, and finally call exit (0) to return.

10. Concluding remarks

As described in this article, you should know how to design a assembler, including how system calls and C library calls are used, and finally assemble the connection to run on a 32-bit system or a 64-bit system.

Assembly Language Programming Reading notes (3)-Program examples

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.