In-depth analysis of assembler programs and system calls in Linux

Source: Internet
Author: User

1) Compile source program analysis:
Compile an at&t assembly language program to sleep for 10 seconds during running.

The source code is as follows:

# Include "sys/syscall. H"

. Data
Sleeptime:
. Long 10, 0

. Text

. Global _ start
. Type _ start, @ Function

_ Start:
Movl $ sys_nanosleep, % eax
Movl $ sleeptime, % EBX
Int $0x80

Movl $ sys_exit, % eax
Movl $0, % EBX
Int $0x80

 

1) The Assembly Language loads the C header file

######################################## ######################
# Include "sys/syscall. H"
######################################## ######################

The sys/syscall. h header file is loaded, and syscall. h defines the macro called by the system.
POSIX functions are simple encapsulation of system calls. POSIX functions are similar to open, close, and read functions.
That is to say, open is not a system call, but a POSIX function, but this POSIX function contains a system call assigned by the kernel.
For example:
# Include <syscall. h>
N = syscall (sys_read, FD, buffer, length );
In the preceding example, the system call is defined by the macro sys_read, while the READ function encapsulates the preceding system call.

2) syscall Technology Used in Linux

The syscall Technology Used in Linux is called the binary interface (ABI) of an application. Unlike the application interface (API), Apis require functions that are compatible with links,
Abi does not need to link the code that does not need to be run.
That is to say, when you directly call Abi (syscall), you do not need to link the library, but call the API, such as read, you need to link the library, such as libc.

3) code segment and Data Segment
######################################## #######################
. Data
Sleeptime:
. Long 10, 0
######################################## #######################
Linux is a 32-bit operating system running in protected mode. It adopts the flat memory mode. Currently, binary code in ELF format is the most commonly used.
An executable program in the ELF format is generally divided into the following parts :. text ,. data and. BSS, where. text is a read-only code area ,. data is a readable and writable data area,
BSS is a readable and writable data zone without initialization.
Code and data zones are collectively called sections in elf. You can use other standard sections or add custom sections as needed, but at least one elf executable program should have one. text
The data segment. Data is used here, And sleeptime:. Long 10.0 is defined. Long is the length of 32 bits.

 

4) entry to the code segment
######################################## #######################
. Text
. Global _ start
. Type _ start, @ Function
######################################## #######################
The code segment is defined here. text defines the global entry as _ start. The LD linker selects _ start as the symbol entry by default. If _ start is not specified in the program as the global entry, the compilation will receive the following warning, but it does not affect the operation.
/Usr/bin/ld: Warning: cannot find entry symbol _ start; defaulting to 0000000008048074
Note that when _ start is specified as the global program entry and/usr/lib/crt1.o is also linked at the link, an error is returned because crt1.o already provides a _ Start entry point, multiple definitions appear here, so an error is reported.
The entry point of a C program is the main function, which is actually inaccurate. The real entry point is _ start, because LD will default the CRT. O and the target program are linked together, that is, the program first calls _ start (startup routine) and then calls the main function through _ start.

 

5) system call
######################################## #######################
_ Start:
Movl $ sys_nanosleep, % eax
Movl $ sleeptime, % EBX
Int $0x80

Movl $ sys_exit, % eax
Movl $0, % EBX
Int $0x80
######################################## #######################
Sys_nanosleep and sys_exit are called here. It is similar to the encapsulation call of C's read function:
N = syscall (sys_read, FD, buffer, length );

 

2) Compilation and link

The above program is compiled using the following statement:
Gcc-o basic-nostdlib basic. s

1) Why do I need to specify-nostdlib.
The answer is that we do not need to call the main function. By default, the GCC compiler will generate a _ main function call at the beginning of the main function. If we use-nostdlib to compile the program, the _ main function will not be called, and-lgcc will not be added during the link.

Add-V during compilation to print the compilation process:
Gcc-o basic-nostdlib basic. S-V
Using built-in specs.
Target: i486-linux-gnu
Configured :.. /src/configure-V -- With-pkgversion = 'debian 4.3.2-100' -- With-bugurl = file: // usr/share/doc/gcc-4.3/readme. bugs -- enable-languages ages = C, C ++, Fortran, objc, OBJ-C ++ -- prefix =/usr -- enable-shared -- With-system-zlib -- libexecdir =/usr/lib -- without-included-gettext -- enable-threads = POSIX -- enable-NLS -- With-gxx-include-Dir =/usr/include/C ++/4.3 -- Program-suffix =-4.3 -- enable-clocale = GNU -- enable-libstdcxx- debug -- enable-objc-GC -- enable-mpfr -- enable-targets = all -- enable-ClD -- enable-checking = release -- Build = i486-linux-gnu -- Host = i486-linux-gnu -- target = i486-linux-gnu
Thread model: POSIX
GCC version 4.3.2 (Debian 4.3.2-1.1)
Collect_gcc_options = '-O ''basic ''-nostdlib''-v''-mtune = generic'
/Usr/lib/GCC/i486-linux-gnu/4.3.2/PC3-e-Lang-ASM-quiet-V basic. s-mtune = generic-fno-directives-only-o/tmp/ccsjdq1s. s
Ignoring nonexistent directory "/usr/local/include/i486-linux-gnu"
Ignoring nonexistent directory "/usr/lib/GCC/i486-linux-gnu/4.3.2/.../i486-linux-gnu/include"
Ignoring nonexistent directory "/usr/include/i486-linux-gnu"
# Include "..." search starts here:
# Include <...> search starts here:
/Usr/local/include
/Usr/lib/GCC/i486-linux-gnu/4.3.2/include
/Usr/lib/GCC/i486-linux-gnu/4.3.2/include-fixed
/Usr/include
End of search list.
Collect_gcc_options = '-O ''basic ''-nostdlib''-v''-mtune = generic'
As-v-QY-O/tmp/cc6h5zxo. O/tmp/ccsjdq1s. s
GNU extends er version 2.18.0 (i486-linux-gnu) using BFD version (GNU binutils for Debian) 2.18.0.20080103
Compiler_path =/usr/lib/GCC/i486-linux-gnu/4.3.2/:/usr/lib/GCC/i486-linux-gnu/4.3.2/:/usr/lib/GCC/i486-linux-gnu /: /usr/lib/GCC/i486-linux-gnu/4.3.2/:/usr/lib/GCC/i486-linux-gnu/:/usr/lib/GCC/i486-linux-gnu/4.3.2 /: /usr/lib/GCC/i486-linux-gnu/
LIBRARY_PATH =/usr/lib/GCC/i486-linux-gnu/4.3.2/:/usr/lib/GCC/i486-linux-gnu/4.3.2/:/usr/lib/GCC/i486-linux-gnu/4.3.2 /.. /.. /.. /.. /lib/:/lib /.. /lib/:/usr/lib /.. /lib/:/usr/lib/GCC/i486-linux-gnu/4.3.2 /.. /.. /.. /:/lib/:/usr/lib/
Collect_gcc_options = '-O ''basic ''-nostdlib''-v''-mtune = generic'
/Usr/lib/GCC/i486-linux-gnu/4.3.2/collect2 -- Eh-frame-HDR-M elf_i386 -- hash-style = both-dynamic-linker/lib/ld-linux.so.2-o basic-l /usr/lib/GCC/i486-linux-gnu/4.3.2-L/usr/lib/GCC/i486-linux-gnu/4.3.2-L/usr/lib/GCC/i486-linux-gnu/4.3.2 /.. /.. /.. /.. /lib-L/lib /.. /lib-L/usr/lib /.. /lib-L/usr/lib/GCC/i486-linux-gnu/4.3.2 /.. /.. /.. /tmp/cc6h5zxo. O

2) Add-nostdlib compilation and link process:

Step 1:
/Usr/lib/GCC/i486-linux-gnu/4.3.2/PC3-e-Lang-ASM-quiet-V basic. s-mtune = generic-fno-directives-only-o/tmp/ccsjdq1s. s
Step 2:
As-v-QY-O/tmp/cc6h5zxo. O/tmp/ccsjdq1s. s
Step 3:
/Usr/lib/GCC/i486-linux-gnu/4.3.2/collect2 -- Eh-frame-HDR-M elf_i386 -- hash-style = both-dynamic-linker/lib/ld-linux.so.2-o basic-l /usr/lib/GCC/i486-linux-gnu/4.3.2-L/usr/lib/GCC/i486-linux-gnu/4.3.2-L/usr/lib/GCC/i486-linux-gnu/4.3.2 /.. /.. /.. /.. /lib-L/lib /.. /lib-L/usr/lib /.. /lib-L/usr/lib/GCC/i486-linux-gnu/4.3.2 /.. /.. /.. /tmp/cc6h5zxo. O

3) the process of compiling and linking without-nostdlib:

Step 1:
/Usr/lib/GCC/i486-linux-gnu/4.3.2/PC3-e-Lang-ASM-quiet-V basic. s-mtune = generic-fno-directives-only-o/tmp/ccsjdq1s. s
Step 2:
As-v-QY-O/tmp/cc6h5zxo. O/tmp/ccsjdq1s. s
Step 3:
/Usr/lib/GCC/i486-linux-gnu/4.3.2/collect2 -- Eh-frame-HDR-M elf_i386 -- hash-style = both-dynamic-linker/lib/ld-linux.so.2-o basic/usr /lib/GCC/i486-linux-gnu/4.3.2 /.. /.. /.. /.. /lib/crt1.o/usr/lib/GCC/i486-linux-gnu/4.3.2 /.. /.. /.. /.. /lib/crti. o/usr/lib/GCC/i486-linux-gnu/4.3.2/crtbegin. o-L/usr/lib/GCC/i486-linux-gnu/4.3.2-L/usr/lib/GCC/i486-linux-gnu/4.3.2-L/usr/lib/GCC/i486-linux-gnu/4.3.2 /.. /.. /.. /.. /lib-L/lib /.. /lib-L/usr/lib /.. /lib-L/usr/lib/GCC/i486-linux-gnu/4.3.2 /.. /.. /.. /tmp/cccfpbfe. o-lgcc -- as-needed-lgcc_s -- no-as-needed-LC-lgcc -- as-needed-lgcc_s -- no-as-needed/usr/lib/GCC/i486-linux-gnu/ 4.3.2/crtend. o/usr/lib/GCC/i486-linux-gnu/4.3.2 /.. /.. /.. /.. /lib/crtn. O

NOTE: If nostdlib is not added for compilation, an error is returned.

The comparison result shows that the third step of collect2 is different.

4) What is collect2
Collect2 is an encapsulation of the LD linker. In the end, it still needs to call LD to complete the link work. collect2 aims to collect the special symbols named in the target file before implementing the code of the main function.
These special symbols indicate that they are global constructors or executed before Main. collect2 will generate a temporary one. the C file collects the addresses of these symbols into an array and then places them in this. in the c file, it is linked to the final output file together with other target files after compilation.
-Nostdlib is not added here, so it will not call _ main, nor link the target file to be referenced by the main function, nor collect special symbols.

 

5) Why does nostdlib report an error?

The following error message does not contain-nostdlib:

Gcc-o basic. s
/Tmp/ccanei0c. O: In function '_ start ':
(. Text + 0x0): Multiple definition of '_ start'
/Usr/lib/GCC/i486-linux-gnu/4.3.2/.../../lib/crt1.o :(. Text + 0x0): first defined here
/Usr/lib/GCC/i486-linux-gnu/4.3.2/.../lib/crt1.o: In function '_ start ':
(. Text + 0x18): Undefined reference to 'main'
Collect2: LD returned 1 exit status

There are two errors.
First error:
(. Text + 0x0): Multiple definition of '_ start'
It tells us that _ start is referenced in multiple places. We only specify the _ start reference in one place, and where is the other reference?
The answer is in/usr/lib/crt1.o:
View the crt1.o symbol table:
Nm/usr/lib/crt1.o
00000000 r_ io_stdin_used
00000000 D _ data_start
U _ libc_csu_fini
U _ libc_csu_init
U _ libc_start_main
00000000 r_fp_hw
00000000 T _ start
00000000 W data_start
U main
We can see that _ start is referenced first before Main is called.

OK. Why is there no such error when-nostdlib is added for compilation, because adding-nostdlib does not link crt1.o, so it does not collect its symbol table and naturally does not report an error.

What if I use another name for the program entry point in the assembler?

The compilation result does not report this error.
For example:
Gcc-o basic. s
/Usr/lib/GCC/i486-linux-gnu/4.3.2/.../lib/crt1.o: In function '_ start ':
(. Text + 0x18): Undefined reference to 'main'
Collect2: LD returned 1 exit status

(. Text + 0x18): What is the error of undefined reference to 'main?
It is because-nostdlib is not added, it is still linked to crt1.o, that is, it collects its symbol table, the symbol table still has the main call, and there is no main function in the assembler, therefore, an error is reported.
This is the cause of the second error.

 

3) use the stract command to analyze this program

The command is as follows:
Strace-T./basic
07:27:56 execve ("./basic", ["./basic"], [/* 17 vars */]) = 0
07:27:56 nanosleep ({5, 0}, null) = 0
07:28:01 _ exit (0) =?

Three system calls are involved here:
1) execve ("./basic", ["./basic"], [/* 17 vars */]) = 0
Execve is a sub-process fork in the parent process. In the sub-process, the exec function is called to start a new program.
There are a total of six exec functions, of which execve is a kernel-level system call. Other (execl, execle, execlp, execv, execvp) are the library functions that call execve.

2) nanosleep ({5, 0}, null) = 0
Is in our assembler
Movl $ sys_nanosleep, % eax
Movl $ sleeptime, % EBX
Int $0x80
System Call, meaning the program's sleep time.

3) _ exit (0) =?
Is in our assembler
Movl $ sys_exit, % eax
Movl $0, % EBX
Int $0x80
To exit the program.

 

These three system calls (syscall) Enable the program to switch from the user mode to the kernel mode, while the strace command will trace the system calls in the kernel mode and print the results.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.