[Relationship between Assembly and C Language] 2. main Function and startup routine, main routine
Why is the assembler entry _ start, and the C program entry is the main function? Let's explain the problem as follows:
In the x86 assembler BASICS (AT&T syntax), the steps for compiling and linking are as follows:
$ as hello.s -o hello.o$ ld hello.o -o hello
We use gcc main. c-o main to compile a c program, which is divided into three steps: compilation, assembly, and link.
$ Gcc-S main. c generate assembly code $ gcc-c main. s generate the target file $ gcc main. o generate executable files
We previously used ld to link the target file hello. o generated by the first assembler in "x86 assembler BASICS (AT&T syntax)". Can we use gcc? As follows:
Two errors are reported: 1. _ start has multiple definitions. One definition is included in our assembly code. Another _ start function defined by/usr/lib/cr1l. o; 2. crt1.o must call the main function. However, the main function definition is not provided in our assembly code. The last line shows these error messages reported by ld. So if we use gcc for Link, gcc actually calls ld to connect the target file crt1.o with the hello. olink we wrote.
If the target file is compiled and generated by the C program, it is right to use gcc for the link. The entrance of the entire program is crtl. the _ start provided by o, which first performs some initialization operations (hereinafter referred to as the Startup Routine, Startup Routine), and then calls the main function provided in C code. _ Start is the real entry point, and main is called by _ start. In the previous article [relationship between Assembly and C Language] 1. Function calling, gcc main. o-o main actually calls ld for a link, which is equivalent to the following command:
$ ld /usr/lib/crt1.o /usr/lib/crti.o main.o -o main -lc -dynamic-linker /lib/ld-linux.so.2
In addition to crt1.o, there are also crti. o. These two target files are connected with our hello. olink to generate the executable file main. -Lc indicates that the libc library needs to be connected. The-lc option is the default option of gcc and does not need to be written. For ld, it is not the default option. -Dynamic-linker/lib/ld-linux.so.2 specifies that the dynamic linker is/lib/ld-linux.so.2.
We can use readelf to view the content in crt1.o and crti. o. Here we only care about the symbol table. If you only want to view the symbol table, you can use the-s option of the readelf command or the nm command.
$ nm /usr/lib/crt1.o00000000 R _IO_stdin_used00000000 D __data_start U __libc_csu_fini U __libc_csu_init U __libc_start_main00000000 R _fp_hw00000000 T _start00000000 W data_startU main$ nm /usr/lib/crti.o U _GLOBAL_OFFSET_TABLE_ w __gmon_start__00000000 T _fini00000000 T _init
The U main line indicates that the main symbol is used in crt1.o, but is not defined (the U table shows Undefined). Therefore, another target file must provide a definition and be linked with crt1.o. Specifically, the address represented by the main symbol must be used in crt1.o. For example, there is an instruction that is the address represented by the push $ Symbol main, but I don't know what the address is, therefore, in crt1.o, this command is temporarily written as $0x0. when the olink is connected to an executable file, you will know the address. For example, 0x80483c4, then the command in the main of the executable file will be changed to push $0x80483c4 by the linker. Here, the linker serves as a symbolic Resolution. The linker also has a role of relocation, and the linker edits the target file. Therefore, the linker is also an editor, vi, and other editors that edit source files, the linker edits the target file, so it is also called the Link Editor. The T _ start line indicates that the _ start symbol is defined in crt1.o. The type of this symbol is code (T indicates Text ). We select several symbols from the above output to illustrate the relationship between them with diagrams:
The ld command we wrote above is simplified, and gcc also uses several other target files during the link process. Therefore, an extra box is drawn to indicate that the main is not only the executable file. o, crt1.o, and crti. the gcc-v option can be used to learn more about the compilation process.
The executable file main generated by the Link contains the symbols defined in each target file. The definition of these symbols can be seen through disassembly:
The undefined Symbol main in crt1.o is defined in main. o, so there is no problem in linking them together. Crt1.o also has an undefined symbol _ libc_start_main which is not defined in other target files. Therefore, it is still an undefined symbol in the executable file main. This symbol is defined in libc. libc does not link to the executable file main like other target files, but doesDynamic Link:
1. When the operating system loads and executes the main program, first check whether it has undefined symbols that require dynamic links.
2. if you need to do dynamic links, check which shared libraries are developed by this program (we use-lc to specify libc) and what dynamic linker is used for dynamic Linking (we use-dynamic-linker/lib/ld-linux.so.2 to specify the dynamic linker ).
3. The dynamic linker searches for the definition of these symbols in the shared library to complete the link process.
After learning about this, let's look at the _ start disassembly:
First, pressure the stack of a series of parameters, and then call the library function _ libc_start_main of libc for initialization. The parameter push $0x80483c4 of the Last Pressure stack is the address of the main function, __libc_start_main the main function is called after Initialization is completed. Because _ libc_start_main requires a dynamic link, the commands of this library function cannot be found in the disassembly of the executable file main. However, we found this:
At first, I thought it was libc that was linked in, but it wasn't. These three commands are located in the. plt segment rather than the. text Segment, And the. plt segment assists in the dynamic link process.
The prototype of the main function is int main (int argc, char * argv []). That is to say, the startup routine will pass two parameters to the main function.
Because the main function is called by the startup routine, the return value of the main function is still returned to the startup routine from the return value of the main function, if the startup routine is expressed as equivalent C code (in fact, the startup routine is usually written directly by sink), the main function is called in the form:
exit(main(argc, argv));
That is to say, after the startup routine gets the return value of the main function, it will immediately use it as a parameter to call the exit function. Exit is also a function in lib. It first cleans up the process and then calls the _ exit system call to terminate the process. the return value of the main function is finally passed to the _ exit system call and becomes the exit status of the process. We can also directly call the exit function in the main function to terminate the process without returning to the startup routine.
Note that the exit status is only 8 bits and is interpreted by Shell as the unsigned number. If you change the code above to exit (-1); or return-1; Then echo $? Output 255.
The _ exit function must contain the header file unistd. h.