Experiment: ELF file format and program compilation link
The creation of executable files? Overview of the process from source code to executables: source code (. C. cpp. h) After C preprocessor (CPP) generates. i files, compiler (CC1, Cc1plus) compiles. s files, assembler (AS) After assembling the. s file, the. o file is generated, and the linker (LD) link. o file generates an executable file. GCC is the packaging of CPP, CC1 (Cc1plus), as, LD, which will invoke the background program according to different parameter requirements.
Take the HelloWorld procedure as an example:
Gcc -E -o hello.cpp hello.c -m32 //Generate preprocessed files
Hello.cpp / / preprocessing is responsible for including include files and macro replacement work
Gcc -x cpp-output -S -o hello.s hello.cpp -m32 Compiled into assembly code hello.s
Gcc -x assembler -c hello.s -o hello.o -m32 Compile into object code and get the binary file hello.o
Gcc -o hello hello.o -m32 link to executable file hello
./hello runs the hello file
Second, the composition of the executable file? (1) There are three main types of files in elf format:? ① relocatable Files: Keep the code and the appropriate data together with other object files to create an executable file or a shared file. For example. o files. ② executables: Files that can be run. This file indicates how exec (Ba_os) created the process image. Again to associate the difference between the program and the process. Is this executable file a process or a program? We found that it contained only segments of. Text and. Data, not a stack segment. So you can be sure it's just a program. When it is called into memory by the operating system, it does not really become a process. For example, an. out file. ③ Shared object file: The code and data are saved and linked by two linker. One is the connection editor, and other object files can be relocated and shared to create additional objects. The second is a dynamic linker that unites an executable file and other shared object files to create a process image.? (2) The head of the elf file: Use the command readelf-h hello to view the header of the Hello file:
? The elf's head holds the metadata, which is the roadmap that describes the organization of the document. The Program Header table, for example, tells the system how to create a memory image of a process. The section Headers table (section Header table) contains information describing the sections of the file. Each section has an entry in the table, and each entry gives information such as its name, size, and so on. The remainder of the elf is sections, including code snippets, data segments. These are loaded into the virtual address space of the memory when the program becomes a process image, loaded from the elf head, so you can see that the real code starts with 0x400430, which is the real program entry. The code required for a static link is all in the code snippet, and the dynamic link is not the same, it will be at runtime to find the middle part of the memory loaded library functions. Iii. Loading of executable programs (1) Loading: Execution Environment shell of executable program. The shell is the user-typed command that loads and executes the executable's console. The essence of the shell is to provide a graphical interface that parses a user-written string into a really executed command or executable program. Two questions: What is the program or instruction that ① really executes? The answer is that the EXECVE system call (the library function exec* are EXECVE encapsulation routines). ② How do I call the system to pass the parameter? That is, what is the default format for the pass parameter? A: The shell will pass in EXECVE parameters are two, one is the program itself parameters, that is, the parameters of main ARGC,ARGV, the second is the shell environment variable parameters, ENVP string array. To see the type of the EXECVE parameter:
int execve(const char * filename,char * const argv[ ],char * const envp[ ]);
? Of course, some programs have main functions that do not handle environment variable parameters, such as common:
? But sometimes it is supported, for example, so that the parameters of the environment variables passed in at this time will be used for parsing.
int main(int argc, char *argv[], char *envp[])
? (2) How does Execve pass the parameters from the kernel state to the user-state stack of the process (assuming the process is user-state)?
The discovery is still copied from the user-state data segment to the user-State stack. So what's the relationship with the kernel? The process of executing execve is the current shell, so the parameters are first pressed into the kernel stack of the current shell process. The key to the kernel is the pointer, so the kernel (Sys_ execve) is to copy the value of the pointer back to the new process's code snippet, and then copy it to the user stack segment of the process. The initialized process memory address space is the second image. It also explains the role and results of the Sys_ EXECVE loading process and initialization. The shell always fork a shell to execute the command, so when the new process comes up, the shell that launches it ends, and the parameters that have been saved are not. (3) Two dynamic links? ① dynamic link on load (load-time linking): This approach is based on the premise that you know exactly which functions of the dynamic library to invoke before compiling, and that only the necessary link information is retained in the target file at compile time, without the dynamic library function code When the program executes, the function is invoked by using the link information to load the dynamic library function code and to link it in memory to the execution space of the calling program (all functions are loaded into memory), the main purpose of which is to facilitate code sharing. (dynamic loader, in the loading phase, primarily to share code, share code memory)? ② runtime dynamic Link (run-time dynamic linking): This means that you do not know which dynamic library functions will be called before compiling. It is entirely in the running process to decide which function should be called, load it into memory (only the function that is called into memory), and identify the memory address, and other programs can use the program, and get the entry address of the dynamic library function. (The dynamic library only has one copy in memory, in the running phase) four, using GDB trace analysis of a EXECVE system call kernel processing function Sys_execve? (1) Add EXECVE system call command:
? (2) Locate the Boot kernel command in makefile:
? (3) After booting the kernel, find the added EXEC command, execute exec--new loaded execution program to output "Hello World":
? (4) After freezing gdb trace, set breakpoints:
? (5) Enter DO_EXECVE interior
? (6) Carry on, to Load_elf_binary.
? (7) After the list, you can see the static link when the Elf_interp is empty
? (8) Re-execution, tracking to Start_thread
V. Summary: The new executable is executed from the New_ IP, and the Start_ thread is actually the next instruction that returns to the user state from the int 0x80 and becomes the entry location for the specified newly loaded executables, i.e. the value of the EIP that modifies the kernel stack as the starting point for the new program. When executing to the EXECVE system call, fall into the kernel state, the executable file loaded with Execve to overwrite the current process, and when the EXECVE system call returns, returns the execution starting point (main function location) of the new executable program. So the new executable program can execute smoothly after the EXECVE system call returns. For statically linked executables and dynamically linked executables EXECVE system call returns, if it is a static link, elf_ entry points to the header specified by the executable (the location 0x8048*** for the main function); If you need to rely on a dynamic-link library, elf_ Entry points to the starting point of the dynamic linker.
2017-2018-1 20179215 "Linux kernel Fundamentals and analysis" Eighth week assignment