Program compilation link process, program compilation link Process
Let's start with HelloWorld...
#include <stdio.h>int main(int argc, char* argv[]){ printf("Hello World!\n"); return 0;}
Follow these steps to connect the source file hello.cppto hello.exe:
Run the following command to generate an executable file from the source file:
Linux:
Gcc-lstdc ++ Hello. cpp-o Hello. out // The lstdc parameter must be included; otherwise, the undefined reference to '_ gxx_personality_v0' error will be reported. g ++ Hello. cpp-o Hello. out
Note: The suffix is. c file gcc treats it as c code, while g ++ treats it as c ++ code. Both gcc and g ++ are callers, and the final called compiler is ), cc1plus (c ++ c code ).
In addition, gcc does not automatically link to the c ++ standard library during the link stage. You need to include the-lstdc ++ parameter to connect.
Windows:
cl Hello.cpp /link -out:Hello.exe
Preprocessing: It is mainly used to replace the code text. (This replacement is a recursive layer-by-layer expansion process .)
(1) Delete all # define and expand all macro definitions.
(2) process all conditional pre-compilation commands, such as: # if # ifdef # elif # else # endif
(3) Processing # include pre-compiled commands, which Insert the contained files into the positions of the commands. This process is recursive.
(4) Delete all comments // and /**/
(5) Add the line number and the name of the file to generate the line number information for debugging and display the line number during compilation errors or warnings.
(6) Keep all # pragma compiler instructions because the compiler needs to use them
Linux:
cpp Hello.cpp > Hello.igcc -E Hello.cpp -o Hello.ig++ -E Hello.cpp -o Hello.i
Line number and file name identifier explanation:
#32 "/usr/include/bits/types. h "2 3 4 // indicates the following behavior types. row 3 of h: typedef unsigned char _ u_char; typedef unsigned short int _ u_short; typedef unsigned int _ u_int; typedef unsigned long int _ u_long;
Above, # The meaning of the Number 2 3 4 at the end of the row:
1-open a new file
2-return the previous file
3-The following code comes from the System File
4-The following code is implicitly wrapped in extern "C"
Do not generate line numbers and file name identifiers:
cpp -P Hello.cpp > Hello.igcc -E -P Hello.cpp -o Hello.ig++ -E -P Hello.cpp -o Hello.i
Windows:
cl /E Hello.cpp > Hello.i
Line number and file name identifier explanation:
# Line 283 "C :\\ Program Files \ Microsoft Visual Studio \ VC98 \ include \ stdio. h "// indicates the following behavior stdio. line 3 of h void _ cdecl clearerr (FILE *); int _ cdecl fclose (FILE *); int _ cdecl _ fcloseall (void );
Do not generate line numbers and file name identifiers:
cl /EP Hello.cpp > Hello.i
Compile: Perform a series of pre-processed filesLexical Analysis(Lex ),Syntax analysis(Yacc ),Semantic AnalysisAndOptimizationThe assembly code is generated later. This process is the core part of the program construction.
Linux:
/usr/lib/gcc/i586-suse-linux/4.1.2/cc1 Hello.cpp
The "Hello. s" file generated by using is as follows (Because Hello. cpp does not have the c ++ feature, you can also compile it using the c language compiler ):
.file "Hello.cpp" .section .rodata.LC0: .string "Hello World!" .text.globl main .type main, @functionmain: leal 4(%esp), %ecx andl $-16, %esp pushl -4(%ecx) pushl %ebp movl %esp, %ebp pushl %ecx subl $4, %esp movl $.LC0, (%esp) call puts movl $0, %eax addl $4, %esp popl %ecx popl %ebp leal -4(%ecx), %esp ret .size main, .-main .ident "GCC: (GNU) 4.1.2 20070115 (prerelease) (SUSE Linux)" .section .note.GNU-stack,"",@progbits
For cpp files with c ++ features, use cc1plus for compilation, or use the gcc command for compilation (it will use the extension name to choose to call cc1plus or cc1plus)
/usr/lib/gcc/i586-suse-linux/4.1.2/cc1plus Hello.cppgcc -S Hello.cpp -o Hello.sg++ -S Hello.cpp -o Hello.s
Windows:
cl /FA Hello.cpp Hello.asm
The Hello. asm file generated by vc6 is as follows:
TITLE Hello.cpp .386Pinclude listing.incif @Version gt 510.model FLATelse_TEXT SEGMENT PARA USE32 PUBLIC 'CODE'_TEXT ENDS_DATA SEGMENT DWORD USE32 PUBLIC 'DATA'_DATA ENDSCONST SEGMENT DWORD USE32 PUBLIC 'CONST'CONST ENDS_BSS SEGMENT DWORD USE32 PUBLIC 'BSS'_BSS ENDS_TLS SEGMENT DWORD USE32 PUBLIC 'TLS'_TLS ENDSFLAT GROUP _DATA, CONST, _BSS ASSUME CS: FLAT, DS: FLAT, SS: FLATendifPUBLIC _mainEXTRN _printf:NEAR_DATA SEGMENT$SG579 DB 'Hello World!', 0aH, 00H_DATA ENDS_TEXT SEGMENT_main PROC NEAR; File Hello.cpp; Line 7 push ebp mov ebp, esp; Line 8 push OFFSET FLAT:$SG579 call _printf add esp, 4; Line 9 xor eax, eax; Line 10 pop ebp ret 0_main ENDP_TEXT ENDSEND
Assembly: Assembly code-> machine commands.
Linux:
as Hello.s -o Hello.ogcc -c Hello.cpp -o Hello.og++ -c Hello.cpp -o Hello.o
Windows:
cl /c Hello.cpp > Hello.obj
At this point, the generated target file is similar to the final executable file in structure.
Link: The link mentioned here should be called static link strictly. Multiple Target files, libraries-> final executable files (the splicing process ).
Executable File category:
Linux ELF files-bin, a, and so
Windows PE files-exe, lib, dll
Note: Both the PE file and the ELF file are COFF file variants.
Linux:
ld -static /usr/lib/crt1.o /usr/lib/crti.o /usr/lib/gcc/i586-suse-linux/4.1.2/crtbeginT.o -L/usr/lib/gcc/i586-suse-linux/4.1.2/ -L/usr/lib -L/lib Hello.o --start-group -lgcc -lgcc_eh -lc --end-group /usr/lib/gcc/i586-suse-linux/4.1.2/crtend.o /usr/lib/crtn.o -o Hello.out
Note:-static: force all-l options to use static links;-L: link the search path of the external static library and dynamic library;
-L: Specifies the name of the static library (the name of the last library is libgcc. a, libgcc_eh.a, and libc. );
-- Start-group... -- end-group: The content can only be a file name or a-l option. To ensure that the symbols in the content item can be parsed, the linker cyclically searches for all content items.
This method has performance overhead. It is best to use it when two or more content items have circular references.
Windows:
link /subsystem:console /out:Hello.exe Hello.obj
Static LibraryEssentially, it is a compressed package containing a bunch of intermediate target files. Just like a zip file, the external symbol addresses contained in each intermediate file are not corrected by the linker.
View static library content
Linux:
ar -t libc.a
Windows:
lib /list libcmt.lib
Decompress the content in the static library
Linux: [decompress all the o files in libc. a to the current directory]
ar -x /usr/lib/libc.a
Windows: [decompress atof. obj in libcmt. lib to the current directory]
lib libcmt.lib /extract:build\intel\mt_obj\atof.obj
Generate static library
Linux:
ar -rf test.a main.o fun.o
Windows:
lib /out:test.lib main.obj fun.obj
Symbol-interface of the link
Each function or variable has its own unique name to avoid confusion between different variables and functions during the link process.
In the link, functions and variables are collectively referred to as symbols. function names or variable names are symbol names, and function or variable addresses are symbol values.
Each target file has a symbol table with the following symbols:
(1) Global symbols defined in the current target file, which can be referenced by other target files
For example, global variables and global functions
(2) The Global symbols referenced in this target file are not defined in this target file-External symbols)
Such as the extern variable, printf, and other library functions.
(3) segment name. This symbol is generated by the compiler and its value is the starting address of the segment.
Such as. text and. data of the target file.
(4) Local symbols, which are visible inside
For example, static variables
During the link process, the first and second categories are concerned.
View symbols
Linux:
nm Hello.oreadelf -s Hello.oobjdump -t Hello.obj
You can install MinGW on windows to obtain these tools.
Windows:
dumpbin /symbols Hello.obj
Name Decoration)
Symbol modification is actually the process of renaming a variable or function. The factors that affect the name include:
(1) There are differences in modification rules for different languages
For example, the foo function is modified to _ foo in C and to _ foo _ in Fortran _
(2) features introduced by object-oriented languages (such as C ++)
Such as class, inheritance, virtual mechanism, overload, namespace, etc.
----------------------------- MSVC compiler -----------------------------
The MSVC compiler uses the _ cdecl call Convention by default (set in "C/C ++" -- "Advanced" -- "Calling Convention ), the _ stdcall call Convention used by Windows APIs.
MSVC has two modification rules for c and c ++:
C language function name Modification Convention rules(Code block wrapped by extern "C)
1. The _ stdcall call Convention adds an underline prefix before the output function name, followed by the "@" symbol and the number of bytes of the parameter. The format is _ functionname @ number.
2. The _ cdecl call Convention only adds an underline prefix before the output function name in the format of _ functionname.
3. The _ fastcall call Convention adds a "@" symbol before the output function name, followed by a "@" symbol and the number of bytes of its parameter, in the format of @ functionname @ number.
They do not change the case sensitivity of the output function name. This is different from pascal's call conventions. pascal's output function names are not modified and all are capitalized.
Naming Conventions for c ++ language functions:
1. _ stdcall call conventions:
(1) Take "?" Start of the function name, followed by the function name;
(2) The function name starts with "@ yg" to identify the parameter table, followed by the parameter table;
(3) The parameter table is represented by code:
X -- void,
D -- char,
E -- unsigned char,
F -- short,
H -- int,
I -- unsigned int,
J -- long,
K -- unsigned long,
M -- float,
N -- double,
_ N -- bool,
....
Pa -- indicates the pointer. The code behind the pointer indicates the pointer type. If a pointer of the same type appears consecutively, it is replaced by "0". A "0" indicates a repetition;
(4) the first item of the parameter table is the type of the return value of the function, followed by the Data Type of the parameter, and the pointer ID is before the data type referred to by the function;
(5) "@ z" indicates the end of the entire name after the parameter table. If this function does not have a parameter, it ends with "z.
The format is "? Functionname @ yg ***** @ z "or"? Functionname @ yg * xz ", for example
Int test1 ----- "? Test1 @ yghpadk @ z"
Void test2 ----- "? Test2 @ ygxxz"
2. _ cdecl call conventions:
The rules are the same as the _ stdcall call Convention above, except that the start mark of the parameter table is changed from "@ yg" to "@ ya ".
3. _ fastcall call conventions:
The rules are the same as the _ stdcall call Convention above, but the start mark of the parameter table is changed from "@ yg" to "@ yi ".
Note: If the map file is output, you can view the name strings of each function and variable after modification in this file.
-------------------------------------------------------------------------
Function Signature)
Function signatures are used to identify different functions, including function names, parameter types and numbers, classes and namespaces, call conventions, and other information.
The symbolic modification and function signature rules of Visual C ++ are not made public, but Microsoft providesUnDecorateSymbolNameYou can convert the modified name to a function prototype.
Use extern "C" to force the C ++ compiler to use the C language rules for symbol modification.
extern "C" int g_nTest1;extern "C" int fun();#ifdef __cplusplusextern "C"{#endif int g_nTest2 = 0; int add(int a, int b); #ifdef __cplusplus}#endif
Weak symbols and strong symbols [wiki]
For C/C ++, the default function of the compiler and the initialized global variables are strong symbols, and uninitialized global variables are weak symbols.
GCC can use "_ attribute _ (weak)" to define any strong symbol as a weak symbol.
Extern int _ attribute _ (weak) ext; // modify the variable ext to a weak symbol int _ attribute _ (weak) fun1 (); // change function fun1 to a weak symbol int fun2 () _ attribute _ (weak); // modify function fun2 to a weak symbol int weak1; int strong = 1; int _ attribute _ (weak) weak2 = 2; // The forced variable weak2 is the weak symbol int main () {return 0 ;}
Above, weak1 and weak2 are weak symbols, and strong and main are strong symbols.
For the concept of strong and weak symbols, the linker processes and selects the global symbols that have been defined multiple times according to the following rules:
(1) The strong symbol cannot be defined multiple times. Otherwise, the linker reports an error of repeated symbol definitions.
(2) If a symbol is a strong symbol in a target file and is a weak symbol in other files, select a strong symbol.
(3) If a symbol is weak in all target files, select one of the most occupied space.
Weak references and strong references
A symbolic reference to an external target file must be correctly determined when the target file is eventually linked to an executable file. If the definition of the symbol is not found, the compiler reports an error with the symbol as the definition, this is called a strong reference;
There is also a weak reference corresponding to it. When dealing with weak references, even if the symbol is not defined, the linker will not report an error. The default value is 0 or a special value.
GCC can declare an external function reference as a weak reference through "_ attribute _ (weakref.
__attribute__ ((weakref)) void fun();int main(){ if (NULL != fun) { fun(); }}
This weak symbol and weak reference are very useful for the library. The weak symbol defined in the library can be overwritten by the user-defined strong symbol, so that the program can use the library function of the custom version;
Or the program can define a weak reference for some extension function modules. When we link the extension module with the program, the function module can be used normally;
If some functional modules are removed, the program can be properly linked, but the corresponding functions are missing, which makes the program functions easier to crop and combine.
# Include <stdio. h> # include <math. h> // declare the math system library function abs as the weak symbol int _ attribute _ (weak) abs (int ); // re-implement an abs function int abs (int a) {return 0;} int main (int argc, char * argv []) {int s = abs (int) -5); printf ("s = % d \ n", s); // s = 0 return 0 ;}
For the linker, the entire Link process is to combine multiple input target files into one executable binary file.
Modern linker is basically usedTwo-step LinkMethod:
(1) space and Address Allocation
Scan all input target files, obtain the length, attributes, and locations of each segment, and collect all symbol definitions and symbol references in the symbol table in the input target file, unified into a global symbol table.
In this step, the linker will be able to get the segment lengths of all input target files, and combine them to calculate the length and position of each segment in the output file after merging, and establish a ing relationship.
(2) symbol parsing and Relocation
Use all the information collected in the first step to read the data and Relocation information in the middle section of the input file (there is a Relocation Table ), in addition, it resolves and migrates symbols, and adjusts the addresses (external symbols) in the code.
Reference
Programmer's self-cultivation links, loading and libraries