Program compilation link process, program compilation link Process

Source: Internet
Author: User

Program compilation link process, program compilation link Process

Let's start with HelloWorld...

#include <stdio.h>int main(int argc, char* argv[]){    printf("Hello World!\n");    return 0;}

Follow these steps to connect the source file hello.cppto hello.exe:

Run the following command to generate an executable file from the source file:

Linux:

Gcc-lstdc ++ Hello. cpp-o Hello. out // The lstdc parameter must be included; otherwise, the undefined reference to '_ gxx_personality_v0' error will be reported. g ++ Hello. cpp-o Hello. out

Note: The suffix is. c file gcc treats it as c code, while g ++ treats it as c ++ code. Both gcc and g ++ are callers, and the final called compiler is ), cc1plus (c ++ c code ).
In addition, gcc does not automatically link to the c ++ standard library during the link stage. You need to include the-lstdc ++ parameter to connect.

Windows:

cl Hello.cpp /link -out:Hello.exe

 

Preprocessing: It is mainly used to replace the code text. (This replacement is a recursive layer-by-layer expansion process .)

(1) Delete all # define and expand all macro definitions.

(2) process all conditional pre-compilation commands, such as: # if # ifdef # elif # else # endif

(3) Processing # include pre-compiled commands, which Insert the contained files into the positions of the commands. This process is recursive.

(4) Delete all comments // and /**/

(5) Add the line number and the name of the file to generate the line number information for debugging and display the line number during compilation errors or warnings.

(6) Keep all # pragma compiler instructions because the compiler needs to use them

Linux:

cpp Hello.cpp > Hello.igcc -E Hello.cpp -o Hello.ig++ -E Hello.cpp -o Hello.i

Line number and file name identifier explanation:

#32 "/usr/include/bits/types. h "2 3 4 // indicates the following behavior types. row 3 of h: typedef unsigned char _ u_char; typedef unsigned short int _ u_short; typedef unsigned int _ u_int; typedef unsigned long int _ u_long;

Above, # The meaning of the Number 2 3 4 at the end of the row:

1-open a new file
2-return the previous file
3-The following code comes from the System File
4-The following code is implicitly wrapped in extern "C"

Do not generate line numbers and file name identifiers:

cpp -P Hello.cpp > Hello.igcc -E -P Hello.cpp -o Hello.ig++ -E -P Hello.cpp -o Hello.i

Windows:

cl /E Hello.cpp > Hello.i

Line number and file name identifier explanation:

# Line 283 "C :\\ Program Files \ Microsoft Visual Studio \ VC98 \ include \ stdio. h "// indicates the following behavior stdio. line 3 of h void _ cdecl clearerr (FILE *); int _ cdecl fclose (FILE *); int _ cdecl _ fcloseall (void );

Do not generate line numbers and file name identifiers:

cl /EP Hello.cpp > Hello.i

 

Compile: Perform a series of pre-processed filesLexical Analysis(Lex ),Syntax analysis(Yacc ),Semantic AnalysisAndOptimizationThe assembly code is generated later. This process is the core part of the program construction.

Linux:

/usr/lib/gcc/i586-suse-linux/4.1.2/cc1 Hello.cpp

The "Hello. s" file generated by using is as follows (Because Hello. cpp does not have the c ++ feature, you can also compile it using the c language compiler ):

    .file    "Hello.cpp"    .section    .rodata.LC0:    .string    "Hello World!"    .text.globl main    .type    main, @functionmain:    leal    4(%esp), %ecx    andl    $-16, %esp    pushl    -4(%ecx)    pushl    %ebp    movl    %esp, %ebp    pushl    %ecx    subl    $4, %esp    movl    $.LC0, (%esp)    call    puts    movl    $0, %eax    addl    $4, %esp    popl    %ecx    popl    %ebp    leal    -4(%ecx), %esp    ret    .size    main, .-main    .ident    "GCC: (GNU) 4.1.2 20070115 (prerelease) (SUSE Linux)"    .section    .note.GNU-stack,"",@progbits

For cpp files with c ++ features, use cc1plus for compilation, or use the gcc command for compilation (it will use the extension name to choose to call cc1plus or cc1plus)

/usr/lib/gcc/i586-suse-linux/4.1.2/cc1plus Hello.cppgcc -S Hello.cpp -o Hello.sg++ -S Hello.cpp -o Hello.s

Windows:

cl /FA Hello.cpp Hello.asm

The Hello. asm file generated by vc6 is as follows:

    TITLE    Hello.cpp    .386Pinclude listing.incif @Version gt 510.model FLATelse_TEXT    SEGMENT PARA USE32 PUBLIC 'CODE'_TEXT    ENDS_DATA    SEGMENT DWORD USE32 PUBLIC 'DATA'_DATA    ENDSCONST    SEGMENT DWORD USE32 PUBLIC 'CONST'CONST    ENDS_BSS    SEGMENT DWORD USE32 PUBLIC 'BSS'_BSS    ENDS_TLS    SEGMENT DWORD USE32 PUBLIC 'TLS'_TLS    ENDSFLAT    GROUP _DATA, CONST, _BSS    ASSUME    CS: FLAT, DS: FLAT, SS: FLATendifPUBLIC    _mainEXTRN    _printf:NEAR_DATA    SEGMENT$SG579    DB    'Hello World!', 0aH, 00H_DATA    ENDS_TEXT    SEGMENT_main    PROC NEAR; File Hello.cpp; Line 7    push    ebp    mov    ebp, esp; Line 8    push    OFFSET FLAT:$SG579    call    _printf    add    esp, 4; Line 9    xor    eax, eax; Line 10    pop    ebp    ret    0_main    ENDP_TEXT    ENDSEND

 

Assembly: Assembly code-> machine commands.

Linux:

as Hello.s -o Hello.ogcc -c Hello.cpp -o Hello.og++ -c Hello.cpp -o Hello.o

Windows:

cl /c Hello.cpp > Hello.obj

At this point, the generated target file is similar to the final executable file in structure.

 

Link: The link mentioned here should be called static link strictly. Multiple Target files, libraries-> final executable files (the splicing process ).

Executable File category:

Linux ELF files-bin, a, and so

Windows PE files-exe, lib, dll

Note: Both the PE file and the ELF file are COFF file variants.

Linux:

ld -static /usr/lib/crt1.o /usr/lib/crti.o /usr/lib/gcc/i586-suse-linux/4.1.2/crtbeginT.o -L/usr/lib/gcc/i586-suse-linux/4.1.2/ -L/usr/lib -L/lib Hello.o --start-group -lgcc -lgcc_eh -lc --end-group /usr/lib/gcc/i586-suse-linux/4.1.2/crtend.o /usr/lib/crtn.o -o Hello.out

Note:-static: force all-l options to use static links;-L: link the search path of the external static library and dynamic library;

-L: Specifies the name of the static library (the name of the last library is libgcc. a, libgcc_eh.a, and libc. );

-- Start-group... -- end-group: The content can only be a file name or a-l option. To ensure that the symbols in the content item can be parsed, the linker cyclically searches for all content items.

This method has performance overhead. It is best to use it when two or more content items have circular references.

Windows:

link /subsystem:console /out:Hello.exe Hello.obj

Static LibraryEssentially, it is a compressed package containing a bunch of intermediate target files. Just like a zip file, the external symbol addresses contained in each intermediate file are not corrected by the linker.

View static library content

Linux:

ar -t libc.a

Windows:

lib /list libcmt.lib

Decompress the content in the static library

Linux: [decompress all the o files in libc. a to the current directory]

ar -x /usr/lib/libc.a

Windows: [decompress atof. obj in libcmt. lib to the current directory]

lib libcmt.lib /extract:build\intel\mt_obj\atof.obj

Generate static library

Linux:

ar -rf test.a main.o fun.o

Windows:

lib /out:test.lib main.obj fun.obj

 

Symbol-interface of the link

Each function or variable has its own unique name to avoid confusion between different variables and functions during the link process.

In the link, functions and variables are collectively referred to as symbols. function names or variable names are symbol names, and function or variable addresses are symbol values.

Each target file has a symbol table with the following symbols:

(1) Global symbols defined in the current target file, which can be referenced by other target files

For example, global variables and global functions

(2) The Global symbols referenced in this target file are not defined in this target file-External symbols)

Such as the extern variable, printf, and other library functions.

(3) segment name. This symbol is generated by the compiler and its value is the starting address of the segment.

Such as. text and. data of the target file.

(4) Local symbols, which are visible inside

For example, static variables

During the link process, the first and second categories are concerned.

View symbols

Linux:

nm Hello.oreadelf -s Hello.oobjdump -t Hello.obj

You can install MinGW on windows to obtain these tools.

Windows:

dumpbin /symbols Hello.obj 

Name Decoration)

Symbol modification is actually the process of renaming a variable or function. The factors that affect the name include:

(1) There are differences in modification rules for different languages

For example, the foo function is modified to _ foo in C and to _ foo _ in Fortran _

(2) features introduced by object-oriented languages (such as C ++)

Such as class, inheritance, virtual mechanism, overload, namespace, etc.

----------------------------- MSVC compiler -----------------------------

The MSVC compiler uses the _ cdecl call Convention by default (set in "C/C ++" -- "Advanced" -- "Calling Convention ), the _ stdcall call Convention used by Windows APIs.

MSVC has two modification rules for c and c ++:

C language function name Modification Convention rules(Code block wrapped by extern "C)
1. The _ stdcall call Convention adds an underline prefix before the output function name, followed by the "@" symbol and the number of bytes of the parameter. The format is _ functionname @ number.

2. The _ cdecl call Convention only adds an underline prefix before the output function name in the format of _ functionname.

3. The _ fastcall call Convention adds a "@" symbol before the output function name, followed by a "@" symbol and the number of bytes of its parameter, in the format of @ functionname @ number.

They do not change the case sensitivity of the output function name. This is different from pascal's call conventions. pascal's output function names are not modified and all are capitalized.

Naming Conventions for c ++ language functions:
1. _ stdcall call conventions:
(1) Take "?" Start of the function name, followed by the function name;
(2) The function name starts with "@ yg" to identify the parameter table, followed by the parameter table;
(3) The parameter table is represented by code:
X -- void,
D -- char,
E -- unsigned char,
F -- short,
H -- int,
I -- unsigned int,
J -- long,
K -- unsigned long,
M -- float,
N -- double,
_ N -- bool,
....
Pa -- indicates the pointer. The code behind the pointer indicates the pointer type. If a pointer of the same type appears consecutively, it is replaced by "0". A "0" indicates a repetition;
(4) the first item of the parameter table is the type of the return value of the function, followed by the Data Type of the parameter, and the pointer ID is before the data type referred to by the function;
(5) "@ z" indicates the end of the entire name after the parameter table. If this function does not have a parameter, it ends with "z.
The format is "? Functionname @ yg ***** @ z "or"? Functionname @ yg * xz ", for example
Int test1 ----- "? Test1 @ yghpadk @ z"
Void test2 ----- "? Test2 @ ygxxz"

2. _ cdecl call conventions:
The rules are the same as the _ stdcall call Convention above, except that the start mark of the parameter table is changed from "@ yg" to "@ ya ".

3. _ fastcall call conventions:
The rules are the same as the _ stdcall call Convention above, but the start mark of the parameter table is changed from "@ yg" to "@ yi ".

Note: If the map file is output, you can view the name strings of each function and variable after modification in this file.

-------------------------------------------------------------------------

Function Signature)

Function signatures are used to identify different functions, including function names, parameter types and numbers, classes and namespaces, call conventions, and other information.

The symbolic modification and function signature rules of Visual C ++ are not made public, but Microsoft providesUnDecorateSymbolNameYou can convert the modified name to a function prototype.

 

Use extern "C" to force the C ++ compiler to use the C language rules for symbol modification.

extern "C" int g_nTest1;extern "C" int fun();#ifdef __cplusplusextern "C"{#endif    int g_nTest2 = 0;    int add(int a, int b); #ifdef __cplusplus}#endif

 

Weak symbols and strong symbols [wiki]

For C/C ++, the default function of the compiler and the initialized global variables are strong symbols, and uninitialized global variables are weak symbols.

GCC can use "_ attribute _ (weak)" to define any strong symbol as a weak symbol.

Extern int _ attribute _ (weak) ext; // modify the variable ext to a weak symbol int _ attribute _ (weak) fun1 (); // change function fun1 to a weak symbol int fun2 () _ attribute _ (weak); // modify function fun2 to a weak symbol int weak1; int strong = 1; int _ attribute _ (weak) weak2 = 2; // The forced variable weak2 is the weak symbol int main () {return 0 ;}

Above, weak1 and weak2 are weak symbols, and strong and main are strong symbols.

For the concept of strong and weak symbols, the linker processes and selects the global symbols that have been defined multiple times according to the following rules:

(1) The strong symbol cannot be defined multiple times. Otherwise, the linker reports an error of repeated symbol definitions.

(2) If a symbol is a strong symbol in a target file and is a weak symbol in other files, select a strong symbol.

(3) If a symbol is weak in all target files, select one of the most occupied space.

 

Weak references and strong references

A symbolic reference to an external target file must be correctly determined when the target file is eventually linked to an executable file. If the definition of the symbol is not found, the compiler reports an error with the symbol as the definition, this is called a strong reference;

There is also a weak reference corresponding to it. When dealing with weak references, even if the symbol is not defined, the linker will not report an error. The default value is 0 or a special value.

GCC can declare an external function reference as a weak reference through "_ attribute _ (weakref.

__attribute__ ((weakref)) void fun();int main(){    if (NULL != fun)    {        fun();    }}

 

This weak symbol and weak reference are very useful for the library. The weak symbol defined in the library can be overwritten by the user-defined strong symbol, so that the program can use the library function of the custom version;

Or the program can define a weak reference for some extension function modules. When we link the extension module with the program, the function module can be used normally;

If some functional modules are removed, the program can be properly linked, but the corresponding functions are missing, which makes the program functions easier to crop and combine.

# Include <stdio. h> # include <math. h> // declare the math system library function abs as the weak symbol int _ attribute _ (weak) abs (int ); // re-implement an abs function int abs (int a) {return 0;} int main (int argc, char * argv []) {int s = abs (int) -5); printf ("s = % d \ n", s); // s = 0 return 0 ;}

 

For the linker, the entire Link process is to combine multiple input target files into one executable binary file.

Modern linker is basically usedTwo-step LinkMethod:

(1) space and Address Allocation

Scan all input target files, obtain the length, attributes, and locations of each segment, and collect all symbol definitions and symbol references in the symbol table in the input target file, unified into a global symbol table.

In this step, the linker will be able to get the segment lengths of all input target files, and combine them to calculate the length and position of each segment in the output file after merging, and establish a ing relationship.

(2) symbol parsing and Relocation

Use all the information collected in the first step to read the data and Relocation information in the middle section of the input file (there is a Relocation Table ), in addition, it resolves and migrates symbols, and adjusts the addresses (external symbols) in the code.

 

Reference

Programmer's self-cultivation links, loading and libraries

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.