c compiler, linker, loader detailed

Source: Internet
Author: User

excerpted from http://blog.csdn.net/zzxian/article/details/16820035c compiler, linker, loader detailed overview
C language Compiler link process to convert a C program we write (source code) to a program that can run on hardware (executable code), need to compile and link. Compiling is the process of translating text form source code into a target file in the form of machine language. A link is the process of organizing the target file, the operating system's startup code, and the library file used to form the final loadable, executable code.

The process plots are as follows:

    1. Preprocessor: Convert. c files to. i files, using the GCC command is: gcc–e, corresponding to pre-processing command CPP;
    2. Compiler: Convert. c/.h files to. S files, using the GCC command: gcc–s, corresponding to the compile command cc–s;
    3. Assembler: Converting. s files to. o files, using the GCC command is: gcc–c, which corresponds to the assembly command is as;
    4. Linker: Converts the. o file into an executable program, using the GCC command: GCC, which corresponds to the link command is LD;
    5. Loader: Loads executable programs into memory and executes them, loader and ld-linux.so.

Second, the compilation process

The compilation process can be divided into two phases: compilation and compilation.

2.1 Compiling

Compiling refers to the compiler reading the source program (character stream), the lexical and syntactic analysis, the high-level language instruction to functional equivalent assembly code.

The compilation process for a source file consists of two main phases:

The first phase is the preprocessing phase, which precedes the formal compilation phase. The preprocessing phase modifies the contents of the source file based on the preprocessing directives that have been placed in the file.

Mainly in the following aspects of treatment:

    1. Macro definition directives, such as #define a B for this pseudo-directive, the precompilation is to replace all a in the program with B, but a as a string constant is not replaced. Also #undef, the definition of a macro is canceled, so that subsequent occurrences of the string are no longer replaced.
    2. Conditional compilation directives, such as #ifdef, #ifndef, #else, #elif, #endif等. The introduction of these pseudo-directives allows programmers to define different macros to determine which code is processed by the compiler. The precompiled program will filter out unnecessary code according to the relevant files.
    3. Header files contain directives, such as # include ' FileName ' or # include '. The directive adds all the definitions in the header file to the output file it produces for processing by the compiler.
    4. Special symbols, the precompiled program can recognize some special symbols. For example, the line identifier that appears in the source program is interpreted as the current row number (decimal number), and file is interpreted as the name of the currently compiled C source program. The precompiled program will replace the strings that appear in the source program with the appropriate values.

The purpose of the header file is primarily to make certain definitions available to several different C source programs, which involves locating the header file as a search path problem. The header file search rules are as follows:

    1. The search for all header file will start with-I
    2. Then find the environment variable c_include_path,cplus_include_path,objc_include_path the specified path
    3. Find the default directory (/usr/include,/usr/local/include,/usr/lib/gcc-lib/i386-linux/2.95.2/include ...)

In the second phase of the compilation and optimization phase, the compiler has to work through lexical analysis and parsing, after confirming that all instructions conform to grammatical rules, translate them into equivalent intermediate code representations or assembly code.

2.2 Assembly

The assembler actually refers to the process by which the assembler (as) translates assembly language code into a target machine instruction. A machine language code that is stored in the target file, which is the target equivalent to the source program. The destination file consists of segments. Typically there are at least two segments in a target file:

    • Code snippet: This paragraph contains mainly the instructions of the program. The paragraph is generally readable and executable, but is generally not writable.
    • Data segment: A variety of global variables or static data to be used in the main storage program. General data segments are readable, writable, and executable.

2.3 Destination file (executable and linkable Format)
    1. Relocatable (relocatable) files: generated by compilers and assemblers that can be merged with other relocatable target files to create an executable or shared target file;
    2. Shared target file: A special kind of relocatable target file that can be added to the target file at the time of link (static shared library) or loaded or run (Dynamic shared library) is dynamically loaded into memory and executed;
    3. Executable (executable) file: generated by the linker and can be loaded directly into memory by the loader to act as a file for process execution.

2.4 Static libraries and dynamic libraries

A static library is a separate file that is packaged to form a related target module. Use the AR command.

The advantages of a static library are:

    • Programmers do not need to explicitly specify all the target modules that need to be linked, because specifying is a time-consuming and error-prone process;
    • When linking, the linker copies only the target modules referenced by the program from the Static library, reducing the size of the executable file on disk and in memory.

Dynamic library is a special target module that can be loaded into any memory address at run time, or linked to any program.

The advantages of a dynamic library are:

    • Update the dynamic library without re-linking; for large systems, relink is a time-consuming process;
    • It can be used by multiple programs in the run, only one copy is needed in memory, saving memory.

Third, the link process

The linker primarily connects the relevant target files to each other to generate a loadable, executable target file. The core work of the linker is symbolic table parsing and relocation.

3.1 Timing of the link:
    1. At compile time, the source code is compiled into machine code (static linker is responsible for);
    2. Load, that is, when the program is loaded into memory (loader responsible);
    3. runtime, implemented by the application (Dynamic linker is responsible for).
3.2 Effects of the link (software reuse):
    1. Makes the separation compile possible;
    2. Dynamic binding (binding): Separating the definition, implementation, use
3.3 Static Library search path (owned by static linker)
    1. gcc is first searched from-l;
    2. Then find the environment variable library_path the specified search path;
    3. Find the default directory/lib/usr/lib/usr/local/lib This is the original compile GCC when written in the program.
3.4 Dynamic Library Search path (owned by dynamic Linker)
    1. The dynamic library search path specified when compiling the target code-l;
    2. environment variable Ld_library_path the specified dynamic library search path;
    3. The dynamic library search path specified in the configuration file/etc/ld.so.conf;
    4. Default dynamic Library search path/lib/usr/lib//usr/local/lib
3.5 Static links (compile-time)

The linker copies the code of the function from its location (the destination file or the static link library) to the final executable program. The code is then loaded into the virtual address space of the process when it is executed. A static link library is actually a collection of target files in which each file contains code for one or a set of related functions in the library.

To create an executable file, the linker must complete the main tasks:

    1. Symbolic parsing: Linking the definition and reference of symbols in the target file;
    2. Reposition: Match the symbol definition to the memory address, and then modify all references to the symbol.

Analysis of symbol table and symbol parsing and relocation follow-up learning.

3.6 Dynamic Link (load, run time)

In this way, the function is defined in the dynamic-link library or in the target file of the shared object. During the linking phase of the compilation, the dynamic link library provides only symbolic tables and other small amounts of information to ensure that all symbolic references are defined, ensuring that the compilation passes smoothly. The dynamic Linker (LD-LINUX.SO) linker dynamically loads the shared library, and then completes the relocation, based on the symbolic definition of the recorded shared object during the run. When the executable is executed, the entire contents of the dynamic-link library are mapped to the virtual address space of the corresponding process at run time. The dynamic linker will find the appropriate function code based on the information recorded in the executable program.

Iv. Loading process

The loader loads the executable files from the external memory into memory and executes them. The memory image of the process runtime in Linux is as follows:

The loading process is as follows:

The loader first creates a memory image as shown, and then copies the target file to the memory data and code snippet based on the Segment Header table. The loader then jumps to the program entry point (that is, the address of the symbol _start), executes the startup code, and the start code is called in the order shown:

V. Common tools for dealing with goals

UNIX systems provide a range of tools to help understand and manipulate target files. The Gnubinutils package also offers a lot of help. These tools include:

    • AR: Create a static library, insert, delete, list, and extract members;
    • STRINGS: Lists all the strings that can be printed in the target file;
    • STRIP: Remove the symbol table information from the target file;
    • NM: Lists the symbols defined in the target file symbol table;
    • Size: Lists the name and size of the section in the target file;
    • Readelf: Displays the complete structure of a target file, including all information encoded in the ELF header.
    • OBJDUMP: Displays all information about the destination file, and the most useful feature is disassembly. The binary instruction in the text section.
    • LDD: Lists the shared libraries that the executable requires at run time.

c compiler, linker, loader detailed

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.