C compiler, linker, and loader

Source: Internet
Author: User
Document directory
  • 2.1 compile
  •  
  • 2.2 assembly
  • 2.3 target file (executable and linkable format)
  • 2.4 static and dynamic Databases
  • 3.1 link timing:
  • 3.2 functions of links (software reuse ):
  • 3.3 static library search path (managed by the static linker)
  • 3.4 dynamic library search path (responsible by the dynamic linker)
  • 3.5 static Link (during compilation)
  • 3.6 Dynamic Link (loading and running)
I. Overview

In the C language compilation link process, we need to compile and link a C program (source code) compiled by us to a program (executable code) that can be run on hardware. Compiling is the process of translating the source code in text form into a target file in machine language form. Links are the process of organizing the target file, the startup code of the operating system, and the library files used to generate the final executable and executable code.

The process is illustrated as follows:

  1. Preprocessor: converts a. c file to a. I file. The GCC command used is gcc-E, which corresponds to the pre-processing command CPP;
  2. Compiler: converts a. c/. h file into a. s file. The GCC command used is gcc-S, which corresponds to the compiling command CC-s;
  3. Assembler: converts a. s file to a. o file. The GCC command used is gcc-C, which corresponds to the as command;
  4. Linker: converts a. o file into an executable program. The GCC command used is GCC, which corresponds to the LD command;
  5. Loader: Load executable programs into memory and execute, loader and ld-linux.so.

 

Ii. compilation process

The compilation process can be divided into two phases: Compilation and compilation.

2.1 compile

Compile refers to the compiler reading the source program (NLP stream), analyzing the lexical and syntax, converting advanced language commands into functional equivalentAssembly Code.

The source file compilation process has two main stages:

The first stage is the preprocessing stage, which is carried out before the formal compilation stage. In the preprocessing phase, the content of the source file will be modified according to the pre-processing commands that have been placed in the file.

It mainly deals with the following aspects:

  1. Macro definition commandsFor example, for the pseudo command # define a B, all that is required for pre-compilation is to replace all A in the program with B, but a as a String constant is not replaced. In addition, # UNDEF will cancel the definition of a macro so that the appearance of the string will not be replaced in the future.
  2. Conditional compilation instructionsSuch as # ifdef, # ifndef, # else, # Elif, # endif. The introduction of these pseudo commands allows programmers to define different macros to determine which code the program will process. The pre-compiled program filters out unnecessary code based on the relevant files.
  3. Header file inclusion commandFor example, # include "FILENAME" or # include. This command adds all the definitions in the header file to the output file generated by the command for the Compilation Program to process it.
  4. Special symbolsPre-compiled programs can recognize some special symbols. For example, the line mark in the source program will be interpreted as the current line number (in decimal number), and the file will be interpreted as the name of the currently compiled C source program. The pre-compiled program replaces these strings with appropriate values in the source program.

The purpose of header files is to make some definitions available for multiple different C source programs. This involves the location of header files, that is, the search path.Header file search rulesAs follows:

  1. All header files are searched from-I.
  2. Find the environment variable c_include_path, cplus_include_path, and the path specified by objc_include_path.
  3. Find the default directory (/usr/include,/usr/local/include,/usr/lib/GCC-lib/i386-linux/2.95.2/include ......)

 

Stage 2Compilation and OptimizationCompile the program to perform lexical analysis and syntax analysis. After confirming that all commands comply with the syntax rules, translate them into equivalent intermediate code representation or assembly code.

2.2 assembly

Compilation actually refers to the process in which the assembler (AS) translates the assembly language code into the instructions of the target machine.. What is stored in the target file is the machine language code equivalent to the source program. The target file consists of segments. Generally, a target file contains at least two segments:

  • Code segment: This section mainly contains program instructions. This section is generally readable and executable, but generally cannot be written.
  • Data Segment: It mainly stores various global variables or static data used in the program. Generally, data segments are readable, writable, and executable.

 

2.3 target file (executable and linkable format)
  1. Relocatable files: Generated by the compiler and assembler. It can be combined with other relocated target files to create an executable or shared target file;
  2. Shared object: A special type of target file that can be relocated can be dynamically loaded to the memory or executed when the target file is added to the link (static shared library) or during loading or running (dynamic shared library;
  3. Executable File: A file generated by the linker that can be directly loaded into the memory by the loader and act as a process execution file.

 

2.4 static and dynamic Databases

Static library is a separate file formed by packaging the relevant target module. Use the AR command.

Static databases have the following advantages:

  • Programmers do not need to explicitly specify all target modules to be linked, because the specified module is a time-consuming and error-prone process;
  • During the connection, the connection program only copies the target module referenced by the program from the static library, which reduces the size of executable files on disk and memory.

Dynamic library is a special target module. It can be loaded to any memory address or linked to any program at runtime.

The dynamic library has the following advantages:

  • Updating the dynamic library does not require relinking. For large systems, relinking is a very time-consuming process;
  • It can be used by multiple programs. Only one portion of memory is required to save memory.

 

Iii. Link Process

The linker links the relevant target files to deliver them to a load-able and executable target file. The core work of the linker is to parse and relocate the symbol table.

3.1 link timing:
  1. During compilation, the source code is compiled into the machine code (the static linker is responsible );
  2. During loading, that is, when the program is loaded into the memory (the loader is responsible );
  3. During running, it is implemented by the application (the dynamic linker is responsible ).
3.2 functions of links (software reuse ):
  1. Make separate compilation possible;
  2. Dynamic binding: separates definitions, implementations, and usage.
3.3 static library search path (managed by the static linker)
  1. GCC first looks for it from-L;
  2. Find the search path specified by the Environment Variable LIBRARY_PATH;
  3. Find the default directory/lib/usr/local/lib, which was written in the program at the beginning of compile GCC.
3.4 dynamic library search path (responsible by the dynamic linker)
  1. The dynamic library search path specified when the target code is compiled-L;
  2. The dynamic library search path specified by the Environment Variable LD_LIBRARY_PATH;
  3. The dynamic library search path specified in the configuration file/etc/lD. So. conf;
  4. Default dynamic library search path/lib/usr/lib // usr/local/lib
3.5 static Link (during compilation)

The linker copies the code of the function from its location (in the target file or static Link Library) to the final executable program. In this way, when the program is executed, the code will be loaded into the virtual address space of the process. The static Link Library is actually a collection of target files. Each file contains the code of one or more related functions in the library.

To create an executable file, the linker must complete the following main tasks:

  1. Symbol Parsing: Associate the definition and reference of symbols in the target file;
  2. Relocation: Maps the symbol definition with the memory address, and then modifies all references to the symbol.

AboutSymbol table and symbol parsing and RelocationAnalysis and follow-up learning.

3.6 Dynamic Link (loading and running)

In this way, the function is defined in the target file of the dynamic link library or shared object.During the compilation link stage, the dynamic link library only provides the symbol table and a small amount of other information to ensure that all symbol references are defined and compilation passes smoothly. The dynamic linker (ld-linux.so) linking program dynamically loads the shared library according to the symbolic definition of the recorded shared object during running, and then completes the relocation.When the executable file is executed, all content of the dynamic link library will be mapped to the virtual address space of the corresponding process at runtime. The dynamic link program finds the corresponding function code based on the information recorded in the executable program.


Iv. Loading Process

The loader loads executable files from external storage to the memory for execution. The memory image for Linux Runtime is as follows:

 

The loading process is as follows:

The loader first creates a memory image as shown in, And then copies the target file to the memory data and code segment based on the segment header table. Then, the loader jumps to the program entry point (the symbolic _ start address) and executes the startup code (startup Code). The call sequence of the startup code is shown in:

 

5. common tools for processing Objectives

UNIX provides a series of tools to help you understand and process target files. The gnubinutils package also provides a lot of help. These tools include:

  • AR: creates static databases, inserts, deletes, lists, and extracts members;
  • Strings: list all printable strings in the target file;
  • Strip: deletes the symbol table information from the target file;
  • NM: list the symbols defined in the symbol table of the target file;
  • Size: list the names and sizes of segments in the target file;
  • Readelf: displays the complete structure of a target file, including all the encoded information in the elf header.
  • Objdump: displays all the information of the target file. The most useful function is the binary instruction in the disassembly. text section.
  • LDD: list the shared libraries required for running executable files.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.