Compile link load

Source: Internet
Author: User

This article is a summary of the group sharing, mainly on the source code---executable program, the implementation of this process. That is how the source code is translated into an executable program, and then the execution of the program is executed. In Java or Python, only Java clsname or Python a.py can be used to execute the corresponding program, in fact, they are based on the underlying virtual machine. This article focuses on operating system-level connections, loading, execution, and so on, rather than the execution of virtual machine languages. Here only to link, load to carry on a brief introduction, the detailed content recommends everybody to see "the Thorough understanding computer system" and "the programmer self-accomplishment", the second book is more detailed than the first, but slightly verbose, if only to understand the suggestion read the first book of the seventh chapter.

Let's take a look at two sample programs, and follow them for example:

[CPP]View Plaincopyprint?
    1. Foo.c
    2. #include <stdio.h>
    3. int a = 10;
    4. int b;
    5. void Bar (int c);
    6. Int
    7. Main () {
    8. Bar (a);
    9. printf ("...");
    10. }

[CPP]View Plaincopyprint?
    1. Bar.c
    2. void Bar (int c) {
    3. // ...  
    4. }

Usually C programs are composed of multiple modules, each module corresponding to a C file, will be compiled to connect to the target file, and then by the connector all the modules into an executable program. You can complete the compile action with the following command:

[Plain]View Plaincopyprint?
    1. > Gcc-c foo.c

After compiling, the current directory generates FOO.O, which is the corresponding connected target file. The fact that the source code is converted into a target file consists of several steps:

Preprocessing (CPP): Complete macro substitution, file ingestion, and removal of empty lines, annotations, etc. for lexical analysis preparation.

Compile (CC): The pre-processed code is compiled into assembly code, due to the addition of the assembler layer, the isolation of the different implementations of the underlying hardware, improved portability.

Assembly (AS): Converts assembly code into machine code, which is 01 sequence.

We know that a program consists of code and data that the target file must organize in some way so that the linker and the loader can identify the appropriate information from the file. Under Linux, the format of the target file is elf (executable linkable format), which can be used to describe the target file that can be linked to, and the target file box to be shared. Here's a look at what the main content of a linked target file contains. The destination file organizes the data in sections (section) and has a section Header table that describes all the sections. The main sections include:

. Data: Initialized global variables and static bureau variables. The global variable A in FOO.C is present in the. Data section.

. BSS: Uninitialized global variables and static local variables, this section is cleared 0 when loaded into memory, so the default value for uninitialized global and static local variables is 0. The global variable B in FOO.C exists. BSS section.

. Text: Compiled machine code. All the functions compiled by the binary code will exist in the. Text section, such as the main function.

. String: Used to store strings used in the destination file and string constants.

. symtab: Symbol table. The symbol is the global variable and function in the target file, and the symbol table describes all the symbols in the target file, which is the basis for linking the linker. The symbols are divided into:

Import symbol: The current module refers to symbols defined in other modules, such as the bar function defined in bar.c used in foo.c, then the FOO.O symbol table contains the import symbol bar.

Export symbol: is the symbol defined by the current module and can be referenced by other modules. These exported symbols are the initialized global variables and non-static functions defined in the module.

There are actually many sections in the target file, and here are just a few of the main sections.

After multiple C files are compiled into a linked target file, you need to link to build the executable file. The link is to resolve references and library calls to multiple modules, and then relocate them to generate the executable file. The most important part of the linking process is symbolic parsing, which is to find the definition of the import symbol in the module, and then replace the symbol with a pointer.

When linking, symbols can be divided into strong and weak symbols:

Strong sign: Is the initialization of global variables and non-static functions. For example, the global variable A and function main in foo.c and the function bar in bar.c.

Weak symbol: A global variable that is not initialized. For example, the global variable B in foo.c.

When linking, if you encounter strong symbols with the same name (for example, int a = 1 is defined in FOO.C and BAR.C), the error "duplicated symbols" is not clear. If a weak symbol with the same name is encountered, the behavior of the link depends on the implementation, which is no longer discussed in depth.

The linker needs to combine multiple linked target files into an executable target file that collects sections of the same type in each module and then forms the corresponding sections of the executable, such as collecting the. Data section of FOO.O and BAR.O, and then merging together the. Data section that makes up the executable file. The linker also needs to complete the relocation because when the section is merged, the address of the original module section changes, so repositioning is modifying the address of the pointer in the module.

After the connection is complete, the executable target file is generated on disk. To execute a program, you must load the executable target file into memory. We know that a process is a container for program execution, each running program has its own memory address space, and the data and code sections in the executable destination file need to be loaded into the process's address space. Let's look at the address space of the process:

Each process has its own private virtual memory address space, on the 32bit machine, the size of the address space is 4GB, high-address 1GB of memory space is mapped to kernel space, to provide kernel services. The user stack is the function call stack for implementing function calls and allocating space on the stack for local variables. Shared libraries are used to implement code and data similar to the C standard library. The heap is used for dynamic memory allocation. The rest of the data area and code area are related to the executable file and need to be loaded from disk.

The loader loads executable target files with the support of virtual memory, maps the. Data section and. Bass section in the executable destination file to the data area in the process address space, and maps the. Text section to the code area by using the Mmap file mapping method. Stacks and heaps use anonymous mappings for mmap, which means no file parameters are provided. Each occupied area in the address space is a VMA (Virtual Memory region), which is organized by linked lists and red-black trees, and the list is used to facilitate sequential traversal, and the red-black tree is designed to quickly retrieve the corresponding VMA based on the address. When we access memory through an address p, the OS will check the address legality, the first must ensure that P is contained in a VMA, and the second for each VMA has a read, write and execute permissions, the process must have the appropriate permissions to perform the operation. If the above two points are not met, a "Segment Fault" error is thrown.

After the mapping is completed, the next step is to execute the program, the first is to execute the _start function, which is a library function belonging to GLIBC, complete the initialization of the program, prepare for the operation of the program, and then call the main function, which is the symbol table to complete the main function entrance positioning. After mmap, the actual file is not loaded into memory, and the corresponding content is loaded from disk when first accessed, which is done by the virtual memory mechanism and is transparent to the process.

The element in the user stack is a function call corresponding to the stack frame, the current stack frame has the CPU register%ESP and%EBP identification,%ESP is the stack pointer to the top of the stack,%EBP is the frame pointer, the area between%esp and%EBP is the current function corresponding to the stack frame. The stack frame holds the argument of the function as a local variable.

The run-time heap is the area of dynamic memory allocation, and the pointer sbrk points to the top of the heap and can be allocated and freed by changing the SBRK for dynamic memory. The malloc and free layers in the C standard library are based on the SBRK pointer, and of course we can implement the memory allocator through SBRK, but we also design our own memory allocation algorithm, which is usually recommended for memory allocation using standard libraries.

The above is the whole content of the group to share, because completely naked, not fully prepared, think of where to say, so inevitably there will be flaws, forgive me ~ ~

Compile link load

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.