Link-writing tutorials in C language _c language

Source: Internet
Author: User
Tags function definition function prototype int size modifier

Link
A link is the process of collecting and combining different parts of code and data into a single file that can be loaded or copied to memory execution.
Links can be executed and compiled (when the source code is translated into machine code), it can also be executed with the load time (when the program is loaded into the memory and executed by the loader), even when executed and run by the application. In modern systems, links are executed automatically by the linker.
The linker is divided into two types: static linker and dynamic linker.
STATIC linker
the static linker takes a set of relocatable target files and command-line arguments as input, generating a fully linked executable object file that can be loaded and run as output.

The static linker mainly accomplishes two tasks:
1> symbol Resolution: Target file definition and reference symbol. The purpose of symbolic parsing is to associate each symbolic reference with a symbolic definition.
2> Relocation: Compilers and assemblers generate code and data sections from address zeros. The linker redirects the sections by linking each symbol definition to a memory location and then modifying all references to those symbols so they can execute the storage location.

Target file:
The target file has three different forms:
1> relocatable Target files:
Contains binary code and data that can be compiled in a way that merges with other relocatable target files to create an executable target file.
2> executable target file:
Contains binary code and data in a form that can be copied directly to the memory and executed.
3> Shared destination file:
A special relocatable target file, which can be reloaded or run, is dynamically clamped into the memory and executed.
The compiler and assembler generate relocatable target files, including shared destination files, and the linker generates executable target files.

relocatable destination file:
the EF header starts with a 16-byte sequence that describes the word size and the system byte order in which it was generated. The remainder of the ELF header contains information about the Help linker parsing and interpreting the destination file. This includes the size of the elf head, the type of target file (for example, relocatable, executable, shared object file), machine type, section Header table file offset, And the size and quantity of the table in the section Head table. The position and size of the sections are described in the section Header table, where each section in the target file has a fixed-size table. The relocatable target file structure in ELF format is shown in the following illustration:

. Text: Machine code for compiled programs
. Rodata: Read-only data
. Data: Initialized global C variables
. BSS: Uninitialized global C variable. This section does not occupy the actual space in the destination file, only a placeholder.
. sysmtab: A symbol table that holds information about functions and global variables that are defined and referenced in the program.
. Rel.text: When the linker combines this target file with other files, many of the locations in the. Text section need to be modified. In general, any instruction that calls an external function or references a global variable needs to be modified. On the other hand, instructions that call local functions do not need to be modified.
. Rel.data: Information about any global variables that are defined or referenced by the module.
. Debug: A Debug Symbol table
. Line: The mapping between the row number in the original C source program and the machine instructions in the. Text section.
. strtab: A string table that contains the symbol tables in the. Symtab and. Debug sections, as well as the section names in the section header.

Symbols and Symbol tables
each relocatable target module m has a symbol table that contains information about the symbols defined and referenced by M. There are three different symbols in the linker context:
1> global symbols that are defined by M and can be referenced by other modules. The global linker symbol corresponds to a non-static C function and a global variable that is defined as a static property with no C.
2> a global symbol defined by another module and referenced by module M. These symbols become external symbols that correspond to C functions and variables defined in other modules.
3> is a local symbol defined and referenced by module M only. Some local symbol linker symbols correspond to C functions and global variables with static properties. These symbols are visible anywhere in module M, However, it cannot be referenced by other modules. Local symbols can also be obtained from the section of the target file corresponding to module m and the name of the corresponding source file.

The symbol table is constructed by the assembler, using the compiler to output symbols in assembly language. s files. The sysmab section contains an elf symbol table. This symbol table contains an array of table purposes. Table destination format is as follows:

typedef struct{
 int name;//string table offset
 int value;//section offset, or VM address
 int size;//object Size in bytes
 char type:4,//data, func, section, or src file
    binding:4;//local or global
 char reserved; Used
 char section;//section header index, ABS, UNDEF, or COMMON
}elf_symbol;

Symbolic parsing
the linker resolves a symbolic reference by linking each reference to a defined symbolic definition in the symbol table that it enters as a Relocatable object file.
For those references with references to local symbols defined in the same module, the symbolic parsing is very straightforward. The compiler allows only one definition per local symbol in each module. The compiler also ensures that static local variables, which have local linker symbols, have a unique name.
For reference resolution of global symbols, when the compiler encounters a symbol (variable or function name) that is not defined in the current module, it assumes that the symbol is defined in one of the other modules and generates a Linker symbol table table. and give it to the linker to process it. If the linker cannot find the referenced symbol in any of its input modules, it prints an error message and terminates.
At compile time, each global symbol that the compiler outputs is given to the assembler, or strong, or weak, and the assembler implicitly encodes this information in the symbol table of the relocatable target file. Functions and global variables initialized are strong symbols, and uninitialized global variables are weak symbols.
Depending on the strength of the symbol, there are the following rules:
1> not allowed to have multiple strong symbols
2> If you have a strong symbol and more than one weak symbol, select the strong symbol
3> If you have more than one weak symbol, select a weak symbol

Links to Static libraries
All compilation systems provide a mechanism for packaging all the relevant target modules into a single file called a static library, which can be used as a linker input. When the linker constructs an output executable, it copies only the target modules referenced by the application in the static library.
In UNIX systems, static libraries are stored on disk in a special file format called archiving. An archive file is a collection of linked relocatable target files with a header that describes the size and location of each member's target file.

How the linker uses static libraries to resolve references
in the symbol resolution phase, the linker scans the relocatable target and archive files in the same order that they appear on the compilation Driver command line, from left to right. In this scan, the linker position is a relocatable destination file set E, and the files in this collection are merged to form an executable file. and an unresolved collection of symbols U, and a symbolic binding defined in the previous input file D. e,u,d are empty at the beginning.
1> for each input file f on the command line, the linker determines whether F is a target file or an archive file. If it is a target file, then the linker adds F to E, modifies U and D to reflect the symbol definitions and references in F, and continues with the next input file.
2> If f is an archive file, then the linker tries to match the unsigned symbols in U that are defined by the archive file member. If an archive file member, M, defines a symbol to parse a reference in U, then add M to E, and the linker modifies U and D to reflect the symbol definitions and references in M. Repeat this process for all member target files in the archive file, knowing that neither U nor D will change. At this point, any member target files that are not included in e are discarded, and the linker continues to the next input file.
3> If you are non-null after the linker completes a scan of the input command line, the linker will output an error and terminate it. Otherwise, it merges the target files in the relocation e to build the output executable file.

This way, you have to take into account the location of the static and target files when you enter the command, the library file is placed behind the target file, and if there is a reference relationship between the library files, the referenced library is placed behind it.

Reposition
when the linker completes the symbolic parsing step, it links each symbolic reference in the code to a symbolic definition (that is, one of its input target modules). At this point, The linker knows the exact size of the code section and the data solution in its input target module. Then the relocation step begins. Relocation consists of two steps:
1> relocation section and symbol definition:
In this step, the linker merges all the same types of sections into a new aggregation section. The linker then assigns the Run-time memory address to the new aggregation section, assigns each section defined by the input module, and each symbol assigned to the input module definition. When this step is completed, Each instruction and global variable in the program is a unique run-time memory address.
2> the symbol reference in the relocation section:
In this step, the linker modifies the references to each symbol in the Code section and in the data section so that they point to the correct run-time address. To perform this step, the linker relies on the data structure in the target module, which is called the relocation table, to reposition.

Reposition Table:
When the assembler generates a target module, it does not know where the data and code will eventually reside in the memory. It also does not know the location of any externally defined functions or global variables referenced by this module. Therefore, whenever the assembler encounters a target reference that is unknown to the final position, it generates a relocation table. Tells the linker how to modify this reference when it merges the target file into an executable file. The relocation table for the code is placed in. rel.text the relocated table of the initialized data is placed in the rel.data.
The ELF Relocation table destination format is as follows:
typedef struct{
int offset; Offset of the reference to relocate
int symbol:24,//symbol the reference point to
Type:8; Relocation Type
} Elf32_rel;

The elf defines a different type of relocation in 11, where the two most basic types of relocation are: R_386_PC32 (reposition a reference with a 32PC-related address) and r_386_32 (reposition one with a 32-bit absolute address).

Dynamic linker
A shared library is a target module that, at runtime, can be loaded into any memory address and linked to a program in memory. This process is called dynamic linking, which is done by the dynamic linker.
Shared library sharing differs in two different ways. First, in any given file system, there is only one. So file for a library. All references to the Kurdish executable target file share the code and data in this. so file, Instead of being copied and embedded in executable files that reference them, like static Kurdish content. Second, in memory, a shared library's. Text section has only one copy that can be shared by different running processes.

Links to multiple destination files
STACK.C

#include <stdio.h> 
   
  #define STACKSIZE 1000 
   
  typedef struct STACK { 
    int data[stacksize]; 
    int top; 
  } stack; 
   
  stack s; 
  int count = 0; 
   
  void Pushstack (int d) 
  { 
    S.data[s.top + +] = D; 
    Count + +; 
  } 
   
  int Popstack () 
  {return 
    s.data[--s.top]; 
  } 
   
  int IsEmpty () 
  {return 
    s.top = = 0; 
  } 


Link.c

  #include <stdio.h> 
   
  int A, b; 
   
  int main () 
  { 
    a = b = 1; 
   
    Pushstack (a); 
    Pushstack (b); 
    Pushstack (a); 
   
    while (! IsEmpty ()) { 
      printf ("%d\n", Popstack ()); 
    } 
     
    return 0; 
  } 


How to compile:

Gcc-wall stack.c Link.c-o Main

Prompt for error message as follows:

But the code can be executed.

Definitions and declarations

static and extern modifier functions
The reason for this compilation error is that the compiler did not find the function prototype when it was processing the function call code, and had to make an implicit declaration based on the function call code, declaring the three functions as:

  int pushstack (int); 
  int popstack (void); 
  int isempty (void); 


Compilers often do not know where to look for function definitions, as in the example above, I have the compiler compile MAIN.C, and these functions are defined in STACK.C, the compiler is not aware, so you can use extern declarations. Modify LINK.C as follows:

  #include <stdio.h> 
   
  int A, b; 
   
  extern void Pushstack (int d); 
  extern int popstack (void); 
  extern int isempty (void); 
   
  int main () 
  { 
    a = b = 1; 
   
    Pushstack (a); 
    Pushstack (b); 
    Pushstack (a); 
   
    while (! IsEmpty ()) { 
      printf ("%d\n", Popstack ()); 
    } 
     
    return 0; 
  } 


So the compiler will not call the police. Here the extern keyword indicates that this identifier has external linkage.pushstack this identifier has external Linkage refers to the same function, if LINK.C and STACK.C are linked together, and if Pushstack are declared in Link.c and STACK.C (the declarations in STACK.C are also defined), then these declarations refer to the same functions. The link is followed by the same global symbol, representing the same address. extern in a function declaration can be omitted without writing, and the function declaration of disdain for extern also indicates that the function has external linkage.

If a function declaration is decorated with the static keyword, it means that the identifier has internal linkage, for example, the following two program files:

  /* FOO.C/ 
   
  static void foo (void) {} 


  /*main.c*/ 
   
  void foo (void); 
   
  int main (void) {foo (); return 0;} 


Compiling links can be an error, because:

Although function foo is defined in foo.c, this function is a static property and has only internal linkage. If the FOO.C is compiled into the target file, the function name foo is a local symbol, does not participate in the link process, so in the link, main.c used a external linkage foo function, the linker can not find its definition where, cannot determine its address, Can not do symbolic analysis, have to complain.

A variable or function that has been declared multiple times must have and only one declaration is defined, and if there are multiple definitions, or if a definition is not available, the linker cannot complete the link


Static and extern modifier variables
if I want to access the int variable count that is defined in STACK.C in LINK.C, you can use the extern declaration

 #include <stdio.h> 
   
  int A, b; 
   
  extern void Pushstack (int d); 
  extern int popstack (void); 
  extern int isempty (void); 
  extern int count; 
   
  int main () 
  { 
    a = b = 1; 
   
    Pushstack (a); 
    Pushstack (b); 
    Pushstack (a); 
   
    printf ("%d\n", count); 
   
    while (! IsEmpty ()) { 
      printf ("%d\n", Popstack ()); 
    } 
     
    return 0; 
  } 


Variable count has external linkage, and its storage space is allocated in STACK.C, so the variable in link.c declares extern int count; it is not a variable definition because it does not allocate storage space.

If you do not want to have external access to count outside of STACK.C, you can declare count as internal with the Static keyword linkage
Difference
Variable life and function declarations are a little different, the function declaration of the extern can not write, and the variable declaration if not to write extern, the meaning is completely changed. If the above example does not write extern, it means to define a global variable count in the main function.

Declaring functions and keywords with internal linkage with the Static keyword is the purpose of protecting the internal state, as well as the idea of encapsulation (encapsulation). In a module, some functions are provided for external use, also known as export to the outside world, these functions are declared as external linkage.


header File
in order to prevent each function extern declaration, for example, another foo.c also use functions such as pushstack, and the need to write multiple extern declarations in foo.c, in order to avoid this repetitive cumbersome operation, you can define a stack.h header file:

 #ifndef stack_h 
  #define STACK_H 
   
  #define STACKSIZE 1000 
   
  typedef struct STACK { 
    int data[stacksize]; 
    int top; 
  } stack; 
   
  extern void Pushstack (int d); 
  extern int popstack (void); 
  extern int isempty (void); 
   
  #endif 


In this way, it is only necessary to include the header file in the LINK.C, without having to write three function declarations:

 #include <stdio.h> 
  #include "stack.h" 
   
  int A, b; 
   
  extern int count; 
   
  int main () 
  { 
    a = b = 1; 
   
    Pushstack (a); 
    Pushstack (b); 
    Pushstack (a); 
   
    printf ("%d\n", count); 
   
    while (! IsEmpty ()) { 
      printf ("%d\n", Popstack ()); 
    } 
     
    return 0; 
  } 


Why #include <stdio.h> use angle brackets, and #include "stack.h" in quotes? Reason:

    • For header files included with angle brackets, GCC first finds the directory specified by the-I option, and then looks for the system's header file directory (usually/usr/include)
    • For header files included with, GCC first finds the directory containing the. c file that contains the header file, finds the directory specified by the-I option, and then finds the system's header file directory


With the #ifndef #define #endif是为了防止头文件的重复包含, the header file repeats the following questions:

    • Slows down preprocessing and handles many header files that you don't need to work with
    • If the a.h contains B.h, then the b.h contains the A.H, the preprocessing goes into a dead loop.
    • Header files are not allowed to recur in some code


The variables and function declarations in the header file must not be defined. If a variable or function definition appears in the header file, and the header file is included with multiple. c files, then these. c files cannot be linked together

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.