1. Symbol classification
(1) Global symbols: Non-static global variables, non-static functions
(2) External symbols: global variables and functions that are defined in other modules and referenced by this module
(3) Local symbols: Static variables (both global and local), static functions
For static local variables, the compiler generates a unique name for it. such as X.fun1,x.fun2. Local symbols are not visible to the linker.
2. Symbolic resolution
when the compiler encounters a symbol that is not defined by this module, it assumes that the function is defined by another module and generates a linker symbol table entry to be processed by the linker. If the linker does not find the symbol in any of its input modules, it will give a link error similar to the undefined reference to ' xxx '. If the linker finds more than one external symbol definition in the input module, this time requires the linker to sign the resolution, and the linker may not complain or even warn about multiple external symbol definitions, but instead select one of the symbolic definitions according to its rules.
The linker classifies the global symbols that each module outputs into strong and weak symbols:
(1) Strong symbol: function and initialized global variable
(2) Weak sign: For initializing global variables
Depending on the definition of the strong or weak symbol, the linker handles multiple-defined symbols according to the following rules:
Rule 1: Multiple strong-symbol definitions are not allowed
Rule 2: If you have a strong symbol and more than one weak symbol, select the strong symbol
Rule 3: If you have more than one weak symbol, select the sizeof from these weak symbols, and if the size is the same, select the first link
The rules above are the root cause of many link errors, because the compiler may silently make decisions for you in the resolution that you do not know about. According to the above rules, you can draw the following classic examples:
Example 1:
In lib1.c
int x;
void f ()
{
x = 1235;
}
In main1.c
#include <stdio.h>
void f (void);
int x = 1234;
int main (void)
{
f ();
printf ("x=%d\n", x);
return 0;
}
In the code above, the main function printf output: x=1235. Because the linker is defined by the rule 2 symbol x as a strong symbolic definition in main.c, and LIB.C's author does not know it, his use and modification of x affects main.c. This interaction will be complicated because everyone thinks they're doing the right thing, using the right variable. and the entire resolution process, the linker silently completed.
Example 2:
In lib2.c
double x;
void f ()
{
x = -0.0;
}
In main2.c
#include <stdio.h>
void f (void);
int x = 1234;
int y = 1235;
int main ()
{
f ();
printf ("x=0x%x y=0x%x \ n", x, y);
return 0;
}
In this case, the program gets output: x=0x0 y=0x80000000, and the linker (GCC LD) finally gives a warning:
Copy Code code as follows:
ld:warning:tentative definition of ' _x ' with size 8 from ' obj/debug/lib2.o ' are being replaced by real definition of of smal Ler size 4 from ' obj/debug/main2.o '
The linker resolution is a symbolic address, and the adjacent global variable may be adjacent to the memory address in the. Data section, thus causing a more complex problem. This is similar to stack overflow, but is more complex than stack overflows because the problem is between multiple modules, not within a function.
Example 3:
In lib3.c
struct
{
int A;
int b;
} x;
void f ()
{
x.a = 123;
x.b = 456;
printf ("in F (): sizeof (x) =%d, (&x) =0x%08x\n", sizeof (x), &x);
In main3.c
#include <stdio.h>
void f (void);
int x;
int y;
int main ()
{
f ();
printf ("in Main" (): sizeof (x) =%d, (&x) =0x%08x, (&x) =0x%08x, x=%d,y=%d \ n ", sizeof (x), &x, &y, x, y);
return 0;
}
Program output:
In F (): sizeof (x) =8, (&x) =0x02489018 in
Main (): sizeof (x) =4, (&x) =0x02489018, (&y) =0x02489020, x=123, Y=0
Always remember that an external symbol resolution is an address, so the symbol x address is unique, regardless of how many times it has been defined, in both lib3.c and main3.c. Second, sizeof is compiler resolution, unrelated to the link, the compiler can only see the definition or declaration of this module. Finally, because the symbol x resolution to the X in lib3.c, the size is 8, so the address of Y in main3.c is 8 larger than X, which is LIB3.O and MAIN3.O after the linker fills in the. Data section of the executable file. So y is an irrelevant variable, initialized to 0, noting the difference from Example 2.
3. Summary
due to the various problems caused by symbolic resolution, we should note when writing C:
Try to hide variables and functions in the module with the static attribute, just like in C + + As far as possible with private protection of the same type of privacy members.
Define weak symbols less and try to initialize global variables so that the linker gives multiple symbol-defined errors according to Rule 1. The
sets the necessary options for the linker, such as the-fno-common of GCC, so that the linker gives a warning when it encounters a multiple-symbol definition.
4. C + + Symbolic resolution
C + + does not support strong and weak symbols exist at the same time, all symbols can have only one definition (function overload by rewriting the function symbol to ensure its unique), so to a large extent avoid the linker in C trouble.