Before starting the link, let's look at a few concepts:
The concept of a symbol.
We know that the most important link is "reposition the symbol", and the above refers to the symbol table, what is the symbol, in the link, we refer to the function and the variable is called the symbol. The function name and variable name are the symbolic names (symbol name). Each target file has a corresponding symbol table, which records all the symbols used in the target file. Each corresponding symbol has a corresponding value, called the symbol value. For functions and variables, the symbolic value is their address. In addition to functions and variables, there are several other symbols. Get a quick look.
1, global symbol: global symbol defined in this target file
2, external symbol: referenced in this target file, but not defined in this target file, called external symbol
3, segment name: Symbols generated by the compiler
4, local symbols: local variables
5, line number information for the link, the most important of which is the global symbol and the external symbol.
II. structure of the ELF symbol table:
A, this is a section of the table
B, this is the symbol table.
We can find the following relationships in two tables
1, starting with NUM subscript 6 of the symbol table, Static_init, is the local static variable that we have defined initialized, the binding property is local, stored in the segment labeled 3 in the Segment table, that is, the. Data segment; and then Static_uninit, is an uninitialized local static variable that is stored in the segment labeled 4 in the Segment table, which is the. BSS segment.
2, and then Global_init, and static_init the same, are saved in the. Data section. and Global_uninit, according to our analysis, it should be the same as the Staticy_uninit. BSS section, here is COM, why? Originally, this is related to different languages and different compiler implementations, some compilers will place the global uninitialized variables in the target file. BSS segments, some do not store, just reserve an undefined global variable symbol, wait until the final link to the executable file, and then allocate space in the. BSS segment. (There are COM symbols about here, which are discussed later when it comes to the common block.) )
The 3,func function and the main function are globally visible because it is a function, so the type is stt_func. They all exist in the. Text.
4,printf, because we just use it but don't define it, so it doesn't exist in any segment in this target file, so it's und (undefined).
Three, strong symbols and weak symbols.
The compiler default function and the initialized global variable are strongly signed for C + + languages, and the uninitialized global variable is a weak symbol.
For the concept of strong and weak symbols, the linker processes and selects global symbols that are defined multiple times as follows:
Rule 1, strong symbols are not allowed to be defined more than once. Otherwise the error.
Rule 2, if a symbol is strongly signed in one of the target files and is weak in other files, select the strong symbol.
Rule 3, if a symbol is a weak symbol in all target files, select the one that occupies the most space.
--------------------------------------------------------------------------------------------------------------- ----------------------
Once you know it, it's time to move on to static links.
We know that the link is to merge all the target files and the library files in some way, and finally build the executable file. Is that so easy? Yes or no, we're going to have a good dive.
We write two files a.c and B.C
A.c
extern indicates that GKFX can be found in other files.
B.c
After compiling, get a.o,b.o. Link after get AB
View information about these three files separately
。 First look at the size of AB, a little bit we can find that the. Text and. Data fields their size relationship is AB (size) =a.o (size) +b.o (size), so you can conclude that this is merged by a similar segment.
。 Look at the VMA, this is the virtual Memory address, that is, the fictitious addresses. The LMA indicates load Memory ADDRE1SS, which is the load address. In addition to some embedded systems, the average two values are the same. Both VMA and LMA in A.O and B.O are 0, because the compile phase does not allocate memory addresses.
。。 This kind of merging generally uses a method called two-step link,
。。。 The first step, space and address assignment.
Question: What space is allocated?
The space here has two meanings, one is the output in the executable file space, and the second is the virtual address space in the virtual address after the load. As we can see, the ab file does not have a. BSS segment, which turns out that the meaning of allocating space is confined to the virtual address space for a field such as. BSS. In our focus on virtual address space.
Summarize what the first step has done:
Before linking, the VMA of all segments in the destination file are 0, and when linked, the linker merges in a similar segment in a. Text and. Data segment that incorporates two target files, with virtual addresses assigned to those segments. With the address of the segment, the linker begins to calculate the symbolic address in each segment, because the symbols are fixed in each segment relative to each other, so the correct virtual address of each symbol can be obtained according to the offset.
For a chestnut: A.O. The virtual address of text is 0x80488094, where the offset of main in the. Text field is X, the virtual address of main is 0x80488094+x.
。。。 The second step, symbol parsing and relocation.
After the first step space and address assignment is completed, the linker reads the data in the middle of the input file according to the information collected in the first step, repositions the information and makes symbolic resolution and relocation, and adjusts the address in the code.
OK, let's look into it in detail.
Take a look at A.O's disassembly code:
The code inside the program is used in the virtual address, from the above disassembly results can be seen in the target file A.O, the starting address of main is 0, because there is no space allocated. Until the space allocation is complete, the functions will determine their location in the virtual address space;
Looking closely at these two lines, the 11th line "C7 44 24 04" is the MOVL instruction code, "00 00 00 00" Which is the address of GKFX.
Line 20th "E8" is the relative offset of the command call instruction code, "FC FF FF FF" is the offset of the destination address relative to the next instruction.
When the source code A.C is compiled into the target file, the compiler does not know the address of shared and swap, so the compiler temporarily considers 0 as the shared address, and about the FC FF FF FF is the offset of the called function relative to the next instruction of the calling instruction, which is just a temporary fake address, is the complement form of the constant-4.
Now let's take a look at the results after the link:
We found that, from the link results,
Original 11 lines
Original 20 lines
Their addresses have been rewritten. In the original 20 lines, the next line of the call instruction is leave its address is 80480b9 so the address relative to the leave instruction offset is 0x00000003 is 8048B9+3=8048BC. It's just the swap address!
Here, in fact, the relocation process is a symbolic parsing process, after the linker scans all the input target files, all these undefined symbols should be able to be found in the global symbol table, or the linker will report the symbol undefined error. (symbol UND).
OK, here we go, the next section, to learn about the common block, the static library link.
C + + Start the previous chapter, deep compile links (supplemental 2)