This section describes the symbols, relocation, string-table, and section in the compilation link system)

Source: Internet
Author: User
Author: liigo Date: 2009/11 links: Compile and link, a general processing system for computer programming languages. Compile the program to convert the source code to the target file; link to convert the target file to an executable file. The compilation and link are divided into two relatively independent subsystems to simplify the process, to divide and conquer the problem, and to ensure universality. The compiler task is to compile the program source code into the target file. Generally, each programming language has its own compiler. The output of various compilers is the target file (. OBJ,. Lib,. O,. ,...). The task of linker(is to process the target files generated by the compiler in a series, and finally generate executable files (.exe,. dll,. So ,...).

If the specification of the target file is unique, all compilers will output the target file in a specific format, and the linker will also receive the target file in this format, which is probably an ideal state, the compiler and the linker are also relatively easy to implement and have good versatility. It only needs one to be used as the basis for the linker. This may be one of the original intentions of designing the compilation link system. However, the reality is cruel. At present, the format of the target file is not uniform, but the format of the target file is very different. There are more than 10 rough statistics, which are incompatible with each other. Developers of compilers and connectors can only select one or a few target file formats. The general-purpose linker has never been seen before as a luxury, almost totally away from its original design. Fortunately, it is feasible to apply the general linker in a small scope. For example, if you compile the source code of the d language into the. OBJ target file, you can use the C-language linker link to generate the EXE together with other. OBJ target files. Another example is a new programming language (easy language). If you do not want to develop dedicated connectors repeatedly, You can compile and generate a target file in the C language format, in this way, the existing C language linker links can be used to generate executable files, which can greatly reduce the development workload, reduce R & D costs, and increase the openness of the system. The target file is a bridge between the compiler and the linker. It is both the output of the compiler and the input of the linker. the symbol is the core element of the target file, it is the most important operation object in the compilation and linking systems. In layman's terms, a compiler task is a "create symbol" (and a symbolic-related auxiliary facility), A Linker task is a "use symbol" (and a symbolic-related auxiliary facility ).

The following describes the basic elements and internal structure of the target file based on my personal understanding (liigo. Take the target file in coff format that I currently have the most contact with as an example. The basic elements in the target file include symbol, relocation, string-table, and section ). Here we are talking about logical concepts. What is a symbol? I cannot make it clear because it is hard to understand (for readers) and express (for authors ). For example, if there are variables, constants, and functions in the source code of the program, the variable constant functions after compilation will each become a symbol for reference. Can symbols be understood as "abstraction at a higher level than variable constant functions? Probably. It is precisely because symbols are high-level abstractions that are separated from the variable constants and functions in the programming language concept that the linker can have nothing to do with a specific programming language. The main attributes of a symbol include: name (symbol matching is based entirely on the name text), the serial number of the section to which the symbol belongs, the offset of the symbol entity in the Section, and the scope (private within OBJ, or global ). There are two main types of symbols: one is defining the properties (such as variable definition and function definition), and the content (such as the value and body of the variable) it is stored at an offset in the specified section. The other type is declarative (such as variable declaration and function declaration) and has no content (so it does not need attributes such as the section or offset ), the linker will find the definition of this symbol in other OBJ files or other lib files based on the name. This shows the meaning of the word "link" in the linker: One Party declares (dependency, use) a symbol, the other party defines this symbol, and both parties link it together through the symbol name. You can declare a symbol before it is defined, even if it is not defined. Declaring a symbol is the action of the compiler. It only indicates the dependency on the symbol. The corresponding symbol definition can be completed by others (or the compiler) at other times, you only need to find the definition when the linker works (in other target files. Logically, a symbol usually refers to a variable (the address of the variable) and a function (the first address of the executable body of the function ). When it is stored in OBJ, the symbol corresponds to a certain offset in a section. When it is linked (or later ), symbol corresponds to a fixed memory address (this address is assigned by the linker and can be relocated only after the address is available ). Symbols are stored sequentially in the OBJ file. The struct of all symbols is an array called a symbol table. Within the OBJ file, the index (> = 0) in the symbol table usually refers to a symbol. What if it refers to the symbols in other OBJ? First, define a "declarative nature" symbol with the same name in this OBJ, and then use the symbol index to refer to the same name symbol in this obj. When the linker works in the future, all symbols with the same name are regarded as the same entity and assigned a unique address. A section is a data container and a place where data is stored. The data stored in the section usually includes the variable value, constant value, function body, and so on. The basic attributes of the Section include: Data Length, data offset in the file, whether readable and writable, and whether to relocate the table. When linking, the Section is always included in the link as a whole, and it cannot be divided. The Compilation Time is relatively small and relatively small, which facilitates on-demand extraction during the link and optimizes the size of the compiled EXE or DLL. According to the analysis of the OBJ file generated by the vc6 compiler, a function is generally stored in a separate section. If you look at the source code of the C standard library, you will find that it often writes a function to a separate source file, so that a function will generate an OBJ file at the time of compilation, and try to refine it as much as possible. In OBJ, the section-header of all sections is stored in sequence to form an array, which is called the section header table or section table. Generally, the sequence number (> = 1) in the section table in the OBJ file refers to a section. Relocation is an important element of a section. It is used to modify the address part of the section data. If you analyze the Function Code Compiled by the compiler, you will find that the generated code is not complete and truly executable. Instead, it is only a code template. Where the address is involved, the 0x00000000 placeholder is often used, at the same time, bind a symbol here to correct this address. Why? When the compiler is working, it does not know the symbol (variable, function, and so on) address. It may come from another OBJ (or another Lib), or even cannot even know whether it has a definition. The compiler can only leave a blank space for the linker. In layman's terms, the compiler has a cloze question, which should be answered by the linker. The relocation table can be understood as the information provided by the compiler to the linker. It is an array composed of multiple relocation items. The basic attributes of each relocation include the offset of the corrected address in the section data, it is used to provide the symbolic index and relocation type (absolute location, relative location, etc.) of the address ). When the linker is working, obtain the symbol name based on the symbol index in the relocation item, and then query and obtain the symbol address (the linker is responsible for assigning the symbol address ), get the address of the corrected address based on the offset of the corrected address in the Section and the address of the Section (the linker is responsible for assigning the address of the Section), and then fill in the address of the symbol Based on the relocation type. For example, the C language code int A = 1; assigns a value to the variable. The compilation result (regardless of the compilation optimization) may be mov dword ptr [0x00000000], 0x12345678, the corresponding x86 command sequence is C7 05 00 00 00 00 78 56 34 12. The four-byte 0 in the middle is a placeholder. In the future, the linker will need to overwrite the address of variable, this is an absolute positioning; for example, C code F ();, the compilation result (regardless of compilation optimization) may be call dword ptr [0x00000000], the corresponding x86 command sequence is FF 15 00 00 00 00, and the four-byte 0 in the middle is a placeholder, in the future, the linker will overwrite "the difference between the address of function F and the address of the next command". This is an example of relative positioning. Whether to use absolute positioning or relative positioning or other positioning methods is specified by the relocation table generated by the compiler, depending on the instruction code selected by the compiler. The address placeholder is not necessarily zero. It can be any value (positive or negative), indicating the front and back offset of the target address, the address entered when the linker is relocated is actually obtained by adding the value to the target address. The above is the relocation performed by the linker when the link is used to generate an EXE or DLL. In the future, when the DLL or EXE is loaded, the PE Loader will perform a relocation (the relocation table is generated by the linker, EXE can be omitted). Although the details of the relocation in these two phases are different, the principles are consistent. String-table is an auxiliary facility in the OBJ file or lib file. It is used to centrally store some name texts, such as symbol names and segment names with a length greater than 8 bytes, and the name of a link Member (link Member, which is found in Lib) with a length greater than 15 bytes. The purpose of a string table is to optimize the size of the OBJ or lib file. Take the symbol name as an example. In OBJ, the size of the struct corresponding to a symbol is fixed, 18 bytes in total, and 8 bytes are reserved for storing the symbol name. If the symbol name is short and smaller than or equal to 8 bytes, it is directly stored in this struct (not the end character '/0' of C text). If the length of the symbol name is greater than 8 bytes, store the name in the string-table, then, the offset of the name in the string table is recorded at the eight-byte area mentioned above ('/' before the first character is added as the marker for distinguishing the name and offset ).

As for the Lib file, it is much simpler than obj. It is only the packaging and indexing of the OBJ file and completely contains the content of all the OBJ files in the library, it also provides a name index table for the public symbols in the Library (you can quickly query whether a symbol is defined in the library and in which OBJ it is defined based on the symbol name ). Physically, the front part of the Lib file is composed of three fixed link members (linker member), followed by sequential storage of the content of each OBJ file (also called linker member ), each link Member has a data header ). The first fixed link Member (1st linker member) is retained only for compatibility reasons and has been replaced by the second fixed link Member (2nd linker member, the latter records the index information of the symbol name and the basic information of each OBJ member. The third fixed member (3rd linker member) records the long text (which may be omitted ). The writing is not very organized and messy. Please forgive me. Liigo, 2009.

Reference: <Microsoft portable executable and common object file format specification> http://www.microsoft.com/whdc/system/platform/firmware/PECOFF.mspxhttp://download.microsoft.com/download/9/c/5/9c5b2167-8017-4bae-9fde-d599bac8184a/pecoff_v8.docx

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.