C + + Compile link process detailed

Source: Internet
Author: User

Some people write C + + (the following is assumed to be C + +) program, unresolved external link or duplicated external simbol error message at a loss (because such an error message cannot be located to a row). or some parts of the language don't know why (or not) to design like that. After understanding this article, there may be some answers.      First Look at how we write a program. If you are using some kind of IDE (Visual Studio,elicpse,dev C + +, etc.), you may not find out how the program is organized (many people oppose the use of the IDE for beginners). Because using the IDE, all you have to do is create a new series of. cpp and. h files in a project, and then click "Compile" in the menu after writing. But in the past, programmers didn't write programs like this. They first open an editor, write the code like a text file, and then tap      CC 1.cpp-o 1.o     cc 2.cpp-o 2.o   &n at the command line Bsp CC 3.cpp-o 3.o  here, CC represents a C + + compiler, followed by the CPP file to be compiled, and specifies the file to output with-O (please forgive me for not using any of the popular compilers as an example). This will appear in the current directory:     1.O 2.o 3.o  Finally, the programmer also type      link 1.o 2.o 3.o-o a.out  to generate the final executable file A.out. Now the IDE, in fact, also follow this step, just to automate everything.      Let's analyze the process above to see what we can find.      First, compiling the source code is done separately for each CPP file. For each compilation, if you exclude the CPP file in the CPP file (which is extremely wrong in C + + code writing), then the compiler only knows the one CPP file that is currently being compiled and is completely unaware of the existence of the other CPP files.      Next, after each CPP file is compiled, the resulting. o file is read in by a linker to eventually generate the executable file.      OK,With these perceptual knowledge, let's take a look at how C + + programs are organized.           First you need to know some concepts:     compile: The compiler compiles the source code, Is the process of translating source code that exists as text into a target file in the form of a machine language.      Compilation unit: for C + +, each CPP file is a compilation unit. As you can see from the previous compilation process, each compilation unit is mutually agnostic.      Target files: The files generated by the compilation contain all the code and data in the compilation unit in the form of a machine code, as well as some other information.           Let's take a look at the compilation process. We skip grammar analysis, etc., and come directly to the target file generation. Suppose we have a 1.cpp file       int n = 1;       void f ()       {& nbsp;        ++n;    }       It compiles the target file 1.O will have a region (assuming the name of the 2-segment), including the above data/function, which has n, F, in the form of a file offset is likely to be:     offset     content     Length      0x000    n    4     0x004    f    ??      Note: This is just speculation and does not represent the true layout of the target file. The individual data of the target file is not necessarily sequential, not necessarily in this order, and certainly not necessarily from 0x000.      Now let's look at the contents of the F function starting from 0x004 (guessing under the 0x86 platform):     0X004 Inc DWORD PTR [0x000]     0x00? ret     Note n++ has been translated as: Inc DWORD PTR [0x000], which is a DWORD (4 bytes) on the 0x000 position of this unit plus 1.           Below if there is another 2.cpp, as follows      extern int n;     void g ()      {         ++n;    }     Then it's the target file 2.o of the 2 binary segment should be      offset     content     length      0x000    g &nbsp ;  ??      Why there is no space for n (that is, the definition of n) because N is declared extern, indicating that the definition of n is in a different compilation unit. Do not forget the compile time is impossible to know the situation of other compilation unit, so the compiler does not know where n exactly where, so this time G's binary code is not able to fill in the Inc DWORD PTR [???] In the??? Part. What do we do? This work can only be handed over to the later linker to deal with. In order for the linker to know where the address is not filled out, the target file also has an "unresolved symbol table", that is, unresolved symbol table. Similarly, the target file that provides the definition of n (that is, 1.O) also provides an "export symbol table", which is exported to tell the linker which addresses it can provide.      Let's get to the idea: now we know that each target file, in addition to having its own data and binary code, has to provide at least 2 tables: unresolved symbol tables and exporting symbol tables, telling the linker what it needs and what it can provide, respectively. The following question is how to establish a correspondence between the 2 tables. Here's a new concept: symbols. In C + +, each variable and function has its own symbol. For example, the symbol of variable n is "n". The symbol of the function is more complex, it needs to be combinedThe function name and its arguments and invocation conventions, etc., get a unique string. The sign of F may be "_f" (varies depending on the compiler).      So, the export symbol table for 1.O is      symbol     address      N    0x000 & nbsp   _f    0x004     non-resolved symbol table is empty      2.O export symbol table for      symbol   &NBS p; address      _g    0x000     unresolved symbol table for      symbol     address   &nbs p;     N    0x001         0x001 here is the INC DWORD PTR starting from 0x000 [???] stored in binary encoding??? The start address (this assumes that the 2–5 byte of the Machine Code of INC is the absolute address of +1, need to know the exact case to check the manual). This table tells the linker that there is an address at the location of this compilation unit 0x001, which has an unknown value but has a symbolic n.      LINK, the linker found in 2.O unresolved symbol N, then in the search for all the compilation unit, the export symbol n found in 1.O, then the linker will be N address 0x000 fill in the location of 2.O 0x001.      "Stop", maybe you'll jump out and accuse me. If this is done, it is not the content of G will become the INC DWORD PTR [0x000], as previously understood, this is the 0x000 address of this unit 4 bytes plus 1, rather than the corresponding position of 1.O plus 1. Yes, since the address of each compilation unit starts at 0, the address is duplicated when it is finally stitched together. So the linker will adjust the address of each unit when stitching. In this example, suppose that the 0x00000000 address of 2.O is positioned on the 0x00001000 of the executable file, and the 1.O 0x00000000 address is positionedLine file on the 0x00002000, then actually for the linker, 1.O export symbol table actually      symbol     address      N    0x000 + 0x2000     _f    0x004 + 0x2000     The unsigned table is empty      2.O export symbol table is      symbols     address      _g    0x000 + 0x1000     unresolved symbol table for  & nbsp   symbol     address                 n    0x001 + 0x1000  So the final G code will change to Inc DWORD PTR [0x000 + 0x2000].      Finally there is a vulnerability, since the address of the last n becomes 0x2000, then the previous F Code inc DWORD PTR [0x000] is wrong. Therefore, the target file also provides a table called address Redirection table redirect.      for 1.O, its redirection table is      address      0x005     This table does not require symbols, When the linker processes the table and finds that there is an address on the location where the address is 0x005 to redirect, add 0x2000 on the 4 bytes starting with 0x005.      Let's summarize: when the compiler compiles a CPP into a target file, it provides at least 3 tables in addition to the data and code contained in the CPP in the target file: Unresolved symbol table, export symbol table and address redirection table.      Unresolved symbol table provides all of the symbols referenced in the compilation unit but not defined in this compilation unit and theirThe current address. The      Export symbol table provides the symbols and their addresses that this compilation unit has defined and is willing to provide to other compilation units. The      Address redirection table provides a record of all references to its own address in this compilation unit.      When linking to a linker, first determine the location of each target file in the final executable file. It then accesses the address redirection table for all the target files, redirecting the addresses in which they are recorded (that is, adding the starting address of the compilation unit to the executable file actually). It then iterates through the unresolved symbol table for all the target files, finds the matching symbol in all the exported symbols, and fills in the actual address in the position recorded in the unresolved symbol table (plus the start address of the compilation unit that owns the symbol definition actually in the executable file). Finally, the contents of all the target files are written in their respective positions, and then some other work is done, and an executable file is released.      Final link 1.o 2.o .... The resulting executable file is probably      0x00000000  ???? (Some other information)      ....     0x00001000  inc DWORD PTR [0x00002000]       &NBSP ;      //Here is the beginning of the 2.O, which is the definition of G      0x00001005  ret                                  //Suppose Inc is 5 bytes, this is the end of G      ... & nbsp;    0x00002000  0x00000001                     &NBSP ;    //Here is the beginning of 1.O, also the definition of n (initialized to 1)    &NBSp 0x00002004  inc DWORD PTR [0x00002000]        //Here is the start of F      0x00002009  ret                                  //assumptions Inc is 5 bytes, here is the end of F      ...     ...     The actual link is more complicated because the actual target file divides the data/code into several zones , redirects are done by area, but the principle is the same.      Now we can take a look at a few classic link errors:     unresolved external link.      This is obviously the linker found an unresolved symbol, but no corresponding item was found in the export symbol table.      Solution, of course, is to provide the definition of this symbol in a compilation unit. (Note that this symbol can be a variable, or it can be a function), or you can see if there is no link to the file without links      duplicated external simbols...     This is where duplicates appear in the export symbol table, so the linker cannot determine which one to use. This may be a duplicate name, or there may be another reason.      Let's take a look at some of the features provided in the C/s + + language:     extern: This tells the compiler that this symbol is defined in a different compilation unit, That is to put this symbol in the unresolved symbol table. (external link)      static: If the keyword precedes the declaration of a global function or variable, it indicates that the compilation unit does not export the function/variable symbol. Therefore, it cannot be used in other compilation units. (internal link). If it is a static local variable, the variable is stored in the same way as the global variable, but stillThe symbol is not exported.      Default Link properties: For functions and variables, modulo external links, for const variables, default internal links. (You can change the link properties by adding extern and static)      pros and cons of external links: externally linked symbols can be used throughout the program (because symbols are exported). However, it is also required that other compilation units cannot export the same symbols (otherwise duplicated external simbols)      internal links: Symbols of internal links cannot be used within other compilation units. However, different compilation units can have internal link symbols of the same name.       Why the header file can generally only have a declaration cannot have a definition: the header file can be included in multiple compilation units, if there is a definition in the header file, then each containing the header file of the compilation unit will be the same symbol is defined, if the symbol is an external link , it causes duplicated external simbols. Therefore, if the header file is to be defined, the defined symbol must be guaranteed to have only internal links.      Why Changshime think of internal links, while variables are not:         This is to be able to define constants such as const int n = 0 in the header file. Because constants are read-only, it does not matter if each compilation unit has a definition. If a variable that is defined in a header file has an internal link, then if more than one compilation unit defines the variable, one of the compilation units modifies the variable without affecting the same variable in the other cells, which can have unintended consequences.      Why the function defaults to an external link:         Although the function is read-only, but unlike variables, the function is very easy to change when the code is written, and if the function has internal links by default, Then people tend to define the function in the header file, and once the function is modified, all the compilation units that contain the header file are recompiled. In addition, the static local variables defined in the function are also defined in the header file.      Why static variables of a class cannot be initialized in place: the so-called in-place initialization is similar to the case:         class a     & nbsp  {             static char msg[] = "aha";        };  not allowed The reason for this is that because class declarations are usually in the header file, if this is allowed, it is actually equivalent to defining a non-const variable in the header file.       in C + +, the header file defines what a const object would be like:         Generally, this is the same as C in defining a const int in a header file , each compilation unit that contains the header file defines the object. However, because the object is const, it has no effect. However: there are 2 situations that could disrupt the situation:         1. If it involves taking an address to the Const object and relying on the uniqueness of the address, then the address can be different in different compilation units. (This is rarely done in general)          2. If the object has a mutable variable and a compilation unit modifies it, the other compilation unit is not affected.        Why the static constants of a class cannot be initialized in place:        because this is equivalent to defining a const object in the header file. As an exception, Int/char can be initialized in-place, because these variables can be directly optimized to the immediate number, just like a macro.      Inline functions:         c++ inline functions because they resemble a macro, there is no link property problem.      Why the public use of inline functions is defined in the header file:        Because compile-time compilation units do not know each other, If the inline function is defined in a. cpp file, there is no way to find the definition of the function when compiling other compilation units that use the function, so the function cannot be expanded. So if the inline function is defined in the. cpp file, then only this CPP file can be used with this function.      The inline function is rejected in the header fileWhat's going to happen?:         If the inline function defined in the header file is rejected, the compiler automatically defines the function in each compilation unit that contains the header file and does not export the symbol.      If a static local variable is defined in an inline function that is rejected, the variable is defined where:         Early compilers define one in each compilation unit, And therefore produces the wrong result, the newer compiler will solve this problem, the means is unknown.      Why the Export keyword is not implemented:         export requires the compiler to look for function definitions across compilation units, making compiler implementations very difficult.     article Source: DIY Tribe (http://www.diybl.com/course/3_program/vc/vc_js/20090307/159149.html)

C + + Compile link process detailed

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.