Chapter 10-01

Source: Internet
Author: User
Tags blank page

Please indicate the Source:http://blog.csdn.net/gaoxiangnumber1
Welcome to my Github:https://github.com/gaoxiangnumber1
? The three main constraints of the C + + language are: compatibility with C, 0 overhead (zero overhead), and value semantics. §11.7 describes value semantics.
?“ Compatible with C "is not only compatible with C syntax, but more importantly compatible with the C language of the compilation model and the running model, that is, direct use of the C language header files and libraries."
For example, for Connect (2) This system function, its header file and prototype are as follows:

#include <sys/socket.h>int connect(intconststruct sockaddr *addr, socklen_t addrlen);

? The length and representation of a C + + primitive type (representation) must be the same as the C language (int, pointer, and so on), which is exactly the same as the C language compiler that compiles the system library. The C + + compiler must be able to understand the definition of struct sockaddr in the header file sys/socket.h, use the same alignment (alignment) algorithm, generate a layout that is exactly the same as the C compiler, and follow the C-language function calling convention (parameter passing, return value passing , stack frame management, and so on) to call this C-language library function directly.
? The native interface exposed by modern operating systems is often described in C, and the native API interface for Windows is the Windows.h header file, POSIX is a bunch of C language header files. C + + compatible, so that you can directly use these header files at compile time and link to the corresponding library. And in the run time directly call C language function library, save a middle layer of procedures, can be counted as one of the reasons for C + + efficiency.

? 10-1 is a typical process for compiling a C + + program on Linux. One of the most time-consuming is cc1plus this step. The boundaries of each stage in the diagram are not necessarily the same. CPP and Cc1plus are usually merged into one process, while cc1plus and as can be either mediated by temporary files (*.S) or piped (pipe), and small programs for single source files often do not have to generate. o files. Linker is also called Link Editor.
? In different contexts, the word "compile" has a different meaning.
-If the. cc file is "compiled" as an executable in general terms, then it refers to the four steps of Preprocessor/compiler/assembler/linker.
-If you differentiate between "compile" and "link", then "compile" refers to the steps of generating a target file from a source file (that is, g++-c).
-If preprocessing, compilation (transcoding), and assembly are distinguished, the compiler actually sees the source code after the preprocessor completes the header file substitution and the macro expansion.
? C + + so far (including c++11) does not have a module mechanism, and can not be used as other programming languages with import or using to introduce the current source files to use the library (including other package/module functions or classes), Instead, you must use the Include header file to mechanically load the library's interface declarations as text replacements, and then parse them again.
? To do this on the one hand, the compiler is inefficient, the compiler is prone to parse tens of thousands of line after preprocessing source code, even if the source file only hundreds of lines, on the other hand left a huge hidden trouble. One reason is that the header file contains a transitive, unnecessary dependency, and another reason is that the header file is used at compile time and the dynamic library file is used at runtime, and the time difference between the two may result in mismatches, resulting in binary compatibility issues (§11.2).
10.1 C language compilation model and its Genesis
10.1.1 Why does the C language require preprocessing
? Hard constraints on the first generation C compiler on PDP-11: The memory address space is only 16-bit, and programs and data must be squeezed into this tiny 64KiB space. The compiler has no way to fully represent the abstract syntax tree of a single source file in memory, and it is not possible to put the entire program (composed of multiple source files) into memory to complete cross-references (the functions of non-homologous files are called each other, using external variables, etc.).
? Due to memory limitations, the compiler must be able to compile multiple source files separately, generate multiple target files, and then try to link these target files to 20 as an executable file.

? Limited by memory, an executable program cannot be too large, the PDP-11 C compiler written by Dennis Ritchie is not an EXE file, but 7 executables: CC, CPP, as, LD, C0, C1, C2.
The CC is driver, which is used to invoke several other programs.
? CPP is a preprocessor, then called compiler control line Expander.
? C0, C1, C2 are the three stages of the C compiler (phase):
The function of C0 is to compile the source program into two intermediate files;
C1 compiles the intermediate file into the assembly source code;
C2 is optional and is used for peek-hole optimization of generated assembly code.
? As is the assembler that converts the assembly code to the target file.
The LD is a linker that links the target and library files to an executable file.
? The compilation process is shown in Figure 10-2. Without CC, the process of manually compiling a simple program prog.c is as follows:

To achieve separate compilation in the case of reduced memory usage, the C language uses an implicit function declaration (Implicit declaration of function). When the code uses a function that is not defined earlier, the compiler does not need or check the function prototype: Neither the number of parameters nor the type of the parameter and the return value are checked. The compiler considers the undeclared function to return int, and can accept any number of int arguments.

With implicit function declarations, we can compile multiple source files separately and then link them to a large executable file. Why do we need header files and preprocessing?
According to Eric S. Raymond in 17th. 1.1 of the Art of UNIX programming, the earliest UNIX was to print kernel data structures (such as struct dirent) on the manual, Each program then defines the struct itself in the code. For example, Unix V5 's LS (1) Source code defines the structure that represents the directory itself. With preprocessing and header files, these public information can be made into a header file to/usr/include, and then the program contains the use of the header file. Reduce unnecessary errors and increase the portability of your code.
? The earliest preprocessing has only two functions: #include和 # define. #include完成文件内容替换, #define只支持定义宏常量, defining a macro function is not supported. There are only three things in the early header file: struct definition, declaration of external variables, macro constants. This reduces the duplication of code in each source file.
Compilation model of 10.1.2 C language
? Because the entire source file's syntax tree cannot be saved in memory, the C language is designed by "one pass". Single-pass compilation refers to the source code is scanned from start to finish, while parsing (parse) code, while generating the target immediately. In a single-pass compilation, the compiler can see only the code that has been parsed before the current statement/symbol, see the code after it, and forget it. This means that
? The C language requires that a struct must be defined before it can access its members, or the compiler does not know the type and offset of struct members and cannot generate the target code immediately.
The local variable must also be defined and reused, because if the definition is put back, the compiler will not be able to generate code immediately when it sees a local variable for the first time without knowing its type and position in the stack.
? In order to facilitate the compiler to allocate stack space, the C language requires local variables to be defined only at the beginning of the statement block.
For an external variable, the compiler only needs to know its type and name, and does not need to know its address, so it will need to be declared before it can be used. In the generated target code, the address of the external variable is a blank, left to the linker to fill in.
? When the compiler sees a function call, the compiler can immediately generate the assembly code of the calling function (function arguments into the stack, call, get return value) by implicit function declaration rules, where the only thing not sure is the actual address of the function, and the compiler leaves a blank for the linker to fill.
For the C compiler, just remember the members and offsets of the struct and know the type of the external variable enough to parse the source code while generating the target code. As a result, early header files and preprocessing exactly meet the needs of the compiler. external symbols (functions or variables) of the resolution (resolution) can be left to the linker to do.
? From the compilation process above, it can be found that the C compiler can do very little, using only a small amount of memory. The C compiler of Unix V5 does not even use the dynamic allocation of memory, but instead uses a number of global stacks and arrays to help handle complex expressions and statement nesting, and the entire compiler's memory consumption is fixed. I speculate that C does not support nesting of defined functions inside a function is also affected by this, because this means that the function body must be resolved with recursion, and the memory consumption of the compiler is not a fixed value.
With the "Cannot nest" effect, the entire C language namespace is flat (flat), and the functions and structs are in the global namespace. This brings trouble because each library tries to avoid its own functions and structs from conflicts with other libraries. The earlier C language did not even allow the same member names to be used in different structs, so we saw some struct names prefixed, such as the members of struct timeval are tv_sec and tv_usec,struct sockaddr_in members are Sin_ Family, Sin_port, sin_addr.
10.2 C + + 's compilation model
10.2.1 Single-Pass compilation
? C + + inherits the single-pass compilation. In a single-pass compilation, the compiler can only make decisions based on the code that is currently seen, and reading the following code will not affect the decisions made earlier. This affects names lookup (name lookup) and function overload resolution.
Name Lookup
? The names in C + + include the type name, function name, variable name, typedef name, template name, and so on. The following line of code
Foo A; The three names of Foo, T and A are not macro.
? If you do not know what the three names of Foo, T, and a represent, the compiler will not be able to parse the syntax. There are at least three possibilities for this line of statements, depending on the code that appeared earlier:
1.Foo is the template class foo;,t is the type, this sentence with T as the type parameter type has the Foo types, and defines the variable A.
2.Foo is the template class foo;,t is a const int variable, this sentence with T as a non-type template parameter with the Foo type, and defines the variable A.
3.Foo, T, A are int, this sentence is a useless statement. Don't forget that operator< () can be overloaded, and this code can also express other meanings. Another classic example is the AA BB (CC), which can either declare a function or define a variable.

? C + + can only parse the source to understand the meaning of the name, not by directly reading the object code in the metadata to obtain the required information (function prototype, class type definition, etc.). This means that to understand exactly what a line of C + + code means, we need to read through all the code before that line of code and understand the definition of each symbol (including the operator). The presence of a header file makes it almost impossible to see the naked eye. There may be a situation where someone inadvertently changes the header file, or simply changes the order in which the header files in the source file are contained, altering the meaning of the code and destroying the functionality of the code.
? The C + + compiler's symbol table should at least preserve the meaning of each name that is currently seen, including the class's member definition, declared variables, known function prototypes, and so on to properly parse the source code. This is not considered template, the difficulty of compiling template is beyond imagination. The compiler also correctly handles the change in the meaning of the name thrown by the scope nesting: names in the inner scope may obscure names in the outer scope (shadow). Some other languages warn of this, and I recommend compiling the code with the g++-wshadow option. (The code for Muduo is-wall-wextra-werror-wconversion-wshadow compiled.) )
Function overload resolution
? When the C + + compiler reads a function call statement, it can only select the best function from the same name function that is currently seen. Even more appropriate in the later code does not affect the current decision 40.

? This means that if we exchange two namespace-level function definitions in the source code position, then it is possible to change the behavior of the program. For example, the following code:

#include <stdio.h>usingnamespacestd;void foo(int x){    printf("foo(int);\n");}void bar(){    foo(‘a‘);}void foo(char ch){    printf("foo(char)\n");}int main(){    bar();    return0;}

? If you move the definition of void bar () to void Foo (char) at refactoring time, the output of the program is different.
? This example illustrates the difficulty of implementing the C + + Refactoring tool: The refactoring must understand the code to the level of the compiler in order to change the code without changing the original intent. The argument to a function can be a complex expression, and the refactoring must parse the type of the expression correctly to complete the overload resolution. For example, if Foo (str[0]) should invoke which Foo () is related to the type of str[0], and Str may be a std::string, this requires the refactoring to correctly understand the template and present it. C + + has no decent refactoring tools to date, I'm afraid that's the reason.
? The C + + compiler must save the function-level syntax tree in memory to properly implement the return value optimization (RVO), otherwise the compiler cannot tell whether the returned object is the one that can be optimized named object when it encounters a return statement.
? because C + + has added a lot of language features, C + + compiler does not really do like C as a wink that forget the single-pass compilation. But C + + must be compatible with the semantics of C, so the compiler has to pretend to be a single-pass compilation (exactly one-time parse), even if it's inside multiple passes.
10.2.2 Forward Statement

? The C + + coding specification recommends using forward declarations as much as possible to reduce compile-time dependencies, where I use "one-way compilation" To explain why this is feasible and often even necessary.
? If the code calls the function foo (), the C + + compiler will need to generate the target code for the function call when the function call is made here. To complete the syntax check and generate the target code for the calling function, the compiler needs to know the number and type of arguments and the return value type of the function, and it does not need to know the implementation of the function body (unless you want to do an inline expansion). So we usually put the function prototype in the header file so that each source file containing the header file can use this function.
Light has a function prototype is not enough, a program of a source file should be defined by this function, otherwise it will cause link errors (undefined symbols). The source file that defines the Foo () function usually also contains the header file of foo (). However, what happens if you write the parameter type incorrectly when you define the Foo () function?

// in foo.hvoid foo(int);      // 原型声明// in foo.cc#include "foo.h"void foo(intbool// 在定义的时候必须把参数列表和返回类型抄一遍。{                   // 有抄错的可能,也可能将来改了一处,忘了改另一处// do something}

There is no mistake in compiling foo.cc because the compiler thinks Foo has two overloads. But linking the entire program will cause an error: the definition of void foo (int) cannot be found.
? This is a flaw in C + +, that is, one thing distinguishes between declarations and definitions, and the code is placed in different files, and there is a possibility of inconsistency. A lot of the errors in C + + are derived from this, such as: in a source file declaration extern char* name, in another source file is defined as Char name[]= "Shuo Chen";.
? For the function's prototype declaration and function body definition, this inconsistency is manifested in the parameter list and the return type, the compiler can detect the parameter list is different, but not necessarily can detect the return type is different (§10.3). It is possible that the parameter types are the same, but the order is reversed. For example, when a prototype is declared as draw (int height, int width), it is written as draw (int width, int height) when defined, and the compiler cannot detect such errors because the variable names in the prototype declaration are useless.
? If you want to write a library to someone else, you usually put the prototype declaration of the interface function in the header file. But in the internal implementation of the write library, if there are no calls to function, then we can properly organize the order of function definitions, so that the underlying function appears in front of the code, so that you do not have to forward the declaration function prototype. See a blog from Yunfeng ().
The function prototype declaration can be thought of as a forward declaration of the function (forward declaration), in addition to the class's forward declaration.
In some cases, a forward declaration of class is required, such as the case where child and parent class §11.7.2 appear.
Sometimes the complete definition of a class is required [CCS, Clause 22], for example to access a class's members, or to know the size of a class to allocate space.
At other times, a forward declaration of class is sufficient, and the compiler only needs to know the class with that name.
For class Foo, the following uses do not need to see its full definition:
Define or declare foo* and foo&amp, including for function arguments, return types, local variables, class member variables, and so on. This is because the memory model of C + + is flat, and the definition of Foo cannot change the meaning of Foo's pointer or reference.
Declare a function that takes Foo as a parameter or return type, such as Foo bar () or void bar (foo f). However, if you call this function in your code, you need to know the definition of Foo, because the compiler will use the copy constructor and destructor of Foo, so at least see their declaration (although the constructor has no arguments, it may be in the private zone).
? [CCS] 30th stipulates that the &&, | |,, (comma) of these three operators, Google's C + + programming specification supplement does not overload the unary operator& (accessor operator), because once overloaded operator& This class cannot be declared with a forward statement. For example:

class Foo;          // 前向声明void bar(Foo& foo){Foo* p = &foo;  // 这句话是取foo的地址,但是如果重载了&,意思就变了。}

10.3 C + + links (linking)
? To manually compile a book's catalogue and cross-index, for example, describes the basic workings of the linker.

? Suppose that an author has written more than 10 chapters, your task is to edit these chapters into a book. Each section of the page, ranging from 30 to 80 pages, have been printed out separately. (The target file has been compiled from the source file.) There is a cross-reference between the chapters, i.e. "Please refer to section yyy of the XXX page" in the text. The author does not know the chapter number of the current text when writing, and of course does not know which page the current text will appear in the future. Because he can adjust chapter order, add or subtract text content at any time, these actions will affect the final chapter number and page number.
To reference the contents of other chapters, the author puts anchor in the text, naming the text that needs to be referenced. For example, the name of this chapter is Ch:cpp compilation. (This is like giving a unique name to a global function or global variable.) When referring to the number or page number of another chapter, the author leaves an appropriate blank in the text and indicates the page number or chapter numbering of a anchor that should be filled in here.
? Now that you've got these more than 10 stacks of printed documents, how do you edit them into a book? You might think of two steps: Page numbering and chapter numbering, and cross-references.
1. The first step:
1a: The manuscript is folded in the order of the chapters so that the text page can be compiled uniformly.
1b: While compiling the page number, the chapter number can also be determined.
? In the 1a and 1b steps, you can record two sheets of paper at the same time:
-The number of the section, the title, and the page number it appears for compiling the table of contents.
-When encountering anchor, write down its name and the page number and chapter number that appears to resolve the cross-reference.
2. Step two: Again from the beginning of the manuscript, encountered a blank cross-application, to the Anchor Index table to find out its page numbers and chapter numbers, fill in the blanks.
At this point, if all goes well, the book editing task is completed.
? The following two unexpected situations are most likely to occur in this job, and the two most common link errors.
1. Cross-application in the text can not find the corresponding anchor, blank fill.
2. A anchor is defined several times, which one to fill in the cross-reference of the blank space?
? The above method should be at least two times the full text, there is no way to complete the cross-reference from beginning to end only once? This can be done if the author only references the following chapters from the previous chapters when writing a book. We read the full text of the page and chapter number, and when we encounter a new cross-reference blank, we write it on a piece of paper. This paper records cross-referenced names and blank numbers appearing on the page. We know we can meet the corresponding anchor in the back. When encountering a anchor, go to that piece of paper to see if there is cross-reference to it, if there is, go back to the blank page number, fill in the blanks, and then continue to compile page numbers and chapter numbers. Sweep down this way, chapter numbers, page numbers, cross-references are all done.
? This is the way the traditional ONE-PASS linker works, and when you use this linker pay attention to the order of the parameters, and the more basic libraries are placed behind. If the program uses more than one library, and the libraries are dependent on each other (assuming no cyclic dependencies), then the linker's parameter order should be the topological ordering of the dependent graphs. This ensures that each pending symbol can be found in the library that appears later. For example, a, b two separate libraries rely on the C library, then the order of the link is ABC or BAC.
Why not reverse, list the underlying library first, and then list the application library? The reason is that the memory consumption of the previous procedure is small. If the underlying library is processed first, the linker does not know which symbols in the library will be used by the subsequent code, so it is only possible to remember that the memory consumption of the linker is proportional to the sum of the sizes of all the libraries. If the application library is processed first, just remember that the symbol is not yet found, and the memory consumption of the linker is proportional to the number of external symbols in the program (and once you fill in the blanks, you can forget about them).
The above describes the C language of the link model, C + + compared to the main increase in two things:
1. Function overloading, requires a type-safe link [d&e, section 11.3], which is name mangling.

2.vague linkage, that is, the same symbol has multiple conflicting definitions.
? name mangling generally does not require the programmer to worry about, as long as the use of the extern "C", can and C library interoperate on the line. Now the generic C language Library header file will be appropriate to use the extern "C", so that it can also be used in C + + programs.
? C language usually a symbol can have only one definition in the program, otherwise it will result in duplicate definition. C + + is different, the compiler does not know whether certain symbols should be defined in this compilation unit when processing a single source file. For the sake of insurance, only one copy of the "weak definition" is generated per target file, and the linker chooses a copy as the final definition, which is vague linkage. If you don't do that, you'll get an undefined symbol error, because the linker doesn't call the compiler in turn to generate undefined symbols. In order for this mechanism to work correctly, C + + requires code to satisfy a definition principle (ODR), otherwise the code behaves randomly.
10.3.1 function overloading
? In order to implement function overloading, the C + + compiler uses name mangling to generate unique names for each overloaded function so that the correct overloaded version can be found at the time of the link. For example, foo.cc defines two foo () overloaded functions.

//foo.cc   int  foo (bool  x) {return  42 ;} int  foo (int  x) { 100 ;} $ g++-C foo.cc$ nm foo.o # foo.o defined two external linkage functions  0000000000000000  t _z3foob0000000000000010  T _z3fooi$ c++filt _Z3fo OB _z3fooi # unmangle these two function names  foo (bool ) Span class= "Hljs-preprocessor" ># note that there is no return type  foo (int ) in mangled name  

? The mangled name of the normal non-template function does not contain a return type. The return type does not participate in function overloading. This has a hidden danger. If a source file uses an overloaded function, but it sees that the return type of the function prototype declaration is wrong (in violation of the ODR), the linker cannot catch such an error.

// main.ccvoid foo(bool);  # 返回类型错误地写成了voidint main(){foo(true);}$ g++ -c main.cc$ nm main.o  # 目标文件依赖_Z3foob这个符号U   _Z3foob0000000000000000    T   main$ g++ main.o foo.o  # 能正常生成 ./a.out

For built-in types, this does not have a real impact. But if the return type is class, the consequences are unknown.
10.3.2 Inline function
See [EC3] 30th for all aspects of the inline function.
? Because of the inline function, invoking a function in the C + + source code does not imply that a real function call is made in the generated target code (that is, there is a calling command). Now the compiler can automatically determine if a function is suitable for inline, so the inline keyword is often not required in the source file. Inline is still needed in the header file in order to prevent the linker from complaining about duplicate definitions (multiple definition).
? The C + + compiler now uses a duplicate code elimination approach to avoid duplicate definitions.
-If the compiler cannot expand inline, each compilation unit will generate the target code for the inline function, and then the linker will select one copy of the multiple implementations and the rest will be discarded (vague linkage).
-If the compiler is able to expand the inline function, you do not have to generate the target code for it alone, unless you use a function pointer to point to it.
How can I tell if a C + + executable is a debug build or a release build? How to tell if an executable is-o0 compiled or-o2 compiled? It is common practice to see if the short member function of class template has been expanded by inline. For example:

//vec.cc#include <vector>#include <stdio.h>intMain () {STD:: vector<int>Viprintf("%zd\n", Vi.size ());# The inline function is called here size ()}$ g++-wall vec.cc# non-optimized Build$ nm./a.out |grep size|c++filt00000000004007AC WSTD:: vector<int, std::allocator<int>>::size ()Const//Vector<int>::size () No inline expansion, a function (weak) definition appears in the destination file. $ g++-wall-o2 vec.cc# Optimized Build$ nm./a.out |grep size|c++filt//No output, because Vector<int>::size () is expanded by inline. 

? The compiler automatically generates class destructors for us as inline functions, and sometimes we deliberately outline to prevent code bloat or compile errors. The following printer is an open class that is implemented according to the PIMPL technique introduced by §11.4. This class's header file does not reveal any details of the Implclass, only the forward declaration. and intentionally explicitly declares the constructor and destructor.
Printer.h

#include <boost/scoped_ptr.hpp>// : boost::noncopyable{public:Printer();~Printer();  // make it out-line// other member functionsprivate:class Impl;  // forward declaration onlyboost::scoped_ptr<Impl> impl_;};

In the source file, we can calmly define the Printer::impl first, and then define the constructors and destructors of the printer.
printer.cc

 #include "Printer.h"  Span class= "Hljs-class" >class  printer :: impl  { //members }; Printer::P rinter (): Impl_ (new  impl) //now the compiler sees Impl The definition of this sentence can be compiled through.  {}printer::~printer () //although the destructor is empty, it must be defined here.                   Otherwise, the compiler  {//cannot be seen when the implicitly declared ~printer () inline is expanded } //Impl::~impl () statement, will be an error. See Boost::checked_delete   

? in modern C + + systems, the boundaries of compilation and linking are more blurred. The traditional C + + textbook tells us that if you want the compiler to be able to inline a function, the body of the function must be visible in the current compilation unit. So we usually put the public inline function in the header file. Now that you have link time code generation, the compiler does not need to see the definition of the inline function, and the inline expansion can be left to the linker.
In addition to the inline function, g++ has a large number of built-in functions (built-in function), so "function calls" such as memcpy, memset, strlen, sin, and exp in the source code do not necessarily invoke the library function in libc. In addition, because the compiler knows the functions of these functions, it is better to optimize them.
Please indicate the Source:http://blog.csdn.net/gaoxiangnumber1
Welcome to my Github:https://github.com/gaoxiangnumber1

Chapter 10-01

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.