Parsing C language and C + + compiler model _c language

Source: Internet
Author: User
Tags class definition function definition function prototype memory usage

First, a brief introduction to C's compilation model:
limited to the hardware conditions at the time, the C compiler is not able to load all the program code in memory, but it needs to divide the code into multiple source files and compile separately. And because of the memory limit, the compiler itself cannot be too large, so it needs to be divided into multiple executables for staged compilation. In the early years altogether consisted of 7 executables: CC (Invoke other executable files), CPP (preprocessor), C0 (Generate intermediate file), C1 (Generate assembly file), C2 (optimized, optional), as (assembler, generate target file), LD (linker).
1. Implicit function declaration
in order to achieve a separate compilation with reduced memory usage, the C language also supports implicit function declarations, where the code uses a function that is not defined previously, the compiler does not check the function prototype, the compiler assumes that the function exists and is called correctly, and also assumes that the function returns int and generates assembly code for the function. The only thing that is uncertain at this point is the function address of the function. This is done by the linker. Such as:

int main ()
{
 printf ("ok\n");
 return 0;
}

An implicit function declaration warning is given on GCC, but can be compiled and run through. Because when linking, the linker finds the definition of the printf symbol in the libc and fills its address in the blanks left in the compile phase. PS: Compiling with g++ generates an error: Use of undeclared identifier ' printf '. If you are using an undefined function, as the printf function above changes to print, you get a link error instead of a compilation error.
2. Header file
With an implicit function declaration, the compiler should not need a header file at compile time, and the compiler can generate assembly code according to the code of the function call, and assume that the function returns INT. The first purpose of the C header file is to facilitate the sharing of data structure definitions between files, external variables, Chang. In the early days of the header, it contained only three things. Note that the function declaration is not mentioned.
Now, after introducing a function declaration into the header file, what are the advantages and disadvantages:
Advantages:
Share interfaces between different files in a project.
The header file provides an interface description for the Third-party library.
Disadvantages:
Efficiency: In order to use a simple library function, the compiler may want to parse thousands of rows after preprocessing the header file source code.
Transitivity: Header file is transitive. Any file change in the header file delivery chain will cause all source files that contain the header file to be recompiled. Even if the change does not matter (no source file uses the modified interface).
Variance: Header files are used at compile time, and dynamic libraries are used at run time, both of which may cause binary compatibility issues due to version inconsistency.
Consistency: the header file function declaration and the source file function implementation have no consistent parameter names. This may result in the meaning of the function declaration, inconsistent with the specific implementation of the function. If declared as void draw (int height, int width) is implemented as void draw (int width, int height).
3. Compilation (one pass)
Because the compiler at the time could not save the entire source file's syntax tree in memory, the compiler was actually "compiled". That is, the compiler compiles the source file from beginning to end, while parsing it, generating the object code immediately, and the compiler can only see the part that has been resolved when compiling the compilation. means:
C language structures need to be defined before they can be accessed. Because the compiler needs to know the structure definition, it knows the struct member type and the offset, and generates the target code.
Local variables must be defined before they are used. The compiler needs to know the type of the local variable and its position in the stack.
External variables (global variables), the compiler only needs to know its type and name, do not need to know its address, you can generate the target code. The address of the external variable is left to the connector to fill.
For functions, according to implicit function declarations, the compiler can immediately generate the target code and assume that the function returns int, leaving a blank function address to the connector to fill.
The early header file of C language is used to provide the structural body definition and the external variable declaration, while the resolution of the external symbol (function or external variable) is given to the linker.
A single compilation combined with implicit function declarations leads to an interesting example:

void Bar ()
{
 foo (' a ');
}

int foo (char a)
{
 printf ("foobar\n");
 return 0;
}

int main ()
{
 bar ();
 return 0;
}

GCC compiles the above code and gets the following error:

Test.c:16:6: error:conflicting types for ' foo '
void foo (char a)
 ^
test.c:12:2: note:previous implicit de Claration is here
  foo (' a ');

This is because when the compiler encounters Foo calls in Bar (), the compiler does not see the Foo function definition at the very end. It can only generate the function call code for int foo (int) based on an implicit function declaration, and note that the implicitly generated function argument is int instead of char, which should be an upward conversion by the compiler, aligned to an int. When the compiler resolves to a more appropriate int foo (char), it does not admit that it is wrong, and it thinks that the Foo definition is inconsistent with the compiler's implicitly generated Foo declaration, resulting in a compilation error. Replacing the above Foo function with void foo (int a) also gets a similar compilation error, with the C language requirement that a symbol can have only one definition, including function return values.
And before the Foo definition is put in bar, the compile runs OK.
C + + compilation model
So far, the 3 points we've mentioned about the C-compilation model are more than good for C, because the C language is simple enough. C + + is more difficult to understand when C + + tries to be compatible with these features (C + + has no implicit function declaration), plus the unique features of C + + itself, such as overloads, classes, templates, and so on.
1. Single-pass compilation
C + + does not have an implicit function declaration, but it still follows a single compile, at least as it seems, the impact of the single compile semantics on C + + is mainly overload resolution and name resolution.
1.1 overload resolution

#include <stdio.h>

void foo (int a)
{
 printf ("foo (int) \ n")
;

void Bar ()
{
 foo (' a ');
}

void Foo (char a)
{
 printf ("foo (char) \ n");
}

int main ()
{
 bar ();
 return 0;
}

The above code is compiled by g++ to run the result: foo (int). Although there is a more appropriate function prototype in the back, C + + only sees void foo (int) when parsing bar ().
This is one of the puzzles caused by the C + + overload combination of compilation, even though C + + is not really a compilation (think of a forward declaration), but it wants to be compatible with C semantics, so it has to "play dumb". The exception for C + + classes is that the compiler scans the definition of the class and then resolves the member function, so all functions of the same name in the class can participate in the overload resolution.
The other thing about overloading is that C's implicit type conversions are also causing problems for overloading:

Case 1
void f (int) {} void
f (unsigned int) {}
void Test () {f (5);}//Call F (int)

//Case 2
void f (i NT} void
F (long) {}
void Test () {f (5);}//Call F (int)

//Case 3
void f (unsigned int) {}
void F (Lon g) {}
void Test () {f (5); The compiler doesn't know what you're doing.

/Case 4
void f (unsigned int) {}
void test{f (5);}/ /call f (unsigned int) ...
void f (Long) {}

Plus an implicit conversion of C + + subclass to the parent class, the overload of the conversion operator ... You have to work hard to make sure the compiler does what you expect. Another effect of
1.2-word lookup
compilation to C + + is name lookup, C + + can only understand the meaning of the name through the source code, such as AA BB (CC), this sentence can be declared function, can be defined variable. The compiler needs to combine all the source code it has parsed to determine the exact meaning of the sentence. When combined with C + + template, this difficulty geometry climbs. Inadvertently changing the header file, or modifying the header file inclusion order, can change the statement semantics and the meaning of the code.
2. header file
in beginners C + +, the function declaration is placed in the. h file, and the function implementation is placed in the. cpp file, which seems to have become a consensus. C + + does not have the implicit function declaration of C. And there is no other high-level language package mechanism, therefore, in the same project, the header file has become a module and a module, between classes and classes, the main way to share the interface.
C in the efficiency, transitivity, diversity, consistency, C + + is not the ground to inherit. In addition, the C + + header file also poses the following problems:
2.1 Order
because the C + + header file contains more content: template, typedef, #define, #pragma, class, etc., The different header file contains the order, which can result in completely different semantics. or a direct result of compilation errors.
2.2 Also see overload
because C + + supports overloading, so if the function declarations in the header file and the function implementations in the source file are inconsistent (such as the number of parameters, the const attribute, and so on), they may constitute overloads, and this time "smart" C + + Compiler good error, it will the function of the call address to the linker to fill out, and the source file in the wrong implementation will be identified as a new overload. Thus, the link stage is not an error. This can be a compile error in C because C has no overloads and no name adaptation (name mangling), which will result in a symbolic conflict at compile time.
2.3 Repeats include
because of the transitivity of the header file, it is possible to cause a duplicate inclusion of an upper header file. Duplicate-Included header files, when expanded, can result in symbolic redefinition, such as:

Common.h
class Common
{
 //...
};

H1.h
#include "common.h"

//H2.h
#include "common.h"/

/test.cpp
#include "H1.h"
# Include "H2.h"
int main ()
{return
 0;
}

If common.h, there are function definitions, struct definitions, class declarations, external variable definitions, and so on. In Test.cpp, two copies of Common.h are expanded, and a symbol redefinition error is obtained at compile time. If the Common.h only has an external function declaration, OK, because the function can be declared in more than one place, but it can only be defined in a single place. With regard to class declarations, the C + + class retains the semantics of the struct, so it is more appropriate to be called a class definition. Always remember that the header file is just a consolidation of common code that will be replaced in the source file during the precompiled period.
To address duplicate inclusion, C + + header files are commonly used #ifndef #define #endif或 #pragma once to ensure that header files are not duplicated.
2.4 Cross Inclusion
Cross-inclusions occur when classes in C + + are referenced by each other. If parent contains a child object, and the child class contains a reference to parent. So each other's header file, the compiler to expand the Child.h need to expand the Parent.h, expand the Parent.h to expand Child.h, so infinite loop, and finally g++ give: error: #include nested too the compilation error.
The solution to this problem is a forward declaration, preceded by a class Parent in the child class definition; Declare the parent class without having to include its header file. A forward declaration can be used for a class as well as a function (that is, an explicit function declaration). The forward declaration should be used in large quantities, and it can solve most of the problems that the header file brings, such as efficiency, transitivity, duplicate inclusion, cross inclusion, and so on. This is a bit like the package (package) mechanism, what you need to declare (import) what. The forward Declaration also has limitations: only when the compiler does not need to know the complete definition of the target class. Class A Can use class B in the following situations:
Use B to declare references or pointers in Class A;
Class A uses B as a function parameter type or return type without using the object, that is, it does not need to know its constructors and destructors or member functions;
2.5 How to use header files
Recommendations for using header files:
Reduce compilation dependencies between files (such as using a forward declaration);
Classify header files in a specific order, such as C language System header file, C + + system header file, project base header file, project header file;
Prevents the header file from being repeatedly compiled (#ifndef or #pragma);
Ensure consistency of header and source files;
3. Summary
C language itself some relatively simple features, put in C + + but caused a lot of trouble, mainly because of C + + complex language features: classes, templates, all kinds of macros ... For example, for a Class A, it has a private function that requires class B, and this private function must appear in the class definition, the header file, thus adding an unnecessary reference to B for a header file. This is because the C + + class follows the semantics of the C struct, and all class members must appear in the class definition, "part of this class". This is not only a definition of inconvenience, but also prone to semantic misunderstanding, in fact, the C + + class member functions are not part of the object, it is more like ordinary functions (except virtual functions).
And in C, there is no "bundle of classes", the implementation will be much simpler, put the function in A.C, the function is not declared in the A.h. The A.C contains B.h and relieves the association between A.h and B.h, which is one of the advantages of C separating data from operations.
Finally, look at how other languages have avoided these "pits":
For an interpreted language, import directly parses the source file of the corresponding module, rather than including the file in it;
For a compiled language, the compiled target file contains enough metadata to read the source file (there is no header file to say);
They all avoid the problem of inconsistency between definitions and declarations, and in these languages, definitions and declarations are integrated. The import mechanism ensures that only the necessary name symbols are available everywhere, and no extra symbols are added in.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.