Summary of the source code

Source: Internet
Author: User

How does the compiler translate the C source program into machine code? I believe you will be curious and want to take a look at specific examples. Well, let's take a very simple example to illustrate the entire working process of the compiler.

Source program:

Int round (f) float F ;{

Return F + 0.5;

}

Stage 1: preprocessing.

Preprocessing refers to macro extension, header file introduction, and Conditional compilation code. It is actually the # define, # include <XXX. h>, # ifdef XXX statements that you often use. The Preprocessing Program is executed as an independent process and can be mixed. For example, the LCC compiler can use the GCC Preprocessor (this is because everyone follows the same standard ansi c, so the standard is good, and everyone will not be confused when there is a standard ). However, for this example, the source code after preprocessing is still like this, so I am not going to introduce the pre-processor in depth here, and it will not affect the reading of the source code.

Stage 2: lexical analysis.

The task of lexical analysis is to break down words (tokens ). Just like English-Chinese Translation, you may wish to regard the C source program as English and machine code as Chinese. Well, now I want to give you an English (C source program) and translate it into Chinese (machine code ). What would you do? Of course, you should first break down the English section into words and compare them with the Oxford Dictionary (C language lexical rules) one by one ). Similarly, lexical analysis is nothing more than breaking the source program into lexical units, that is, words, and then writing them down in a table. Let's take a look at the syntax analysis. Now, this example will be broken into the following word table by the James:

Word EncodingValue-added

Int inttype

ID "round"

'('

ID "F"

')'

Float floattype

ID "F"

';'

'{'

Return

ID "F"

'+'

Fcon 0.5

';'

'}'

EOI

EOI indicates the Terminator. The added value provides more information about words. Although you may still be concerned about some symbols in the above table, such as fcon, you only need to have a rough understanding of lexical analysis, that is to say, you can understand what lexical analysis is. As for how it operates, it will be analyzed in the future.

Stage 3: syntax and semantic analysis.

Syntax analysis is to analyze whether it meets the syntax requirements and requirements of the C language. For the syntax of the C language, see Appendix A of K & R's classic "the C programming language, it provides a comprehensive explanation of ansi c. Semantic Analysis is to analyze whether it is semantic. For example, some sentences may have no syntax errors but semantic errors, such as code A = B + 3, and undefined variable A. This is a semantic error. Syntax analysis will eventually generate a forest (the data structure should have been learned). There are many trees in the forest, each of which is called the abstract syntax tree (AST ). In this example, the following two ast trees are generated:

 

 

 

Each node is in the form of "operator + type ". The details are as follows:

 

Asgn + F: Assignment + float, that is, floating point value assignment.

Addrf + P: Address-Function + pointer refers to the pointer to the function parameter, that is, the parameter address.

CVD + F: Convert double to float converts double to float. Similarly, CVD + I converts double to int, and so on.

Indir + d: Indirection double, which is a value. The value type is double. Indir indicates the value operation, and + indicates the type of the value. For example, indir + F is the floating point value.

The first ast should be from"Caller "F" ---> double"Look backward and backward. The general process is as follows: the value from the caller (caller) F is a double type, which is converted to the float type and assigned to the function round (callee) parameter F.

The second ast should be"Callee" F "---> float"Scroll down the arrow and look clockwise. The general process is as follows: Take the value from parameter F and convert it to double. Then, add the double precision to the constant 0.5 of the double type. After the computation, convert the result to the int type and return the result.

Do you forget the source code here? Let's look at the source code at the beginning. The returned value is of the int type. The parameter type is of the float type, and constant 0.5 is of the double type by default. Therefore, the strong conversion of a series of types occurs without the programmer's knowledge: first, the real parameters of the double type (hypothetical) are converted to the float type and passed to the parameter, because 0.5 is of the double type, you have to convert it to the double type again, and then perform the operation. Finally, let's look at the returned type. It's not good. It's an int type, but it is strongly converted to the int type, finally, return the result to the caller.

So now programmers who use advanced language programming are really happy! Easily write code like this:

Int A = 5; Double B = 2.34; char c = A + B;

It's okay. Because the C compiler has endured all this in obscurity! This is also one of the benefits of using advanced languages. But the machine is always a machine. You need to tell it what type of operation it can calculate. All those who have learned assembly know that there are integer and floating point types in addition. One of the most important tasks of the compiler is to hide these details to make programming simple.

Stage 4: Generation of intermediate code.

In the intermediate code stage, convert the ast to Dag (directed acyclic graph, non-cyclic directed graph ). For example:

 

 

 

 

The difference between this image and the previous one is to change the image like asgn + F to asgnf to show the difference. The cnst + D is a static variable with the reference number 2. Number 1 indicates the end of the round.

Stage 5: Generation of assembly code.

As you can see, dag can clearly describe the Execution Code. The code generator of the LCS generates the assembly code by adding comments to the Dag. The annotation result is as follows:

 

 

 

The entry and exit of each function are the same assembly code, so the backend will prepare a code template. We only need to insert the generated code into the appropriate position. In terms of X86 and DOS/WINNT, it works with MASM or tasm assembler to connect the generated assembly code to machine code on a specific architecture and operating system.

 

I will not discuss Assembly knowledge here to avoid unnecessary details.

Summary:

From the C source program to the assembly code, we quickly went through it and learned the working process of the LCS. Next, enter the topic and comment out the code. The order of comments is the same as that of the author's book "a retargetable...", because the process should start with lexical analysis, but how to manage the words produced by lexical analysis? This involves symbol tables. How can we dynamically increase the number of symbol tables and store them? This involves memory management. Symbol tables are closely related to Type tables. The symbol name of the symbol table also involves the management of strings. Therefore, the order from bottom to top is:

Memory Management ----> string management -----> symbol table management, type table management -----> lexical analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.