Arbitrary recording (how the compiler Works)

Source: Internet
Author: User

[Disclaimer: All Rights Reserved. You are welcome to reprint it. Do not use it for commercial purposes. Contact Email: feixiaoxing @ 163.com]

Compilers have always been my favorite topic. The compiler is a magic tool that converts meaningless character data into a line of executable code. As a student of every class, the compilation principle is an essential part of professional learning. In subsequent work, few developers were actually engaged in Compiler development. However, if you understand the mechanisms related to compilation principles, it would be of great help to your work. There are many books on compilation principles. on the Internet, you can search for AHO's compilation principles, Academician Chen huowang's compilation principles, and Zhang Suqin's compilation principles. I think all three books are good. In addition, there are also many development tools on compilation principles, such as Lex & YACC. It is not difficult to design your own compiler as long as you write the basic syntax paradigm.

In fact, the current compiler has already broken through the original concept. For example, the final code of the compiler does not necessarily run on the actual machine, but may be a virtual machine. The Compiler does not necessarily need to generate executable files when compiling a language, so it can be interpreted. It is best for the compiler to compile in parallel; the compiler may not be very large. A dozen or more files can be used, such as Lua. However, the compiler we are talking about today is a traditional C compiler. If you are interested, you can see how the compiler helps us generate executable files. We expand them one by one in the order of lexical, syntax, semantics, and optimization. Now suppose there is such a piece of code,

#include <stdio.h>#define MAX_VALUE 7int test(int value){return MAX_VALUE + 1 + value * 4;}int main(int argc, char* argv){int p;p = test(3);printf("p = %d", p);return 1;}

(1) lexical analysis


Lexical analysis is the most basic part of file compilation. There are many characters in the above file, so we need to process them separately. For example, the general classification is probably like this,

A) whether the character is a number, for example, 7, 1, 4

B) whether the character is of the string type, for example, "P = % d"

C) whether the character is a keyword, such as define

D) Whether the character is a variable, such as value, argc, argv, P

E) Whether the character is an operator, for example, +

F) whether the characters are Parentheses, square brackets, curly brackets, etc.

(2) syntax analysis


The purpose of syntax analysis is to build a syntax tree to analyze whether the current file conforms to the syntax structure of the programming language. For example,

A) whether the entire string meets the expression requirements

B) whether the string meets the judgment statement requirements

C) whether the string meets the loop statement requirements

D) Whether the string meets the function requirements

E) Whether the string meets the include syntax requirements

F) Are there any variables that are used without being declared;

(3) semantic analysis


Semantic Analysis is sometimes associated with syntax analysis. However, here we split it into a separate part. The so-called semantic analysis is actually the process of splitting the previously generated syntax tree and generating atomic statements. For example, the above file is probably in this form,

SET valuemov temp1[inner], 7add temp1[inner], 1mul temp2,value[param], 4add temp1, temp2mov result, temp1popSET argc[param]SET argv[param]SET 3call testpop get resultmov p[inner], resultSET pSET string "p = %d"poppopmov result, 1poppop

Here we need to explain that the structure and form of semantic conversion are defined by each compiler, and may not have a general structure. The statements here are just what I have come up with. They may be very different from the actual form, but the basic method should be the same. The main explanation is as follows,

A) the set value is a function parameter.

B) call is a function call.

C) Use Pop for Stack balancing

D) data [inner], indicating that the current variable is a temporary variable in the function

E) Data [Param], indicating that the current variable is an input parameter

F) temp indicates the local variables temporarily added by the compiler for its own convenience.

G) result indicates the returned value.

(4) code optimization


Code optimization is an important part of compiler processing. The purpose of code optimization is to reduce unnecessary computing and processing, such

A) Calculate temporary variables with no use value

B) Remove the if statement that does not determine the value

C) For some const variables, the compiler calculates them in advance. Here we can calculate temp1 in advance.

D) other optimization measures

(5) generate assembly code


The code generated in (3) is only intermediate code and is not an assembly language in the full sense. Therefore, the compiler also needs to translate it into the corresponding binary code, such as the arm language, x86 language, or PowerPC language. Of course, there are some skills in the middle, such

A) For multi-parameter functions, some CPUs can be replaced by registers, and some CPUs are represented by stacks.

B) Some CPUs need to be aligned to bytes, while some CPUs do not.

C) Some CPUs have requirements in the byte sequence. Some CPUs do not matter, while some CPUs are optional.

D) for temporary variables, some CPUs can be represented by registers, while some CPUs can only generate one temp variable by themselves,

Speaking of this, we can also try our best to see how the code is generated. If you are familiar with x86 code, you can try it on your own,

push ebpmov ebp, esppush ebxpush ecxmov ebx, 8mov ecx, ebp[8]mul ecx, 4add ebx, ecxmov eax, ebxpop ecxpop ebxmov esp, ebppop ebppush ebpmov ebp, espsub esp, 0x4push 3call testadd esp 4mov ebp[-4], eaxpush ebp[-4]push string "p = %d"call printfadd esp, 8mov eax, 1sub esp, 0x4mov esp, ebppop ebp

(6) Assembly-level code optimization


The optimization here is actually quite a lot, but the function is basically limited, nothing more,

A) transformation from multiplication to shift

B) division into shift

C) Register Optimization

D) repeated operations for deleting registers

E) some function parameters are replaced by registers.

(7) Link and generate an executable file


During the compilation process, we often see that some code has been compiled, but the link has failed. This is normal, because in the last generated file, each variable and function should have a source; otherwise, the link will fail. No matter what system platform, the link is a great learning. At this time, there are actually a lot of things to do, such

A) generate the execution file and check whether debugging information is included.

B) link all variables and code

C) generate a map file

D) determine the source of the function and variable. Once the search fails, the process ends.

E) Adjust the location of variables and function code, fill in the file structure, and generate the final Executable File

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.